Hei*_*nzi 9 language-agnostic algorithm escaping
我正在尝试编写两个函数escape(text, delimiter)并unescape(text, delimiter)具有以下属性:
结果escape不包含delimiter.
unescape是相反的escape,即
unescape(escape(text, delimiter), delimiter) == text
Run Code Online (Sandbox Code Playgroud)
对于所有值text和delimiter
可以限制允许的值delimiter.
背景:我想创建一个分隔符分隔的值字符串.为了能够再次从字符串中提取相同的列表,我必须确保单独的字符串不包含分隔符.
我尝试了什么:我想出了一个简单的解决方案(伪代码):
escape(text, delimiter): return text.Replace("\", "\\").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\")
Run Code Online (Sandbox Code Playgroud)
但发现测试字符串上的属性2失败了"\d<delimiter>".目前,我有以下工作方案
escape(text, delimiter): return text.Replace("\", "\b").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\")
Run Code Online (Sandbox Code Playgroud)
这似乎有用,只要delimiter不是\,b或d(这很好,我不想用那些作为分隔符).但是,由于我还没有正式证明其正确性,我担心我错过了其中一个属性被违反的情况.由于这是一个常见的问题,我认为已经存在一个"众所周知的证明正确"的算法,因此我的问题(见标题).
你的第一个算法是正确的。
错误在于 unescape() 的实现中:您需要在同一 pass 中同时替换\dbydelimiter和\\by 。您不能像这样多次调用 Replace()。\
以下是一些用于安全引用分隔符分隔字符串的示例 C# 代码:
static string QuoteSeparator(string str,
char separator, char quoteChar, char otherChar) // "~" -> "~~" ";" -> "~s"
{
var sb = new StringBuilder(str.Length);
foreach (char c in str)
{
if (c == quoteChar)
{
sb.Append(quoteChar);
sb.Append(quoteChar);
}
else if (c == separator)
{
sb.Append(quoteChar);
sb.Append(otherChar);
}
else
{
sb.Append(c);
}
}
return sb.ToString(); // no separator in the result -> Join/Split is safe
}
static string UnquoteSeparator(string str,
char separator, char quoteChar, char otherChar) // "~~" -> "~" "~s" -> ";"
{
var sb = new StringBuilder(str.Length);
bool isQuoted = false;
foreach (char c in str)
{
if (isQuoted)
{
if (c == otherChar)
sb.Append(separator);
else
sb.Append(c);
isQuoted = false;
}
else
{
if (c == quoteChar)
isQuoted = true;
else
sb.Append(c);
}
}
if (isQuoted)
throw new ArgumentException("input string is not correctly quoted");
return sb.ToString(); // ";" are restored
}
/// <summary>
/// Encodes the given strings as a single string.
/// </summary>
/// <param name="input">The strings.</param>
/// <param name="separator">The separator.</param>
/// <param name="quoteChar">The quote char.</param>
/// <param name="otherChar">The other char.</param>
/// <returns></returns>
public static string QuoteAndJoin(this IEnumerable<string> input,
char separator = ';', char quoteChar = '~', char otherChar = 's')
{
CommonHelper.CheckNullReference(input, "input");
if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
throw new ArgumentException("cannot quote: ambiguous format");
return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray());
}
/// <summary>
/// Decodes the strings encoded in a single string.
/// </summary>
/// <param name="encoded">The encoded.</param>
/// <param name="separator">The separator.</param>
/// <param name="quoteChar">The quote char.</param>
/// <param name="otherChar">The other char.</param>
/// <returns></returns>
public static IEnumerable<string> SplitAndUnquote(this string encoded,
char separator = ';', char quoteChar = '~', char otherChar = 's')
{
CommonHelper.CheckNullReference(encoded, "encoded");
if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
throw new ArgumentException("cannot unquote: ambiguous format");
return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar);
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
905 次 |
| 最近记录: |