逃避单个角色最简单的算法是什么?

Hei*_*nzi 9 language-agnostic algorithm escaping

我正在尝试编写两个函数escape(text, delimiter)unescape(text, delimiter)具有以下属性:

  1. 结果escape不包含delimiter.

  2. unescape是相反的escape,即

    unescape(escape(text, delimiter), delimiter) == text
    
    Run Code Online (Sandbox Code Playgroud)

    对于所有值textdelimiter

可以限制允许的值delimiter.


背景:我想创建一个分隔符分隔的值字符串.为了能够再次从字符串中提取相同的列表,我必须确保单独的字符串不包含分隔符.


我尝试了什么:我想出了一个简单的解决方案(伪代码):

escape(text, delimiter):   return text.Replace("\", "\\").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\\", "\")
Run Code Online (Sandbox Code Playgroud)

但发现测试字符串上的属性2失败了"\d<delimiter>".目前,我有以下工作方案

escape(text, delimiter):   return text.Replace("\", "\b").Replace(delimiter, "\d")
unescape(text, delimiter): return text.Replace("\d", delimiter).Replace("\b", "\")
Run Code Online (Sandbox Code Playgroud)

这似乎有用,只要delimiter不是\,bd(这很好,我不想用那些作为分隔符).但是,由于我还没有正式证明其正确性,我担心我错过了其中一个属性被违反的情况.由于这是一个常见的问题,我认为已经存在一个"众所周知的证明正确"的算法,因此我的问题(见标题).

Eld*_*rum 5

你的第一个算法是正确的。

错误在于 unescape() 的实现中:您需要在同一 pass 中同时替换\dbydelimiter\\by 。您不能像这样多次调用 Replace()。\

以下是一些用于安全引用分隔符分隔字符串的示例 C# 代码:

    static string QuoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~" -> "~~"     ";" -> "~s"
    {
        var sb = new StringBuilder(str.Length);
        foreach (char c in str)
        {
            if (c == quoteChar)
            {
                sb.Append(quoteChar);
                sb.Append(quoteChar);
            }
            else if (c == separator)
            {
                sb.Append(quoteChar);
                sb.Append(otherChar);
            }
            else
            {
                sb.Append(c);
            }
        }
        return sb.ToString(); // no separator in the result -> Join/Split is safe
    }
    static string UnquoteSeparator(string str,
        char separator, char quoteChar, char otherChar) // "~~" -> "~"     "~s" -> ";"
    {
        var sb = new StringBuilder(str.Length);
        bool isQuoted = false;
        foreach (char c in str)
        {
            if (isQuoted)
            {
                if (c == otherChar)
                    sb.Append(separator);
                else
                    sb.Append(c);
                isQuoted = false;
            }
            else
            {
                if (c == quoteChar)
                    isQuoted = true;
                else
                    sb.Append(c);
            }
        }
        if (isQuoted)
            throw new ArgumentException("input string is not correctly quoted");
        return sb.ToString(); // ";" are restored
    }

    /// <summary>
    /// Encodes the given strings as a single string.
    /// </summary>
    /// <param name="input">The strings.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static string QuoteAndJoin(this IEnumerable<string> input,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(input, "input");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot quote: ambiguous format");
        return string.Join(new string(separator, 1), (from str in input select QuoteSeparator(str, separator, quoteChar, otherChar)).ToArray());
    }

    /// <summary>
    /// Decodes the strings encoded in a single string.
    /// </summary>
    /// <param name="encoded">The encoded.</param>
    /// <param name="separator">The separator.</param>
    /// <param name="quoteChar">The quote char.</param>
    /// <param name="otherChar">The other char.</param>
    /// <returns></returns>
    public static IEnumerable<string> SplitAndUnquote(this string encoded,
        char separator = ';', char quoteChar = '~', char otherChar = 's')
    {
        CommonHelper.CheckNullReference(encoded, "encoded");
        if (separator == quoteChar || quoteChar == otherChar || separator == otherChar)
            throw new ArgumentException("cannot unquote: ambiguous format");
        return from s in encoded.Split(separator) select UnquoteSeparator(s, separator, quoteChar, otherChar);
    }
Run Code Online (Sandbox Code Playgroud)