如何解析转义序列?

t3c*_*b0t 0 c# string parsing escaping

我正在为自己的标记编写解析器,我需要处理一些转义序列,但我不确定应该选择哪种策略。

特别是我有两个。

这是foo \\\<bar baz其中两个的示例:\\\<

当我现在按字符扫描字符串时

  1. 我应该检测反斜杠\然后检查下一个字符是否是一个可扩展的字符或
  2. 我应该检查字符然后回头看看它前面是否有反斜杠\

两者是否有任何主要(缺点)优势?

15e*_*153 5

你需要知道你在哪里。这样做的方法是状态机。如果你只做\r, \t, \n, \", 和\\,你可以用一个非常简单的方法来解决。像这样(在这里小提琴):

public static class StringExtensions
{
    private enum UnescapeState
    {
        Unescaped,
        Escaped
    }

    public static String Unescape(this String s)
    {
        var sb = new System.Text.StringBuilder();
        UnescapeState state = UnescapeState.Unescaped;

        foreach (var ch in s)
        {
            switch (state)
            {
                case UnescapeState.Escaped:
                    switch (ch)
                    {
                        case 't':
                            sb.Append('\t');
                            break;
                        case 'n':
                            sb.Append('\n');
                            break;
                        case 'r':
                            sb.Append('\r');
                            break;

                        case '\\':
                        case '\"':
                            sb.Append(ch);
                            break;

                        default:
                            throw new Exception("Unrecognized escape sequence '\\" + ch + "'");

                        //  Finally, what about stuff like '\x0a'? That's a much more 
                        //  complicated state machine. When you see 'x' in Escaped state,
                        //  you transition to UnescapeState.HexDigit0, then either 
                        //  UnescapeState.HexDigit1 or throw an exception, etc. 
                        //  Wicked fun to write. 
                    }
                    state = UnescapeState.Unescaped;
                    break;

                case UnescapeState.Unescaped:
                    if (ch == '\\')
                    {
                        state = UnescapeState.Escaped;
                    }
                    else
                    {
                        sb.Append(ch);
                    }
                    break;
            }
        }

        if (state == UnescapeState.Escaped)
        {
            throw new Exception("Unterminated escape sequence");
        }

        return sb.ToString();
    }
}
Run Code Online (Sandbox Code Playgroud)