C#正则表达式有意想不到的结果

Kal*_*nGi 0 c# regex

我有一个正则表达式,它应该匹配一个字符串中的"S".我使用了以下版本,它通过拒绝字符串"圣查尔斯"来工作:

regex = new Regex(@"(^|\s)(?<stuff>S?)(\s|$)");
Match match = regex.Match("Saint Charles");
Run Code Online (Sandbox Code Playgroud)

比赛按预期失败.

我的问题是下面的第二个版本如何接受字符串:

regex = new Regex(@"(^|\b)(?<stuff>S?)(\b|$)");
Match match = regex.Match("Saint Charles");
Run Code Online (Sandbox Code Playgroud)

比赛成功,但我预计它会失败.

更新: 以下是我的目标背景:

我有一个错误拼写或需要不同措辞的位置列表:

string[] locationNames =
            {
                "Ponte Vedra Beac",
                "Newton Upper Fal",
                "Howey In The Hil",
                "Mc Donough",
                "East Mc Dowell",
                "Saint Charles",
                "Cape Saint Clair",
                "Marine On Saint",
                "W Mifflin Fin",
                "Mt Sylvan",
                "Bromley Mtn",
                "S Richmond Hill"
            }; 
Run Code Online (Sandbox Code Playgroud)

通过查看数据,我确定一些替换应发生在位置名称的末尾,一些替换发生在开头,而其他位于其中的任何位置.

我正在使用字典来确定1)正确的替换和2)所需的正则表达式的类型.

var alternateSpellings = new Dictionary<string, string>()
                                {
                                    {"Beac$", "Beach"},
                                    {"Fal$", "Falls"},
                                    {"Hil$", "Hills"},
                                    {"Mc ", "Mc"},
                                    {"\bMt\b", "Mount"},
                                    {"\bMtn\b", "Mountain"},
                                    {"\bS\b", "South"},
                                    {"\bSaint\b", "St."}

                                };
Run Code Online (Sandbox Code Playgroud)

我循环遍历列表,并根据嵌入的元字符选择正则表达式.选项是:

regex = new Regex(".*(?<stuff>" + alternateSpelling.Key.Replace("$", "") + ")$");
Run Code Online (Sandbox Code Playgroud)

要么

regex = new Regex(@"(^|\s)(?<stuff>" + alternateSpelling.Key.Replace("\b", "") + @")(\s|$)");
Run Code Online (Sandbox Code Playgroud)

注意:我放弃了\b赞成\s OR

regex = new Regex(".*(?<stuff>" + alternateSpelling.Key + ").*");
Run Code Online (Sandbox Code Playgroud)

一旦我找到匹配,我就做了替换......

if (match.Success)
                {
                    var stuff = match.Groups["stuff"].Value;
                    var stuffPosition = match.Groups["stuff"].Index;

                    newLocationName = locationName.Remove(stuffPosition, stuff.Length).Insert(stuffPosition, alternateSpelling.Value);

                }
Run Code Online (Sandbox Code Playgroud)

nu1*_*73R 5

怎么(^|\b)(?<stuff>S?)(\b|$) 匹配Saint Charles

^      =>   Start of String
             Saint Charles
            ^

S?     =>    which is optional. Tries to match

             Saint Charles
             ^

(\b|$) =>    Tries for \b or $ after S. But cannot match. Backtracks to start

             Saint Charles
            ^
\b     =>    Matches at the start of the string

             Saint Charles 
             ^
Run Code Online (Sandbox Code Playgroud)

因此成功

  • \b匹配单词边界.那是在字符串的开头和结尾

怎么纠正

S通过删除?帮助来匹配仅包含简单修改的单词

(^|\b)(?<stuff>S)(\b|$)
Run Code Online (Sandbox Code Playgroud)

正则表达式示例