Sve*_*sen 6 .net c# regex regex-lookarounds
我正在努力使这个正则表达式模式完全正确,如果有人有更好的选择,我对正则表达式之外的其他选项持开放态度.
情况:我基本上是要针对C#中的文本列解析T-SQL"in"子句.所以,我需要采用这样的字符串值:
"'don''t', 'do', 'anything', 'stupid'"
并将其解释为值列表(稍后我会处理双引号):
"don''t""do""anything""stupid"我有一个适用于大多数情况的正则表达式,但是我很难将它概括为可以接受任何字符或我的组中的双引号单引号: (?:')([a-z0-9\s(?:'(?='))]+)(?:')[,\w]*
我对正则表达式很有经验,但很少(如果有的话)发现需要环顾四周(因此降低了对我的正则表达式经验的评估).
所以,换句话说,我想要一串逗号分隔的值,每个值都用单引号括起来,但可以包含加倍的单引号,并输出每个这样的值.
编辑 这是我当前正则表达式的一个非工作示例(我的问题是我需要处理我的分组中的所有字符,当我遇到单引号后没有第二个单引号时停止):
"'don''t', 'do?', 'anything!', '#stupid$'"
出于可维护性的考虑,我决定不使用正则表达式并遵循使用状态机的建议。这是我的实现的关键:
string currentTerm = string.Empty;
State currentState = State.BetweenTerms;
foreach (char c in valueToParse)
{
switch (currentState)
{
// if between terms, only need to do something if we encounter a single quote, signalling to start a new term
// encloser is client-specified char to look for (e.g. ')
case State.BetweenTerms:
if (c == encloser)
{
currentState = State.InTerm;
}
break;
case State.InTerm:
if (c == encloser)
{
if (valueToParse.Length > index + 1 && valueToParse[index + 1] == encloser && valueToParse.Length > index + 2)
{
// if next character is also encloser then add it and move on
currentTerm += c;
}
else if (currentTerm.Length > 0 && currentTerm[currentTerm.Length - 1] != encloser)
{
// on an encloser and didn't just add encloser, so we are done
// converterFunc is a client-specified Func<string,T> to return terms in the specified type (to allow for converting to int, for example)
yield return converterFunc(currentTerm);
currentTerm = string.Empty;
currentState = State.BetweenTerms;
}
}
else
{
currentTerm += c;
}
break;
}
index++;
}
if (currentTerm.Length > 0)
{
yield return converterFunc(currentTerm);
}
Run Code Online (Sandbox Code Playgroud)