正则表达式; 反向引用字符集中不匹配的字符

fla*_*hon 7 java regex logic backreference

我想构造一个正则表达式,匹配任何一个'"然后匹配其他字符,当a '或a "分别匹配时结束,这取决于在开始时遇到的内容.所以这个问题看起来很简单,最后可以通过反向引用来解决; 下面是一些正则表达式代码(它在Java中,所以请注意额外的转义字符,例如\前面的"):

private static String seekerTwo = "(['\"])([a-zA-Z])([a-zA-Z0-9():;/`\\=\\.\\,\\- ]+)(\\1)";
Run Code Online (Sandbox Code Playgroud)

此代码将成功处理以下内容:

"hello my name is bob"
'i live in bethnal green'
Run Code Online (Sandbox Code Playgroud)

当我有这样的字符串时出现问题:

"hello this seat 'may be taken' already"
Run Code Online (Sandbox Code Playgroud)

使用上面的正则表达式会在遇到初始部分失败'然后它会继续并成功匹配'may be taken'...但这显然是不够的,我需要整个字符串匹配.

我在想的是,我需要一种方法来忽略引号的类型,它在第一组中不匹配,将它包含在第3组字符集中的字符中.但是,我知道无法做到这一点.是否存在某种偷偷摸摸的非反向引用功能?我可以用来引用第一组中不匹配的角色?或者以其他方式解决我的困境?

Tim*_*ker 12

这可以使用负前瞻断言来完成.以下解决方案甚至考虑到您可以在字符串中转义引号:

(["'])(?:\\.|(?!\1).)*\1
Run Code Online (Sandbox Code Playgroud)

说明:

(["'])    # Match and remember a quote.
(?:       # Either match...
 \\.      # an escaped character
|         # or
 (?!\1)   # (unless that character is identical to the quote character in \1)
 .        # any character
)*        # any number of times.
\1        # Match the corresponding quote.
Run Code Online (Sandbox Code Playgroud)

这正确匹配"hello this seat 'may be taken' already""hello this seat \"may be taken\" already".

在Java中,包含所有反斜杠:

Pattern regex = Pattern.compile(
    "([\"'])   # Match and remember a quote.\n" +
    "(?:       # Either match...\n" +
    " \\\\.    # an escaped character\n" +
    "|         # or\n" +
    " (?!\\1)  # (unless that character is identical to the matched quote char)\n" +
    " .        # any character\n" +
    ")*        # any number of times.\n" +
    "\\1       # Match the corresponding quote", 
    Pattern.COMMENTS);
Run Code Online (Sandbox Code Playgroud)