目标:
我想将句子与'no'一词匹配,但只有在'没有'之前没有'带'或'有'或'有'在r之前.
输入:
The ground was rocky with no cracks in it
No diggedy, no doubt
Understandably, there is no way an elephant can be green
Run Code Online (Sandbox Code Playgroud)
预期产量:
The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green
Run Code Online (Sandbox Code Playgroud)
尝试:
gsub(".*(?:((?<!with )|(?<!there is )|(?<!there are ))\\bno\\b(?![?:A-Za-z])|([?:]\\s*N?![A-Za-z])).*\\R*", "", input_string, perl=TRUE, ignore.case=TRUE)
Run Code Online (Sandbox Code Playgroud)
问题:
负面的后视似乎被忽略,所以所有的句子都被替换了.问题是在lookbehind声明中使用交替吗?
您可以使用
(?mxi)^ # Start of a line (and free-spacing/case insensitive modes are on)
(?: # Outer container group start
(?!.*\b(?:with|there\h(?:is|are))\h+no\b) # no 'with/there is/are no' before 'no'
.*\bno\b # 'no' whole word after 0+ chars
(?![?:]) # cannot be followed with ? or :
| # or
.* # any 0+ chars
[?:]\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
) # container group end
.* # the rest of the line and
\R* # 0+ line breaks
Run Code Online (Sandbox Code Playgroud)
请参阅正则表达式演示。简而言之:该模式找到 2 个替代方案,两种类型的行中的一种,其中一种包含no整个单词且前面没有with,there is或there are,后面有一个空格,或者包含?或:后跟 0+ 水平空格 ( \h)的行然后后面n不跟任何其他字母。
请参阅R 演示:
sentences <- "The ground was rocky with no cracks in it\r\nNo diggedy, no doubt\r\nUnderstandably, there is no way an elephant can be green"
rx <- "(?mxi)^ # Start of a line
(?: # Outer container group start
(?!.*\\b(?:with|there\\h(?:is|are))\\h+no\\b) # no 'with/there is/are no' before 'no'
.*\\bno\\b # 'no' whole word after 0+ chars
(?![?:]) # cannot be followed with ? or :
| # or
.* # any 0+ chars
[?:]\\h*n(?![a-z]) # ? or : followed with 0+ spaces, 'n' not followed with any letter
) # container group end
.* # the rest of the line and 0+ line breaks
\\R*"
res <- gsub(rx, "", sentences, perl=TRUE)
cat(res, sep="\n")
Run Code Online (Sandbox Code Playgroud)
输出:
The ground was rocky with no cracks in it
Understandably, there is no way an elephant can be green
Run Code Online (Sandbox Code Playgroud)
借助修饰符x,您可以向正则表达式模式添加注释并使用空格对其进行格式化以提高可读性。请注意,所有文字空白必须替换为\\h(水平空白)、\\s(任何空白)、\\n(LF)、\\r(CR) 等,以使其以这种模式工作。
修饰符(?i)代表ingore.case=TRUE.