因此,我尝试grep从"天气状况"列中查看数据,该列具有针对不同天气类型的多个指标.我试图分别"+ SN","SN"和"-SN",但我很难避免部分匹配.
以下是要插入的列中可能包含的内容的示例:
c("-SN", " ", "SN FR", "HZ +SN", "SN", "+SN", " ", "+BC -SN")
Grepping"-SN"很好,但是grepping"+ SN"很棘手,因为+是一个正则表达式运算符本身.使用转义字符给我以下错误:
> grep( "\+SN" ,aa)
Error: '\+' is an unrecognized escape in character string starting ""\+"
此外,在不获得"+ SN"或"-SN"的情况下轻击"SN"是一项挑战.正如您所看到的,我无法使用^SN$或^SN排除+或 - 符号,因为一列中可能有多个指标,而我正在寻找的指标可能位于另一个指标的前面或后面.R中有grep !=还是-v等价的吗?你会怎么样这样的?R中的正则表达式在功能上似乎更受限制.
谢谢.
您需要使用基于负面外观的正则表达式.
> x <- c("-SN", " ", "SN FR", "HZ +SN", "SN", "+SN", " ", "+BC -SN")
> regmatches(x, regexpr("(?<!\\S)[-+]?SN(?!\\S)", x, perl=TRUE))
[1] "-SN" "SN" "+SN" "SN" "+SN" "-SN"
Run Code Online (Sandbox Code Playgroud)
(?<!\\S) 断言匹配不会以非空格字符开头.
要么
按顺序使用锚点进行精确的字符串匹配.
> x <- c("-SN", " ", "SN FR", "HZ +SN", "SN", "+SN", " ", "+BC -SN")
> regmatches(x, regexpr("^[-+]?SN$", x))
[1] "-SN" "SN" "+SN"
Run Code Online (Sandbox Code Playgroud)
要么
> grep("^[-+]?SN$", x, value=TRUE)
[1] "-SN" "SN" "+SN"
Run Code Online (Sandbox Code Playgroud)
要么
要获得SN单独的,即,SN它不是由前面+或-
> x <- c("-SN", " ", "SN FR", "HZ +SN", "SN", "+SN", " ", "+BC -SN")
> regmatches(x, regexpr("(?<![+-])SN\\b", x, perl=TRUE))
[1] "SN" "SN"
Run Code Online (Sandbox Code Playgroud)