我想从我的字符串向量中获取模式
string <- c(
"P10000101 - Przychody netto ze sprzedazy produktów" ,
"P10000102_PL - Przychody nettozy uslug",
"P1000010201_PL - Handlowych, marketingowych, szkoleniowych",
"P100001020101 - - Handlowych,, szkoleniowych - refaktury",
"- Handlowych, marketingowych,P100001020102, - pozostale"
)
Run Code Online (Sandbox Code Playgroud)
结果我想得到正则表达式的精确匹配
result <- c(
"P10000101",
"P10000102_PL",
"P1000010201_PL",
"P100001020101",
"P100001020102"
)
Run Code Online (Sandbox Code Playgroud)
我试过这个pattern = "([PLA]\\d+)"和不同的组合value = T, fixed = T, perl = T.
grep(x = string, pattern = "([PLA]\\d+(_PL)?)", fixed = T)
Run Code Online (Sandbox Code Playgroud)
我们可以试试 str_extract
library(stringr)
str_extract(string, "P\\d+(_[A-Z]+)*")
#[1] "P10000101" "P10000102_PL" "P1000010201_PL" "P100001020101" "P100001020102"
Run Code Online (Sandbox Code Playgroud)
grep用于查找匹配模式是否存在于特定字符串中.提取时,使用sub或gregexpr/regmatches或str_extract
使用base R(regexpr/regmatches)
regmatches(string, regexpr("P\\d+(_[A-Z]+)*", string))
#[1] "P10000101" "P10000102_PL" "P1000010201_PL" "P100001020101" "P100001020102"
Run Code Online (Sandbox Code Playgroud)
基本上,匹配的模式P后跟一个数字(\\d+)后跟greedy(*)匹配_和一个或多个大写字母.