我正在尝试删除文本,直到包含一个或多个逗号的字符串中的第一个逗号为止。出于某种原因,我发现这总是删除所有字符串的最后一个逗号之前的所有内容。
字符串看起来像:
OCR - (some text), Variant - (some text), Bad Subtype - (some text)
Run Code Online (Sandbox Code Playgroud)
我的正则表达式正在返回:
Bad Subtype - (some text)
Run Code Online (Sandbox Code Playgroud)
当所需的输出是:
Variant - (some text), Bad Subtype - (some text)
Run Code Online (Sandbox Code Playgroud)
Variant 不能保证排在第二位。
#select all strings beginning with OCR in the column Tags
clean<- subset(all, grepl("^OCR", all$Tags)
#trim the OCR text up to the first comma, and store in a new column called Tag
clean$Tag<- gsub(".*,", "", clean$Tag)
Run Code Online (Sandbox Code Playgroud)
或者
clean$Tag <- gsub(".*\\,", "", clean$Tag)
Run Code Online (Sandbox Code Playgroud)
或者
clean$Tag<- sub(".*,", "", clean$Tag)
Run Code Online (Sandbox Code Playgroud)
等等..
这是一个可以完成这项工作的正则表达式。
x <- "OCR - (some text), Variant - (some text), Bad Subtype - (some text) and my regex is returning: Bad Subtype - (some text) when the desired output is: Variant - (some text), Bad Subtype - (some text)"
sub("^[^,]*,", "", x)
#[1] " Variant - (some text), Bad Subtype - (some text) and my regex is returning: Bad Subtype - (some text) when the desired output is: Variant - (some text), Bad Subtype - (some text)"
Run Code Online (Sandbox Code Playgroud)
解释
^字符串的开头;^[^,]*开头的任何字符(","重复零次或多次除外);^[^,]*,上面第 2 点中的模式,后跟一个逗号。此模式由空字符串替代""。
一个选项trimwsfrombase R
trimws(x, whitespace = "^[^,]+,\\s*")
Run Code Online (Sandbox Code Playgroud)
-输出
#[1] "Variant - (some text), Bad Subtype - (some text) and my regex is returning: Bad Subtype - (some text) when the desired output is: Variant - (some text), Bad Subtype - (some text)"
Run Code Online (Sandbox Code Playgroud)
x <- "OCR - (some text), Variant - (some text), Bad Subtype - (some text) and my regex is returning: Bad Subtype - (some text) when the desired output is: Variant - (some text), Bad Subtype - (some text)"
Run Code Online (Sandbox Code Playgroud)