保护特定单词,删除字符串中的字母

Mar*_*ler 5 regex string r

我想删除字符串中的字母,但保护特定的单词.这是一个例子:

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!"

desired.result <- "12 marigolds, 45 trees"
Run Code Online (Sandbox Code Playgroud)

我尝试了下面的代码,结果令人惊讶.我认为()会保护它所包含的一切.相反,恰恰相反.只()删除了内部的单词(加上!).

gsub("(marigolds|trees)\\D", "", my.string)

# [1] "Water the 12 gold please, but not the 45 "
Run Code Online (Sandbox Code Playgroud)

以下是一个较长字符串的示例:

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!, The 7 orange marigolds are fine."

desired.result <- "12 marigolds, 45 trees, 7 marigolds"

gsub("(marigolds|trees)\\D", "", my.string)
Run Code Online (Sandbox Code Playgroud)

返回:

[1] "Water the 12 gold please, but not the 45 , The 7 orange are fine."
Run Code Online (Sandbox Code Playgroud)

谢谢你的任何建议.我更喜欢regex基础解决方案R.

fal*_*tru 7

使用词边界,负前瞻断言.

> my.string <- "Water the 12 gold marigolds please, but not the 45 trees!"
> gsub("\\b(?!marigolds\\b|trees\\b)[A-Za-z]+\\s*", "", my.string, perl=TRUE)
[1] "12 marigolds , 45 trees!"
> gsub("\\b(?!marigolds\\b|trees\\b)[A-Za-z]+\\s*|!", "", my.string, perl=TRUE)
[1] "12 marigolds , 45 trees"
Run Code Online (Sandbox Code Playgroud)


Cas*_*yte 2

使用捕获组的另一种方法:

my.string <- "Water the 12 gold marigolds please, but not the 45 trees!, The 7 orange marigolds are fine."
gsub("(?i)\\b(?:(marigolds|trees)|[a-z]+)\\b\\s*|[.?!]", "\\1", my.string, perl=TRUE)
Run Code Online (Sandbox Code Playgroud)