删除两个括号之间的所有文本

Mic*_*son 14 regex r stringr

假设我有这样的文字,

text<-c("[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It's harder to save. It's harder to retire. [McCain]: The biggest problem with American healthcare system is that it costs too much. [Obama]: We will have a healthcare system, not a disease-care system. We have the chance to solve problems that we've been talking about... [Text on screen]: Senators McCain and Obama are talking about your healthcare and financial security. We need more than talk. [Obama]: ...year after year after year after year. [Announcer]: Call and make sure their talk turns into real solutions. AARP is responsible for the content of this advertising.")
Run Code Online (Sandbox Code Playgroud)

我想删除(编辑:删除)[和](和括号本身)之间的所有文本.最好的方法是什么?这是我使用正则表达式和stingr包的微弱尝试:

str_extract(text, "\\[[a-z]*\\]")
Run Code Online (Sandbox Code Playgroud)

谢谢你的帮助!

zx8*_*x81 21

有了这个:

gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);
Run Code Online (Sandbox Code Playgroud)

正则表达式意味着什么:

  \[                       # '['
  [^\]]*                   # any character except: '\]' (0 or more
                           # times (matching the most amount possible))
  \]                       # ']'
Run Code Online (Sandbox Code Playgroud)

  • @MichaelDavidson非常欢迎你.通常情况下,这里的否定字符类比".*?"中的懒星点更快,因为引擎在每一步都会回溯.在这种情况下没什么大不了的,两种解决方案都没问题.:) (2认同)

jba*_*ums 9

以下应该做的伎俩.该?部队懒惰匹配,它匹配尽可能少的.前后面越好].

gsub('\\[.*?\\]', '', text)
Run Code Online (Sandbox Code Playgroud)