我的目标是仅从字符串集中的括号中删除重复的单词.
a = c( 'I (have|has|have) certain (words|word|worded|word) certain',
'(You|You|Youre) (can|cans|can) do this (works|works|worked)',
'I (am|are|am) (sure|sure|surely) you know (what|when|what) (you|her|you) should (do|do)' )
Run Code Online (Sandbox Code Playgroud)
我想要的就是这样
a
[1]'I (have|has) certain (words|word|worded) certain'
[2]'(You|Youre) (can|cans) do this (works|worked)'
[3]'I (am|are) pretty (sure|surely) you know (what|when) (you|her) should (do|)'
Run Code Online (Sandbox Code Playgroud)
为了得到结果,我使用了这样的代码
a = gsub('\\|', " | ", a)
a = gsub('\\(', "( ", a)
a = gsub('\\)', " )", a)
a = vapply(strsplit(a, " "), function(x) paste(unique(x), collapse = " "), character(1L))
Run Code Online (Sandbox Code Playgroud)
但是,它导致了不良产出.
a
[1] "I ( have | has ) certain words word worded"
[2] "( You | Youre ) can cans do this works worked"
[3] "I ( am | are ) sure surely you know what when her should do"
Run Code Online (Sandbox Code Playgroud)
为什么我的代码会删除位于字符串后半部分的括号?我应该怎样做我想要的结果?
我们可以用gsubfn.这里的想法是通过匹配开括号(\\(必须转义括号,因为它是元字符),然后是一个或多个不是右括号([^)]+)的字符,选择括号内的字符,将其捕获为一个组内的组括号.在替换中,我们拆分组字符(x含)strsplit,unlist所述list输出,得到unique的元件和paste它一起
library(gsubfn)
gsubfn("\\(([^)]+)", ~paste0("(", paste(unique(unlist(strsplit(x,
"[|]"))), collapse="|")), a)
#[1] "I (have|has) certain (words|word|worded) certain"
#[2] "(You|Youre) (can|cans) do this (works|worked)"
#[3] "I (am|are) (sure|surely) you know (what|when) (you|her) should (do)"
Run Code Online (Sandbox Code Playgroud)