正则表达式仅用于替换括号外的特定字符

Ani*_*yal 7 regex r string-substitution gsub

我正在寻找正则表达式(最好是 in R),它可以用 say 替换(任意数量的)特定字符;say;;但仅当文本字符串内的括号内不存在()

注意: 1. 括号内也可能存在多个替换字符

2.数据/向量中没有嵌套括号

例子

  • text;othertext 替换为 text;;othertext
  • text;other(texttt;some;someother);more要替换为text;;other(texttt;some;someother);;more. (即;仅在外部()被替换文本替换)

如果需要澄清,我会尝试解释

in_vec <- c("abcd;ghi;dfsF(adffg;adfsasdf);dfg;(asd;fdsg);ag", "zvc;dfasdf;asdga;asd(asd;hsfd)", "adsg;(asdg;ASF;DFG;ASDF;);sdafdf", "asagf;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;sdfa")

in_vec
#> [1] "abcd;ghi;dfsF(adffg;adfsasdf);dfg;(asd;fdsg);ag"
#> [2] "zvc;dfasdf;asdga;asd(asd;hsfd)"             
#> [3] "adsg;(asdg;ASF;DFG;ASDF;);sdafdf"           
#> [4] "asagf;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;sdfa"
Run Code Online (Sandbox Code Playgroud)

预期输出(手动计算)

[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag" 
[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"             
[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"            
[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"
Run Code Online (Sandbox Code Playgroud)

GKi*_*GKi 10

你可以用gsub;(?![^(]*\\))

gsub(";(?![^(]*\\))", ";;", in_vec, perl=TRUE)
#[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
#[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"                   
#[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"                  
#[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"       
Run Code Online (Sandbox Code Playgroud)

;finds ;, (?!).. Negative Lookahead (在它不匹配时进行替换), [^(].. 一切但不是(,*重复前面的 0 到 n 次, \\).. 流过).

或者

gsub(";(?=[^)]*($|\\())", ";;", in_vec, perl=TRUE)
#[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
#[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"                   
#[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"                  
#[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"       
Run Code Online (Sandbox Code Playgroud)

;finds ;, (?=).. Positive Lookahead (在匹配时进行替换), [^)].. 除不匹配外的所有内容),*重复前面的 0 到 n 次, ($|\\().. match end$(.

或者使用gregexprregmatches提取和之间的部分()在不匹配的子字符串中进行替换:

x <- gregexpr("\\(.*?\\)", in_vec)  #Find the part between ( and )
mapply(function(a, b) {
  paste(matrix(c(gsub(";", ";;", b), a, ""), 2, byrow=TRUE), collapse = "")
}, regmatches(in_vec, x), regmatches(in_vec, x, TRUE))
#[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
#[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"                   
#[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"                  
#[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"       
Run Code Online (Sandbox Code Playgroud)

但所有这些都只适用于简单的开()组合。