Ani*_*yal 7 regex r string-substitution gsub
我正在寻找正则表达式(最好是 in R
),它可以用 say 替换(任意数量的)特定字符;
say;;
但仅当文本字符串内的括号内不存在时()
。
注意: 1. 括号内也可能存在多个替换字符
2.数据/向量中没有嵌套括号
例子
text;othertext
替换为 text;;othertext
text;other(texttt;some;someother);more
要替换为text;;other(texttt;some;someother);;more
. (即;
仅在外部()
被替换文本替换)如果需要澄清,我会尝试解释
in_vec <- c("abcd;ghi;dfsF(adffg;adfsasdf);dfg;(asd;fdsg);ag", "zvc;dfasdf;asdga;asd(asd;hsfd)", "adsg;(asdg;ASF;DFG;ASDF;);sdafdf", "asagf;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;sdfa")
in_vec
#> [1] "abcd;ghi;dfsF(adffg;adfsasdf);dfg;(asd;fdsg);ag"
#> [2] "zvc;dfasdf;asdga;asd(asd;hsfd)"
#> [3] "adsg;(asdg;ASF;DFG;ASDF;);sdafdf"
#> [4] "asagf;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;sdfa"
Run Code Online (Sandbox Code Playgroud)
预期输出(手动计算)
[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"
[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"
[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"
Run Code Online (Sandbox Code Playgroud)
GKi*_*GKi 10
你可以用gsub
与;(?![^(]*\\))
:
gsub(";(?![^(]*\\))", ";;", in_vec, perl=TRUE)
#[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
#[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"
#[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"
#[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"
Run Code Online (Sandbox Code Playgroud)
;
finds ;
, (?!)
.. Negative Lookahead (在它不匹配时进行替换), [^(]
.. 一切但不是(
,*
重复前面的 0 到 n 次, \\)
.. 流过)
.
或者
gsub(";(?=[^)]*($|\\())", ";;", in_vec, perl=TRUE)
#[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
#[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"
#[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"
#[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"
Run Code Online (Sandbox Code Playgroud)
;
finds ;
, (?=)
.. Positive Lookahead (在匹配时进行替换), [^)]
.. 除不匹配外的所有内容)
,*
重复前面的 0 到 n 次, ($|\\()
.. match end$
或(
.
或者使用gregexpr
和regmatches
提取和之间的部分(
并)
在不匹配的子字符串中进行替换:
x <- gregexpr("\\(.*?\\)", in_vec) #Find the part between ( and )
mapply(function(a, b) {
paste(matrix(c(gsub(";", ";;", b), a, ""), 2, byrow=TRUE), collapse = "")
}, regmatches(in_vec, x), regmatches(in_vec, x, TRUE))
#[1] "abcd;;ghi;;dfsF(adffg;adfsasdf);;dfg;;(asd;fdsg);;ag"
#[2] "zvc;;dfasdf;;asdga;;asd(asd;hsfd)"
#[3] "adsg;;(asdg;ASF;DFG;ASDF;);;sdafdf"
#[4] "asagf;;(fafgf;sadg;sdag;a;gddfg;fd)gsfg;;sdfa"
Run Code Online (Sandbox Code Playgroud)
但所有这些都只适用于简单的开(
闭)
组合。