我有一个如下数据集.我想用1来替换两个1之间的所有点,如期望结果所示.我可以regex在基地做这个R吗?
我试过了:
regexpr("^1\\.1$", my.data$my.string, perl = TRUE)
Run Code Online (Sandbox Code Playgroud)
这是c#中的解决方案
谢谢你的任何建议.
my.data <- read.table(text='
my.string state
................1...............1. A
......1..........................1 A
.............1.....2.............. B
......1.................1...2..... B
....1....2........................ B
1...2............................. C
..........1....................1.. C
.1............................1... C
.................1...........1.... C
........1....2.................... C
......1........................1.. C
....1....1...2.................... D
......1....................1...... D
.................1...2............ D
', header = TRUE, na.strings = 'NA', stringsAsFactors = FALSE)
desired.result <- read.table(text='
my.string state
................11111111111111111. A
......1111111111111111111111111111 A
.............1.....2.............. B
......1111111111111111111...2..... B
....1....2........................ B
1...2............................. C
..........1111111111111111111111.. C
.111111111111111111111111111111... C
.................1111111111111.... C
........1....2.................... C
......11111111111111111111111111.. C
....111111...2.................... D
......1111111111111111111111...... D
.................1...2............ D
', header = TRUE, na.strings = 'NA', stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)
hwn*_*wnd 12
下面是使用选项gsub与\G功能,并环视断言.
> gsub('(?:1|\\G(?<!^))\\K\\.(?=\\.*1)', '1', my.data$my.string, perl = TRUE)
# [1] "................11111111111111111." "......1111111111111111111111111111"
# [3] ".............1.....2.............." "......1111111111111111111...2....."
# [5] "....1....2........................" "1...2............................."
# [7] "..........1111111111111111111111.." ".111111111111111111111111111111..."
# [9] ".................1111111111111...." "........1....2...................."
# [11] "......11111111111111111111111111.." "....111111...2...................."
# [13] "......1111111111111111111111......" ".................1...2............"
Run Code Online (Sandbox Code Playgroud)
该\G功能是一个可以在两个位置之一匹配的锚点; 字符串位置的开头或最后一个匹配结束时的位置.因为看起来你想避免字符串位置开头的点,我们使用一个环绕声断言\G(?<!^)来排除字符串的开头.
该\K转义序列重置报道比赛的出发点和任何先前消耗字符不再包括在内.
您可以在此处找到解释正则表达式的整体细分.
使用时gsubfn,第一个参数是一个正则表达式,它匹配1和1之间的字符并捕获后者.第二个参数是一个函数,用公式表示法表示,用于将gsub捕获的字符串中的每个字符替换为1:
library(gsubfn)
transform(my.data, my.string = gsubfn("1(.*)1", ~ gsub(".", 1, x), my.string))
Run Code Online (Sandbox Code Playgroud)
如果字符串中可以有多对1,则使用"1(.*?)1"正则表达式作为正则表达式.
可视化这里的正则表达式很简单,可以直接理解,但这里是一个debuggex可视化anwyays:
1(.*)1
Run Code Online (Sandbox Code Playgroud)

下面是一个使用一个相对简单的正则表达式和的标准组合的选项gregexpr(),regmatches()和regmatches<-()识别,提取物,操作上,然后替换子串匹配正则表达式.
## Copy the character vector
x <- my.data$my.string
## Find sequences of "."s bracketed on either end by a "1"
m <- gregexpr("(?<=1)\\.+(?=1)", x, perl=TRUE)
## Standard template for operating on and replacing matched substrings
regmatches(x,m) <- sapply(regmatches(x,m), function(X) gsub(".", "1", X))
## Check that it worked
head(x)
# [1] "................11111111111111111." "......1111111111111111111111111111"
# [3] ".............1.....2.............." "......1111111111111111111...2....."
# [5] "....1....2........................" "1...2............................."
Run Code Online (Sandbox Code Playgroud)