我有一个字符串向量
c("YSAHEEHHYDK", "HEHISSDYAGK", "TFAHTESHISK", "ISLGEHEGGGK",
"LSSGYDGTSYK", "FGTGTYAGGEK", "VGASTGYSGLK", "TASGVGGFSTK", "SYASDFGSSAK",
"LYSYYSSTESK")
Run Code Online (Sandbox Code Playgroud)
对于每个字符串,我想用“pY”、“pS”或“pT”替换“Y”、“S”或“T”。但我不希望所有替换都在同一个最终字符串中,我希望每个替换生成一个新字符串,例如
“YSAHEEHHYDK”变成
c("pYSAHEEHHYDK",
"YpSAHEEHHYDK",
"YSAHEEHHpYDK")
Run Code Online (Sandbox Code Playgroud)
你可以用 R 语言编写一个函数:
包括零长度的概念,如 @GKi 所示
strings <- c("YSAHEEHHYDK", "HEHISSDYAGK", "TFAHTESHISK", "ISLGEHEGGGK",
"LSSGYDGTSYK", "FGTGTYAGGEK", "VGASTGYSGLK", "TASGVGGFSTK",
"SYASDFGSSAK", "LYSYYSSTESK")
reg <- gregexpr("[YST]", strings)
`regmatches<-`(rep(strings, lengths(reg)),
`attr<-`(unlist(reg), "match.length", 0), value = 'p')
#> [1] "pYSAHEEHHYDK" "YpSAHEEHHYDK" "YSAHEEHHpYDK" "HEHIpSSDYAGK" "HEHISpSDYAGK"
#> [6] "HEHISSDpYAGK" "pTFAHTESHISK" "TFAHpTESHISK" "TFAHTEpSHISK" "TFAHTESHIpSK"
#> [11] "IpSLGEHEGGGK" "LpSSGYDGTSYK" "LSpSGYDGTSYK" "LSSGpYDGTSYK" "LSSGYDGpTSYK"
#> [16] "LSSGYDGTpSYK" "LSSGYDGTSpYK" "FGpTGTYAGGEK" "FGTGpTYAGGEK" "FGTGTpYAGGEK"
#> [21] "VGApSTGYSGLK" "VGASpTGYSGLK" "VGASTGpYSGLK" "VGASTGYpSGLK" "pTASGVGGFSTK"
#> [26] "TApSGVGGFSTK" "TASGVGGFpSTK" "TASGVGGFSpTK" "pSYASDFGSSAK" "SpYASDFGSSAK"
#> [31] "SYApSDFGSSAK" "SYASDFGpSSAK" "SYASDFGSpSAK" "LpYSYYSSTESK" "LYpSYYSSTESK"
#> [36] "LYSpYYSSTESK" "LYSYpYSSTESK" "LYSYYpSSTESK" "LYSYYSpSTESK" "LYSYYSSpTESK"
#> [41] "LYSYYSSTEpSK"
Run Code Online (Sandbox Code Playgroud)
创建于 2023-02-14,使用reprex v2.0.2
您可以创建一个小函数来帮助您。
my_replace <- function(x){
reg <- gregexpr("[YST]", x)
`regmatches<-`(rep(x, lengths(reg)), structure(unlist(reg), match.length = 0), value = "p")
}
Run Code Online (Sandbox Code Playgroud)
使用xx
最后注释中的输入(如问题中加上一些边界测试),我们使用 stringi 函数。特别注意stri_sub
可以插入ap字符。如果输入字符串为空,即“”,或不包含任何 Y、S 或 T,则为该字符串返回 NA。
library(stringi)
add_p <- function(s, loc) {
start <- loc[, "start"]
stri_sub(s, start, start-1) <- "p"
s
}
Map(add_p, xx, stri_locate_all(xx, regex = "[YST]"))
Run Code Online (Sandbox Code Playgroud)
给予
[1] NA
$ABC
[1] NA
$YSAHEEHHYDK
[1] "pYSAHEEHHYDK" "YpSAHEEHHYDK" "YSAHEEHHpYDK"
$HEHISSDYAGK
[1] "HEHIpSSDYAGK" "HEHISpSDYAGK" "HEHISSDpYAGK"
$TFAHTESHISK
[1] "pTFAHTESHISK" "TFAHpTESHISK" "TFAHTEpSHISK" "TFAHTESHIpSK"
# ...snip...
Run Code Online (Sandbox Code Playgroud)
这与问题中的相同,只是我们添加了前两个字符串。
xx <- c("", "ABC", "YSAHEEHHYDK", "HEHISSDYAGK", "TFAHTESHISK", "ISLGEHEGGGK",
"LSSGYDGTSYK", "FGTGTYAGGEK", "VGASTGYSGLK", "TASGVGGFSTK", "SYASDFGSSAK",
"LYSYYSSTESK")
Run Code Online (Sandbox Code Playgroud)
也许与 stringr 和 purrr 类似。
str_locate_all()
返回一个 2 列矩阵,其中包含模式位置的开始和结束位置,str_sub(string, start) <- "p"
方便地接受 a 的相同矩阵start
。从当前结束列中减去 1(即[1, 1]
变为[1, 0]
)保留所有现有字符并插入p
。
library(stringr)
library(purrr)
str_ <- c("YSAHEEHHYDK", "HEHISSDYAGK", "TFAHTESHISK", "ISLGEHEGGGK",
"LSSGYDGTSYK", "FGTGTYAGGEK", "VGASTGYSGLK", "TASGVGGFSTK",
"SYASDFGSSAK", "LYSYYSSTESK")
map2(set_names(str_),
str_locate_all(str_,"Y|S|T"),
function(x, y) {
y[,2] <- y[,2] - 1
str_sub(x, y) <- "p"
x
})
Run Code Online (Sandbox Code Playgroud)
结果为命名列表:
#> $YSAHEEHHYDK
#> [1] "pYSAHEEHHYDK" "YpSAHEEHHYDK" "YSAHEEHHpYDK"
#>
#> $HEHISSDYAGK
#> [1] "HEHIpSSDYAGK" "HEHISpSDYAGK" "HEHISSDpYAGK"
#>
#> $TFAHTESHISK
#> [1] "pTFAHTESHISK" "TFAHpTESHISK" "TFAHTEpSHISK" "TFAHTESHIpSK"
#>
#> $ISLGEHEGGGK
#> [1] "IpSLGEHEGGGK"
#>
#> $LSSGYDGTSYK
#> [1] "LpSSGYDGTSYK" "LSpSGYDGTSYK" "LSSGpYDGTSYK" "LSSGYDGpTSYK" "LSSGYDGTpSYK"
#> [6] "LSSGYDGTSpYK"
#>
#> $FGTGTYAGGEK
#> [1] "FGpTGTYAGGEK" "FGTGpTYAGGEK" "FGTGTpYAGGEK"
#>
#> $VGASTGYSGLK
#> [1] "VGApSTGYSGLK" "VGASpTGYSGLK" "VGASTGpYSGLK" "VGASTGYpSGLK"
#>
#> $TASGVGGFSTK
#> [1] "pTASGVGGFSTK" "TApSGVGGFSTK" "TASGVGGFpSTK" "TASGVGGFSpTK"
#>
#> $SYASDFGSSAK
#> [1] "pSYASDFGSSAK" "SpYASDFGSSAK" "SYApSDFGSSAK" "SYASDFGpSSAK" "SYASDFGSpSAK"
#>
#> $LYSYYSSTESK
#> [1] "LpYSYYSSTESK" "LYpSYYSSTESK" "LYSpYYSSTESK" "LYSYpYSSTESK" "LYSYYpSSTESK"
#> [6] "LYSYYSpSTESK" "LYSYYSSpTESK" "LYSYYSSTEpSK"
Run Code Online (Sandbox Code Playgroud)
创建于 2023-02-15,使用reprex v2.0.2