现在我正在使用R中的字符向量,我使用strsplit逐字分离.我想知道是否有一个函数可以用来检查整个列表,看看列表中是否有特定的单词,并且(如果可能的话)说出它所在的列表中的哪些元素.
恩.
a = c("a","b","c")
b= c("b","d","e")
c = c("a","e","f")
Run Code Online (Sandbox Code Playgroud)
如果z=list(a,b,c)
,那么f("a",z)
将最佳地屈服[1] 1 3
,并且f("b",z)
将最佳地屈服[1] 1 2
任何帮助都会很精彩.
Hon*_*Ooi 23
正如alexwhan所说,grep
是使用的功能.但是,请注意将其与列表一起使用.它没有做你认为它正在做的事情.例如:
grep("c", z)
[1] 1 2 3 # ?
grep(",", z)
[1] 1 2 3 # ???
Run Code Online (Sandbox Code Playgroud)
幕后发生的事情是grep
强迫其第二个参数使用角色as.character
.当应用于列表时,as.character
返回的是通过解压缩获得的该列表的字符表示.(Modulo unlist.)
as.character(z)
[1] "c(\"a\", \"b\", \"c\")" "c(\"b\", \"d\", \"e\")" "c(\"a\", \"e\", \"f\")"
cat(as.character(z))
c("a", "b", "c") c("b", "d", "e") c("a", "e", "f")
Run Code Online (Sandbox Code Playgroud)
这就是grep
正在努力的方向.
如果要grep
在列表上运行,则使用更安全的方法lapply
.这将返回另一个列表,您可以对其进行操作以提取您感兴趣的内容.
res <- lapply(z, function(ch) grep("a", ch))
res
[[1]]
[1] 1
[[2]]
integer(0)
[[3]]
[1] 1
# which vectors contain a search term
sapply(res, function(x) length(x) > 0)
[1] TRUE FALSE TRUE
Run Code Online (Sandbox Code Playgroud)
比grep快得多:
sapply(x, function(y) x %in% y)
Run Code Online (Sandbox Code Playgroud)
如果你想索引当然只是使用which():
which(sapply(x, function(y) x %in% y))
Run Code Online (Sandbox Code Playgroud)
证据!
x = setNames(replicate(26, list(sample(LETTERS, 10, rep=T))), sapply(LETTERS, list))
head(x)
$A
[1] "A" "M" "B" "X" "B" "J" "P" "L" "M" "L"
$B
[1] "H" "G" "F" "R" "B" "E" "D" "I" "L" "R"
$C
[1] "P" "R" "C" "N" "K" "E" "R" "S" "N" "P"
$D
[1] "F" "B" "B" "Z" "E" "Y" "J" "R" "H" "P"
$E
[1] "O" "P" "E" "X" "S" "Q" "S" "A" "H" "B"
$F
[1] "Y" "P" "T" "T" "P" "N" "K" "P" "G" "P"
system.time(replicate(1000, grep("A", x)))
user system elapsed
0.11 0.00 0.11
system.time(replicate(1000, sapply(x, function(y) "A" %in% y)))
user system elapsed
0.05 0.00 0.05
Run Code Online (Sandbox Code Playgroud)
您正在寻找grep()
:
grep("a", z)
#[1] 1 3
grep("b", z)
#[1] 1 2
Run Code Online (Sandbox Code Playgroud)