我有一个数据帧df:
df <- structure(list(page = c(12, 6, 9, 65),
text = structure(c(4L,2L, 1L, 3L),
.Label = c("I just bought a brand new AudiA6", "Get 2 years engine replacement warranty on BMW X6",
"Volkswagen is the parent company of BMW", "ToyotaCorolla is offering new car exchange offers"),
class = "factor")), .Names = c("page","text"), row.names = c(NA, -4L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
另外,我有一个单词列表:
wordlist <- c("Audi", "BMW", "extended", "engine", "replacement", "Volkswagen", "company", "Toyota","exchange", "brand")
Run Code Online (Sandbox Code Playgroud)
我通过取消列出文本和使用grepl来查找wordlist中的单词是否存在于列文本中.
library(data.table)
setDT(df)[, match := paste(wordlist[unlist(lapply(wordlist, function(x) grepl(x, text, …Run Code Online (Sandbox Code Playgroud)