R:agrep与矢量模式

Ale*_*tov 3 fuzzy-search r agrep

我有一个模式向量,需要使用agrep它们.问题是,agrep似乎一次只能采用一种模式.

patt <- c("test","10 Barrel")
lut  <- c("1 Barrel","10 Barrel Brewing","Harpoon 100 Barrel Series","resr","rest","tesr")

for (i in 1:length(patt)) {
  print(agrep(patt[i],lut,max=1,v=T))
}
Run Code Online (Sandbox Code Playgroud)

结果:

[1] "rest" "tesr"
[1] "1 Barrel"                  "10 Barrel Brewing"         "Harpoon 100 Barrel Series"
Run Code Online (Sandbox Code Playgroud)

for 长模式很慢,因此尝试以矢量化形式:

VecMatch1 = function(string, stringVector){
  stringVector[agrep(string, stringVector, max = 1)]
}
a = VecMatch1(patt,lut)

Warning message:
In agrep(string, stringVector, max = 1) :
  argument 'pattern' has length > 1 and only the first element will be used
Run Code Online (Sandbox Code Playgroud)

可能是lapply等功能可以帮助吗?谢谢!!

Ser*_*asa 5

使用lapply:

lapply(patt, agrep, x=lut, max.distance=c(cost=1, all=1), value=TRUE)

[[1]]
[1] "rest" "tesr"

[[2]]
[1] "1 Barrel"                  "10 Barrel Brewing"         "Harpoon 100 Barrel Series"
Run Code Online (Sandbox Code Playgroud)

使用dplyr或data.table可以获得更快的性能.

  • 您将如何使用 dplyr 或 data.table 来处理它? (2认同)