Don*_*yck 2 parsing for-loop r lapply
我想编写一个将字符串分成三元组列表的函数,例如
"JOHNSTEWART" --> chr [1:9] "JOH" "OHN" "HNS" "NST" "STE" "TEW" "EWA" "WAR" "ART"
Run Code Online (Sandbox Code Playgroud)
我可以用for循环写这个,
ngram_function <- function(x){
if(!is.na(x)&(nchar(x)>2)){
ngram <- rep("n", n= nchar(x)-3+1)
for (i in c(1:nchar(x)-2)){
ngram[i] <-(substr(x, start =i,stop= i-1+3))
}
return(ngram)
}
else{
return(x)
}
}
Run Code Online (Sandbox Code Playgroud)
但是需要很长时间来扩展大量值,是否有其他R优化版本可以做到这一点?
这是一个使用的版本sapply:
myfun <- function(x, n){
sapply(1:(nchar(x)-n+1), function(z) substr(x, z, z+n-1))
}
myfun("JOHNSTEWART", 3)
[1] "JOH" "OHN" "HNS" "NST" "STE" "TEW" "EWA" "WAR" "ART"
myfun("JOHNSTEWART", 4)
[1] "JOHN" "OHNS" "HNST" "NSTE" "STEW" "TEWA" "EWAR" "WART"
Run Code Online (Sandbox Code Playgroud)