Pre*_*yas 1 nlp r text-mining string-matching lemmatization
我希望将英语单词语义化,使得所有单词都转换为相同的时态.例如:
c("ran","run","running")
Run Code Online (Sandbox Code Playgroud)
应该成为c("run","run","run").
我已经探索了R包,如tm,wordnet,RTextTools和Snowball C; 但所有这些都会产生输出c("ran","run","run").如您所见,它们不会将"运行"转换为"运行".
看看我维护的texttem包:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(textstem)
lemmatize_words(c("ran","run","running"))
###[1] "run" "run" "run"
Run Code Online (Sandbox Code Playgroud)
请注意,如果您实际上有字符串而不是字向量,则可能需要该lemmatize_strings函数.