如何在 R 中的两个数据帧之间查找和替换值

Question

如何在 R 中的两个数据帧之间查找和替换值

我有一个来自 tidytext 的数据框，其中包含一些调查自由回复评论中的单个单词。它只有不到 500,000 行。作为自由反应数据，它充满了错别字。使用 textclean::replace_misspellings 处理了近 13,000 个拼写错误的单词，但仍有大约 700 个我手动识别的独特拼写错误。

我现在有一个包含两列的第二个表，第一个是拼写错误，第二个是更正。

例如

allComments <- data.frame("Number" = 1:5, "Word" = c("organization","orginization", "oragnization", "help", "hlp"))
misspellings <- data.frame("Wrong" = c("orginization", "oragnization", "hlp"), "Right" = c("organization", "organization", "help"))

Run Code Online (Sandbox Code Playgroud)

我怎么能代替所有的值allComments$word相匹配的misspellings$wrong有misspellings$right？

我觉得这可能是非常基本的，而且我的 R 无知正在显示......

Answer 1

GKi*_*GKi 5

您可以使用match从allComments$Wordin 中查找单词的索引misspellings$Wrong，然后使用此索引对它们进行子集化。

tt <- match(allComments$Word, misspellings$Wrong)
allComments$Word[!is.na(tt)]  <- misspellings$Right[tt[!is.na(tt)]]
allComments
#  Number         Word
#1      1 organization
#2      2 organization
#3      3 organization
#4      4         help
#5      5         help

Run Code Online (Sandbox Code Playgroud)

如果正确的词还没有被转换allComments$Word为 a character：

allComments$Word <- as.character(allComments$Word)

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年前
查看次数：	108 次
最近记录：	6 年前