R中的部分动物串匹配

tes*_*123 16 string r matching dataframe

我有一个数据帧,

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger",
                     "black panther", "short cat", "red bird",
                     "short bird stuffed", "big eagle", "bad sparrow",
                     "dog fish", "head dog", "brown yorkie",
                     "lab short bulldog"), label=1:14)
Run Code Online (Sandbox Code Playgroud)

我想搜索name专栏,如果出现"cat","lion","tiger"和"panther"这两个字,我想将字符串分配给feline新列和相应的行species.

如果"bird", "eagle", and "sparrow"出现单词,我想将字符串分配给avian新列和相应的行species.

如果出现"dog","yorkie"和"bulldog"字样,我想将字符串分配给canine新列和相应的行species.

理想情况下,我将它存储在一个列表或类似的东西中,我可以保留在脚本的开头,因为当物种的新变种出现在名称类别中时,可以轻松访问更新符合条件的内容为feline,aviancanine.

这里几乎已经回答了这个问题(如何根据R中其他列的部分字符串匹配在数据框中创建新列),但它没有解决此问题中存在的多个名称扭曲.

pin*_*ing 26

可能有一个比这更优雅的解决方案,但您可以使用grepwith |来指定替代匹配.

d[grep("cat|lion|tiger|panther", d$name), "species"] <- "feline"
d[grep("bird|eagle|sparrow", d$name), "species"] <- "avian"
d[grep("dog|yorkie", d$name), "species"] <- "canine"
Run Code Online (Sandbox Code Playgroud)

我假设你的意思是"鸟",并且因为它包含"狗"而遗漏了"斗牛犬".

您可能想要添加ignore.case = TRUE到grep.

输出:

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine
Run Code Online (Sandbox Code Playgroud)