tes*_*123 16 string r matching dataframe
我有一个数据帧,
d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger",
"black panther", "short cat", "red bird",
"short bird stuffed", "big eagle", "bad sparrow",
"dog fish", "head dog", "brown yorkie",
"lab short bulldog"), label=1:14)
Run Code Online (Sandbox Code Playgroud)
我想搜索name专栏,如果出现"cat","lion","tiger"和"panther"这两个字,我想将字符串分配给feline新列和相应的行species.
如果"bird", "eagle", and "sparrow"出现单词,我想将字符串分配给avian新列和相应的行species.
如果出现"dog","yorkie"和"bulldog"字样,我想将字符串分配给canine新列和相应的行species.
理想情况下,我将它存储在一个列表或类似的东西中,我可以保留在脚本的开头,因为当物种的新变种出现在名称类别中时,可以轻松访问更新符合条件的内容为feline,avian和canine.
这里几乎已经回答了这个问题(如何根据R中其他列的部分字符串匹配在数据框中创建新列),但它没有解决此问题中存在的多个名称扭曲.
pin*_*ing 26
可能有一个比这更优雅的解决方案,但您可以使用grepwith |来指定替代匹配.
d[grep("cat|lion|tiger|panther", d$name), "species"] <- "feline"
d[grep("bird|eagle|sparrow", d$name), "species"] <- "avian"
d[grep("dog|yorkie", d$name), "species"] <- "canine"
Run Code Online (Sandbox Code Playgroud)
我假设你的意思是"鸟",并且因为它包含"狗"而遗漏了"斗牛犬".
您可能想要添加ignore.case = TRUE到grep.
输出:
# name label species
#1 brown cat 1 feline
#2 blue cat 2 feline
#3 big lion 3 feline
#4 tall tiger 4 feline
#5 black panther 5 feline
#6 short cat 6 feline
#7 red bird 7 avian
#8 short bird stuffed 8 avian
#9 big eagle 9 avian
#10 bad sparrow 10 avian
#11 dog fish 11 canine
#12 head dog 12 canine
#13 brown yorkie 13 canine
#14 lab short bulldog 14 canine
Run Code Online (Sandbox Code Playgroud)