使用R中的键值映射(相当于HashMap)转换值向量

dat*_*ole 5 mapping r hashmap dataframe

我需要根据键值对的映射来转换向量中的值:

vector <- c("dog","ant","eagle","ant","eagle","parrot") 

  "dog"  "ant"  "eagle"  "ant"  "eagle"  "parrot"


mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),value=c("mammal","mammal","mammal","insect","bird","bird"))

  key      value
  dog      mammal
  cat      mammal
  elephant mammal
  ant      insect
  parrot   bird
  eagle    bird
Run Code Online (Sandbox Code Playgroud)

所需的输出将是这样的:

output <- ("mammal", "insect", "bird", "insect", "bird", "bird") 
Run Code Online (Sandbox Code Playgroud)

在真实数据集中,我必须平移~10000个平均长度为~15的输入向量,并且映射数据帧在一百万个密钥的范围内,在值的一侧具有大约100000个唯一类.

问题本身对我来说似乎很基础,但瓶颈是运行时.在其他编程语言中,您可能会使用HashMap进行映射,然后循环遍历向量.到目前为止,RI中的任何解决方案都比Java或Python中基于HashMap的简单慢几个数量级(参见下面的评论).

是否存在比数据帧更有效的数据结构来存储映射?

对于R中这个问题,运行效率最高的解决方案是什么?

avi*_*seR 5

有一个名为的包hashmap非常适合此目的:

library(hashmap)

hash_lookup = hashmap(mapping$key, mapping$value)

output = hash_lookup[[vector]]
Run Code Online (Sandbox Code Playgroud)

结果:

> hash_lookup
## (character) => (character)
##       [cat] => [mammal]   
##  [elephant] => [mammal]   
##       [ant] => [insect]   
##       [dog] => [mammal]   
##     [eagle] => [bird]     
##    [parrot] => [bird]     

> output
[1] "mammal" "insect" "bird"   "insect" "bird"   "bird"
Run Code Online (Sandbox Code Playgroud)

数据:

vector <- c("dog","ant","eagle","ant","eagle","parrot")

mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),
                      value=c("mammal","mammal","mammal","insect","bird","bird"),
                      stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)

笔记:

必须在更大的数据集上进行测试,但该方法应该非常快,因为它是在内部使用 Rcpp 实现的。


CJB*_*CJB 0

那么在列表中呢?从...开始:

FamLst <- list(mammal = c("elephant", "dog"), bird = c("parrot", "eagle"))
Run Code Online (Sandbox Code Playgroud)

然后您可以按位添加到列表中。FamLst$mammal例如,您可以使用 调出所有哺乳动物的列表。如果您想测试是否"dog"属于哺乳动物,请使用"dog" %in% FamLst$mammal