dat*_*ole 5 mapping r hashmap dataframe
我需要根据键值对的映射来转换向量中的值:
vector <- c("dog","ant","eagle","ant","eagle","parrot")
"dog" "ant" "eagle" "ant" "eagle" "parrot"
mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),value=c("mammal","mammal","mammal","insect","bird","bird"))
key value
dog mammal
cat mammal
elephant mammal
ant insect
parrot bird
eagle bird
Run Code Online (Sandbox Code Playgroud)
所需的输出将是这样的:
output <- ("mammal", "insect", "bird", "insect", "bird", "bird")
Run Code Online (Sandbox Code Playgroud)
在真实数据集中,我必须平移~10000个平均长度为~15的输入向量,并且映射数据帧在一百万个密钥的范围内,在值的一侧具有大约100000个唯一类.
问题本身对我来说似乎很基础,但瓶颈是运行时.在其他编程语言中,您可能会使用HashMap进行映射,然后循环遍历向量.到目前为止,RI中的任何解决方案都比Java或Python中基于HashMap的简单慢几个数量级(参见下面的评论).
是否存在比数据帧更有效的数据结构来存储映射?
对于R中这个问题,运行效率最高的解决方案是什么?
有一个名为的包hashmap
非常适合此目的:
library(hashmap)
hash_lookup = hashmap(mapping$key, mapping$value)
output = hash_lookup[[vector]]
Run Code Online (Sandbox Code Playgroud)
结果:
> hash_lookup
## (character) => (character)
## [cat] => [mammal]
## [elephant] => [mammal]
## [ant] => [insect]
## [dog] => [mammal]
## [eagle] => [bird]
## [parrot] => [bird]
> output
[1] "mammal" "insect" "bird" "insect" "bird" "bird"
Run Code Online (Sandbox Code Playgroud)
数据:
vector <- c("dog","ant","eagle","ant","eagle","parrot")
mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),
value=c("mammal","mammal","mammal","insect","bird","bird"),
stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)
笔记:
必须在更大的数据集上进行测试,但该方法应该非常快,因为它是在内部使用 Rcpp 实现的。
那么在列表中呢?从...开始:
FamLst <- list(mammal = c("elephant", "dog"), bird = c("parrot", "eagle"))
Run Code Online (Sandbox Code Playgroud)
然后您可以按位添加到列表中。FamLst$mammal
例如,您可以使用 调出所有哺乳动物的列表。如果您想测试是否"dog"
属于哺乳动物,请使用"dog" %in% FamLst$mammal
。
归档时间: |
|
查看次数: |
1064 次 |
最近记录: |