jbu*_*unk 36 lookup r dataframe
我在替换数据帧中的值时遇到了一些麻烦.我想基于单独的表替换值.以下是我想要做的一个例子.
我有一张桌子,每排都是顾客,每列都是他们购买的动物.让我们调用这个数据帧table
.
> table
# P1 P2 P3
# 1 cat lizard parrot
# 2 lizard parrot cat
# 3 parrot cat lizard
Run Code Online (Sandbox Code Playgroud)
我还有一个我将引用的表格lookUp
.
> lookUp
# pet class
# 1 cat mammal
# 2 lizard reptile
# 3 parrot bird
Run Code Online (Sandbox Code Playgroud)
我想要做的就是创建一个名为新表new
与功能替换所有值table
与class
列lookUp
.我自己尝试使用一个lapply
函数,但是我收到了以下警告.
new <- as.data.frame(lapply(table, function(x) {
gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE)
Warning messages:
1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
argument 'replacement' has length > 1 and only the first element will be used
Run Code Online (Sandbox Code Playgroud)
关于如何使这项工作的任何想法?
tal*_*lat 34
你在问题中发布了一个方法并不错.这是一种熟悉的方法:
new <- df # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])
Run Code Online (Sandbox Code Playgroud)
另一种更快的方法是:
new <- df
new[] <- look$class[match(unlist(df), look$pet)]
Run Code Online (Sandbox Code Playgroud)
请注意,我[]
在两种情况下都使用空括号()来保持结构new
(data.frame).
(我使用的df
,而不是table
和look
,而不是lookup
在我的答案)
Thi*_*rry 20
另一种选择是组合tidyr
和dplyr
library(dplyr)
library(tidyr)
table %>%
gather(key = "pet") %>%
left_join(lookup, by = "pet") %>%
spread(key = pet, value = class)
Run Code Online (Sandbox Code Playgroud)
Mic*_*ico 14
任何时候你有两个单独的data.frame
s,并试图将信息从一个带到另一个,答案是合并.
R所有人都有自己喜欢的合并方法data.table
.
此外,因为要做到这一点,以多列,它会以更快的速度melt
和dcast
-而不是遍历列,一旦它应用于重整的表,然后再重塑.
library(data.table)
#the row names will be our ID variable for melting
setDT(table, keep.rownames = TRUE)
setDT(lookUp)
#now melt, merge, recast
# melting (reshape wide to long)
table[ , melt(.SD, id.vars = 'rn')
# merging
][lookup, new_value := i.class, on = c(value = 'pet')
#reform back to original shape
][ , dcast(.SD, rn ~ variable, value.var = 'new_value')]
# rn P1 P2 P3
# 1: 1 mammal reptile bird
# 2: 2 reptile bird mammal
# 3: 3 bird mammal reptile
Run Code Online (Sandbox Code Playgroud)
如果您发现dcast
/ melt
bit有点令人生畏,这里的方法只是循环遍历列; dcast
/ melt
只是回避了这个问题的循环.
setDT(table) #don't need row names this time
setDT(lookUp)
sapply(names(table), #(or to whichever are the relevant columns)
function(cc) table[lookUp, (cc) := #merge, replace
#need to pass a _named_ vector to 'on', so use setNames
i.class, on = setNames("pet", cc)])
Run Code Online (Sandbox Code Playgroud)
创建一个命名向量,并循环遍历每一列并匹配,请参阅:
# make lookup vector with names
lookUp1 <- setNames(as.character(lookUp$class), lookUp$pet)
lookUp1
# cat lizard parrot
# "mammal" "reptile" "bird"
# match on names get values from lookup vector
res <- data.frame(lapply(df1, function(i) lookUp1[i]))
# reset rownames
rownames(res) <- NULL
# res
# P1 P2 P3
# 1 mammal reptile bird
# 2 reptile bird mammal
# 3 bird mammal reptile
Run Code Online (Sandbox Code Playgroud)
df1 <- read.table(text = "
P1 P2 P3
1 cat lizard parrot
2 lizard parrot cat
3 parrot cat lizard", header = TRUE)
lookUp <- read.table(text = "
pet class
1 cat mammal
2 lizard reptile
3 parrot bird", header = TRUE)
Run Code Online (Sandbox Code Playgroud)