框架:
df 1:包含多个具有500列值的相同id的行
id|val.1|val.2|...|val.500
---------------------------------
1 | 240 | 234 |...|228
1 | 224 | 222 |...|230
1 | 238 | 240 |...|240
2 | 277 | 270 |...|255
2 | 291 | 290 |...|265
2 | 284 | 282 |...|285
Run Code Online (Sandbox Code Playgroud)
df 2:只包含一个唯一的id(行),它将df-1 id列与500列值相匹配
id|val.1|val.2|...|val.500
---------------------------------
1 | 250 | 240 |...|245
2 | 280 | 282 |...|281
Run Code Online (Sandbox Code Playgroud)
我想根据它们的id将df 1列值除以df 2中的相应列值,最后得到df 3:
id|val.1|val.2|...|val.500
---------------------------------
1 | 0.96| 0.98|...|0.93
1 | 0.90| 0.93|...|0.94
1 | 0.95| 1.00|...|0.98
2 | 0.99| 0.96|...|0.91
2 | 1.04| 1.03|...|0.94
2 | 1.01| 1.00|...|1.01
Run Code Online (Sandbox Code Playgroud)
基本上根据其id和列值将df 1值加权df 2.我一直在摸不着头脑,谈论最好的解决方法,并没有取得多大进展.任何指导将不胜感激.谢谢
两种可能的方法:
1:“宽”方法
使用dplyr和purrr包:
library(dplyr)
library(purrr)
df12 <- left_join(df1, df2, by = 'id')
cbind(id=df12[,1], map2_df(df12[,2:4], df12[,5:7], `/`))
Run Code Online (Sandbox Code Playgroud)
使用data.table包(从这里借用的方法):
library(data.table)
# convert to 'data.tables'
setDT(df1)
setDT(df2)
# creates two vectors of matching columnnames
xcols = names(df1)[-1]
icols = paste0("i.", xcols)
# join and do the calculation
df1[df2, on = 'id', Map('/', mget(xcols), mget(icols)), by = .EACHI]
Run Code Online (Sandbox Code Playgroud)
两者都给出:
id val.1 val.2 val.3
1: 1 0.9600000 0.9750000 0.9306122
2: 1 0.8960000 0.9250000 0.9387755
3: 1 0.9520000 1.0000000 0.9795918
4: 2 0.9892857 0.9574468 0.9074733
5: 2 1.0392857 1.0283688 0.9430605
6: 2 1.0142857 1.0000000 1.0142349
Run Code Online (Sandbox Code Playgroud)
2:“长”方法
另一种选择是将数据帧重塑为长格式,然后merge/join它们并进行计算。
使用data.table- 包:
library(data.table)
dt1 <- melt(setDT(df1), id = 1)
dt2 <- melt(setDT(df2), id = 1)
dt1[dt2, on = c('id','variable'), value := value/i.value][]
Run Code Online (Sandbox Code Playgroud)
使用dplyr和tidyr包:
library(dplyr)
library(tidyr)
df1 %>%
gather(variable, value, -id) %>%
left_join(., df2 %>% gather(variable, value, -id), by = c('id','variable')) %>%
mutate(value = value.x/value.y) %>%
select(id, variable, value)
Run Code Online (Sandbox Code Playgroud)
两者都给出:
id variable value
1: 1 val.1 0.9600000
2: 1 val.1 0.8960000
3: 1 val.1 0.9520000
4: 2 val.1 0.9892857
5: 2 val.1 1.0392857
6: 2 val.1 1.0142857
7: 1 val.2 0.9750000
8: 1 val.2 0.9250000
9: 1 val.2 1.0000000
10: 2 val.2 0.9574468
11: 2 val.2 1.0283688
12: 2 val.2 1.0000000
13: 1 val.3 0.9306122
14: 1 val.3 0.9387755
15: 1 val.3 0.9795918
16: 2 val.3 0.9074733
17: 2 val.3 0.9430605
18: 2 val.3 1.0142349
Run Code Online (Sandbox Code Playgroud)
使用数据:
df1 <- structure(list(id = c(1, 1, 1, 2, 2, 2), val.1 = c(240, 224, 238, 277, 291, 284),
val.2 = c(234, 222, 240, 270, 290, 282), val.3 = c(228, 230, 240, 255, 265, 285)),
.Names = c("id", "val.1", "val.2", "val.3"), class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(id = c(1, 2), val.1 = c(250, 280), val.2 = c(240, 282), val.3 = c(245, 281)),
.Names = c("id", "val.1", "val.2", "val.3"), class = "data.frame", row.names = c(NA, -2L))
Run Code Online (Sandbox Code Playgroud)