如何执行rowwise使用其他行的值的操作(以dplyr/tidy样式)?假设我有这个df:
df <- data_frame(value = c(5,6,7,3,4),
group = c(1,2,2,3,3),
group.to.use = c(2,3,3,1,1))
Run Code Online (Sandbox Code Playgroud)
我想创建一个新变量new.value,它等于每行的当前值加上"group"等于该行的"group.to.use"的行的最大值.所以对于第一行
new.value = 5 + (max(value[group === 2])) = 5 + 7 = 12
期望的输出:
# A tibble: 5 x 4
value group group.to.use new.value
<dbl> <dbl> <dbl> <dbl>
1 5. 1. 2. 12.
2 6. 2. 3. 10.
3 7. 2. 3. 11.
4 3. 3. 1. 8.
5 4. 3. 1. 9.
Run Code Online (Sandbox Code Playgroud)
伪代码:
df %<>%
mutate(new.value = value + max(value[group.to.use == <group.for.this.row>]))
Run Code Online (Sandbox Code Playgroud)
在横行操作,你可以参考整个data.frame用.,并与正常语法data.frame一整列.$colname或.[['col.name']]:
df %>%
rowwise() %>%
mutate(new.value = value + max(.$value[.$group == group.to.use])) %>%
ungroup()
# # A tibble: 5 x 4
# value group group.to.use new.value
# <dbl> <dbl> <dbl> <dbl>
# 1 5 1 2 12
# 2 6 2 3 10
# 3 7 2 3 11
# 4 3 3 1 8
# 5 4 3 1 9
Run Code Online (Sandbox Code Playgroud)
或者,您可以预先计算每个组的最大值,然后执行左连接:
df.max <- df %>% group_by(group) %>% summarise(max.value = max(value))
df %>%
left_join(df.max, by = c('group.to.use' = 'group')) %>%
mutate(new.value = value + max.value) %>%
select(-max.value)
# # A tibble: 5 x 4
# value group group.to.use new.value
# <dbl> <dbl> <dbl> <dbl>
# 1 5 1 2 12
# 2 6 2 3 10
# 3 7 2 3 11
# 4 3 3 1 8
# 5 4 3 1 9
Run Code Online (Sandbox Code Playgroud)