比较 dplyr 中组内列中的值

Jav*_*rdo 3 r dplyr

我想使用 dplyr 比较分组 data.frame 内的值,并创建一个虚拟变量或类似的变量,指示哪个更大。想不通!

这是一些可重现的代码:

table <- structure(list(species = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Adelophryne adiastola", 
"Adelophryne gutturosa"), class = "factor"), scenario = structure(c(3L, 
1L, 2L, 3L, 1L, 2L), .Label = c("future1", "future2", "present"
), class = "factor"), amount = c(5L, 3L, 2L, 50L, 60L, 40L)), .Names = c("species", 
"scenario", "amount"), class = "data.frame", row.names = c(NA, 
-6L))
> table
                species scenario amount
1 Adelophryne adiastola  present      5
2 Adelophryne adiastola  future1      3
3 Adelophryne adiastola  future2      2
4 Adelophryne gutturosa  present     50
5 Adelophryne gutturosa  future1     60
6 Adelophryne gutturosa  future2     40
Run Code Online (Sandbox Code Playgroud)

我会将 df 按 分组species。我想创建一个新列,可以是increase_amount,其中每个“未来”的金额与“现在”进行比较。当值增加时我可以得到 1,当值减少时我可以得到 0。

我一直在尝试使用一个 for 循环来抛出每个物种,但是 df 包含超过 50,000 个物种,并且对于我必须重新执行操作的时间来说花费的时间太长......

有人知道办法吗?多谢!

Sca*_*bee 5

你可以这样做:

table %>% 
  group_by(species) %>% 
  mutate(tmp = amount[scenario == "present"]) %>% 
  mutate(increase_amount = ifelse(amount > tmp, 1, 0))
# Source: local data frame [6 x 5]
# Groups: species [2]
# 
#                 species scenario amount   tmp increase_amount
#                  <fctr>   <fctr>   <int> <int>           <dbl>
# 1 Adelophryne adiastola  present      5     5               0
# 2 Adelophryne adiastola  future1      3     5               0
# 3 Adelophryne adiastola  future2      2     5               0
# 4 Adelophryne gutturosa  present     50    50               0
# 5 Adelophryne gutturosa  future1     60    50               1
# 6 Adelophryne gutturosa  future2     40    50               0
Run Code Online (Sandbox Code Playgroud)