我正在使用 dplyr 操作数据,对数据进行分组后,我想将所有值减去组中的第一个或第二个值(即减去基线)。是否可以在单个管道步骤中执行此操作?
微量元素:
test <- tibble(one=c("c","d","e","c","d","e"), two=c("a","a","a","b","b","b"), three=1:6)
test %>% group_by(`two`) %>% mutate(new=three-three[.$`one`=="d"])
Run Code Online (Sandbox Code Playgroud)
我想要的输出是:
# A tibble: 6 x 4
# Groups: two [2]
one two three new
<chr> <chr> <int> <int>
1 c a 1 -1
2 d a 2 0
3 e a 3 1
4 c b 4 -1
5 d b 5 0
6 e b 6 1
Run Code Online (Sandbox Code Playgroud)
但是我得到这个作为输出:
# A tibble: 6 x 4
# Groups: two [2]
one two three new
<chr> <chr> <int> <int>
1 c a 1 -1
2 d a 2 NA
3 e a 3 1
4 c b 4 -1
5 d b 5 NA
6 e b 6 1
Run Code Online (Sandbox Code Playgroud)
我们可以使用first来自dplyr
test %>%
group_by(two) %>%
mutate(new=three- first(three))
# A tibble: 6 x 4
# Groups: two [2]
# one two three new
# <chr> <chr> <int> <int>
#1 c a 1 0
#2 d a 2 1
#3 e a 3 2
#4 c b 4 0
#5 d b 5 1
#6 e b 6 2
Run Code Online (Sandbox Code Playgroud)
如果我们根据“one”中的字符串“c”对“三个”值进行子集化,那么我们不需要,.$因为它将获得整个列“c”,而不是按列分组中的值
test %>%
group_by(`two`) %>%
mutate(new=three-three[one=="c"])
Run Code Online (Sandbox Code Playgroud)