情况
我有一个数据框df:
df <- structure(list(person = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L), .Label = c("pA", "pB", "pC"), class = "factor"), date = structure(c(16071,
16102, 16130, 16161, 16071, 16102, 16130, 16071, 16102), class = "Date")), .Names = c("person",
"date"), row.names = c(NA, -9L), class = "data.frame")
> df
person date
1 pA 2014-01-01
2 pA 2014-02-01
3 pA 2014-03-01
4 pA 2014-04-01
5 pB 2014-01-01
6 pB 2014-02-01
7 pB 2014-03-01
8 pC 2014-01-01
9 pC 2014-02-01
Run Code Online (Sandbox Code Playgroud)
题
如何为每个人选择按日期排序的最后2个(或'n')条目,以便我得到一个结果数据框df1:
> df1
person date
1 pA 2014-03-01
2 pA 2014-04-01
3 pB 2014-02-01
4 pB 2014-03-01
5 pC 2014-01-01
6 pC 2014-02-01
Run Code Online (Sandbox Code Playgroud)
?
我尝试过组合
library(dplyr)
df1 <- df %>%
group_by(person) %>%
select(tail(df, 2))
Run Code Online (Sandbox Code Playgroud)
没有快乐.
你可以试试 slice
library(dplyr)
df %>%
group_by(person) %>%
arrange(date, person) %>%
slice((n()-1):n())
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
Run Code Online (Sandbox Code Playgroud)
或者代替最后一步
do(tail(., 2))
Run Code Online (Sandbox Code Playgroud)
使用data.table:
setDT(df)[order(person), tail(.SD, 2L), by=person]
# person date
# 1: pA 2014-03-01
# 2: pA 2014-04-01
# 3: pB 2014-02-01
# 4: pB 2014-03-01
# 5: pC 2014-01-01
# 6: pC 2014-02-01
Run Code Online (Sandbox Code Playgroud)
我们订购的person,然后GROUP BY person和数据子集选择的最后两行.SD的每个组.