如何在R中的每个组中选择'x'最近的值?

Wil*_*T-E 2 r greatest-n-per-group

我正在尝试在 R 中的数据框中选择/过滤每个组中的最新值。例如,我想从以下数据中的每个 Name 组中选择 3 个最近的值(即最接近今天的日期)框架:

Player  Date    Result
 Sam    03/15/2015  1
 Sam    03/22/2015  0
 Sam    04/04/2015  2
 Sam    04/12/2015  1
 Sam    04/18/2015  1
 Sam    04/26/2015  0
 Sam    08/08/2015  3
Steve   02/17/2015  0
Steve   02/21/2015  0
Steve   03/04/2015  4
Steve   03/11/2015  2
Steve   03/15/2015  1
Steve   03/22/2015  0
Steve   04/12/2015  0
Steve   04/18/2015  2
Steve   04/26/2015  1
Steve   04/29/2015  2
Steve   08/16/2015  4
Jasper  03/15/2015  3
Jasper  03/22/2015  3.5
Jasper  04/04/2015  4
Jasper  04/12/2015  4
Jasper  04/18/2015  5
Jasper  04/26/2015  0
Run Code Online (Sandbox Code Playgroud)

我已经编写了as.date()代码,所以 R 现在可以理解日期格式,但是我现在可以使用什么代码来仅从每个组中选择 3 个(比如说)最近的值?

akr*_*run 7

我们可以使用dplyr. 我们使用 将“日期”转换为Dateas.Date。按“玩家”分组后,我们arrange将“日期”列降序并用于slice获取最近的 3 个值。如果我们不想更改 'Date' 类,我们可以删除该mutate步骤并在arrangeie 中进行转换arrange(desc(as.Date(Date, '%m/%d/%Y')))

library(dplyr)
df1 %>%
    mutate(Date=as.Date(Date, '%m/%d/%Y')) %>% 
    group_by(Player) %>% 
    arrange(desc(Date)) %>% 
    slice(1:3)
#    Player       Date Result
#1 Jasper 2015-04-26      0
#2 Jasper 2015-04-18      5
#3 Jasper 2015-04-12      4
#4    Sam 2015-08-08      3
#5    Sam 2015-04-26      0
#6    Sam 2015-04-18      1
#7  Steve 2015-08-16      4
#8  Steve 2015-04-29      2
#9  Steve 2015-04-26      1
Run Code Online (Sandbox Code Playgroud)

或者在我们按“播放器”分组后,我们可以top_n通过指定“n”和“wt”变量进行排序。

 df1 %>% 
   mutate(Date=as.Date(Date, '%m/%d/%Y')) %>%
   group_by(Player)  %>%
   top_n(n = 3, Date)
#  Player       Date Result
#1    Sam 2015-04-18      1
#2    Sam 2015-04-26      0
#3    Sam 2015-08-08      3
#4  Steve 2015-04-26      1
#5  Steve 2015-04-29      2
#6  Steve 2015-08-16      4
#7 Jasper 2015-04-12      4
#8 Jasper 2015-04-18      5
#9 Jasper 2015-04-26      0
Run Code Online (Sandbox Code Playgroud)

使用data.table,我们将“data.frame”转换为“data.table”(setDT(df1))。按'Player'分组,我们order转换为Dateclass后的'Date' ,使用head我们可以得到每组的前3行。

library(data.table)
setDT(df1)[order(-as.IDate(Date, '%m/%d/%Y')),head(.SD, 3) , by = Player]
#   Player       Date Result
#1:  Steve 08/16/2015      4
#2:  Steve 04/29/2015      2
#3:  Steve 04/26/2015      1
#4:    Sam 08/08/2015      3
#5:    Sam 04/26/2015      0
#6:    Sam 04/18/2015      1
#7: Jasper 04/26/2015      0
#8: Jasper 04/18/2015      5
#9: Jasper 04/12/2015      4
Run Code Online (Sandbox Code Playgroud)

数据

df1 <- structure(list(Player = c("Sam", "Sam", "Sam", "Sam", "Sam", 
"Sam", "Sam", "Steve", "Steve", "Steve", "Steve", "Steve", "Steve", 
"Steve", "Steve", "Steve", "Steve", "Steve", "Jasper", "Jasper", 
"Jasper", "Jasper", "Jasper", "Jasper"), Date = c("03/15/2015", 
"03/22/2015", "04/04/2015", "04/12/2015", "04/18/2015", "04/26/2015", 
"08/08/2015", "02/17/2015", "02/21/2015", "03/04/2015", "03/11/2015", 
"03/15/2015", "03/22/2015", "04/12/2015", "04/18/2015", "04/26/2015", 
"04/29/2015", "08/16/2015", "03/15/2015", "03/22/2015", "04/04/2015", 
"04/12/2015", "04/18/2015", "04/26/2015"), Result = c(1, 0, 2, 
1, 1, 0, 3, 0, 0, 4, 2, 1, 0, 0, 2, 1, 2, 4, 3, 3.5, 4, 4, 5, 
0)), .Names = c("Player", "Date", "Result"),
class = "data.frame", row.names = c(NA,  -24L))
Run Code Online (Sandbox Code Playgroud)