R和dplyr中的组滞后/领先

ant*_*ant 6 r dplyr

我在尝试延迟按团队分组的日期时遇到了麻烦.

数据:

 df <- data.frame(Team = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D"),
             Date = c("2016-05-10","2016-05-10", "2016-05-10", "2016-05-10",
                      "2016-05-12", "2016-05-12", "2016-05-12",
                      "2016-05-15","2016-05-15",
                      "2016-05-30", "2016-05-30"), 
             Points = c(1,4,3,2,1,5,6,1,2,3,9)
             )

Team      Date       Points
 A     2016-05-10      1
 A     2016-05-10      4
 A     2016-05-10      3
 A     2016-05-10      2
 B     2016-05-12      1
 B     2016-05-12      5
 B     2016-05-12      6
 C     2016-05-15      1
 C     2016-05-15      2
 D     2016-05-30      3
 D     2016-05-30      9
Run Code Online (Sandbox Code Playgroud)

预期结果:

Team      Date       Points   Date_Lagged
 A     2016-05-10      1          NA
 A     2016-05-10      4          NA
 A     2016-05-10      3          NA
 A     2016-05-10      2          NA
 B     2016-05-12      1      2016-05-10 
 B     2016-05-12      5      2016-05-10 
 B     2016-05-12      6      2016-05-10 
 C     2016-05-15      1      2016-05-12
 C     2016-05-15      2      2016-05-12
 D     2016-05-30      3      2016-05-15
 D     2016-05-30      9      2016-05-15
Run Code Online (Sandbox Code Playgroud)

在我意识到以下不是正确的解决方案后,我正在摸不着头脑:

df %>% group_by(Date) %>% mutate(Date_lagged = lag(Date))  
Run Code Online (Sandbox Code Playgroud)

知道怎么解决吗?

akr*_*run 7

lag默认情况下,用偏移n=1.但是,我们有'Team'和'Date'的重复元素.为了获得预期的输出,我们需要获取distinct'Team','Date' 的行,使用'Date'创建'Date_lagged lag'并使用原始数据集创建right_join(或left_join).

distinct(df, Team, Date) %>%
        mutate(Date_Lagged = lag(Date)) %>%
        right_join(., df) %>%
        select(Team, Date, Points, Date_Lagged)
#   Team       Date Points Date_Lagged
#1     A 2016-05-10      1        <NA>
#2     A 2016-05-10      4        <NA>
#3     A 2016-05-10      3        <NA>
#4     A 2016-05-10      2        <NA>
#5     B 2016-05-12      1  2016-05-10
#6     B 2016-05-12      5  2016-05-10
#7     B 2016-05-12      6  2016-05-10
#8     C 2016-05-15      1  2016-05-12
#9     C 2016-05-15      2  2016-05-12
#10    D 2016-05-30      3  2016-05-15
#11    D 2016-05-30      9  2016-05-15
Run Code Online (Sandbox Code Playgroud)

或者我们也可以

df %>% 
    mutate(Date_Lagged = rep(lag(unique(Date)), table(Date)))
Run Code Online (Sandbox Code Playgroud)