R - 复制组内的值

All*_*nLC 3 copy r replicate

我有一个数据框,其中我有过去3年(2016年,2017年,2018年)得分的总分,但也有每年得分数的列.

我的数据框看起来像这样:

myDF <- data.frame(ID =c(1,1,1,2,2,3,4),
 Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
 Total_Points = c(5, 5, 5, 4, 4, 2, 3),
 Points2016 = c(3, NA, NA, 2, NA, NA, 3),
 Points2017 = c(NA,1,NA,NA,2,NA,NA),
 Points2018= c(NA,NA,1, NA, NA, 2, NA))
Run Code Online (Sandbox Code Playgroud)

问题是我想为每个组复制"Points2016","Points2017"和"Points2017"列的值,以使它们的条目看起来相同.

我不确定解释是否清楚所以这将是我的预期输出:

myDF_final <- data.frame(ID =c(1,1,1,2,2,3,4),
               Dates= c("2016", "2017", "2018", "2016", "2017", "2018", "2016"),
               Total_Points = c(5, 5, 5, 4, 4, 2, 3),
               Points2016 = c(3, 3, 3, 2, 2, NA, 3),
               Points2017 = c(1,1,1,2,2,NA,NA),
               Points2018= c(1,1,1, NA, NA, 2, NA))
Run Code Online (Sandbox Code Playgroud)

基本上,我希望每个ID的列"Points201X"具有相同的值.

tyl*_*uRp 9

我想你可以ID在两个方向上由小组填写.随着dplyrtidyr我们可以这样做:

library(dplyr)
library(tidyr)

myDF %>% 
  group_by(ID) %>% 
  fill(Points2016, Points2017, Points2018) %>% 
  fill(Points2016, Points2017, Points2018, .direction = "up")
Run Code Online (Sandbox Code Playgroud)

返回:

  ID Dates Total_Points Points2016 Points2017 Points2018
1  1  2016            5          3          1          1
2  1  2017            5          3          1          1
3  1  2018            5          3          1          1
4  2  2016            4          2          2         NA
5  2  2017            4          2          2         NA
6  3  2018            2         NA         NA          2
7  4  2016            3          3         NA         NA
Run Code Online (Sandbox Code Playgroud)

此外,如果你有一堆年份说1970年至2018年,你可以这样做:

myDF %>% 
  gather(points_year, points, -c(ID, Dates, Total_Points)) %>% 
  group_by(ID, points_year) %>% 
  fill(points) %>% 
  fill(points, .direction = "up") %>% 
  spread(points_year, points)
Run Code Online (Sandbox Code Playgroud)

以免每年打字.但是,这涉及收集和传播可能不必要的数据,假设我们需要fill遵循一致的命名约定.在这种情况下,有一致的命名约定,我们可以使用tidyselect后端dplyr来填充所有以"Points"开头的变量:

myDF %>% 
  group_by(ID) %>% 
  fill(starts_with("Points"), .direction = "down") %>% 
  fill(starts_with("Points"), .direction = "up")
Run Code Online (Sandbox Code Playgroud)

或者,这似乎与工作data.tablezoo:

library(data.table)
library(zoo)

dt <- as.data.table(myDF)

dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x)), by = ID, .SDcols = 4:6]
dt <- dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf0(x, fromLast = TRUE)), by = ID, .SDcols = 4:6]
Run Code Online (Sandbox Code Playgroud)

这一个班轮似乎也一气呵成:

dt[, names(dt)[4:6] := lapply(.SD, function(x) na.locf(x)), by = ID, .SDcols = 4:6]
Run Code Online (Sandbox Code Playgroud)
   ID Dates Total_Points Points2016 Points2017 Points2018
1:  1  2016            5          3          1          1
2:  1  2017            5          3          1          1
3:  1  2018            5          3          1          1
4:  2  2016            4          2          2         NA
5:  2  2017            4          2          2         NA
6:  3  2018            2         NA         NA          2
7:  4  2016            3          3         NA         NA
Run Code Online (Sandbox Code Playgroud)