根据匹配,时间,响应和分组信息,在R中为纵向数据添加新列.

Mik*_*nen 0 statistics r dplyr

示例数据由4列组成:

  • 响应
  • 时间
  • 比赛.

    在治疗组专栏中,有三种治疗方法:

  • treatment1
  • 控制
  • 对待2

    在响应列中,在给定时间内对每个ID有唯一的响应.时间显示在"时间"列中.在"匹配"列中有分组信息.不同的群体是:

  • GROUP_1
  • group_2

    所以每个治疗组(对照组,治疗组1 ......)也属于匹配组(group_1,group_2)

任务是根据"TreatmentGroup"列中的分组信息计算新列,并在"匹配"列中匹配信息.新列将包含treatmentX的响应变量 - 对该匹配组的控制.
例如:治疗2的反应 - 对照(1.2-1.8 = -0.6)和下一行1.4-2.0 = -0.6.因此,将治疗反应与给定时间(0或1)的对照反应进行比较.示例数据和结果表(手动计算):

   TreatmentGroup  ID Response Time   Match
1      treatment2 ID1      1.2    0 group_1
2      treatment2 ID1      1.4    1 group_1
3         control ID2      1.8    0 group_1
4         control ID2      2.0    1 group_1
5      treatment1 ID3      1.5    0 group_1
6      treatment1 ID3      1.8    1 group_1
7      treatment2 ID4      0.2    0 group_2
8      treatment2 ID4      0.3    1 group_2
9         control ID5      2.5    0 group_2
10        control ID5      2.8    1 group_2
11     treatment1 ID6      3.2    0 group_2
12     treatment1 ID6      3.5    1 group_2


   TreatmentGroup  ID Response Time   Match Paired_sub
1      treatment2 ID1      1.2    0 group_1       -0.6
2      treatment2 ID1      1.4    1 group_1       -0.6
3         control ID2      1.8    0 group_1        0.0
4         control ID2      2.0    1 group_1        0.0
5      treatment1 ID3      1.5    0 group_1       -0.3
6      treatment1 ID3      1.8    1 group_1        0.2
7      treatment2 ID4      0.2    0 group_2       -2.3
8      treatment2 ID4      0.3    1 group_2       -2.5
9         control ID5      2.5    0 group_2        0.0
10        control ID5      2.8    1 group_2        0.0
11     treatment1 ID6      3.2    0 group_2        0.7
12     treatment1 ID6      3.5    1 group_2        0.7
Run Code Online (Sandbox Code Playgroud)

对于这类问题,最好的方法(或答案)是什么?
生成示例表的代码:

df <- data.frame("TreatmentGroup"=c("treatment2", "treatment2", "control", "control",  "treatment1", "treatment1"),
                 "ID" = c("ID1","ID1", "ID2","ID2","ID3","ID3", "ID4","ID4", "ID5","ID5", "ID6","ID6"),
                 "Response"=c(1.2, 1.4, 1.8, 2.0, 1.5, 1.8, 0.2,0.3,2.5,2.8,3.2,3.5),
                 "Time" = c(0,1,0,1,0,1),
                 "Match" = c("group_1", "group_1","group_1", "group_1","group_1", "group_1","group_2", "group_2","group_2", "group_2","group_2", "group_2")
                 )


result <- data.frame("TreatmentGroup"=c("treatment2", "treatment2", "control", "control",  "treatment1", "treatment1"),
                 "ID" = c("ID1","ID1", "ID2","ID2","ID3","ID3", "ID4","ID4", "ID5","ID5", "ID6","ID6"),
                 "Response"=c(1.2, 1.4, 1.8, 2.0, 1.5, 1.8, 0.2,0.3,2.5,2.8,3.2,3.5),
                 "Time" = c(0,1,0,1,0,1),
                 "Match" = c("group_1", "group_1","group_1", "group_1","group_1", "group_1","group_2", "group_2","group_2", "group_2","group_2", "group_2"),
                 "Paired_sub" = c(-0.6,-0.6,0,0,-0.3, 0.2,-2.3,-2.5, 0,0, 0.7,0.7)
                 )     
Run Code Online (Sandbox Code Playgroud)

tal*_*lat 5

有很多选择,其中一个是使用dplyr:

require(dplyr)
df %>% 
  group_by(Match, Time) %>%
  mutate(Paired_sub = Response - Response[TreatmentGroup == "control"])

#Source: local data frame [12 x 6]
#Groups: Match, Time
#
#  TreatmentGroup  ID Response Time   Match Paired_sub
#1      treatment2 ID1      1.2    0 group_1       -0.6
#2      treatment2 ID1      1.4    1 group_1       -0.6
#3         control ID2      1.8    0 group_1        0.0
#4         control ID2      2.0    1 group_1        0.0
#5      treatment1 ID3      1.5    0 group_1       -0.3
#6      treatment1 ID3      1.8    1 group_1       -0.2
#7      treatment2 ID4      0.2    0 group_2       -2.3
#8      treatment2 ID4      0.3    1 group_2       -2.5
#9         control ID5      2.5    0 group_2        0.0
#10        control ID5      2.8    1 group_2        0.0
#11     treatment1 ID6      3.2    0 group_2        0.7
#12     treatment1 ID6      3.5    1 group_2        0.7
Run Code Online (Sandbox Code Playgroud)

等效的data.table方法是:

require(data.table)
setDT(df)[, Paired_sub := Response - Response[TreatmentGroup == "control"], by = list(Match, Time)]
Run Code Online (Sandbox Code Playgroud)

还有一个选择,使用基数R:

df <- do.call(rbind, lapply(split(df, interaction(df$Match, df$Time)), function(dd) {
  dd$Paired_sub <- with(dd, Response - Response[TreatmentGroup == "control"])
  dd}))
rownames(df) <- NULL
Run Code Online (Sandbox Code Playgroud)

这里只有顺序不同,paired_sub中的数字有望与其他答案中的数字相同.