Mik*_*nen 0 statistics r dplyr
示例数据由4列组成:
比赛.
在治疗组专栏中,有三种治疗方法:
对待2
在响应列中,在给定时间内对每个ID有唯一的响应.时间显示在"时间"列中.在"匹配"列中有分组信息.不同的群体是:
group_2
所以每个治疗组(对照组,治疗组1 ......)也属于匹配组(group_1,group_2)
任务是根据"TreatmentGroup"列中的分组信息计算新列,并在"匹配"列中匹配信息.新列将包含treatmentX的响应变量 - 对该匹配组的控制.
例如:治疗2的反应 - 对照(1.2-1.8 = -0.6)和下一行1.4-2.0 = -0.6.因此,将治疗反应与给定时间(0或1)的对照反应进行比较.示例数据和结果表(手动计算):
TreatmentGroup ID Response Time Match
1 treatment2 ID1 1.2 0 group_1
2 treatment2 ID1 1.4 1 group_1
3 control ID2 1.8 0 group_1
4 control ID2 2.0 1 group_1
5 treatment1 ID3 1.5 0 group_1
6 treatment1 ID3 1.8 1 group_1
7 treatment2 ID4 0.2 0 group_2
8 treatment2 ID4 0.3 1 group_2
9 control ID5 2.5 0 group_2
10 control ID5 2.8 1 group_2
11 treatment1 ID6 3.2 0 group_2
12 treatment1 ID6 3.5 1 group_2
TreatmentGroup ID Response Time Match Paired_sub
1 treatment2 ID1 1.2 0 group_1 -0.6
2 treatment2 ID1 1.4 1 group_1 -0.6
3 control ID2 1.8 0 group_1 0.0
4 control ID2 2.0 1 group_1 0.0
5 treatment1 ID3 1.5 0 group_1 -0.3
6 treatment1 ID3 1.8 1 group_1 0.2
7 treatment2 ID4 0.2 0 group_2 -2.3
8 treatment2 ID4 0.3 1 group_2 -2.5
9 control ID5 2.5 0 group_2 0.0
10 control ID5 2.8 1 group_2 0.0
11 treatment1 ID6 3.2 0 group_2 0.7
12 treatment1 ID6 3.5 1 group_2 0.7
Run Code Online (Sandbox Code Playgroud)
对于这类问题,最好的方法(或答案)是什么?
生成示例表的代码:
df <- data.frame("TreatmentGroup"=c("treatment2", "treatment2", "control", "control", "treatment1", "treatment1"),
"ID" = c("ID1","ID1", "ID2","ID2","ID3","ID3", "ID4","ID4", "ID5","ID5", "ID6","ID6"),
"Response"=c(1.2, 1.4, 1.8, 2.0, 1.5, 1.8, 0.2,0.3,2.5,2.8,3.2,3.5),
"Time" = c(0,1,0,1,0,1),
"Match" = c("group_1", "group_1","group_1", "group_1","group_1", "group_1","group_2", "group_2","group_2", "group_2","group_2", "group_2")
)
result <- data.frame("TreatmentGroup"=c("treatment2", "treatment2", "control", "control", "treatment1", "treatment1"),
"ID" = c("ID1","ID1", "ID2","ID2","ID3","ID3", "ID4","ID4", "ID5","ID5", "ID6","ID6"),
"Response"=c(1.2, 1.4, 1.8, 2.0, 1.5, 1.8, 0.2,0.3,2.5,2.8,3.2,3.5),
"Time" = c(0,1,0,1,0,1),
"Match" = c("group_1", "group_1","group_1", "group_1","group_1", "group_1","group_2", "group_2","group_2", "group_2","group_2", "group_2"),
"Paired_sub" = c(-0.6,-0.6,0,0,-0.3, 0.2,-2.3,-2.5, 0,0, 0.7,0.7)
)
Run Code Online (Sandbox Code Playgroud)
有很多选择,其中一个是使用dplyr:
require(dplyr)
df %>%
group_by(Match, Time) %>%
mutate(Paired_sub = Response - Response[TreatmentGroup == "control"])
#Source: local data frame [12 x 6]
#Groups: Match, Time
#
# TreatmentGroup ID Response Time Match Paired_sub
#1 treatment2 ID1 1.2 0 group_1 -0.6
#2 treatment2 ID1 1.4 1 group_1 -0.6
#3 control ID2 1.8 0 group_1 0.0
#4 control ID2 2.0 1 group_1 0.0
#5 treatment1 ID3 1.5 0 group_1 -0.3
#6 treatment1 ID3 1.8 1 group_1 -0.2
#7 treatment2 ID4 0.2 0 group_2 -2.3
#8 treatment2 ID4 0.3 1 group_2 -2.5
#9 control ID5 2.5 0 group_2 0.0
#10 control ID5 2.8 1 group_2 0.0
#11 treatment1 ID6 3.2 0 group_2 0.7
#12 treatment1 ID6 3.5 1 group_2 0.7
Run Code Online (Sandbox Code Playgroud)
等效的data.table方法是:
require(data.table)
setDT(df)[, Paired_sub := Response - Response[TreatmentGroup == "control"], by = list(Match, Time)]
Run Code Online (Sandbox Code Playgroud)
还有一个选择,使用基数R:
df <- do.call(rbind, lapply(split(df, interaction(df$Match, df$Time)), function(dd) {
dd$Paired_sub <- with(dd, Response - Response[TreatmentGroup == "control"])
dd}))
rownames(df) <- NULL
Run Code Online (Sandbox Code Playgroud)
这里只有顺序不同,paired_sub中的数字有望与其他答案中的数字相同.
| 归档时间: |
|
| 查看次数: |
100 次 |
| 最近记录: |