如何在R中进行组匹配?

lil*_*ets 5 grouping r match dataframe

假设我有下面的data.frame treat == 1表示id接收到的处理并且prob是计算出的概率treat == 1.

set.seed(1)
df <- data.frame(id = 1:10, treat = sample(0:1, 10, replace = T))
df$prob <- ifelse(df$treat, rnorm(10, .8, .1), rnorm(10, .4, .4))
df
   id treat      prob
1   1     0 0.3820266
2   2     0 0.3935239
3   3     1 0.8738325
4   4     1 0.8575781
5   5     0 0.6375605
6   6     1 0.9511781
7   7     1 0.8389843
8   8     1 0.7378759
9   9     1 0.5785300
10 10     0 0.6479303
Run Code Online (Sandbox Code Playgroud)

为了尽量减少选择偏差,我现在想的值的基础上,建立伪治疗组和对照组treatprob:

  • 当任何id与任何with的treat == 10.1 prob之内idtreat == 0,我希望值group"被处理".

  • 当任何id与任何with的treat == 00.1 prob之内idtreat == 1,我希望值为group"control".

下面是我想要的结果的一个例子.

df$group <- c(NA, NA, NA, NA, 'control', NA, NA, 'treated', 'treated', 'control')
df
   id treat      prob   group
1   1     0 0.3820266    <NA>
2   2     0 0.3935239    <NA>
3   3     1 0.8738325    <NA>
4   4     1 0.8575781    <NA>
5   5     0 0.6375605 control
6   6     1 0.9511781    <NA>
7   7     1 0.8389843    <NA>
8   8     1 0.7378759 treated
9   9     1 0.5785300 treated
10 10     0 0.6479303 control
Run Code Online (Sandbox Code Playgroud)

我该怎么做呢?在上面的示例中,匹配是使用替换完成的,但是也可以使用没有替换的解决方案.

989*_*989 2

我认为这个问题非常适合cut在基础 R 中解决。以下是如何以向量化的方式完成它:

f <- function(r) {
      x <- cut(df[r,]$prob, breaks = c(df[!r,]$prob-0.1, df[!r,]$prob+0.1))
      df[r,][!is.na(x),]$id
}

ones <- df$treat==1
df$group <- NA

df[df$id %in% f(ones),]$group <- "treated"
df[df$id %in% f(!ones),]$group <- "control"

> df

   # id treat      prob   group
# 1   1     0 0.3820266    <NA>
# 2   2     0 0.3935239    <NA>
# 3   3     1 0.8738325    <NA>
# 4   4     1 0.8575781    <NA>
# 5   5     0 0.6375605 control
# 6   6     1 0.9511781    <NA>
# 7   7     1 0.8389843    <NA>
# 8   8     1 0.7378759 treated
# 9   9     1 0.5785300 treated
# 10 10     0 0.6479303 control
Run Code Online (Sandbox Code Playgroud)