数据帧中列的几个变量的Bootstrap CI

jon*_*nas 1 r function summary dataframe dplyr

我想从一个比例来引导置信区间data.frame.我想在我的一个列中获得变量的结果.我已设法为矢量执行引导程序,但不知道如何data.frame从此处将其扩展到a .一个简化示例,将阈值设置为10并查看数据中小于10的比例.

矢量解决方案

library(boot)

vec <- abs(rnorm(1000)*10) #generate example vector

data_to_tb <- vec

tb <- function(data) {
  sum(data < 10, na.rm = FALSE)/length(data) #function for generating the proportion
}

tb(data_to_tb)

boot.out <- boot(data = data_to_tb, function(u,i) tb(u[i]),  R = 999)
quantile(boot.out$t, c(.025,.975))
Run Code Online (Sandbox Code Playgroud)

从这里开始,我想对data.frame包含两列的内容做同样的事情.data.frame如果可能的话,我希望以" (x,样本,比例,CI)列的形式返回结果" :

x    n   proportion  CI

A    xx  xx          xx
B    xx  xx          xx
C    xx  xx          xx
Run Code Online (Sandbox Code Playgroud)

如果dplyr可以使用包装会更好.以下是我的数据的简化示例:

例:

dataframe <- data.frame(x = sample(c("A","B","C"),100,replace = TRUE), vec =abs(rnorm(100)*10))

head(dataframe)
##   x        vec
## 1 B 0.06735163
## 2 C 0.48612358
## 3 B 2.34190635
## 4 C 0.36393262
## 5 A 7.99762969
## 6 B 1.43293330
Run Code Online (Sandbox Code Playgroud)

sha*_*dow 5

您可以使用group_by,并summarisedplyr以达到预期的效果.请参阅下面的代码.

# load required package
require(dplyr)
# function to calculate the confidence interval
CIfun <- function(v, probs = c(.025, .975)) {
  quantile(boot(data = v, function(u,i) tb(u[i]),  R = 999)$t, probs)
}
# using summarise from dplyr
dataframe %>% group_by(x) %>%
  summarise(n = n(), 
            proportion = tb(vec), 
            `2.5%` = CIfun(vec, .025), 
            `97.5%`= CIfun(vec, .975))
Run Code Online (Sandbox Code Playgroud)