use*_*089 6 r confidence-interval statistics-bootstrap
问题:如何使用boostrap来获得在协方差矩阵的特征值上计算的统计数据的置信区间,分别针对数据框中的每个组(因子级别)?
问题:我无法确定我需要包含适合该boot函数的这些结果的数据结构,或者是一种在组中"映射"引导程序并以适合绘图的形式获得置信区间的方法.
上下文:在heplots包中,boxM计算协方差矩阵相等的Box的M检验.有一种绘图方法可以生成进入此测试的对数决定因素的有用图.该图中的置信区间基于渐近理论近似.
> library(heplots)
> iris.boxm <- boxM(iris[, 1:4], iris[, "Species"])
> iris.boxm
Box's M-test for Homogeneity of Covariance Matrices
data: iris[, 1:4]
Chi-Sq (approx.) = 140.94, df = 20, p-value < 2.2e-16
> plot(iris.boxm, gplabel="Species")
Run Code Online (Sandbox Code Playgroud)
绘图方法还可以显示特征值的其他函数,但在这种情况下没有可用的理论置信区间.
op <- par(mfrow=c(2,2), mar=c(5,4,1,1))
plot(iris.boxm, gplabel="Species", which="product")
plot(iris.boxm, gplabel="Species", which="sum")
plot(iris.boxm, gplabel="Species", which="precision")
plot(iris.boxm, gplabel="Species", which="max")
par(op)
Run Code Online (Sandbox Code Playgroud)
因此,我希望能够使用boostrap计算这些CI,并将其显示在相应的图中.
我尝试过的:
下面是提升这些统计数据的函数,但对于总样本,不考虑group(Species).
cov_stat_fun <- function(data, indices,
stats=c("logdet", "prod", "sum", "precision", "max")
) {
dat <- data[indices,]
cov <- cov(dat, use="complete.obs")
eigs <- eigen(cov)$values
res <- c(
"logdet" = log(det(cov)),
"prod" = prod(eigs),
"sum" = sum(eigs),
"precision" = 1/ sum(1/eigs),
"max" = max(eigs)
)
}
boot_cov_stat <- function(data, R=500, ...) {
boot(data, cov_stat_fun, R=R, ...)
}
Run Code Online (Sandbox Code Playgroud)
这有效,但我需要按组进行结果(以及总样本)
> iris.boot <- boot_cov_stat(iris[,1:4])
>
> iris.ci <- boot.ci(iris.boot)
> iris.ci
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates
CALL :
boot.ci(boot.out = iris.boot)
Intervals :
Level Normal Basic Studentized
95% (-6.622, -5.702 ) (-6.593, -5.653 ) (-6.542, -5.438 )
Level Percentile BCa
95% (-6.865, -5.926 ) (-6.613, -5.678 )
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable
>
Run Code Online (Sandbox Code Playgroud)
我还编写了一个函数来计算每个组的单独协方差矩阵,但我无法看到如何在我的引导函数中使用它.有人可以帮忙吗?
# calculate covariance matrices by group and pooled
covs <- function(Y, group) {
Y <- as.matrix(Y)
gname <- deparse(substitute(group))
if (!is.factor(group)) group <- as.factor(as.character(group))
valid <- complete.cases(Y, group)
if (nrow(Y) > sum(valid))
warning(paste(nrow(Y) - sum(valid)), " cases with missing data have been removed.")
Y <- Y[valid,]
group <- group[valid]
nlev <- nlevels(group)
lev <- levels(group)
mats <- aux <- list()
for(i in 1:nlev) {
mats[[i]] <- cov(Y[group == lev[i], ])
}
names(mats) <- lev
pooled <- cov(Y)
c(mats, "pooled"=pooled)
}
Run Code Online (Sandbox Code Playgroud)
编辑:在一个看似相关的问题,按群组引导,建议通过使用strata参数来提供答案boot(),但是没有给出的例子.[啊:这个strata论点只是确保在自举样本中表示层与数据中的频率有关.]
为我的问题尝试这个,我没有进一步开悟,因为我想得到的是每个人的单独置信区间Species.
> iris.boot.strat <- boot_cov_stat(iris[,1:4], strata=iris$Species)
>
> boot.ci(iris.boot.strat, conf=0.95, type=c("basic", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 500 bootstrap replicates
CALL :
boot.ci(boot.out = iris.boot.strat, conf = 0.95, type = c("basic",
"bca"))
Intervals :
Level Basic BCa
95% (-6.587, -5.743 ) (-6.559, -5.841 )
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable
>
Run Code Online (Sandbox Code Playgroud)
如果我理解您的问题,您可以按组运行引导函数,如下所示:
library(boot)
library(tidyverse)
# Pooled
iris.boot <- boot_cov_stat(iris[,1:4])
iris.ci <- boot.ci(iris.boot)
# By Species
boot.list = setNames(unique(iris$Species), unique(iris$Species)) %>%
map(function(group) {
iris.boot = boot_cov_stat(iris[iris$Species==group, 1:4])
boot.ci(iris.boot)
})
# Combine pooled and by-Species results
boot.list = c(boot.list, list(Pooled=iris.ci))
boot.list
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)$setosa BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 500 bootstrap replicates CALL : boot.ci(boot.out = iris.boot) Intervals : Level Normal Basic Studentized 95% (-13.69, -11.86 ) (-13.69, -11.79 ) (-13.52, -10.65 ) Level Percentile BCa 95% (-14.34, -12.44 ) (-13.65, -11.99 ) Calculations and Intervals on Original Scale Warning : BCa Intervals used Extreme Quantiles Some BCa intervals may be unstable $versicolor BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 500 bootstrap replicates CALL : boot.ci(boot.out = iris.boot) Intervals : Level Normal Basic Studentized 95% (-11.37, -9.81 ) (-11.36, -9.78 ) (-11.25, -8.97 ) Level Percentile BCa 95% (-11.97, -10.39 ) (-11.35, -10.09 ) Calculations and Intervals on Original Scale Warning : BCa Intervals used Extreme Quantiles Some BCa intervals may be unstable $virginica BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 500 bootstrap replicates CALL : boot.ci(boot.out = iris.boot) Intervals : Level Normal Basic Studentized 95% (-9.467, -7.784 ) (-9.447, -7.804 ) (-9.328, -6.959 ) Level Percentile BCa 95% (-10.050, -8.407 ) ( -9.456, -8.075 ) Calculations and Intervals on Original Scale Warning : BCa Intervals used Extreme Quantiles Some BCa intervals may be unstable $Pooled BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 500 bootstrap replicates CALL : boot.ci(boot.out = iris.boot) Intervals : Level Normal Basic Studentized 95% (-6.620, -5.714 ) (-6.613, -5.715 ) (-6.556, -5.545 ) Level Percentile BCa 95% (-6.803, -5.906 ) (-6.624, -5.779 ) Calculations and Intervals on Original Scale Some BCa intervals may be unstable