R中MI数据的描述性统计:取3

ksr*_*ogl 1 r summary r-mice imputation

作为R初学者,我发现很难弄清楚如何计算多重估算数据的描述性统计数据(比运行一些其他基本分析更重要,例如关联和回归).

这些类型的问题以道歉(Descriptive statistics(Means,StdDevs)使用多重推算数据:R)开头,但尚未得到解答(https://stats.stackexchange.com/questions/296193/pooling-basic-descriptives-从几个乘法推算数据集 - 使用 - 鼠标)或迅速投下一票.

以下是对mouseadds函数的描述(https://www.rdocumentation.org/packages/miceadds/versions/2.10-14/topics/stats0),我发现很难跟踪以mids格式存储的数据.

我已经使用摘要(完整(imp))得到了一些输出,如均值,中位数,最小值,最大值,但是我想知道如何获得额外的汇总输出(例如,偏斜/峰度,标准偏差,方差).

从上面的上一张海报中借来的插图:

  > imp <- mice(nhanes, seed = 23109)

    iter imp variable
    1   1  bmi  hyp  chl
    1   2  bmi  hyp  chl
    1   3  bmi  hyp  chl
    1   4  bmi  hyp  chl
    1   5  bmi  hyp  chl
    2   1  bmi  hyp  chl
    2   2  bmi  hyp  chl
    2   3  bmi  hyp  chl

  > summary(complete(imp))
   age         bmi        hyp         chl     
   1:12   Min.   :20.40   1:18   Min.   :113  
   2: 7   1st Qu.:24.90   2: 7   1st Qu.:186  
   3: 6   Median :27.40          Median :199  
          Mean   :27.37          Mean   :194  
          3rd Qu.:30.10          3rd Qu.:218  
          Max.   :35.30          Max.   :284  
Run Code Online (Sandbox Code Playgroud)

有人会花时间来说明如何使用mids对象获取基本描述吗?

小智 7

您的代码和 Katia 的答案中都有几个错误,并且 Katia 提供的链接不再可用。

要在多重插补后计算简单的统计数据,您必须遵循鲁宾规则,这是在小鼠中用于选定一组模型拟合的方法。

使用时

library(mice)
imp <- mice(nhanes, seed = 23109)
mat <- complete(imp)
mat
   age  bmi hyp chl
1    1 28.7   1 199
2    2 22.7   1 187
3    1 30.1   1 187
4    3 22.7   2 204
5    1 20.4   1 113
6    3 24.9   2 184
7    1 22.5   1 118
8    1 30.1   1 187
9    2 22.0   1 238
10   2 30.1   1 229
11   1 35.3   1 187
12   2 27.5   1 229
13   3 21.7   1 206
14   2 28.7   2 204
15   1 29.6   1 238
16   1 29.6   1 238
17   3 27.2   2 284
18   2 26.3   2 199
19   1 35.3   1 218
20   3 25.5   2 206
21   1 33.2   1 238
22   1 33.2   1 229
23   1 27.5   1 131
24   3 24.9   1 284
25   2 27.4   1 186
Run Code Online (Sandbox Code Playgroud)

您仅返回第一个估算数据集,而默认情况下估算了五个数据集。请参阅?mice::complete参考资料 “默认操作 = 1L 返回第一个估算数据集。” 要获取五个估算数据集,您必须指定action参数mice::complete

mat2 <- complete(imp, "long")
mat2
    .imp .id age  bmi hyp chl
1      1   1   1 28.7   1 199
2      1   2   2 22.7   1 187
3      1   3   1 30.1   1 187
4      1   4   3 22.7   2 204
5      1   5   1 20.4   1 113
6      1   6   3 24.9   2 184
7      1   7   1 22.5   1 118
8      1   8   1 30.1   1 187
9      1   9   2 22.0   1 238
10     1  10   2 30.1   1 229
...
115    5  15   1 29.6   1 187
116    5  16   1 25.5   1 187
117    5  17   3 27.2   2 284
118    5  18   2 26.3   2 199
119    5  19   1 35.3   1 218
120    5  20   3 25.5   2 218
121    5  21   1 22.7   1 238
122    5  22   1 33.2   1 229
123    5  23   1 27.5   1 131
124    5  24   3 24.9   1 186
125    5  25   2 27.4   1 186
Run Code Online (Sandbox Code Playgroud)

summary(mat)都是summary(mat2)假的。让我们关注体重指数。第一个提供了第一个估算数据集的平均体重指数。第二个提供了人工m倍大数据集的平均值。第二个数据集的方差也过低。

mean(mat$bmi)
27.484
mean(mat2$bmi)
26.5192
Run Code Online (Sandbox Code Playgroud)

我还没有找到比手动将鲁宾规则应用于平均估计更好的解决方案。正确的估计只是所有估算数据集的估计平均值

res <- with(imp, mean(bmi)) #get the mean for each imputed dataset, stored in res$analyses
do.call(sum, res$analyses) / 5 #compute mean over m = 5 mean estimations
26.5192
Run Code Online (Sandbox Code Playgroud)

必须适当计算方差/标准差。您可以使用鲁宾规则来计算您想要的任何简单统计数据。您可以在这里找到这样做的方法https://bookdown.org/mwheymans/bookmi/rubins-rules.html

希望这可以帮助。


Kat*_*tia 6

以下是您可以采取的一些步骤,以便更好地了解每个步骤后R对象发生的情况.我还建议您查看本教程:https: //gerkovink.github.io/miceVignettes/

library(mice)

# nhanes object is just a simple dataframe: 
data(nhanes)
str(nhanes)
#'data.frame':  25 obs. of  4 variables:
#  $ age: num  1 2 1 3 1 3 1 1 2 2 ...
#$ bmi: num  NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
#$ hyp: num  NA 1 1 NA 1 NA 1 1 1 NA ...
#$ chl: num  NA 187 187 NA 113 184 118 187 238 NA ...

# you can generate multivariate imputation using mice() function
imp <- mice(nhanes, seed=23109)

#The output variable is an object of class "mids" which you can explore using str() function
str(imp)
# List of 17
# $ call           : language mice(data = nhanes)
# $ data           :'data.frame':  25 obs. of  4 variables:
#   ..$ age: num [1:25] 1 2 1 3 1 3 1 1 2 2 ...
# ..$ bmi: num [1:25] NA 22.7 NA NA 20.4 NA 22.5 30.1 22 NA ...
# ..$ hyp: num [1:25] NA 1 1 NA 1 NA 1 1 1 NA ...
# ..$ chl: num [1:25] NA 187 187 NA 113 184 118 187 238 NA ...
# $ m              : num 5
# ...
 # $ imp            :List of 4
  #..$ age: NULL
  #..$ bmi:'data.frame':    9 obs. of  5 variables:
  #.. ..$ 1: num [1:9] 28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
  #.. ..$ 2: num [1:9] 27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
  #.. ..$ 3: num [1:9] 22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
  #.. ..$ 4: num [1:9] 27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
  #.. ..$ 5: num [1:9] 28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7
#...


#You can extract individual components of this object using $, for example
#To view the actual imputation for bmi column
imp$imp$bmi
#       1    2    3    4    5
# 1  28.7 27.2 22.5 27.2 28.7
# 3  30.1 30.1 30.1 22.0 28.7
# 4  22.7 27.2 20.4 22.7 20.4
# 6  24.9 25.5 22.5 21.7 21.7
# 10 30.1 29.6 27.4 25.5 25.5
# 11 35.3 26.3 22.0 27.2 22.5
# 12 27.5 26.3 26.3 24.9 22.5
# 16 29.6 30.1 27.4 30.1 25.5
# 21 33.2 30.1 35.3 22.0 22.7

# The above output is again just a regular dataframe:
str(imp$imp$bmi)
# 'data.frame':  9 obs. of  5 variables:
#   $ 1: num  28.7 30.1 22.7 24.9 30.1 35.3 27.5 29.6 33.2
# $ 2: num  27.2 30.1 27.2 25.5 29.6 26.3 26.3 30.1 30.1
# $ 3: num  22.5 30.1 20.4 22.5 27.4 22 26.3 27.4 35.3
# $ 4: num  27.2 22 22.7 21.7 25.5 27.2 24.9 30.1 22
# $ 5: num  28.7 28.7 20.4 21.7 25.5 22.5 22.5 25.5 22.7

# complete() function returns imputed dataset:
mat <- complete(imp)

# The output of this function is a regular data frame:
str(mat)
# 'data.frame':  25 obs. of  4 variables:
# $ age: num  1 2 1 3 1 3 1 1 2 2 ...
# $ bmi: num  28.7 22.7 30.1 22.7 20.4 24.9 22.5 30.1 22 30.1 ...
# $ hyp: num  1 1 1 2 1 2 1 1 1 1 ...
# $ chl: num  199 187 187 204 113 184 118 187 238 229 ...

# So you can run any descriptive statistics you need with this object
# Just like you would do with a regular dataframe:
> summary(mat)
# age            bmi             hyp            chl       
# Min.   :1.00   Min.   :20.40   Min.   :1.00   Min.   :113.0  
# 1st Qu.:1.00   1st Qu.:24.90   1st Qu.:1.00   1st Qu.:187.0  
# Median :2.00   Median :27.50   Median :1.00   Median :204.0  
# Mean   :1.76   Mean   :27.48   Mean   :1.24   Mean   :204.9  
# 3rd Qu.:2.00   3rd Qu.:30.10   3rd Qu.:1.00   3rd Qu.:229.0  
# Max.   :3.00   Max.   :35.30   Max.   :2.00   Max.   :284.0  
Run Code Online (Sandbox Code Playgroud)