我正在尝试对数据集运行正态性检验 (shapiro-wilk),并且我希望同时获得所有列的统计数据和 p 值。我已经阅读了SO上的所有其他页面(R:按组进行的夏皮罗测试不会产生p值和损坏的数据框警告,在数据框中的多个列上使用shapiro.test),但仍然无法弄清楚。任何帮助,将不胜感激!!
例如,这是数据集:具有一个字符向量(NVL)和其余数字,我想按 NVL(NV/VL)进行分组。
     NVL  Var1  Var2  Var3  Var 4  Var 5
1.   NV   22.5  26.8   89.2  35.7   100
2.   NV   34.7  67.4   29.8  12.4   100
3.   NV   68.3  34.5   44.5  23.8   100
4.   NV   11.2  55.3   17.5  77.9   100
5.   VL   55.6  77.2   59.7  89.6   100
6.   VL   60.5  88.7   65.4  99.6   100
7.   VL   89.4  87.5   65.9  89.5   100
8.   VL   65.4  74.2   75.4  89.5   100
9.   VL   81.8  78.5   95.4  92.5   100
这是代码:
library(dplyr)
normalityVar1<-mydata %>%
group_by(NVL) %>%
summarise(statistic = shapiro.test(Var1)$statistic, 
p.value = shapiro.test(Var1)$p.value)
这是输出:
NVL statistic   p.value
  <chr>     <dbl>     <dbl>
1    VL 0.9125239 0.1985486
2    NV 0.8983501 0.2101248
现在,我是否编辑此代码,以便可以同时获得所有变量(Var2, 3, 4 ,5)的输出?我什至尝试聚合和应用,但我被困住了。
aggregate(formula = Var1 ~ NVL,
data = mydata,
FUN = function(x) {y <- shapiro.test(x); c(y$statistic, y$p.value)}) 
正如您所看到的,我只能对一个变量执行此操作!我知道我已经很接近了,但我就是无法再弄清楚了!预先感谢您的帮助!
mydata <- read.table(text="
   NVL  Var1  Var2  Var3  Var4  Var5
1   NV   22.5  26.8   89.2  35.7   100
2   NV   34.7  67.4   29.8  12.4   100
3   NV   68.3  34.5   44.5  23.8   50
4   NV   11.2  55.3   17.5  77.9   100
5   VL   55.6  77.2   59.7  89.6   100
6   VL   60.5  88.7   65.4  99.6   100
7   VL   89.4  87.5   65.9  89.5   100
8   VL   65.4  74.2   75.4  89.5   90
9   VL   81.8  78.5   95.4  92.5   90
", header=T)
library(dplyr)
myfun <- function(x, group) {
  data.frame(x, group) %>%
  group_by(group) %>%
  summarise(
    statistic = ifelse(sd(x)!=0,shapiro.test(x)$statistic,NA), 
    p.value = ifelse(sd(x)!=0,shapiro.test(x)$p.value,NA)
  )
}
(lst <- lapply(mydata[,-1], myfun, group=mydata[,1]))
输出是:
$Var1
# A tibble: 2 x 3
   group statistic   p.value
  <fctr>     <dbl>     <dbl>
1     NV 0.9313476 0.6023421
2     VL 0.9149572 0.4979450
$Var2
# A tibble: 2 x 3
   group statistic   p.value
  <fctr>     <dbl>     <dbl>
1     NV 0.9409576 0.6601747
2     VL 0.8736587 0.2815562
$Var3
# A tibble: 2 x 3
   group statistic   p.value
  <fctr>     <dbl>     <dbl>
1     NV 0.9096322 0.4804557
2     VL 0.8644349 0.2446131
$Var4
# A tibble: 2 x 3
   group statistic    p.value
  <fctr>     <dbl>      <dbl>
1     NV 0.9003135 0.43261822
2     VL 0.7260939 0.01760713
$Var5
# A tibble: 2 x 3
   group statistic     p.value
  <fctr>     <dbl>       <dbl>
1     NV 0.6297763 0.001240726
2     VL 0.6840289 0.006470001
输出lst列表可以转换为一个data.frame对象:
do.call(cbind, lst)[,-seq(4,3*(ncol(mydata)-1),3)]
这是输出:
  Var1.group Var1.statistic Var1.p.value Var2.statistic Var2.p.value Var3.statistic Var3.p.value Var4.statistic Var4.p.value Var5.statistic Var5.p.value
1         NV      0.9313476    0.6023421      0.9409576    0.6601747      0.9096322    0.4804557      0.9003135   0.43261822      0.6297763  0.001240726
2         VL      0.9149572    0.4979450      0.8736587    0.2815562      0.8644349    0.2446131      0.7260939   0.01760713      0.6840289  0.006470001
| 归档时间: | 
 | 
| 查看次数: | 10747 次 | 
| 最近记录: |