这是R:t-test对所有列的后续问题
假设我有一个庞大的数据集,然后我根据某些条件创建了许多子集.子集应具有相同的列数.然后我想一次对两个子集进行t检验(外循环),然后对于每个子集组合,一次一列地遍历所有列(内循环).
以下是我根据之前的答案提出的建议.这个因错误而停止.
C <- c("c1","c1","c1","c1","c1",
"c2","c2","c2","c2","c2",
"c3","c3","c3","c3","c3",
"c4","c4","c4","c4","c4",
"c5","c5","c5","c5","c5",
"c6","c6","c6","c6","c6",
"c7","c7","c7","c7","c7",
"c8","c8","c8","c8","c8",
"c9","c9","c9","c9","c9",
"c10","c10","c10","c10","c10")
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(C, X, Y, Z)
Data.c1 = subset(Data, C == "c1",select=X:Z)
Data.c2 = subset(Data, C == "c2",select=X:Z)
Data.c3 = subset(Data, C == "c3",select=X:Z)
Data.c4 = subset(Data, C == "c4",select=X:Z)
Data.c5 = subset(Data, C == "c5",select=X:Z)
Data.Subsets = c("Data.c1",
"Data.c2",
"Data.c3",
"Data.c4",
"Data.c5")
library(plyr)
combo1 <- combn(length(Data.Subsets),1)
adply(combo1, 1, function(x) {
combo2 <- combn(ncol(Data.Subsets[x]),2)
adply(combo2, 2, function(y) {
test <- t.test( Data.Subsets[x][, y[1]], Data.Subsets[x][, y[2]], na.rm=TRUE)
out <- data.frame("Subset" = rownames(Data.Subsets[x]),
, "Row" = colnames(x)[y[1]]
, "Column" = colnames(x[y[2]])
, "t.value" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
} )
} )
Run Code Online (Sandbox Code Playgroud)
首先,您可以更轻松地使用数据集定义数据集gl,并避免为列创建单个变量.
Data <- data.frame(
C = gl(10, 5, labels = paste("c", 1:10, sep = "")),
X = rnorm(n = 50, mean = 10, sd = 5),
Y = rnorm(n = 50, mean = 15, sd = 6),
Z = rnorm(n = 50, mean = 20, sd = 5)
)
Run Code Online (Sandbox Code Playgroud)
这个转换使用"长"格式melt从reshape包.(您也可以使用基本reshape功能.)
longData <- melt(Data, id.vars = "C")
Run Code Online (Sandbox Code Playgroud)
现在pairwise.t.test用于为每个级别的C计算所有X/Y/Z对的t检验.
with(longData, pairwise.t.test(value, interaction(C, variable)))
Run Code Online (Sandbox Code Playgroud)
请注意,使用pairwise.t.test而不仅仅是大量单独调用非常重要,t.test因为如果运行大量测试,则需要调整p值.(参见,例如,xkcd的解释.)
一般来说,成对t检验不如回归,所以要小心它们的用法.