来自 fisher.test() 的 p 值与 phyper() 不匹配

R-P*_*eys 5 statistics r contingency p-value hypothesis-test

Fisher's Exact Test 与超几何分布有关,我希望这两个命令会返回相同的 pvalues。谁能解释我做错了什么,他们不匹配?

#data (variable names chosen to match dhyper() argument names)
x = 14
m = 20
n = 41047
k = 40

#Fisher test, alternative = 'greater'
(fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'))$p.value 
#returns 2.01804e-39

#geometric distribution, lower.tail = F, i.e. P[X > x]
phyper(x, m, n, k, lower.tail = F, log.p = F)
#returns 5.115862e-43
Run Code Online (Sandbox Code Playgroud)

De *_*ica 7

在这种情况下,相关的实际调用phyperphyper(x - 1, m, n, k, lower.tail = FALSE)。查看fisher.test与您的调用相关的源代码fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater')。在第 138 行,PVAL设置为:

switch(alternative, less = pnhyper(x, or), 
    greater = pnhyper(x, or, upper.tail = TRUE), 
    two.sided = {
      if (or == 0) as.numeric(x == lo) else if (or == 
        Inf) as.numeric(x == hi) else {
        relErr <- 1 + 10^(-7)
        d <- dnhyper(or)
        sum(d[d <= d[x - lo + 1] * relErr])
      }
    })
Run Code Online (Sandbox Code Playgroud)

因为alternative = 'greater'PVAL被设置为pnhyper(x, or, upper.tail = TRUE)。您可以pnhyper在第 122 行看到定义。这里,or = 1,它被传递给ncp,所以调用是phyper(x - 1, m, n, k, lower.tail = FALSE)

用你的价值观:

x = 14
m = 20
n = 41047
k = 40
phyper(x - 1, m, n, k, lower.tail = FALSE)
# [1] 2.01804e-39
Run Code Online (Sandbox Code Playgroud)