如何用numpy计算统计"t-test"

Question

如何用numpy计算统计"t-test"

Mar*_*ark 25 python statistics numpy scipy

我想生成一些关于我在python中创建的模型的统计信息.我想在它上面生成t检验,但是想知道是否有一种简单的方法可以用numpy/scipy做到这一点.周围有什么好的解释吗？

例如,我有三个相关的数据集,如下所示:

[55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0]

Run Code Online (Sandbox Code Playgroud)

现在,我想对他们进行学生的t检验.

Answer 1

van*_*van 27

在scipy.stats包中,ttest_...功能很少.从这里看示例:

>>> print 't-statistic = %6.3f pvalue = %6.4f' %  stats.ttest_1samp(x, m)
t-statistic =  0.391 pvalue = 0.6955

Run Code Online (Sandbox Code Playgroud)

Answer 2

Ech*_*tor 8

van 使用 scipy 的答案是完全正确的，使用scipy.stats.ttest_*功能非常方便。

但是我来到这个页面是为了寻找一个纯numpy的解决方案，如标题中所述，以避免对 scipy 的依赖。为此，让我指出这里给出的例子：https : //docs.scipy.org/doc/numpy/reference/generated/numpy.random.standard_t.html

主要问题是，numpy 没有累积分布函数，因此我的结论是你应该真正使用 scipy。无论如何，只使用 numpy 是可能的：

从最初的问题我猜你想比较你的数据集并用 t 检验判断是否存在显着偏差？此外，样本是配对的？（请参阅https://en.wikipedia.org/wiki/Student%27s_t-test#Unpaired_and_paired_two-sample_t-tests）在这种情况下，您可以像这样计算 t 和 p 值：

import numpy as np
sample1 = np.array([55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0])
sample2 = np.array([54.0, 56.0, 48.0, 46.0, 56.0, 56.0, 55.0, 62.0])
# paired sample -> the difference has mean 0
difference = sample1 - sample2
# the t-value is easily computed with numpy
t = (np.mean(difference))/(difference.std(ddof=1)/np.sqrt(len(difference)))
# unfortunately, numpy does not have a build in CDF
# here is a ridiculous work-around integrating by sampling
s = np.random.standard_t(len(difference), size=100000)
p = np.sum(s<t) / float(len(s))
# using a two-sided test
print("There is a {} % probability that the paired samples stem from distributions with the same means.".format(2 * min(p, 1 - p) * 100))

Run Code Online (Sandbox Code Playgroud)

这将打印There is a 73.028 % probability that the paired samples stem from distributions with the same means.由于这远高于任何合理的置信区间（例如 5%），因此您不应针对具体情况得出任何结论。

归档时间：	16 年前
查看次数：	51319 次
最近记录：	8 年，8 月前