Mia*_*Mia 6 python statistics numpy scipy pandas
我对Python和统计学都很陌生.我正在尝试应用Chi Squared Test来确定先前的成功是否会影响一个人的变化水平(百分比明智,这似乎确实如此,但我想看看我的结果是否具有统计学意义).
我的问题是:我这样做了吗?我的结果说p值是0.0,这意味着我的变量之间存在显着的关系(这当然是我想要的......但是对于p值来说0看起来有点太完美了,所以我是想知道我是否错误编码明智了).
这是我做的:
import numpy as np
import pandas as pd
import scipy.stats as stats
d = {'Previously Successful' : pd.Series([129.3, 182.7, 312], index=['Yes - changed strategy', 'No', 'col_totals']),
 'Previously Unsuccessful' : pd.Series([260.17, 711.83, 972], index=['Yes - changed strategy', 'No', 'col_totals']),
 'row_totals' : pd.Series([(129.3+260.17), (182.7+711.83), (312+972)], index=['Yes - changed strategy', 'No', 'col_totals'])}
total_summarized = pd.DataFrame(d)
observed = total_summarized.ix[0:2,0:2]
输出: 观察
expected =  np.outer(total_summarized["row_totals"][0:2],
                 total_summarized.ix["col_totals"][0:2])/1000
expected = pd.DataFrame(expected)
expected.columns = ["Previously Successful","Previously Unsuccessful"]
expected.index = ["Yes - changed strategy","No"]
chi_squared_stat = (((observed-expected)**2)/expected).sum().sum()
print(chi_squared_stat)
crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*
                  df = 8)   # *
print("Critical value")
print(crit)
p_value = 1 - stats.chi2.cdf(x=chi_squared_stat,  # Find the p-value
                         df=8)
print("P value")
print(p_value)
stats.chi2_contingency(observed= observed)
输出 统计
War*_*ser 10
一些更正:
expected阵列不正确.你必须除以observed.sum().sum()1284,而不是1000.chi_squared_stat不包括连续性校正.(但不使用它并不一定是错的 - 这是对统计学家的判断.)您执行的所有计算(预期矩阵,统计,自由度,p值)通过以下公式计算chi2_contingency:
In [65]: observed
Out[65]: 
                        Previously Successful  Previously Unsuccessful
Yes - changed strategy                  129.3                   260.17
No                                      182.7                   711.83
In [66]: from scipy.stats import chi2_contingency
In [67]: chi2, p, dof, expected = chi2_contingency(observed)
In [68]: chi2
Out[68]: 23.383138325890453
In [69]: p
Out[69]: 1.3273696199438626e-06
In [70]: dof
Out[70]: 1
In [71]: expected
Out[71]: 
array([[  94.63757009,  294.83242991],
       [ 217.36242991,  677.16757009]])
默认情况下,chi2_contingency当列联表为2x2时使用连续性校正.如果您不想使用更正,可以使用以下参数禁用它correction=False:
In [73]: chi2, p, dof, expected = chi2_contingency(observed, correction=False)
In [74]: chi2
Out[74]: 24.072616672232893
In [75]: p
Out[75]: 9.2770200776879643e-07
| 归档时间: | 
 | 
| 查看次数: | 9892 次 | 
| 最近记录: |