在python中R data.chisq $残差的等价物是什么?

ama*_*ity 1 python r scipy

我有以下数据:

array([[33, 250, 196, 136, 32],
       [55, 293, 190,  71, 13]])
Run Code Online (Sandbox Code Playgroud)

我可以从中得到p值stats.chi2_contingency(data).

有没有类似于这个R对象 - data.chisq$residuals获得Pearson的残差和标准化残差?

War*_*ser 6

你必须分别计算它们.这是一个简短的模块,用于定义这些残差的函数.它们采用观察到的频率和预期频率(由chi2_contingency返回).请注意,while statsmodels和以下statsmodels函数适用于n维数组,scipy.stats.chi2_contingency此处实现的仅适用于2D数组.

In [2]: import numpy as np                                                                                   

In [3]: import statsmodels.api as sm                                                                         

In [4]: F = np.array([[33, 250, 196, 136, 32], [55, 293, 190,  71, 13]])                                     

In [5]: table = sm.stats.Table(F)                                                                            

In [6]: table.resid_pearson  # Pearson's residuals
Out[6]: 
array([[-1.77162519, -1.61362277, -0.05718356,  2.96508777,  1.89079393],
       [ 1.80687785,  1.64573143,  0.05832142, -3.02408853, -1.92841787]])

In [7]: table.standardized_resids  # Standardized residuals
Out[7]: 
array([[-2.62309082, -3.0471942 , -0.09791681,  4.6295814 ,  2.74991911],
       [ 2.62309082,  3.0471942 ,  0.09791681, -4.6295814 , -2.74991911]])
Run Code Online (Sandbox Code Playgroud)

根据您的数据,我们得到:

from __future__ import division

import numpy as np
from scipy.stats.contingency import margins


def residuals(observed, expected):
    return (observed - expected) / np.sqrt(expected)

def stdres(observed, expected):
    n = observed.sum()
    rsum, csum = margins(observed)
    # With integers, the calculation
    #     csum * rsum * (n - rsum) * (n - csum)
    # might overflow, so convert rsum and csum to floating point.
    rsum = rsum.astype(np.float64)
    csum = csum.astype(np.float64)
    v = csum * rsum * (n - rsum) * (n - csum) / n**3
    return (observed - expected) / np.sqrt(v)
Run Code Online (Sandbox Code Playgroud)

这是R中的计算比较:

>>> F = np.array([[33, 250, 196, 136, 32], [55, 293, 190, 71, 13]])

>>> chi2, p, dof, expected = chi2_contingency(F)

>>> residuals(F, expected)
array([[-1.77162519, -1.61362277, -0.05718356,  2.96508777,  1.89079393],
       [ 1.80687785,  1.64573143,  0.05832142, -3.02408853, -1.92841787]])

>>> stdres(F, expected)
array([[-2.62309082, -3.0471942 , -0.09791681,  4.6295814 ,  2.74991911],
       [ 2.62309082,  3.0471942 ,  0.09791681, -4.6295814 , -2.74991911]])
Run Code Online (Sandbox Code Playgroud)