Pat*_*ick 6 python interaction statsmodels
我在使用 statsmodels 的 get_margeff 命令处理具有交互项的 logit 模型时遇到问题。虽然在主效应模型中,效应被正确计算并对应于 Stata 和 R 结果,但当涉及交互项时情况并非如此。这里的效果是错误的,并且还报告了交互项的边际效果,这是没有意义的。以下代码说明了这一点:
import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2
df=sm2.datasets.heart.load_pandas().data
regression = sm.logit(formula='censors~survival+age', data=df).fit()
#only for illustration purposes; does not make real sense
print(regression.get_margeff().summary())
# the calculation of marginal effects here is corrects and corresponds to Stata and R results
Run Code Online (Sandbox Code Playgroud)
dy/dx std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
survival -0.0004 7.95e-05 -4.672 0.000 -0.001 -0.000
age 0.0148 0.005 3.262 0.001 0.006 0.024
==============================================================================
Run Code Online (Sandbox Code Playgroud)
regression = sm.logit(formula='censors~survival+age+survival*age', data=df).fit()
print(regression.get_margeff().summary())
## effects for survival and age are not correct and a marginal effect for survival:age is reported which does not make sense
Run Code Online (Sandbox Code Playgroud)
================================================================================
dy/dx std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------
survival -0.0009 0.001 -1.040 0.298 -0.003 0.001
age 0.0120 0.006 1.857 0.063 -0.001 0.025
survival:age 1.08e-05 1.8e-05 0.599 0.549 -2.45e-05 4.61e-05
================================================================================
Run Code Online (Sandbox Code Playgroud)
有人知道如何解决这个问题,以便第二个模型中生存和年龄的边际效应 [此处仅用于说明目的] 对应于 Stata 和 R 结果?
编辑,4 月 11 日:
响应用户“StupidWolf”这里是各自的Stata结果:
use "heart.dta"
qui logit censors survival age
margins, dydx(*)
Average marginal effects Number of obs = 69
Model VCE : OIM
Expression : Pr(censors), predict()
dy/dx w.r.t. : survival age
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
survival | -.0003716 .0000795 -4.67 0.000 -.0005275 -.0002157
age | .014813 .0045409 3.26 0.001 .0059131 .0237129
------------------------------------------------------------------------------
qui logit censors survival age c.survival#c.age
margins, dydx(*)
Average marginal effects Number of obs = 69
Model VCE : OIM
Expression : Pr(censors), predict()
dy/dx w.r.t. : survival age
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
survival | -.0003816 .0000814 -4.68 0.000 -.0005412 -.0002219
age | .0162289 .0051163 3.17 0.002 .0062012 .0262567
------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
关于为什么不应为交互项计算边际效应有广泛的讨论,例如:https : //www3.nd.edu/~rwilliam/stats/Margins01.pdf https://www.stata.com/statalist /archive/2013-01/msg00293.html