rat*_*ite 5 python random-effects mixed-models statsmodels linearmodels
我试图理解 Python statsmodel 包提供的混合线性模型的结果。我想避免数据分析和解释中的陷阱。问题在数据加载/输出代码块之后。
加载数据并拟合模型:
import statsmodels.api as sm
import statsmodels.formula.api as smf
data = sm.datasets.get_rdataset("dietox", "geepack").data
md = smf.mixedlm("Weight ~ Time", data, groups=data["Pig"])
mdf = md.fit()
print mdf.summary()
Mixed Linear Model Regression Results
========================================================
Model: MixedLM Dependent Variable: Weight
No. Observations: 861 Method: REML
No. Groups: 72 Scale: 11.3669
Min. group size: 11 Likelihood: -2404.7753
Max. group size: 12 Converged: Yes
Mean group size: 12.0
--------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------
Intercept 15.724 0.788 19.952 0.000 14.179 17.268
Time 6.943 0.033 207.939 0.000 6.877 7.008
Group Var 40.394 2.149
========================================================
Run Code Online (Sandbox Code Playgroud)
Q1. (a) 组 Var 系数 (params) 到底是什么?我认为这是 Group Var (cov_params) 的方差,但默认输出与内置方法输出不匹配。
Q1. (b) “Group Var”参数(params)是什么意思?
print "-----Parameters-----"
print mdf.params
print
print "-----Covariance matrix-----"
print mdf.cov_params()
-----Parameters-----
Intercept 15.723523
Time 6.942505
Group Var 3.553634
dtype: float64
-----Covariance matrix-----
Intercept Time Group Var
Intercept 0.621028 -0.007222 0.000052
Time -0.007222 0.001115 -0.000012
Group Var 0.000052 -0.000012 0.406197
Run Code Online (Sandbox Code Playgroud)
Q2。(a) Var 组的标准误差 (bse) 是什么意思?为什么默认输出中未报告组 Var 估计?难道不重要吗?
Q2。(b) 与方差标准误差 (bse_re) 有何不同?
print "-----Standard errors-----"
print mdf.bse
print
print "-----Standard errors of random effects-----"
print mdf.bse_re
-----Standard errors-----
Intercept 0.788053
Time 0.033387
Group Var 0.637336
dtype: float64
-----Standard errors of random effects-----
Group Var 2.148771
dtype: float64
Run Code Online (Sandbox Code Playgroud)
Q3。为什么在summary()中没有报告随机参数的t值和p值?
print "-----t-values (or z-values?)-----"
print mdf.tvalues
print
print "-----p-values-----"
print mdf.pvalues
-----t-values (or z-values?)-----
Intercept 19.952366
Time 207.938608
Group Var 5.575760
dtype: float64
-----p-values-----
Intercept 1.429597e-88
Time 0.000000e+00
Group Var 2.464519e-08
dtype: float64
Run Code Online (Sandbox Code Playgroud)
参考: https: //www.statsmodels.org/dev/mixed_linear.html