Python:不要在 statsmodels 摘要中显示虚拟对象

use*_*074 2 python regression statsmodels

我正在使用 statsmodels 创建一些回归输出:

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col
import numpy as np 
import pandas as pd 

x1 = pd.Series(np.random.randn(2000))
x2 = pd.Series(np.random.randn(2000))
aa_milne_arr = ['a', 'b', 'c', 'd', "e", "f", "g", "h", "i"]
dummy = pd.Series(np.random.choice(aa_milne_arr, 2000,))
depen = pd.Series(np.random.randn(2000))
df = pd.DataFrame({"y": depen, "x1": x1, "x2": x2, "dummy": dummy})
df['const'] = 1
df['xsqr'] = df['x1']**2  
mod = smf.ols('y ~ x1 + x2 + dummy', data=df)
mod2 = smf.ols('y ~ x1 + x2 + xsqr + dummy', data=df)
res = mod.fit()
res2 = mod2.fit()

print (summary_col([res,res2],stars=True,float_format='%0.3f',
                  model_names=['one\n(0)','two\n(1)'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)}))
Run Code Online (Sandbox Code Playgroud)

它工作得很好,但我有一个包含许多虚拟对象的大数据集(比示例中的多得多)。因此,我想从摘要输出中排除虚拟对象(而不是从回归本身中排除)。这有可能吗?

Ern*_*ler 5

我将使用regressor_order中的参数summary_col,它允许您指定首先显示哪些回归量(如果您指定,则完全省略drop_omitted=True)。

例子:

all_regressors = sorted(list(set(res1.exog_names) | set(res2.exog_names)))
# Drop the dummies using some logic on their names.
all_regressors_no_fe = [var_name for var_name in all_regressors if not var_name.startswith('C(')]

print (summary_col([res,res2],stars=True,float_format='%0.3f',
                  model_names=['one\n(0)','two\n(1)'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)},
                  regressor_order=all_regressors_no_fe,
                  drop_omitted=True))
Run Code Online (Sandbox Code Playgroud)