use*_*074 2 python regression statsmodels
我正在使用 statsmodels 创建一些回归输出:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col
import numpy as np
import pandas as pd
x1 = pd.Series(np.random.randn(2000))
x2 = pd.Series(np.random.randn(2000))
aa_milne_arr = ['a', 'b', 'c', 'd', "e", "f", "g", "h", "i"]
dummy = pd.Series(np.random.choice(aa_milne_arr, 2000,))
depen = pd.Series(np.random.randn(2000))
df = pd.DataFrame({"y": depen, "x1": x1, "x2": x2, "dummy": dummy})
df['const'] = 1
df['xsqr'] = df['x1']**2
mod = smf.ols('y ~ x1 + x2 + dummy', data=df)
mod2 = smf.ols('y ~ x1 + x2 + xsqr + dummy', data=df)
res = mod.fit()
res2 = mod2.fit()
print (summary_col([res,res2],stars=True,float_format='%0.3f',
model_names=['one\n(0)','two\n(1)'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)}))
Run Code Online (Sandbox Code Playgroud)
它工作得很好,但我有一个包含许多虚拟对象的大数据集(比示例中的多得多)。因此,我想从摘要输出中排除虚拟对象(而不是从回归本身中排除)。这有可能吗?
我将使用regressor_order中的参数summary_col,它允许您指定首先显示哪些回归量(如果您指定,则完全省略drop_omitted=True)。
例子:
all_regressors = sorted(list(set(res1.exog_names) | set(res2.exog_names)))
# Drop the dummies using some logic on their names.
all_regressors_no_fe = [var_name for var_name in all_regressors if not var_name.startswith('C(')]
print (summary_col([res,res2],stars=True,float_format='%0.3f',
model_names=['one\n(0)','two\n(1)'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)},
regressor_order=all_regressors_no_fe,
drop_omitted=True))
Run Code Online (Sandbox Code Playgroud)