Sag*_*tha 10 python pandas statsmodels
我statsmodels.formula.api
在Windows 10上使用(版本0.9.0)进行多元线性回归.在拟合模型并使用以下行获取摘要后,我将以摘要对象格式获得摘要.
X_opt = X[:, [0,1,2,3]]
regressor_OLS = sm.OLS(endog= y, exog= X_opt).fit()
regressor_OLS.summary()
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.951
Model: OLS Adj. R-squared: 0.948
Method: Least Squares F-statistic: 296.0
Date: Wed, 08 Aug 2018 Prob (F-statistic): 4.53e-30
Time: 00:46:48 Log-Likelihood: -525.39
No. Observations: 50 AIC: 1059.
Df Residuals: 46 BIC: 1066.
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 5.012e+04 6572.353 7.626 0.000 3.69e+04 6.34e+04
x1 0.8057 0.045 17.846 0.000 0.715 0.897
x2 -0.0268 0.051 -0.526 0.602 -0.130 0.076
x3 0.0272 0.016 1.655 0.105 -0.006 0.060
==============================================================================
Omnibus: 14.838 Durbin-Watson: 1.282
Prob(Omnibus): 0.001 Jarque-Bera (JB): 21.442
Skew: -0.949 Prob(JB): 2.21e-05
Kurtosis: 5.586 Cond. No. 1.40e+06
==============================================================================
Run Code Online (Sandbox Code Playgroud)
我想对显着性水平0.05的P值进行反向消除.为此,我需要删除具有最高P值的预测变量并再次运行代码.
我想知道是否有一种方法可以从摘要对象中提取P值,这样我就可以运行带有条件语句的循环并找到重要变量而无需手动重复这些步骤.
谢谢.
Zax*_*axR 12
@Michael B的答案效果很好,但需要"重新创建"表格.表本身实际上可以直接从summary().tables属性中获得.此属性中的每个表(表格列表)都是SimpleTable,它具有输出不同格式的方法.然后我们可以将这些格式作为pd.DataFrame读取:
import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()
results_summary = results.summary()
# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0
results_as_html = results_summary.tables[1].as_html()
pd.read_html(results_as_html, header=0, index_col=0)[0]
Run Code Online (Sandbox Code Playgroud)
Mic*_*l B 11
将模型拟合存储为变量results
,如下所示:
import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()
Run Code Online (Sandbox Code Playgroud)
然后创建一个如下所示的函数:
def results_summary_to_dataframe(results):
'''take the result of an statsmodel results table and transforms it into a dataframe'''
pvals = results.pvalues
coeff = results.params
conf_lower = results.conf_int()[0]
conf_higher = results.conf_int()[1]
results_df = pd.DataFrame({"pvals":pvals,
"coeff":coeff,
"conf_lower":conf_lower,
"conf_higher":conf_higher
})
#Reordering...
results_df = results_df[["coeff","pvals","conf_lower","conf_higher"]]
return results_df
Run Code Online (Sandbox Code Playgroud)
您可以results
使用dir()进行打印,然后将它们添加到函数和df中,从而进一步探索对象的所有属性.
一个简单的解决方案就是一行代码:
LRresult = (result.summary2().tables[1])
Run Code Online (Sandbox Code Playgroud)
这将为您提供一个数据框对象:
type(LRresult)
Run Code Online (Sandbox Code Playgroud)
pandas.core.frame.DataFrame
要获取重要变量并再次运行测试:
newlist = list(LRresult[LRresult['P>|z|']<=0.05].index)[1:]
myform1 = 'binary_Target' + ' ~ ' + ' + '.join(newlist)
M1_test2 = smf.logit(formula=myform1,data=myM1_1)
result2 = M1_test2.fit(maxiter=200)
LRresult2 = (result2.summary2().tables[1])
LRresult2
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
6204 次 |
最近记录: |