pol*_*s11 3 python linear-regression data-science
我试图通过 coef 打印 VIF(方差膨胀因子)。但是,我似乎无法从 statsmodels 中找到任何说明如何操作的文档?我有一个需要处理的 n 个变量的模型,所有变量的多重共线性值无助于删除具有最高共线性的值。
这看起来像一个答案
但是我将如何针对此工作簿运行它。
http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv
下面是代码和摘要输出,这也是我现在所在的位置。
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
# read data into a DataFrame
data = pd.read_csv('somepath', index_col=0)
print(data.head())
#multiregression
lm = smf.ols(formula='Sales ~ TV + Radio + Newspaper', data=data).fit()
print(lm.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 0.897
Model: OLS Adj. R-squared: 0.896
Method: Least Squares F-statistic: 570.3
Date: Wed, 15 Feb 2017 Prob (F-statistic): 1.58e-96
Time: 13:28:29 Log-Likelihood: -386.18
No. Observations: 200 AIC: 780.4
Df Residuals: 196 BIC: 793.6
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 2.9389 0.312 9.422 0.000 2.324 3.554
TV 0.0458 0.001 32.809 0.000 0.043 0.049
Radio 0.1885 0.009 21.893 0.000 0.172 0.206
Newspaper -0.0010 0.006 -0.177 0.860 -0.013 0.011
==============================================================================
Omnibus: 60.414 Durbin-Watson: 2.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241
Skew: -1.327 Prob(JB): 1.44e-33
Kurtosis: 6.332 Cond. No. 454.
==============================================================================
Run Code Online (Sandbox Code Playgroud)
要获取 VIF 列表:
from statsmodels.stats.outliers_influence import variance_inflation_factor
variables = lm.model.exog
vif = [variance_inflation_factor(variables, i) for i in range(variables.shape[1])]
vif
Run Code Online (Sandbox Code Playgroud)
要得到他们的意思:
np.array(vif).mean()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6103 次 |
| 最近记录: |