Jay*_*s01 7 python linear-regression scikit-learn statsmodels
我在拟合线性回归后寻找影响统计数据.在RI中可以像这样获得它们(例如):
hatvalues(fitted_model) #hatvalues (leverage)
cooks.distance(fitted_model) #Cook's D values
rstandard(fitted_model) #standardized residuals
rstudent(fitted_model) #studentized residuals
Run Code Online (Sandbox Code Playgroud)
等等
在拟合这样的模型后,如何在Python中使用statsmodel时获得相同的统计信息:
#import statsmodels
import statsmodels.api as sm
#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()
#Creating a dataframe that includes the studentized residuals
sm.regression.linear_model.OLSResults.outlier_test(results)
Run Code Online (Sandbox Code Playgroud)
编辑:见下面的答案......
Sco*_*ter 14
尽管接受的答案是正确的,但我发现statsmodels.regression.linear_model.OLSResults.get_influence在拟合模型后将统计信息作为影响实例 ( ) 的实例属性单独访问会很有帮助。这使我不必索引,summary_frame因为我只对其中一个统计数据感兴趣,而不是对所有统计数据感兴趣。所以也许这对其他人有帮助:
import statsmodels.api as sm
#Fit linear model to any dataset
model = sm.OLS(Y,X)
results = model.fit()
#create instance of influence
influence = results.get_influence()
#leverage (hat values)
leverage = influence.hat_matrix_diag
#Cook's D values (and p-values) as tuple of arrays
cooks_d = influence.cooks_distance
#standardized residuals
standardized_residuals = influence.resid_studentized_internal
#studentized residuals
studentized_residuals = influence.resid_studentized_external
Run Code Online (Sandbox Code Playgroud)