如何使用scikit线性回归找到系数的特征名称?

ame*_*hta 22 python machine-learning linear-regression scikit-learn

#training the model
model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])

model_2 = linear_model.LinearRegression()
model_2.fit(train_data[model_2_features], train_data['price'])

model_3 = linear_model.LinearRegression()
model_3.fit(train_data[model_3_features], train_data['price'])

# extracting the coef
print model_1.coef_
print model_2.coef_
print model_3.coef_
Run Code Online (Sandbox Code Playgroud)

如果我改变了特征的顺序,则coef仍以相同的顺序打印,因此我想知道该特征与coeff的映射

Rob*_*ess 14

诀窍在于,在训练模型后,您就知道系数的顺序:

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))
Run Code Online (Sandbox Code Playgroud)

这将打印系数和正确的功能.(用pandas DataFrame测试)

如果您想稍后重用系数,也可以将它们放在字典中:

coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
    coef_dict[feat] = coef
Run Code Online (Sandbox Code Playgroud)

(你可以通过训练两个具有相同功能的模型来测试它,但正如你所说的那样,改进了功能的顺序.)


roc*_*ady 7

@Robin 发布了一个很好的答案,但对我来说,我必须对其进行一些调整才能按照我想要的方式工作,它是指我想要的“coef_”np.array 的维度,即修改为: model_1.coef_[0,:],如下:

coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
    coef_dict[feat] = coef
Run Code Online (Sandbox Code Playgroud)

然后按照我的想象创建了 dict,其中包含 {'feature_name' :coefficient_value} 对。


小智 7

import pandas as pd

import numpy as np

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",regressor.coef_.transpose())
Run Code Online (Sandbox Code Playgroud)