我已经看过其他帖子谈论这个,但其中任何一个都可以帮助我.我在Windows x6机器上使用jupyter笔记本和Python 3.6.0.我有一个大型数据集,但我只保留了一部分来运行我的模型:
这是我使用的一段代码:
df = loan_2.reindex(columns= ['term_clean','grade_clean', 'annual_inc', 'loan_amnt', 'int_rate','purpose_clean','installment','loan_status_clean'])
df.fillna(method= 'ffill').astype(int)
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import StandardScaler
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
array = df.values
y = df['loan_status_clean'].values
imp.fit(array)
array_imp = imp.transform(array)
y2= y.reshape(1,-1)
imp.fit(y2)
y_imp= imp.transform(y2)
X = array_imp[:,0:4]
Y = array_imp[:,4]
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
seed = 7
scoring = 'accuracy'
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix …Run Code Online (Sandbox Code Playgroud)