SDS*_*SDS 1 python python-3.x scikit-learn
我有 2 个列表features
和labels
.
features
包含疾病、年龄、性别、PIN。
labels
包含健康计划。
用户通过user_input
,格式为features
. 所以,代码应该预测健康计划的使用用户DecisionTree
的sklearn
API。
正如一些参数features
是Strings
。例如疾病和性别。我正在对它们进行编码LabelEncoder
以避免错误 ' ValueError: could not convert string to float
' 。
现在,使用后Label Encoder
,我得到以下异常 ' ValueError: bad input shape
'
如何解决问题并再次反转已完成的编码以避免String to Float
错误。请帮忙。
from sklearn import tree
from sklearn.preprocessing import LabelEncoder
features = [['TB' , 28, 'MALE', 121001], ['TB' , 28, 'FEMALE', 121002], ['CANCER' , 28, 'MALE', 121001], ['CANCER' , 28, 'FEMALE', 121001]]
labels = ['X125434', 'X125436','X125437' , 'X125437']
user_input = ['TB' , 28, 'MALE', 121001]
le = LabelEncoder()
Y = le.fit_transform(features)
X = le.fit_transform(labels)
new_user_input = le.fit_transform(user_input)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(new_features, new_labels)
print(clf.predict([new_ui]))
Run Code Online (Sandbox Code Playgroud)
不建议对数据集中的所有特征使用相同的标签编码器。为每一列创建一个标签编码器是安全的,因为每个特征的值都不同。
from sklearn import tree
from sklearn.preprocessing import LabelEncoder
import pandas as pd
features = [['TB' , 28, 'MALE', 121001], ['TB' , 28, 'FEMALE', 121002], ['CANCER' , 28, 'MALE', 121001], ['CANCER' , 28, 'FEMALE', 121001]]
labels = ['X125434', 'X125436','X125437' , 'X125437']
feature_names=['Disease','Age','Gender','PIN']
user_input = ['TB' , 28, 'MALE', 121001]
train=pd.DataFrame(data=features,columns=['Disease','Age','Gender','PIN'])
train['Labels']=labels
test=pd.DataFrame(columns=['Disease','Age','Gender','PIN'])
test.loc[len(test)]=user_input
le_disease = LabelEncoder()
le_gender = LabelEncoder()
le_labels = LabelEncoder()
train['Disease'] = le_disease.fit_transform(train['Disease'])
train['Gender'] = le_gender.fit_transform(train['Gender'])
train['Labels'] = le_labels.fit_transform(train['Labels'])
test['Disease'] = le_disease.transform(test['Disease'])
test['Gender'] = le_gender.transform(test['Gender'])
clf = tree.DecisionTreeClassifier()
clf = clf.fit(train[feature_names], train['Labels'])
print(le_labels.inverse_transform(clf.predict(test[feature_names])))
Run Code Online (Sandbox Code Playgroud)
LabelEncoder.inverse_transform()
可用于取回原始数据。
归档时间: |
|
查看次数: |
15936 次 |
最近记录: |