我正在写一个非常简单的脚本.我所要做的就是使用panda读取数据,然后训练数据的决策树.我使用的数据是:
https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data
Run Code Online (Sandbox Code Playgroud)
以下是我的剧本
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing
import pandas as pd
balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
sep= ',', header= None)
#print "Dataset:: "
#df1.head()
X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
Run Code Online (Sandbox Code Playgroud)
从错误中我猜它无法将"med"属性值转换为float.通过查看数据,我的随机猜测是,在它之前有一个空格而med没有.这就是为什么它变得混乱.但我不确定.请告诉它可能有什么问题.PS:错误发生在最后一行,这里是追溯
ValueError Traceback (most recent call last) …Run Code Online (Sandbox Code Playgroud) 我在以下数据集上拟合决策树:
https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data
Run Code Online (Sandbox Code Playgroud)
以下是我的代码:
balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
sep= ',', header= None)
le = preprocessing.LabelEncoder()
balance_data = balance_data.apply(le.fit_transform)
X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)
#using Gini index
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)
#using Information Gain
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)
#Gini prediction
y_pred = clf_gini.predict(X_test)
y_pred
#IG prediction
y_pred_en = clf_entropy.predict(X_test)
y_pred_en
Run Code Online (Sandbox Code Playgroud)
在 Gini Index …