小编Han*_*nna的帖子

ValueError:无法将字符串转换为float:med

我正在写一个非常简单的脚本.我所要做的就是使用panda读取数据,然后训练数据的决策树.我使用的数据是:

https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data
Run Code Online (Sandbox Code Playgroud)

以下是我的剧本

import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn import preprocessing
import pandas as pd
balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
                           sep= ',', header= None)
#print "Dataset:: "

#df1.head()

X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
                               max_depth=3, min_samples_leaf=5)

clf_gini.fit(X_train, y_train)
Run Code Online (Sandbox Code Playgroud)

从错误中我猜它无法将"med"属性值转换为float.通过查看数据,我的随机猜测是,在它之前有一个空格而med没有.这就是为什么它变得混乱.但我不确定.请告诉它可能有什么问题.PS:错误发生在最后一行,这里是追溯

ValueError                                Traceback (most recent call last) …
Run Code Online (Sandbox Code Playgroud)

python scikit-learn

1
推荐指数
1
解决办法
8071
查看次数

决策树只预测一类

我在以下数据集上拟合决策树:

https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data
Run Code Online (Sandbox Code Playgroud)

以下是我的代码:

balance_data=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
                           sep= ',', header= None)

le = preprocessing.LabelEncoder()
balance_data = balance_data.apply(le.fit_transform)
X = balance_data.values[:, 0:5]
Y = balance_data.values[:,6]
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.2, random_state = 100)

#using Gini index
clf_gini = DecisionTreeClassifier(criterion = "gini", random_state = 100,
                               max_depth=3, min_samples_leaf=5)

clf_gini.fit(X_train, y_train)

#using Information Gain
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
 max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)


#Gini prediction
y_pred = clf_gini.predict(X_test)
y_pred

#IG prediction
y_pred_en = clf_entropy.predict(X_test)
y_pred_en
Run Code Online (Sandbox Code Playgroud)

在 Gini Index …

python machine-learning scikit-learn

0
推荐指数
1
解决办法
2062
查看次数

标签 统计

python ×2

scikit-learn ×2

machine-learning ×1