Tho*_*ott 4 python csv machine-learning scikit-learn
我正在按照本教程编写朴素贝叶斯分类器:http : //machinelearningmastery.com/naive-bayes-classifier-scratch-python/
我不断收到此错误:
dataset[i] = [float(x) for x in dataset[i]]
ValueError: could not convert string to float:
Run Code Online (Sandbox Code Playgroud)
这是我的代码发生错误的部分:
def loadDatasetNB(filename):
lines = csv.reader(open(filename, "rt"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
Run Code Online (Sandbox Code Playgroud)
这是文件的调用方式:
def NB_Analysis():
filename = 'fvectors.csv'
splitRatio = 0.67
dataset = loadDatasetNB(filename)
trainingSet, testSet = splitDatasetNB(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows').format(len(dataset), len(trainingSet), len(testSet))
# prepare model
summaries = summarizeByClassNB(trainingSet)
# test model
predictions = getPredictionsNB(summaries, testSet)
accuracy = getAccuracyNB(testSet, predictionsNB)
print('Accuracy: {0}%').format(accuracy)
NB_Analysis()
Run Code Online (Sandbox Code Playgroud)
这里出了什么问题,我该如何解决?
尝试跳过标题,第一列中的空标题导致问题。
>>> float(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float:
Run Code Online (Sandbox Code Playgroud)
如果你想跳过标题,你可以通过以下方式实现:
def loadDatasetNB(filename):
lines = csv.reader(open(filename, "rt"))
next(reader, None) # <<- skip the headers
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
Run Code Online (Sandbox Code Playgroud)
(2) 或者你可以忽略异常:
try:
float(element)
except ValueError:
pass
Run Code Online (Sandbox Code Playgroud)
如果您决定使用选项 (2),请确保仅跳过第一行或仅跳过包含文本的行,并且您肯定知道它。
| 归档时间: |
|
| 查看次数: |
38493 次 |
| 最近记录: |