scikit-learn 中出现“ValueError：无法将字符串转换为浮点数”错误

Question

scikit-learn 中出现“ValueError：无法将字符串转换为浮点数”错误

我正在运行以下脚本：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
dataset = pd.read_csv('data/50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
onehotencoder = OneHotEncoder(categorical_features=3, 
handle_unknown='ignore')
onehotencoder.fit(X)

Run Code Online (Sandbox Code Playgroud)

数据头看起来像： data

我有这个：

ValueError：无法将字符串转换为浮点数：'New York'

我阅读了类似问题的答案，然后打开了 scikit-learn 文档，但是如何才能看到 scikit-learn 作者没有字符串中的空格问题

我知道我可以使用LabelEncocderfromsklearn.preprocessing然后使用 OHE 并且效果很好，但在这种情况下

In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
warnings.warn(msg, FutureWarning)

Run Code Online (Sandbox Code Playgroud)

发生按摩。

您可以使用完整的 csv 文件或

[[165349.2, 136897.8, 471784.1, 'New York', 192261.83],
[162597.7, 151377.59, 443898.53, 'California', 191792.06],
[153441.51, 101145.55, 407934.54, 'Florida', 191050.39],
[144372.41, 118671.85, 383199.62, 'New York', 182901.99],
[142107.34, 91391.77, 366168.42, 'Florida', 166187.94]]

Run Code Online (Sandbox Code Playgroud)

5 第一行测试此代码。

Answer 1

DYZ*_*DYZ 4

正是它categorical_features=3伤害了你。categorical_features您不能与字符串数据一起使用。删除这个选项，幸运就会降临到你身上。另外，您可能需要fit_transform，而不是fit这样。

onehotencoder = OneHotEncoder(handle_unknown='ignore')
transformed = onehotencoder.fit_transform(X[:, [3]]).toarray()
X1 = np.concatenate([X[:, :2], transformed, X[:, 4:]], axis=1)
#array([[165349.2, 136897.8, 0.0, '0.0, 1.0, 192261.83],
#       [162597.7, 151377.59, 1.0, 0.0, 0.0, 191792.06],
#       [153441.51, 101145.55, 0.0, 1.0, 0.0, 191050.39],
#       [144372.41, 118671.85, 0.0, 0.0, 1.0, 182901.99],
#       [142107.34, 91391.77, 0.0, 1.0, 0.0, 166187.94']])

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，11 月前
查看次数：	23049 次
最近记录：	6 年，10 月前