SciKit-Learn 标签编码器导致错误“参数必须是字符串或数字”

Question

SciKit-Learn 标签编码器导致错误“参数必须是字符串或数字”

mik*_*wry 9 python machine-learning feature-selection scikit-learn one-hot-encoding

我有点困惑 - 在这里创建一个 ML 模型。

我正处于尝试从“大”数据框（180 列）中获取分类特征的步骤，并对它们进行单一处理，以便我可以找到特征之间的相关性并选择“最佳”特征。

这是我的代码：

# import labelencoder
from sklearn.preprocessing import LabelEncoder

# instantiate labelencoder object
le = LabelEncoder()

# apply le on categorical feature columns
df = df.apply(lambda col: le.fit_transform(col))
df.head(10)

Run Code Online (Sandbox Code Playgroud)

运行它时，我收到以下错误：

TypeError: ('argument must be a string or number', 'occurred at index LockTenor')

所以我转到 LockTenor 字段并查看所有不同的值：

df.LockTenor.unique()

Run Code Online (Sandbox Code Playgroud)

结果如下：

array([60.0, 45.0, 'z', 90.0, 75.0, 30.0], dtype=object)

对我来说看起来像所有的字符串和数字。错误是否是因为它是浮点数而不一定是 INT 引起的？

Answer 1

Art*_*uro 13

您会收到此错误，因为您确实有浮点数和字符串的组合。看看这个例子：

# Preliminaries
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Create DataFrames

# df1 has all floats
d1 = {'LockTenor':[60.0, 45.0, 15.0, 90.0, 75.0, 30.0]}
df1 = pd.DataFrame(data=d1)
print("DataFrame 1")
print(df1)

# df2 has a string in the mix
d2 = {'LockTenor':[60.0, 45.0, 'z', 90.0, 75.0, 30.0]}
df2 = pd.DataFrame(data=d2)
print("DataFrame 2")
print(df2)

# Create encoder
le = LabelEncoder()

# Encode first DataFrame 1 (where all values are floats)
df1 = df1.apply(lambda col: le.fit_transform(col), axis=0, result_type='expand')
print("DataFrame 1 encoded")
print(df1)

# Encode first DataFrame 2 (where there is a combination of floats and strings)
df2 = df2.apply(lambda col: le.fit_transform(col), axis=0, result_type='expand')
print("DataFrame 2 encoded")
print(df2)

Run Code Online (Sandbox Code Playgroud)

如果你运行这段代码，你会看到它df1的编码没有问题，因为它的所有值都是浮点数。但是，您将收到您报告的错误df2。

一个简单的解决方法是将列转换为字符串。您可以在相应的 lambda 函数中执行此操作：

df2 = df2.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')

Run Code Online (Sandbox Code Playgroud)

作为额外的建议，我建议您查看您的数据，看看它们是否正确。对我来说，在同一列中混合使用浮点数和字符串有点奇怪。

最后，我只想指出，SCI-KIT的LabelEncoder执行变量的简单编码，它不是performe一个热编码。如果你想这样做，我建议你看看OneHotEncoder

Answer 2

小智 5

尝试用这个：

df[cat] = le.fit_transform(df[cat].astype(str))

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，3 月前
查看次数：	20514 次
最近记录：	4 年，11 月前