Peg*_*s18 5 python machine-learning scikit-learn
我正在尝试使用 1988 年 UCI 乳腺癌发病率存储库(https://archive.ics.uci.edu/ml/datasets/Breast+Cancer)来解决分类机器学习问题。我不断收到以下错误,尽管不一致。有时该算法会直接运行到训练模型并预测测试准确性,有时它会在 OneHotEncoding 上失败并显示以下错误:
ohe = OneHotEncoder()
ohe.fit(X_train)
X_train_encoded = ohe.transform(X_train)
X_test_encoded = ohe.transform(X_test)
Run Code Online (Sandbox Code Playgroud)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-2cfd638a5b4d> in <module>()
2 ohe.fit(X_train)
3 X_train_encoded = ohe.transform(X_train)
----> 4 X_test_encoded = ohe.transform(X_test)
1 frames
/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown)
122 msg = ("Found unknown categories {0} in column {1}"
123 " during transform".format(diff, i))
--> 124 raise ValueError(msg)
125 else:
126 # Set the problematic rows to an acceptable value and
ValueError: Found unknown categories ['?'] in column 7 during transform
Run Code Online (Sandbox Code Playgroud)
我尝试在 Colab 和 Spyder 中运行,但遇到了同样的问题,不知道哪里出了问题。我在分割数据集然后编码之前输入缺失值,但即使删除 SimpleImputer 我仍然收到错误。
dataset = pd.read_csv('breast-cancer.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imputer.fit(X)
X_imputed = imputer.transform(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_imputed, y, test_size = 0.25)
ohe = OneHotEncoder()
ohe.fit(X_train)
X_train_encoded = ohe.transform(X_train)
X_test_encoded = ohe.transform(X_test)
<-- Code stops running here -->
le = LabelEncoder()
le.fit(y_train)
y_train_encoded = le.transform(y_train)
y_test_encoded = le.transform(y_test)
Run Code Online (Sandbox Code Playgroud)
小智 10
测试数据可能包含训练数据中不存在的新条目。\n你能试试这个吗?
\n\nohe = OneHotEncoder(handle_unknown = "ignore")
关于此参数:如果转换期间存在未知分类特征,是否引发错误或忽略(默认为引发)。当此参数设置为 \xe2\x80\x98ignore\xe2\x80\x99 且在转换过程中遇到未知类别时,该功能生成的 one-hot 编码列将全为零。
\n\n更多这里:
\n\nhttps://scikit-learn.org/stable/modules/ generated/sklearn.preprocessing.OneHotEncoder.html
\n 归档时间: |
|
查看次数: |
5506 次 |
最近记录: |