我尝试运行以下代码.顺便说一句,我是python和sklearn的新手.
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7], train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6], test[:, 8]]
Xtest = np.nan_to_num(Xtest)
# model
lr = LogisticRegression()
lr.fit(X, y)
Run Code Online (Sandbox Code Playgroud)
其中y是0和1的np.ndarray
我收到以下内容:
文件"C:\ Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py",line> 1174,in fit check_classification_targets(y)
文件"C:\ Anaconda3\lib\site-packages\sklearn\utils\multiclass.py",第172行,>在check_classification_targets中引发ValueError("未知标签类型:%r"%y_type)
ValueError:未知标签类型:'未知'
来自sklearn文档:http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
y:类似数组,形状(n_samples,)目标值(分类中的类标签,回归中的实数)
我的错误是什么? …