相关疑难解决方法(0)

Scikit-Learn:标签不是x出现在所有训练样例中

我正在尝试使用SVM进行多标记分类.我有近8k的功能,并且有长度接近400的y向量.我已经有了二进制化的Y向量,所以我没有使用MultiLabelBinarizer()但是当我使用它与我的Y数据的原始形式时,它仍然给出相同的东西.

我正在运行此代码:

X = np.genfromtxt('data_X', delimiter=";")
Y = np.genfromtxt('data_y', delimiter=";")
training_X = X[:2600,:]
training_y = Y[:2600,:]

test_sample = X[2600:2601,:]
test_result = Y[2600:2601,:]

classif = OneVsRestClassifier(SVC(kernel='rbf'))
classif.fit(training_X, training_y)
print(classif.predict(test_sample))
print(test_result)

Run Code Online (Sandbox Code Playgroud)

在完成预测部分的所有拟合过程之后,它说Label not x is present in all training examples(x是我的y向量长度范围内的几个不同的数字,即400).之后,它给出预测的y向量,该向量总是零向量,长度为400(向量长度).我是scikit-learn和机器学习的新手.我无法弄清楚这里的问题.有什么问题,我该怎么做才能解决它？谢谢.

python machine-learning scikit-learn

mal*_*sit

lucky-day

8
推荐指数

1
解决办法

3418
查看次数

Python sklearn 多标签分类：用户警告：所有训练示例中都存在标签不是 226

我正在尝试多标签分类问题。我的数据看起来像这样

DocID   Content             Tags           
1       some text here...   [70]
2       some text here...   [59]
3       some text here...  [183]
4       some text here...  [173]
5       some text here...   [71]
6       some text here...   [98]
7       some text here...  [211]
8       some text here...  [188]
.       .............      .....
.       .............      .....
.       .............      .....

Run Code Online (Sandbox Code Playgroud)

这是我的代码

traindf = pd.read_csv("mul.csv")
print "This is what our training data looks like:"
print traindf

t=TfidfVectorizer()

X=traindf["Content"]

y=traindf["Tags"]

print "Original Content"
print X
X=t.fit_transform(X)
print "Content After …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn logistic-regression multilabel-classification

Abt*_*Pst

2015 12-18

4
推荐指数

1
解决办法

1710
查看次数