我正在尝试使用SVM进行多标记分类.我有近8k的功能,并且有长度接近400的y向量.我已经有了二进制化的Y向量,所以我没有使用MultiLabelBinarizer()但是当我使用它与我的Y数据的原始形式时,它仍然给出相同的东西.
我正在运行此代码:
X = np.genfromtxt('data_X', delimiter=";")
Y = np.genfromtxt('data_y', delimiter=";")
training_X = X[:2600,:]
training_y = Y[:2600,:]
test_sample = X[2600:2601,:]
test_result = Y[2600:2601,:]
classif = OneVsRestClassifier(SVC(kernel='rbf'))
classif.fit(training_X, training_y)
print(classif.predict(test_sample))
print(test_result)
Run Code Online (Sandbox Code Playgroud)
在完成预测部分的所有拟合过程之后,它说Label not x is present in all training examples(x是我的y向量长度范围内的几个不同的数字,即400).之后,它给出预测的y向量,该向量总是零向量,长度为400(向量长度).我是scikit-learn和机器学习的新手.我无法弄清楚这里的问题.有什么问题,我该怎么做才能解决它?谢谢.
我正在尝试多标签分类问题。我的数据看起来像这样
DocID Content Tags
1 some text here... [70]
2 some text here... [59]
3 some text here... [183]
4 some text here... [173]
5 some text here... [71]
6 some text here... [98]
7 some text here... [211]
8 some text here... [188]
. ............. .....
. ............. .....
. ............. .....
Run Code Online (Sandbox Code Playgroud)
这是我的代码
traindf = pd.read_csv("mul.csv")
print "This is what our training data looks like:"
print traindf
t=TfidfVectorizer()
X=traindf["Content"]
y=traindf["Tags"]
print "Original Content"
print X
X=t.fit_transform(X)
print "Content After …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn logistic-regression multilabel-classification