bev*_*age 5 machine-learning python-3.x scikit-learn
我正在尝试获取使用 scikit-learn 的 OneVsRestClassifier 构建的分类器的指标,以解决多标签分类问题。但是,我无法让指标库正常工作,因为我尝试比较真实标签和预测标签的二进制指标大小不同。下面是代码,大部分取自使用 scikit-learn 分类为多个类别
import numpy as np
import collections
import csv
import os
import sys
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
import sklearn.metrics as metrics
np.set_printoptions(threshold=sys.maxsize)
csv_read_args = ({'mode': 'rb'} if sys.version_info[0] < 3 else
{'mode': 'rt', 'newline': '', 'encoding': 'latin1'})
with open(os.path.abspath('somefilepath'), **csv_read_args) as myfile:
reader = csv.reader(myfile)
next(reader)
a, b = [], []
# feed generator expression into a zero-length deque to consume it
generator = ((a.append(row[2]), b.append(row[1].split(";"))) for row in reader)
collections.deque(generator, maxlen=0)
X_train = np.array(a)
y_train_text = b
with open(os.path.abspath('some filepath'), **csv_read_args) as myfile:
reader = csv.reader(myfile)
next(reader)
c, d = [], []
generator = ((c.append(row[2]), d.append(row[1].split(";"))) for row in reader)
collections.deque(generator, maxlen=0)
X_test = np.array(c)
mlb = MultiLabelBinarizer()
Y = mlb.fit_transform(y_train_text)
classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, Y)
predicted = classifier.predict(X_test)
all_labels = mlb.inverse_transform(predicted)
mlb = MultiLabelBinarizer()
true = mlb.fit_transform(d)
print(true.shape)
print(predicted.shape)
print(metrics.f1_score(true, predicted, average="micro"))
Run Code Online (Sandbox Code Playgroud)
在最后一行,我收到一条错误消息: ValueError: Multi-label bin Indicator input with different number of labels
为什么我的真实指标和预测指标带有不同数量的标签?是否因为我的训练数据集可能具有测试数据集中不存在的标签,反之亦然?如果是这样,我该如何解释?
| 归档时间: |
|
| 查看次数: |
1206 次 |
| 最近记录: |