我正在尝试构建一个多标签的核外文本分类器。如上所述这里,这个想法是读取(大规模)分批文本数据集和部分装修他们的分类。此外,当您拥有此处所述的多标签实例时,其想法是以一对多的方式构建与数据集中类的数量一样多的二元分类器。
将 sklearn 的 MultiLabelBinarizer 和 OneVsRestClassifier 类与部分拟合相结合时,出现以下错误:
ValueError:包含多个元素的数组的真值不明确。使用 a.any() 或 a.all()
代码如下:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
categories = ['a', 'b', 'c']
X = ["This is a test", "This is another attempt", "And this is a test too!"]
Y = [['a', 'b'],['b'],['a','b']]
mlb = MultiLabelBinarizer(classes=categories)
vectorizer = HashingVectorizer(decode_error='ignore', n_features=2 ** 18, non_negative=True)
clf = OneVsRestClassifier(MultinomialNB(alpha=0.01))
X_train = vectorizer.fit_transform(X)
Y_train = mlb.fit_transform(Y)
clf.partial_fit(X_train, Y_train, classes=categories) …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn multilabel-classification