用sci-kit分类学习多值输出

Question

用sci-kit分类学习多值输出

dem*_*lem 2 python machine-learning scikit-learn

我们假设我从训练集中选择了一份培训文件.我把它放到我选择的功能的特征向量X中.

我想做:

self.clf = LogisticRegression()
self.clf.fit(X, Y)

Run Code Online (Sandbox Code Playgroud)

我的Y会是这样的: [0 0 0 1 1 0 1 0 0 1 0]

我想训练我的单一模型,以便最佳地同时适应11个输出值中的每一个.这似乎不起作用,fit因为我得到一个unhashable type 'list'错误,因为它期望一个单独的值,即二进制或多类,但不允许多个值.

有没有用sci-kit学习呢？

Answer 1

Fre*_*Foo 7

多标签分类与普通分类有一些不同的API.你Y应该是一系列序列,例如列表,如

Y = [["foo", "bar"],          # the first sample is a foo and a bar
     ["foo"],                 # the second is only a foo
     ["bar", "baz"]]          # the third is a bar and a baz

Run Code Online (Sandbox Code Playgroud)

Y然后可以将这样的数据馈送到处理多个分类的估计器.您可以使用OneVsRestClassifier包装器构建这样的估计器:

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(LogisticRegression())

Run Code Online (Sandbox Code Playgroud)

然后训练clf.fit(X, Y).clf.predict现在也将产生序列序列.

UPDATE作为scikit学习0.15,这个API已被弃用,因为它的输入是模糊的.你应该将Y上面我给出的转换为一个矩阵,其中包含MultiLabelBinarizer:

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform(Y)
array([[1, 0, 1],
       [0, 0, 1],
       [1, 1, 0]])

Run Code Online (Sandbox Code Playgroud)

然后将其提供给估算器的fit方法.inverse_transform在同一个二进制化器上完成转换:

>>> mlb.inverse_transform(mlb.transform(Y))
[('bar', 'foo'), ('foo',), ('bar', 'baz')]

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，2 月前
查看次数：	2425 次
最近记录：	11 年，5 月前