partial_fit Sklearn的MLPClassifier

Bit*_*ita 4 python classification neural-network scikit-learn

我一直在尝试使用Sklearn的神经网络MLPClassifier.我有一个大小为1000个实例的数据集(带有二进制输出),我想应用一个带有1个隐藏层的基本神经网络.

问题是我的数据实例并非同时全部可用.在任何时间点,我只能访问1个数据实例.我认为MLPClassifier的partial_fit方法可以用于此,所以我用1000个输入的虚数据集模拟了这个问题,并且一次一个地循环输入,并且每个实例都有一个partial_fit但是当我运行代码时,神经网络什么都不学习并且预测的输出全为零.

我对可能导致问题的原因一无所知.任何想法都非常感激.

from __future__ import division 
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier

#Creating an imaginary dataset
input, output = make_classification(1000, 30, n_informative=10, n_classes=2)
input= input / input.max(axis=0)
N = input.shape[0]
train_input = input[0:N/2,:]
train_target = output[0:N/2]

test_input= input[N/2:N,:]
test_target = output[N/2:N]

#Creating and training the Neural Net
clf = MLPClassifier(activation='tanh', algorithm='sgd', learning_rate='constant',
 alpha=1e-4, hidden_layer_sizes=(15,), random_state=1, batch_size=1,verbose= True,
 max_iter=1, warm_start=True)
classes=[0,1]
for j in xrange(0,100):
for i in xrange(0,train_input.shape[0]):
    input_inst = [train_input[i,:]]
    input_inst = np.asarray(input_inst)
    target_inst= [train_target[i]]
    target_inst = np.asarray(target_inst)
    clf=clf.partial_fit(input_inst,target_inst,classes)

#Testing the Neural Net
y_pred = clf.predict(test_input)
print y_pred
Run Code Online (Sandbox Code Playgroud)

Cur*_*ous 5

解释问题

问题出self.label_binarizer_.fit(y)在第895行multilayer_perceptron.py.

每当你打电话clf.partial_fit(input_inst,target_inst,classes),你打电话self.label_binarizer_.fit(y)其中y只有一个对应一个类样品,在这种情况下.因此,如果最后一个样本是0级,那么您clf将把所有内容归类为0级.

作为临时修复,您可以multilayer_perceptron.py在第895行进行编辑.它位于与此类似的目录中python2.7/site-packages/sklearn/neural_network/

在第895行,改变,

self.label_binarizer_.fit(y)
Run Code Online (Sandbox Code Playgroud)

if not incremental:
    self.label_binarizer_.fit(y)

else:
    self.label_binarizer_.fit(self.classes_)
Run Code Online (Sandbox Code Playgroud)

这样,如果您正在使用partial_fit,则self.label_binarizer_适合类而不是单个样本.

此外,您发布的代码可以更改为以下代码以使其工作,

from __future__ import division 
import numpy as np
from sklearn.datasets import make_classification
from sklearn.neural_network import MLPClassifier

#Creating an imaginary dataset
input, output = make_classification(1000, 30, n_informative=10, n_classes=2)
input= input / input.max(axis=0)
N = input.shape[0]
train_input = input[0:N/2,:]
train_target = output[0:N/2]

test_input= input[N/2:N,:]
test_target = output[N/2:N]

#Creating and training the Neural Net 
# 1. Disable verbose (verbose is annoying with partial_fit)

clf = MLPClassifier(activation='tanh', algorithm='sgd', learning_rate='constant',
 alpha=1e-4, hidden_layer_sizes=(15,), random_state=1, batch_size=1,verbose= False,
 max_iter=1, warm_start=True)

# 2. Set what the classes are
clf.classes_ = [0,1]

for j in xrange(0,100):
    for i in xrange(0,train_input.shape[0]):
       input_inst = train_input[[i]]
       target_inst= train_target[[i]]

       clf=clf.partial_fit(input_inst,target_inst)

    # 3. Monitor progress
    print "Score on training set: %0.8f" % clf.score(train_input, train_target)
#Testing the Neural Net
y_pred = clf.predict(test_input)
print y_pred

# 4. Compute score on testing set
print clf.score(test_input, test_target)
Run Code Online (Sandbox Code Playgroud)

代码中有4个主要更改.这应该可以让您对训练和测试集进行良好的预测!

干杯.