sklearn：半监督学习 - LabelSpreadingModel 内存错误

Question

sklearn：半监督学习 - LabelSpreadingModel 内存错误

Eda*_*ame 1 machine-learning python-2.7 scikit-learn

我正在使用sklearn LabelSpreadingModel如下：

label_spreading_model = LabelSpreading()
model_s = label_spreading_model.fit(my_inputs, labels)

Run Code Online (Sandbox Code Playgroud)

但我收到以下错误：

   MemoryErrorTraceback (most recent call last)
    <ipython-input-17-73adbf1fc908> in <module>()
         11 
         12 label_spreading_model = LabelSpreading()
    ---> 13 model_s = label_spreading_model.fit(my_inputs, labels)

    /usr/local/lib/python2.7/dist-packages/sklearn/semi_supervised/label_propagation.pyc in fit(self, X, y)
        224 
        225         # actual graph construction (implementations should override this)
    --> 226         graph_matrix = self._build_graph()
        227 
        228         # label construction

    /usr/local/lib/python2.7/dist-packages/sklearn/semi_supervised/label_propagation.pyc in _build_graph(self)
        455         affinity_matrix = self._get_kernel(self.X_)
        456         laplacian = graph_laplacian(affinity_matrix, normed=True)
    --> 457         laplacian = -laplacian
        458         if sparse.isspmatrix(laplacian):
        459             diag_mask = (laplacian.row == laplacian.col)

    MemoryError:

Run Code Online (Sandbox Code Playgroud)

我的输入矩阵的拉普拉斯算子看起来有问题。是否有任何我可以配置的参数或任何可以避免此错误的更改？谢谢！

Answer 1

sas*_*cha 5

很明显：您的 PC 内存不足。

由于您没有设置任何参数，因此默认使用 rbf-kernel ( proof )。

scikit-learn 文档的一些摘录：

The RBF kernel will produce a fully connected graph which is represented in
memory by a dense matrix. This matrix may be very large and combined with the 
cost of performing a full matrix multiplication calculation for each iteration
of the algorithm can lead to prohibitively long running times

Run Code Online (Sandbox Code Playgroud)

也许以下（上面文档中的下一句）会有所帮助：

On the other hand, the KNN kernel will produce a much more memory-friendly 
sparse matrix which can drastically reduce running times.

Run Code Online (Sandbox Code Playgroud)

但我不知道你的数据、PC 配置和公司。而且只能猜...

归档时间：	9 年，6 月前
查看次数：	553 次
最近记录：	9 年，6 月前