Scikit学习RandomForest内存错误

imp*_*ush 5 python python-2.7 scikit-learn

我试图在mnist手写数字数据集上运行scikit学习随机森林算法.在算法训练期间,系统进入内存错误.请告诉我该怎么做才能解决这个问题.

CPU统计: Intel Core 2 Duo,4GB RAM

数据集的形状为60000,784.linux终端上的完整错误如下:

> File "./reducer.py", line 53, in <module>
>     main()   File "./reducer.py", line 38, in main
>     clf = clf.fit(data,labels) #training the algorithm   File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 202,
> in fit
>     for i in xrange(n_jobs))   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 409, in
> __call__
>     self.dispatch(function, args, kwargs)   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 295, in
> dispatch
>     job = ImmediateApply(func, args, kwargs)   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 101, in
> __init__
>     self.results = func(*args, **kwargs)   File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 73, in
> _parallel_build_trees
>     sample_mask=sample_mask, X_argsorted=X_argsorted)   File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 476, in fit
>     X_argsorted=X_argsorted)   File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 357, in
> _build_tree
>     np.argsort(X.T, axis=1).astype(np.int32).T)   File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line
> 680, in argsort
>     return argsort(axis, kind, order) MemoryError
Run Code Online (Sandbox Code Playgroud)

Fre*_*Foo 4

设置n_jobs=1或升级到 scikit-learn 的前沿版本。问题在于,当前发布的版本使用多个进程并行拟合树,这意味着数据(Xy)需要复制到这些进程中。下一个版本将使用线程而不是进程,因此树学习器共享内存。