jul*_*lia 6 python machine-learning decision-tree scikit-learn
我有一个拥有86k行,5个功能和1个目标列的pandas DataFrame.我正在尝试使用70%的DataFrame训练DecisionTreeClassifier作为训练数据,我从fit方法得到一个MemoryError.我已经尝试更改一些参数,但我真的不知道是什么导致错误,所以我不知道如何处理它.我在Windows 10上有8GB的RAM.
码
train, test = train_test_split(data, test_size = 0.3)
X_train = train.iloc[:, 1:-1] # first column is not a feature
y_train = train.iloc[:, -1]
X_test = test.iloc[:, 1:-1]
y_test = test.iloc[:, -1]
DT = DecisionTreeClassifier()
DT.fit(X_train, y_train)
dt_predictions = DT.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
错误
File (...), line 97, in <module>
DT.fit(X_train, y_train)
File "(...)\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\tree\tree.py", line 790, in fit
X_idx_sorted=X_idx_sorted)
File "(...)\AppData\Local\Programs\Python\Python36-32\lib\site-packages\sklearn\tree\tree.py", line 362, in fit
builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
File "sklearn\trewe\_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn\tree\_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn\tree\_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
File "sklearn\tree\_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
File "sklearn\tree\_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 671612928 bytes
Run Code Online (Sandbox Code Playgroud)
当我尝试使用RandomForestClassifier时,会发生同样的错误,总是在进行拟合的行中.我怎么解决这个问题?
小智 2
我也遇到了同样的问题。确保您正在处理分类问题而不是回归问题。如果您的目标列是连续的,您可能需要使用http://scikit-learn.org/stable/modules/ generated/sklearn.ensemble.RandomForestRegressor.html 而不是 RandomForestClassifier。
| 归档时间: |
|
| 查看次数: |
1297 次 |
| 最近记录: |