大型RAM机器上的pandas内存错误,但较小的RAM机器上没有:相同的代码,相同的数据

dum*_*dad 2 pandas

我在两台机器上运行以下命令:

import os, sqlite3
import pandas as pd
from feat_transform import filter_anevexp
db_path = r'C:\Users\timregan\Desktop\anondb_280718.sqlite3'
db = sqlite3.connect(db_path)
anevexp_df = filter_anevexp(db, 0)
Run Code Online (Sandbox Code Playgroud)

在我的笔记本电脑上(带有8GB内存),运行没有问题(虽然呼叫filter_anevexp需要几分钟).在我的桌面上(有128GB的RAM)它在pandas中失败并出现内存错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\timregan\source\MentalHealth\code\preprocessing\feat_transform.py", line 171, in filter_anevexp
    anevexp_df = anevexp_df[anevexp_df["user_id"].isin(df)].copy()
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 2724, in _getitem_array
    return self._take(indexer, axis=0)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 2789, in _take
    verify=True)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 4539, in take
    axis=axis, allow_dups=True)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 4425, in reindex_indexer
    for blk in self.blocks]
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 4425, in <listcomp>
    for blk in self.blocks]
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 1258, in take_nd
    allow_fill=True, fill_value=fill_value)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 1655, in take_nd
    out = np.empty(out_shape, dtype=dtype)
MemoryError
Run Code Online (Sandbox Code Playgroud)

我需要做些什么特别的事情来防止有大量内存的机器上的错误(例如寻址错误)?

注意我没有在filter_anevexp函数中包含代码,因为我对如何减少内存占用的建议不感兴趣.我有兴趣理解为什么运行在相同数据上的相同代码在128GB RAM机器上发生内存错误而在8GB RAM机器上成功时会失败?

Jai*_*tas 6

您在家用电脑中使用32位版本,这意味着您的python可执行文件只能访问4GB的RAM.尝试使用64位而不是当前使用的32位重新安装python37.