在python中压缩大数据的麻烦

Question

在python中压缩大数据的麻烦

我在Python中有一个脚本来压缩大字符串:

import zlib

def processFiles():
  ...
  s = """Large string more than 2Gb"""
  data = zlib.compress(s)
  ...

Run Code Online (Sandbox Code Playgroud)

当我运行此脚本时,出现错误:

ERROR: Traceback (most recent call last):#012  File "./../commands/sce.py", line 438, in processFiles#012    data = zlib.compress(s)#012OverflowError: size does not fit in an int

Run Code Online (Sandbox Code Playgroud)

一些信息:

zlib的.version ='1.0'

zlib.ZLIB_VERSION ='1.2.7'

# python -V
Python 2.7.3

# uname -a
Linux app2 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

# free
             total       used       free     shared    buffers     cached
Mem:      65997404    8096588   57900816          0     184260    7212252
-/+ buffers/cache:     700076   65297328
Swap:     35562236          0   35562236

# ldconfig -p | grep python
libpython2.7.so.1.0 (libc6,x86-64) => /usr/lib/libpython2.7.so.1.0
libpython2.7.so (libc6,x86-64) => /usr/lib/libpython2.7.so

Run Code Online (Sandbox Code Playgroud)

如何在Python中压缩大数据(超过2Gb)？

Answer 1

JBe*_*rdo 2

这不是 RAM 问题。简单来说，zlib 或 python 绑定都无法处理大于 4GB 的数据。

将您的数据拆分为 4GB（或更小的块）并单独处理每个数据块。

归档时间：	11 年，8 月前
查看次数：	672 次
最近记录：	9 年，10 月前