Python 3 - 可以处理大于4GB的字节对象吗?

Ran*_*its 55 python size pickle python-3.x

基于此注释和引用的文档,Python 3.4+中的Pickle 4.0+应该能够腌制大于4 GB的字节对象.

但是,在Mac OS X 10.10.4上使用python 3.4.3或python 3.5.0b2时,我尝试挑选一个大字节数组时出错:

>>> import pickle
>>> x = bytearray(8 * 1000 * 1000 * 1000)
>>> fp = open("x.dat", "wb")
>>> pickle.dump(x, fp, protocol = 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
Run Code Online (Sandbox Code Playgroud)

我的代码中是否有错误或我误解了文档?

lun*_*ini 32

这是问题24658的简单解决方法.使用pickle.loads或者pickle.dumps将bytes对象分成大小的块,2**31 - 1以使其进入或退出文件.

import pickle
import os.path

file_path = "pkl.pkl"
n_bytes = 2**31
max_bytes = 2**31 - 1
data = bytearray(n_bytes)

## write
bytes_out = pickle.dumps(data)
with open(file_path, 'wb') as f_out:
    for idx in range(0, len(bytes_out), max_bytes):
        f_out.write(bytes_out[idx:idx+max_bytes])

## read
bytes_in = bytearray(0)
input_size = os.path.getsize(file_path)
with open(file_path, 'rb') as f_in:
    for _ in range(0, input_size, max_bytes):
        bytes_in += f_in.read(max_bytes)
data2 = pickle.loads(bytes_in)

assert(data == data2)
Run Code Online (Sandbox Code Playgroud)

  • 谢谢.这有很大帮助.一件事:对于`write`应该`对于范围内的idx(0,n_bytes,max_bytes):`是`对于范围内的idx(0,len(bytes_out),max_bytes): (4认同)

Mar*_*oma 18

总结评论中回答的内容:

是的,Python可以腌制大于4GB的字节对象.观察到的错误是由实现中的错误引起的(参见问题24658).

  • 这个问题怎么还没有解决?疯 (18认同)
  • 这是2018年,虫子仍在那里.有谁知道为什么? (14认同)
  • It's been fixed for [3.6.8](https://github.com/python/cpython/commit/a5ebc205beea2bf1501e4ac33ed6e81732dd0604), [3.7.2](https://github.com/python/cpython/commit/178d1c07778553bf66e09fe0bb13796be3fb9abf) and [3.8](https://github.com/python/cpython/commit/74a8b6ea7e0a8508b13a1c75ec9b91febd8b5557) 2018 年 10 月;该问题仍然悬而未决,因为作者想向后移植到 2.7。6 周后,随着 Python 2.x 即将停产,这一切都将变得毫无意义。 (2认同)

Sam*_*han 13

这是完整的解决方法,虽然看起来pickle.load不再尝试转储一个巨大的文件(我在Python 3.5.2)所以严格来说只有pickle.dumps需要这个才能正常工作.

import pickle

class MacOSFile(object):

    def __init__(self, f):
        self.f = f

    def __getattr__(self, item):
        return getattr(self.f, item)

    def read(self, n):
        # print("reading total_bytes=%s" % n, flush=True)
        if n >= (1 << 31):
            buffer = bytearray(n)
            idx = 0
            while idx < n:
                batch_size = min(n - idx, 1 << 31 - 1)
                # print("reading bytes [%s,%s)..." % (idx, idx + batch_size), end="", flush=True)
                buffer[idx:idx + batch_size] = self.f.read(batch_size)
                # print("done.", flush=True)
                idx += batch_size
            return buffer
        return self.f.read(n)

    def write(self, buffer):
        n = len(buffer)
        print("writing total_bytes=%s..." % n, flush=True)
        idx = 0
        while idx < n:
            batch_size = min(n - idx, 1 << 31 - 1)
            print("writing bytes [%s, %s)... " % (idx, idx + batch_size), end="", flush=True)
            self.f.write(buffer[idx:idx + batch_size])
            print("done.", flush=True)
            idx += batch_size


def pickle_dump(obj, file_path):
    with open(file_path, "wb") as f:
        return pickle.dump(obj, MacOSFile(f), protocol=pickle.HIGHEST_PROTOCOL)


def pickle_load(file_path):
    with open(file_path, "rb") as f:
        return pickle.load(MacOSFile(f))
Run Code Online (Sandbox Code Playgroud)


Yoh*_*dia 8

您可以指定转储的协议。如果你这样做pickle.dump(obj,file,protocol=4)应该工作。

  • 我所做的是:pickle.dump(data,w,protocol=pickle.HIGHEST_PROTOCOL)。有效! (3认同)