python中的gzip库是否试图在内存中打开整个文件?

jay*_*y93 5 python

with gzip.open("/tar/access.tar.gz", 'rb') as f_in:
    with open("/tar/access.tar", 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)
Run Code Online (Sandbox Code Playgroud)

我的输入文件是150GB.一旦我意识到尝试执行此操作时内存为50GB,我将服务器提升到了432GB的内存.gzip是否首先尝试在内存中打开整个文件?为什么432GB不够用?

确切的错误是OSError: [Errno 14] Bad address: '/tar/access.tar.gz'但是当存在内存错误时抛出此错误.

堆栈跟踪 :

/usr/lib/python3.5/gzip.py in open(filename, mode, compresslevel, encoding, errors, newline)
     51     gz_mode = mode.replace("t", "")
     52     if isinstance(filename, (str, bytes)):
---> 53         binary_file = GzipFile(filename, gz_mode, compresslevel)
     54     elif hasattr(filename, "read") or hasattr(filename, "write"):
     55         binary_file = GzipFile(None, gz_mode, compresslevel, filename)

/usr/lib/python3.5/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
    161             mode += 'b'
    162         if fileobj is None:
--> 163             fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
    164         if filename is None:
    165             filename = getattr(fileobj, 'name', '')

OSError: [Errno 14] Bad address: '/tar/access.tar.gz'
Run Code Online (Sandbox Code Playgroud)

对于任何不相信这是内存问题而且认为该文件不存在的人,基于措辞不佳的错误消息,16gb会在几秒钟内失败,64GB持续不到一分钟,432GB持续失败前5分钟.

什么是解压缩gzip文件的解决方案,而没有全部内存?

相关问题 - Python OSError:从大文件读取时的地址错误

也失败了以下内容:

with gzip.open("/dbfs/tmp/tar/access.tar.gz", 'rb') as f_in:
    print("here")
Run Code Online (Sandbox Code Playgroud)