with gzip.open("/tar/access.tar.gz", 'rb') as f_in:
with open("/tar/access.tar", 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
Run Code Online (Sandbox Code Playgroud)
我的输入文件是150GB.一旦我意识到尝试执行此操作时内存为50GB,我将服务器提升到了432GB的内存.gzip是否首先尝试在内存中打开整个文件?为什么432GB不够用?
确切的错误是OSError: [Errno 14] Bad address: '/tar/access.tar.gz'但是当存在内存错误时抛出此错误.
堆栈跟踪 :
/usr/lib/python3.5/gzip.py in open(filename, mode, compresslevel, encoding, errors, newline)
51 gz_mode = mode.replace("t", "")
52 if isinstance(filename, (str, bytes)):
---> 53 binary_file = GzipFile(filename, gz_mode, compresslevel)
54 elif hasattr(filename, "read") or hasattr(filename, "write"):
55 binary_file = GzipFile(None, gz_mode, compresslevel, filename)
/usr/lib/python3.5/gzip.py in __init__(self, filename, mode, compresslevel, fileobj, mtime)
161 mode += 'b'
162 if fileobj is None:
--> 163 fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
164 if filename is None:
165 filename = getattr(fileobj, 'name', '')
OSError: [Errno 14] Bad address: '/tar/access.tar.gz'
Run Code Online (Sandbox Code Playgroud)
对于任何不相信这是内存问题而且认为该文件不存在的人,基于措辞不佳的错误消息,16gb会在几秒钟内失败,64GB持续不到一分钟,432GB持续失败前5分钟.
什么是解压缩gzip文件的解决方案,而没有全部内存?
相关问题 - Python OSError:从大文件读取时的地址错误
也失败了以下内容:
with gzip.open("/dbfs/tmp/tar/access.tar.gz", 'rb') as f_in:
print("here")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
108 次 |
| 最近记录: |