我需要解析一个巨大的 gz 文件(约 10GB 压缩,约 100GB 未压缩)。该代码在内存中创建数据结构 ('data_struct')。我在一台有Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz16 个 CPU 和充足 RAM(即 200+ GB)的机器上运行,运行 CentOS-6.9。我已经使用 Python3.6.3 (CPython) 中的类实现了这些东西,如下所示:
class my_class():
def __init__(self):
cmd = f'gunzip huge-file.gz'
self.process = subprocess(cmd, stdout=subprocess.PIPE, shell=True)
self.data_struct = dict()
def populate_struct(self):
for line in process.stdout:
<populate the self.data_struct dictionary>
def __del__():
self.process.wait()
#del self.data_struct # presence/absence of this statement decreases/increases runtime respectively
#================End of my_class===================
def main():
my_object = my_class()
my_object.populate_struct()
print(f'~~~~ Finished populate_struct() ~~~~') # last …Run Code Online (Sandbox Code Playgroud)