Python - 打开和更改大文本文件

GSh*_*ked 5 python replace out-of-memory large-files

我有一个~600MB的Roblox类型.mesh文件,它在任何文本编辑器中都像文本文件一样读取.我有以下代码:

mesh = open("file.mesh", "r").read()
mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{")
mesh = "{"+mesh+"}"
f = open("p2t.txt", "w")
f.write(mesh)
Run Code Online (Sandbox Code Playgroud)

它返回:

Traceback (most recent call last):
  File "C:\TheDirectoryToMyFile\p2t2.py", line 2, in <module>
    mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{")
MemoryError
Run Code Online (Sandbox Code Playgroud)

以下是我的文件示例:

[-0.00599, 0.001466, 0.006][0.16903, 0.84515, 0.50709][0.00000, 0.00000, 0][-0.00598, 0.001472, 0.00599][0.09943, 0.79220, 0.60211][0.00000, 0.00000, 0]
Run Code Online (Sandbox Code Playgroud)

我能做什么?

编辑:

我不确定head,follow和tail命令在那个标记为重复的其他线程中是什么.我试图使用它,但无法让它工作.该文件也是一条巨行,它不会分成几行.

Pav*_*kov 5

您需要每次迭代读取一小部分,对其进行分析,然后写入另一个文件或sys.stdout. 试试这个代码:

\n\n
mesh = open("file.mesh", "r")\nmesh_out = open("file-1.mesh", "w")\n\nc = mesh.read(1)\n\nif c:\n    mesh_out.write("{")\nelse:\n    exit(0)\nwhile True:\n    c = mesh.read(1)\n    if c == "":\n        break\n\n    if c == "[":\n        mesh_out.write(",{")\n    elif c == "]":\n        mesh_out.write("}")\n    else:\n        mesh_out.write\xc2\xa9\n
Run Code Online (Sandbox Code Playgroud)\n\n

更新:

\n\n

它运行速度非常慢(感谢 jamylak)。所以我改变了它:

\n\n
import sys\nimport re\n\n\ndef process_char(c, stream, is_first=False):\n    if c == \'\':\n        return False\n    if c == \'[\':\n        stream.write(\'{\' if is_first else \',{\')\n        return True\n    if c == \']\':\n        stream.write(\'}\')\n        return True\n\n\ndef process_file(fname):\n    with open(fname, "r") as mesh:\n        c = mesh.read(1)\n        if c == \'\':\n            return\n        sys.stdout.write(\'{\')\n\n        while True:\n            c = mesh.read(8192)\n            if c == \'\':\n                return\n\n            c = re.sub(r\'\\[\', \',{\', c)\n            c = re.sub(r\'\\]\', \'}\', c)\n            sys.stdout.write(c)\n\n\nif __name__ == \'__main__\':\n    process_file(sys.argv[1])\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在它在处理 1.4G 文件时大约需要 15 秒。运行它:

\n\n
$ python mesh.py file.mesh > file-1.mesh\n
Run Code Online (Sandbox Code Playgroud)\n