在适当的位置更改python文件

Mau*_*lin 8 python file

我有一个大的xml文件(40 Gb),我需要分成更小的块.我正在使用有限的空间,所以当我将它们写入新文件时,有没有办法从原始文件中删除行?

谢谢!

Tor*_*rek 7

假设您要将文件拆分为N个,然后只需从文件的后面开始读取(或多或少)并重复调用truncate:

截断文件的大小.如果存在可选的大小参数,则文件将截断为(最多)该大小.大小默认为当前位置.当前文件位置未更改....

import os
import stat

BUF_SIZE = 4096
size = os.stat("large_file")[stat.ST_SIZE]
chunk_size = size // N 
# or simply set a fixed chunk size based on your free disk space
c = 0

in_ = open("large_file", "r+")

while size > 0:
    in_.seek(-min(size, chunk_size), 2)
    # now you have to find a safe place to split the file at somehow
    # just read forward until you found one
    ...
    old_pos = in_.tell()
    with open("small_chunk%2d" % (c, ), "w") as out:
        b = in_.read(BUF_SIZE)
        while len(b) > 0:
            out.write(b)
            b = in_.read(BUF_SIZE)
    in_.truncate(old_pos)
    size = old_pos
    c += 1
Run Code Online (Sandbox Code Playgroud)

小心,因为我没有测试任何这个.flush截断调用后可能需要调用,我不知道文件系统实际释放空间的速度有多快.

  • 祝你好运:) (2认同)