在Python中编写巨大的字符串

Question

在Python中编写巨大的字符串

Joã*_*ias 3 python performance file-io python-3.x

我有一个很长的字符串，几乎有一兆字节长，我需要将其写入文本文件。常规的

file = open("file.txt","w")
file.write(string)
file.close()

Run Code Online (Sandbox Code Playgroud)

可以，但是太慢了，有什么办法可以写得更快吗？

我正在尝试将几百万位数字写入文本文件，该数字约为math.factorial(67867957)

这是分析中显示的内容：

    203 function calls (198 primitive calls) in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 re.py:217(compile)
        1    0.000    0.000    0.000    0.000 re.py:273(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:172(_compile_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:201(_optimize_charset)
        4    0.000    0.000    0.000    0.000 sre_compile.py:25(_identityfunction)
      3/1    0.000    0.000    0.000    0.000 sre_compile.py:33(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:341(_compile_info)
        2    0.000    0.000    0.000    0.000 sre_compile.py:442(isstring)
        1    0.000    0.000    0.000    0.000 sre_compile.py:445(_code)
        1    0.000    0.000    0.000    0.000 sre_compile.py:460(compile)
        5    0.000    0.000    0.000    0.000 sre_parse.py:126(__len__)
       12    0.000    0.000    0.000    0.000 sre_parse.py:130(__getitem__)
        7    0.000    0.000    0.000    0.000 sre_parse.py:138(append)
      3/1    0.000    0.000    0.000    0.000 sre_parse.py:140(getwidth)
        1    0.000    0.000    0.000    0.000 sre_parse.py:178(__init__)
       10    0.000    0.000    0.000    0.000 sre_parse.py:183(__next)
        2    0.000    0.000    0.000    0.000 sre_parse.py:202(match)
        8    0.000    0.000    0.000    0.000 sre_parse.py:208(get)
        1    0.000    0.000    0.000    0.000 sre_parse.py:351(_parse_sub)
        2    0.000    0.000    0.000    0.000 sre_parse.py:429(_parse)
        1    0.000    0.000    0.000    0.000 sre_parse.py:67(__init__)
        1    0.000    0.000    0.000    0.000 sre_parse.py:726(fix_flags)
        1    0.000    0.000    0.000    0.000 sre_parse.py:738(parse)
        3    0.000    0.000    0.000    0.000 sre_parse.py:90(__init__)
        1    0.000    0.000    0.000    0.000 {built-in method compile}
        1    0.001    0.001    0.001    0.001 {built-in method exec}
       17    0.000    0.000    0.000    0.000 {built-in method isinstance}
    39/38    0.000    0.000    0.000    0.000 {built-in method len}
        2    0.000    0.000    0.000    0.000 {built-in method max}
        8    0.000    0.000    0.000    0.000 {built-in method min}
        6    0.000    0.000    0.000    0.000 {built-in method ord}
       48    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        5    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}

Run Code Online (Sandbox Code Playgroud)

Answer 1

jfs*_*jfs 5

你的问题是，str(long)对于Python中的大整数（数百万位）来说，速度非常慢。它是 Python 中的二次运算（以位数为单位），即，对于约 1e8 位数字，可能需要约 1e16 次运算才能将整数转换为十进制字符串。

写入 500MB 的文件不应花费数小时，例如：

$ python3 -c 'open("file", "w").write("a"*500*1000000)'

Run Code Online (Sandbox Code Playgroud)

几乎立即返回。ls -l file确认文件已创建并且具有预期大小。

计算math.factorial(67867957)（结果有大约 5 亿位数字）可能需要几个小时，但使用它保存pickle是即时的：

import math
import pickle

n = math.factorial(67867957) # takes a long time
with open("file.pickle", "wb") as file:
    pickle.dump(n, file) # very fast (comparatively)

Run Code Online (Sandbox Code Playgroud)

使用它加载回来n = pickle.load(open('file.pickle', 'rb'))只需不到一秒钟的时间。

str(n)仍在我的机器上运行（50 小时后）。

要快速获得十进制表示形式，您可以使用gmpy2：

$ python -c'import gmpy2;open("file.gmpy2", "w").write(str(gmpy2.fac(67867957)))'

Run Code Online (Sandbox Code Playgroud)

在我的机器上只需要不到 10 分钟。

归档时间：	10 年，11 月前
查看次数：	3579 次
最近记录：	10 年，11 月前