为什么这个写入文件的python脚本会突然停止？

Question

为什么这个写入文件的python脚本会突然停止？

这个小脚本读取文件,尝试将每一行与正则表达式匹配,并将匹配的行追加到另一个文件:

regex = re.compile(r"<http://dbtropes.org/resource/Film/.*?> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbtropes.org/resource/Main/.*?> \.")

with open("dbtropes-v2.nt", "a") as output, open("dbtropes.nt", "rb") as input:
    for line in input.readlines():
        if re.findall(regex,line):
            output.write(line)

input.close()
output.close()

Run Code Online (Sandbox Code Playgroud)

但是,剧本在大约5分钟后突然停止.终端显示"Process stopped",输出文件保持空白.

输入文件可以在这里下载:http: //dbtropes.org/static/dbtropes.zip这是4.3Go n-triples文件.

我的代码有问题吗？还有别的吗？任何提示都将在这一个赞赏!

Answer 1

Rob*_*obᵩ 7

它因为内存不足而停止了.input.readlines()在返回行列表之前将整个文件读入内存.

相反,input用作迭代器.这一次只读取几行,并立即返回.

不要这样做:

for line in input.readlines():

Run Code Online (Sandbox Code Playgroud)

这样做:

for line in input:

Run Code Online (Sandbox Code Playgroud)

考虑到每个人的建议,您的计划将变为:

regex = re.compile(r"<http://dbtropes.org/resource/Film/.*?> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbtropes.org/resource/Main/.*?> \.")

with open("dbtropes.nt", "rb") as input:
    with open("dbtropes-v2.nt", "a") as output
        for line in input:
            if regex.search(line):
                output.write(line)

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年前
查看次数：	111 次
最近记录：	11 年前