循环打开和关闭文件

Question

循环打开和关闭文件

假设我有一个包含数万个条目的列表，我想将它们写入文件。如果列表中的项目符合某些条件，我想关闭当前文件并开始一个新文件。

我有几个问题，我认为它们源于这样一个事实，即我想根据该文件中的第一个条目命名文件。此外，启动新文件的信号基于条目是否具有与前一个相同的字段。因此，例如，假设我有以下列表：

l = [('name1', 10), ('name1', 30), ('name2', 5), ('name2', 7), ('name2', 3), ('name3', 10)]

Run Code Online (Sandbox Code Playgroud)

我想最终得到 3 个文件，name1.txt应该包含10and 30，name2.txt应该有5, 7and 3，并且name3.txt应该有10. 该列表已经按第一个元素排序，所以我需要做的就是检查第一个元素是否与前一个元素相同，如果不是，则开始一个新文件。

起初我试过：

name = None
for entry in l:
    if entry[0] != name:
        out_file.close()
        name = entry[0]
        out_file = open("{}.txt".format(name))
        out_file.write("{}\n".format(entry[1]))
    else:
        out_file.write("{}\n".format(entry[1]))

out_file.close()

Run Code Online (Sandbox Code Playgroud)

据我所知，这有几个问题。首先，第一次通过循环，没有out_file关闭。其次，我不能关闭最后out_file创建的，因为它是在循环内定义的。以下解决了第一个问题，但看起来很笨重：

for entry in l:
    if name:
        if entry[0] != name:
            out_file.close()
            name = entry[0]
            out_file = open("{}.txt".format(name))
            out_file.write("{}\n".format(entry[1]))
        else:
            out_file.write("{}\n".format(entry[1]))
    else:
        name = entry[0]
        out_file = open("{}.txt".format(name))
        out_file.write("{}\n".format(entry[1]))

out_file.close()

Run Code Online (Sandbox Code Playgroud)

有一个更好的方法吗？

而且，这似乎不应该解决关闭最后一个文件的问题，尽管此代码运行良好 - 我是否误解了的范围out_file？我认为它会被限制在for循环内部。

编辑：我可能应该提到，我的数据比这里显示的要复杂得多……它实际上不在列表中，它SeqRecord来自 BioPython

编辑 2：好的，我以为我在简化以避免分心。显然有相反的效果 - 我的过错。下面是上面的第二个代码块的等价物：

from re import sub
from Bio import SeqIO

def gbk_to_faa(some_genbank):
    source = None
    for record in SeqIO.parse(some_genbank, 'gb'):
        if source:
            if record.annotations['source'] != source:
                out_file.close()
                source = sub(r'\W+', "_", sub(r'\W$', "", record.annotations['source']))
                out_file = open("{}.faa".format(source), "a+")
                write_all_record(out_file, record)
            else:
                write_all_record(out_file, record)
        else:
            source = sub(r'\W+', "_", sub(r'\W$', "", record.annotations['source']))
            out_file = open("{}.faa".format(source), "a+")
            write_all_record(out_file, record)

    out_file.close()


def write_all_record(file_handle, gbk_record):
    # Does more stuff, I don't think this is important
    # If it is, it's in this gist: https://gist.github.com/kescobo/49ab9f4b08d8a2691a40

Run Code Online (Sandbox Code Playgroud)

Answer 1

Hug*_*ell 5

使用 Python 提供的工具更容易：

from itertools import groupby
from operator import itemgetter

items = [
    ('name1', 10), ('name1', 30),
    ('name2', 5), ('name2', 7), ('name2', 3),
    ('name3', 10)
]

for name, rows in groupby(items, itemgetter(0)):
    with open(name + ".txt", "w") as outf:
        outf.write("\n".join(str(row[1]) for row in rows))

Run Code Online (Sandbox Code Playgroud)

编辑：为了匹配更新的问题，这里是更新的解决方案;-)

for name, records in groupby(SeqIO.parse(some_genbank, 'gb'), lambda record:record.annotations['source']):
    with open(name + ".faa", "w+") as outf:
        for record in records:
            write_all_record(outf, record)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，10 月前
查看次数：	3875 次
最近记录：	9 年，6 月前