Python:如何遍历行块

Adi*_*dia 6 python text-processing

如何通过空行分隔的行块?该文件如下所示:

ID: 1
Name: X
FamilyN: Y
Age: 20

ID: 2
Name: H
FamilyN: F
Age: 23

ID: 3
Name: S
FamilyN: Y
Age: 13

ID: 4
Name: M
FamilyN: Z
Age: 25
Run Code Online (Sandbox Code Playgroud)

我想循环遍历块并在3列的列表中获取名称,姓氏和年龄字段:

Y X 20
F H 23
Y S 13
Z M 25
Run Code Online (Sandbox Code Playgroud)

unu*_*tbu 12

这是另一种方法,使用itertools.groupby.该函数groupy遍历文件的行并调用isa_group_separator(line)每个行line.isa_group_separator返回True或False(称为key),itertools.groupby然后对产生相同True或False结果的所有连续行进行分组.

这是将线路收集到组中的一种非常方便的方法.

import itertools

def isa_group_separator(line):
    return line=='\n'

with open('data_file') as f:
    for key,group in itertools.groupby(f,isa_group_separator):
        # print(key,list(group))  # uncomment to see what itertools.groupby does.
        if not key:
            data={}
            for item in group:
                field,value=item.split(':')
                value=value.strip()
                data[field]=value
            print('{FamilyN} {Name} {Age}'.format(**data))

# Y X 20
# F H 23
# Y S 13
# Z M 25
Run Code Online (Sandbox Code Playgroud)


S.L*_*ott 5

使用发电机。

def blocks( iterable ):
    accumulator= []
    for line in iterable:
        if start_pattern( line ):
            if accumulator:
                yield accumulator
                accumulator= []
        # elif other significant patterns
        else:
            accumulator.append( line )
     if accumulator:
         yield accumulator
Run Code Online (Sandbox Code Playgroud)


Tim*_*ker 5

import re
result = re.findall(
    r"""(?mx)           # multiline, verbose regex
    ^ID:.*\s*           # Match ID: and anything else on that line 
    Name:\s*(.*)\s*     # Match name, capture all characters on this line
    FamilyN:\s*(.*)\s*  # etc. for family name
    Age:\s*(.*)$        # and age""", 
    subject)
Run Code Online (Sandbox Code Playgroud)

结果将是

[('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]
Run Code Online (Sandbox Code Playgroud)

这可以简单地改成你想要的任何字符串表示.