Adi*_*dia 6 python text-processing
如何通过空行分隔的行块?该文件如下所示:
ID: 1
Name: X
FamilyN: Y
Age: 20
ID: 2
Name: H
FamilyN: F
Age: 23
ID: 3
Name: S
FamilyN: Y
Age: 13
ID: 4
Name: M
FamilyN: Z
Age: 25
Run Code Online (Sandbox Code Playgroud)
我想循环遍历块并在3列的列表中获取名称,姓氏和年龄字段:
Y X 20
F H 23
Y S 13
Z M 25
Run Code Online (Sandbox Code Playgroud)
unu*_*tbu 12
这是另一种方法,使用itertools.groupby.该函数groupy遍历文件的行并调用isa_group_separator(line)每个行line.isa_group_separator返回True或False(称为key),itertools.groupby然后对产生相同True或False结果的所有连续行进行分组.
这是将线路收集到组中的一种非常方便的方法.
import itertools
def isa_group_separator(line):
return line=='\n'
with open('data_file') as f:
for key,group in itertools.groupby(f,isa_group_separator):
# print(key,list(group)) # uncomment to see what itertools.groupby does.
if not key:
data={}
for item in group:
field,value=item.split(':')
value=value.strip()
data[field]=value
print('{FamilyN} {Name} {Age}'.format(**data))
# Y X 20
# F H 23
# Y S 13
# Z M 25
Run Code Online (Sandbox Code Playgroud)
使用发电机。
def blocks( iterable ):
accumulator= []
for line in iterable:
if start_pattern( line ):
if accumulator:
yield accumulator
accumulator= []
# elif other significant patterns
else:
accumulator.append( line )
if accumulator:
yield accumulator
Run Code Online (Sandbox Code Playgroud)
import re
result = re.findall(
r"""(?mx) # multiline, verbose regex
^ID:.*\s* # Match ID: and anything else on that line
Name:\s*(.*)\s* # Match name, capture all characters on this line
FamilyN:\s*(.*)\s* # etc. for family name
Age:\s*(.*)$ # and age""",
subject)
Run Code Online (Sandbox Code Playgroud)
结果将是
[('X', 'Y', '20'), ('H', 'F', '23'), ('S', 'Y', '13'), ('M', 'Z', '25')]
Run Code Online (Sandbox Code Playgroud)
这可以简单地改成你想要的任何字符串表示.