我想解析一个看起来像这样的文件:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
HEADER
body
body
body
FOOTER
BLABLABLABLA
BLABLABLABLA
BLABLABLABLA
Run Code Online (Sandbox Code Playgroud)
我想提取HEADER和FOOTER之间存在的内容.每个HEADER和FOOTER之间的行数可以有所不同,内容本身也可以编写以下代码来提取:
fd=open(file,"r")
for line in fd:
if not start_flag:
match = re.search(r'.*HEADER.*',line)
if not match:
continue
else:
body=body+line+"\n"
start_flag=True
else:
match_end = re.search(r'.*FOOTER.*',line)
if not match_end:
body=body+line+"\n"
continue
else:
body=body+line+"\n\n"
break
print body
Run Code Online (Sandbox Code Playgroud)
这是使用python从文件中提取内容的最佳方法吗?有什么其他方法可以解决这个问题?
from itertools import groupby
with open(f, "r") as fin:
groups = groupby(fin, key=lambda k:k.strip() in ("HEADER", "FOOTER"))
any(k for k,g in groups)
content = list(next(groups)[1])
print content
Run Code Online (Sandbox Code Playgroud)