Bri*_*lip 11 python string file
使用python,我想在字典中读取特定字符串后面的文本文件中的所有行.我想在成千上万的文本文件中做到这一点.
我能够使用以下代码识别并打印出特定字符串('Abstract')(从此堆栈溢出答案获得):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
Run Code Online (Sandbox Code Playgroud)
但是我如何告诉python开始读取仅在字符串后面的行?
Pad*_*ham 19
当你到达你想要开始的那条线时,就开始另一个循环:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
Run Code Online (Sandbox Code Playgroud)
文件对象是它自己的迭代器,所以当我们到达包含Abstract的行时,我们继续从该行迭代,直到我们使用了迭代器.
一个简单的例子:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print("starting second loop")
for x in gen:
print("In second loop",x)
else:
print("In first loop", x)
In first loop 0
In first loop 1
In first loop 2
starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
Run Code Online (Sandbox Code Playgroud)
您还可以使用itertools.dropwhile来消耗直到您想要的点.
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: "Abstract" not in _line, f)
next(dropped,"")
for line in dropped:
print(line)
Run Code Online (Sandbox Code Playgroud)
使用布尔值忽略到该点为止的行:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
Run Code Online (Sandbox Code Playgroud)
您可以使用itertools.dropwhile和itertools.islice在这里,伪例如:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
Run Code Online (Sandbox Code Playgroud)
对我来说,下面的代码更容易理解。
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Run Code Online (Sandbox Code Playgroud)