在文本文件Python中重复提取两个分隔符之间的一行

Ren*_*auf 11 python regex

我有一个以下格式的文本文件:

DELIMITER1
extract me
extract me
extract me
DELIMITER2
Run Code Online (Sandbox Code Playgroud)

我想extract me在.txt文件中提取DELIMITER1和DELIMITER2之间的每个块

这是我目前的不良代码:

import re
def GetTheSentences(file):
     fileContents =  open(file)
     start_rx = re.compile('DELIMITER')
     end_rx = re.compile('DELIMITER2')

     line_iterator = iter(fileContents)
     start = False
     for line in line_iterator:
           if re.findall(start_rx, line):

                start = True
                break
      while start:
           next_line = next(line_iterator)
           if re.findall(end_rx, next_line):
                break

           print next_line

           continue
      line_iterator.next()
Run Code Online (Sandbox Code Playgroud)

有任何想法吗?

Bre*_*wey 21

您可以使用简化这一个正则表达式re.S中,DOTALL标志.

import re
def GetTheSentences(infile):
     with open(infile) as fp:
         for result in re.findall('DELIMITER1(.*?)DELIMITER2', fp.read(), re.S):
             print result
# extract me
# extract me
# extract me
Run Code Online (Sandbox Code Playgroud)

这也使用了非贪婪的运算符.*?,因此将找到多个不重叠的DELIMITER1-DELIMITER2对的块.

  • 提示:如果您的文件太大而无法一次性读取,请将其与内存映射文件对象(通过`mmap`模块)一起使用. (3认同)

agf*_*agf 5

如果分隔符在一行内:

def get_sentences(filename):
    with open(filename) as file_contents:
        d1, d2 = '.', ',' # just example delimiters
        for line in file_contents:
            i1, i2 = line.find(d1), line.find(d2)
            if -1 < i1 < i2:
                yield line[i1+1:i2]


sentences = list(get_sentences('path/to/my/file'))
Run Code Online (Sandbox Code Playgroud)

如果他们在自己的线上:

def get_sentences(filename):
    with open(filename) as file_contents:
        d1, d2 = '.', ',' # just example delimiters
        results = []
        for line in file_contents:
            if d1 in line:
                results = []
            elif d2 in line:
                yield results
            else:
                results.append(line)

sentences = list(get_sentences('path/to/my/file'))
Run Code Online (Sandbox Code Playgroud)