如何提取特定单词之后的行?

eli*_*isa 6 python regex string findall python-3.x

我想在python 3中使用正则表达式获取日期和文本中的特定项目。下面是一个示例:

text = '''
190219 7:05:30 line1 fail
               line1 this is the 1st fail
               line2 fail
               line2 this is the 2nd fail
               line3 success 
               line3 this is the 1st success process
               line3 this process need 3sec
200219 9:10:10 line1 fail
               line1 this is the 1st fail
               line2 success 
               line2 this is the 1st success process
               line2 this process need 4sec
               line3 success 
               line3 this is the 2st success process
               line3 this process need 2sec

'''
Run Code Online (Sandbox Code Playgroud)

在上面的示例中,我想获得“成功行”之后的所有行。这里需要输出:

[('190219','7:05:30','line3 this is the 1st success process', 'line3 this process need 3sec'),
('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process','line3 this process need 2sec')]
Run Code Online (Sandbox Code Playgroud)

这是我想尝试的:

>>> newLine = re.sub(r'\t|\n|\r|\s{2,}',' ', text)
>>> newLine
>>> Out[3]: ' 190219 7:05:30 line1 fail  line1 this is the 1st fail  line2 fail  line2 this is the 2nd fail  line3 success line3 this is the 1st success process  line3 this process need 3sec 200219 9:10:10 line1 fail  line1 this is the 1st fail  line2 success line2 this is the 1st success process  line2 this process need 4sec  line3 success line3 this is the 2st success process  line3 this process need 2sec  '
Run Code Online (Sandbox Code Playgroud)

我不知道获得结果的正确方法是什么。我已经尝试过这样做:

(\b\d{6}\b \d{1,}:\d{2}:\d{2})...
Run Code Online (Sandbox Code Playgroud)

我该如何解决这个问题?

kos*_*oda 1

这是一个使用正则表达式来获取日期的解决方案,并使用常规 Python 来获取其他所有内容。

准备输入:

text = '''
190219 7:05:30 line1 fail
               line1 this is the 1st fail
               line2 fail
               line2 this is the 2nd fail
               line3 success
               line3 this is the 1st success process
               line3 this process need 3sec
200219 9:10:10 line1 fail
               line1 this is the 1st fail
               line2 success
               line2 this is the 1st success process
               line2 this process need 4sec
               line3 success
               line3 this is the 2st success process
               line3 this process need 2sec
'''

# Strip the multiline string, split into lines, then strip each line
lines = [line.strip() for line in text.strip().splitlines()]
result = parse(lines)
Run Code Online (Sandbox Code Playgroud)

解决方案:

import re

def parse(lines):
    result = []
    buffer = []

    success = False
    for line in lines:
        date = re.match(r"(\d{6})\s(\d{1,}:\d{2}:\d{2})", line)
        if date:
            # Store previous match and reset buffer
            if buffer:
                result.append(tuple(buffer))
                buffer.clear()
            # Split the date and time and add to buffer
            buffer.extend(date.groups())
        # Check for status change
        if line.endswith("success") or line.endswith("fail"):
            success = True if line.endswith("success") else False
        # Add current line to buffer if it's part of the succeeded process
        else:
            if success:
                buffer.append(line)
    # Store last match
    result.append(tuple(buffer))
    return result
Run Code Online (Sandbox Code Playgroud)

输出:

result = [('190219', '7:05:30', 'line3 this is the 1st success process', 'line3 this process need 3sec'), ('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process', 'line3 this process need 2sec')]
Run Code Online (Sandbox Code Playgroud)