根据关键字python拆分文本字符串

cur*_*smo 2 python string split list

我有一串这样的文字:

'tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 123456
tx cycle up.... down
rx cycle up.... down
phase:...
rx on scan: 789012
setup
tx cycle up.... down
rx cycle up.... down
tx cycle up.... down
rx cycle up.... down'
Run Code Online (Sandbox Code Playgroud)

我需要将此字符串拆分为一个字符串列表,这些字符串被拆分为这些块:

['tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 123456', 
 'tx cycle up.... down rx cycle up.... down phase:.... rx on scan: 789012',
 'tx cycle up... down rx cycle up.... down',
 'tx cycle up... down rx cycle up.... down']
Run Code Online (Sandbox Code Playgroud)

有时它们有一个"阶段"和"扫描"数字,但有时它们没有,我需要这一点足以适用于任何这些情况,并且必须对大量数据执行此操作.

基本上,我想将它拆分为一个字符串列表,其中每个元素从'tx'的出现延伸到下一个'tx'(包括第一个'tx',但不包括该元素中的下一个'tx').我怎样才能做到这一点?

编辑:假设除了上面的文本字符串,我还有其他文本字符串,如下所示:

'closeloop start
closeloop ..up:677 down:098
closeloop start
closeloop ..up:568 down:123'
Run Code Online (Sandbox Code Playgroud)

我的代码遍历每个文本字符串,并使用拆分代码将其拆分为列表.但是当它到达这个文本字符串时,它将找不到任何要拆分的东西 - 那么如果它们出现的话,如何在'closeloop start'行中包含一个命令来拆分,如果出现这些行,那么tx行就像之前一样?我尝试了这段代码但是我遇到了一个TypeError:

data = re.split(r'\n((?=tx)|(?=closeloop\sstart))', data)
Run Code Online (Sandbox Code Playgroud)

Mar*_*ers 8

您可以拆分后面跟的换行符tx:

import re

re.split(r'\n(?=tx)', inputtext)
Run Code Online (Sandbox Code Playgroud)

演示:

>>> import re
>>> inputtext = '''tx cycle up.... down
... rx cycle up.... down
... phase:...
... rx on scan: 123456
... tx cycle up.... down
... rx cycle up.... down
... phase:...
... rx on scan: 789012
... setup
... tx cycle up.... down
... rx cycle up.... down
... tx cycle up.... down
... rx cycle up.... down'''
>>> re.split(r'\n(?=tx)', inputtext)
['tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 123456', 'tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 789012\nsetup', 'tx cycle up.... down\nrx cycle up.... down', 'tx cycle up.... down\nrx cycle up.... down']
>>> from pprint import pprint
>>> pprint(_)
['tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 123456',
 'tx cycle up.... down\nrx cycle up.... down\nphase:...\nrx on scan: 789012\nsetup',
 'tx cycle up.... down\nrx cycle up.... down',
 'tx cycle up.... down\nrx cycle up.... down']
Run Code Online (Sandbox Code Playgroud)

但是,如果您只是循环输入文件对象(逐行读取),则可以在收集行时处理每个块:

section = []
for line in open_file_object:
    if line.startswith('tx'):
        # new section
        if section:
            process_section(section)
        section = [line]
    else:
        section.append(line)
if section:
    process_section(section)
Run Code Online (Sandbox Code Playgroud)

如果您需要匹配多个起始行,请在每个起始行中包含每个起始行|:

data = re.split(r'\n(?=tx|closeloop\sstart)', data)
Run Code Online (Sandbox Code Playgroud)

  • @RickTeachey:不,这不是一个非捕获组.这是一个前瞻性的断言.环顾四周从不是比赛的一部分,而是他们的锚点. (4认同)