如何捕获正则表达式python中的两个前瞻

Aer*_*rin 5 python regex regex-lookarounds

这是一个字符串:

str = "Academy \nADDITIONAL\nAwards and Recognition: Greek Man of the Year 2011 Stanford PanHellenic Community, American Delegate 2010 Global\nEngagement Summit, Honorary Speaker 2010 SELA Convention, Semi-Finalist 2010 Strauss Foundation Scholarship Program\nComputer Skills: Competency: MATLAB, MySQL/PHP, JavaScript, Objective-C, Git Proficiency: Adobe Creative Suite, Excel\n(highly advanced), PowerPoint, HTML5/CSS3\nLanguages: Fluent English, Advanced Spanish\n\x0c"
Run Code Online (Sandbox Code Playgroud)

我想从“ ADDTIONAL”捕捉到“ Languages”,所以我写了这个正则表达式:

regex = r'(?<=\n(ADDITIONAL|Additional)\n)[\s\S]+?(?=\n(Languages|LANGUAGES)\n*)'
Run Code Online (Sandbox Code Playgroud)

但是,它只能捕获介于两者之间的所有内容([\s\S]+)。它不会捕获ADDTIONALLanguages。我在这里想念什么?

roc*_*987 3

你的正则表达式是

regex = r'(?<=\n(ADDITIONAL|Additional)\n)[\s\S]+?(?=\n(Languages|LANGUAGES)\n*)'
Run Code Online (Sandbox Code Playgroud)

你的字符串是

Academy \nADDITIONAL\nAwards and Recognition: ... \nLanguages:
                     ^^                          ^^
                     ||                          ||
Match Position:-(?<=\n(ADDITIONAL|Additional)\n)(?=\n(Languages|LANGUAGES)\n*)
Run Code Online (Sandbox Code Playgroud)

因此[\s\S]+?将包含这两个位置之间的内容,不包括ADDITIONALLANGUAGES

您只需找到 的起始位置ADDITIONAL和结束位置即可LANGUAGES。这可以使用以下正则表达式来完成

(?=\n(ADDITIONAL|Additional)\n)([\s\S]+?)(?<=\n(Languages|LANGUAGES)\b)
Run Code Online (Sandbox Code Playgroud)

此外,如果您[\s\S]+?只想捕获所有内容,那么您可以使用非捕获AdditionalLanguages

(?=\n(?:ADDITIONAL|Additional)\n)[\s\S]+?(?<=\n(?:Languages|LANGUAGES)\b)

Academy \nADDITIONAL\nAwards and Recognition: ... \nLanguages:
        ^^                                                  ^^
        ||                                                  ||
(?=\n(ADDITIONAL|Additional)\n)             (?<=\n(Languages|LANGUAGES))
Run Code Online (Sandbox Code Playgroud)

Python代码

p = re.compile(r'(?=\n(?:ADDITIONAL|Additional)\n)[\s\S]+?(?<=\n(?:Languages|LANGUAGES)\b)', re.MULTILINE)
test_str = "Academy \nADDITIONAL\nAwards and Recognition: Greek Man of the Year 2011 Stanford PanHellenic Community, American Delegate 2010 Global\nEngagement Summit, Honorary Speaker 2010 SELA Convention, Semi-Finalist 2010 Strauss Foundation Scholarship Program\nComputer Skills: Competency: MATLAB, MySQL/PHP, JavaScript, Objective-C, Git Proficiency: Adobe Creative Suite, Excel\n(highly advanced), PowerPoint, HTML5/CSS3\nLanguages: Fluent English, Advanced Spanish\n\x0c"
print(re.findall(p, test_str))
Run Code Online (Sandbox Code Playgroud)

IDEONE 演示