text = "This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE."
pattern = '[A-Z]+[A-Z]+[A-Z]*[\s]+'
Run Code Online (Sandbox Code Playgroud)
re.findall(pattern, text)给出输出 -->
['TEXT ', 'CONTAINING ', 'UPPER ', 'CASE ', 'WORDS ', 'SECOND ', 'SENTENCE ']
Run Code Online (Sandbox Code Playgroud)
但是,我想要这样的输出 -->
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
Run Code Online (Sandbox Code Playgroud)
anu*_*ava 13
您可以使用这个正则表达式:
\b[A-Z]+(?:\s+[A-Z]+)*\b
Run Code Online (Sandbox Code Playgroud)
正则表达式详细信息:
\b:字边界[A-Z]+:匹配仅包含大写字母的单词(?:\s+[A-Z]+)*:匹配 1 个以上的空格,后跟另一个大写字母的单词。匹配该组 0 次或多次\b:字边界代码:
>>> s = 'This is a TEXT CONTAINING UPPER CASE WORDS and lower case words. This is a SECOND SENTENCE';
>>> print (re.findall(r'\b[A-Z]+(?:\s+[A-Z]+)*\b', s))
['TEXT CONTAINING UPPER CASE WORDS', 'SECOND SENTENCE']
Run Code Online (Sandbox Code Playgroud)