pbo*_*bou 5 python regex string
给定如下面的字符串对话,我需要找到对应于每个用户的句子.
text = 'CHRIS: Hello, how are you...
PETER: Great, you? PAM: He is resting.
[PAM SHOWS THE COUCH]
[PETER IS NODDING HIS HEAD]
CHRIS: Are you ok?'
Run Code Online (Sandbox Code Playgroud)
对于上面的对话,我想返回带有三个元素的元组:
1)该人的姓名
2)小写和
3)括号内的句子
像这样的东西:
('CHRIS','你好,你好吗......',无)
('彼得','太棒了,你呢?',无)
('PAM','他正在休息','PAM表示COUCH.彼得正在点头他的头')
('CHRIS','你还好吗?',没有)
等等
我正在尝试使用正则表达式来实现上述目标.到目前为止,我能够通过以下方式获取用户的姓名.我正在努力识别两个用户之间的句子.
('CHRIS', 'Hello, how are you...', None)
('PETER', 'Great, you?', None)
('PAM', 'He is resting', 'PAM SHOWS THE COUCH. PETER IS NODDING HIS HEAD')
('CHRIS', 'Are you ok?', None)
etc...
Run Code Online (Sandbox Code Playgroud)
任何帮助深表感谢.
cs9*_*s95 23
你可以这样做re.findall:
>>> re.findall(r'\b(\S+):([^:\[\]]+?)\n?(\[[^:]+?\]\n?)?(?=\b\S+:|$)', text)
[('CHRIS', ' Hello, how are you...', ''),
('PETER', ' Great, you? ', ''),
('PAM',
' He is resting.',
'[PAM SHOWS THE COUCH]\n[PETER IS NODDING HIS HEAD]\n'),
('CHRIS', ' Are you ok?', '')]
Run Code Online (Sandbox Code Playgroud)
您将不得不弄清楚如何自己删除方括号,这仍然无法使用正则表达式,同时仍尝试匹配所有内容.
正则表达式细分
\b # Word boundary
(\S+) # First capture group, string of characters not having a space
: # Colon
( # Second capture group
[^ # Match anything that is not...
: # a colon
\[\] # or square braces
]+? # Non-greedy match
)
\n? # Optional newline
( # Third capture group
\[ # Literal opening brace
[^:]+? # Similar to above - exclude colon from match
\]
\n? # Optional newlines
)? # Third capture group is optional
(?= # Lookahead for...
\b # a word boundary, followed by
\S+ # one or more non-space chars, and
: # a colon
| # Or,
$ # EOL
)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
689 次 |
| 最近记录: |