Python正则表达式：如何选择两个模式之间的行

Question

Python正则表达式：如何选择两个模式之间的行

sci*_*ci9 4 python regex nlp dataframe pandas

考虑如下典型的实时聊天数据：

Peter (08:16): 
Hi 
What's up? 
;-D

Anji Juo (09:13): 
Hey, I'm using WhatsApp!

Peter (11:17):
Could you please tell me where is the feedback?

Anji Juo (19:13): 
I don't know where it is. 

Anji Juo (19:14): 
Do you by any chance know where I can catch a taxi ?

Run Code Online (Sandbox Code Playgroud)

要将这个原始文本文件转换为 DataFrame，我需要编写一些正则表达式来识别列名称，然后提取相应的值。

请参阅https://regex101.com/r/X3ubqF/1

Index(time)     Name        Message
08:16           Peter       Hi 
                            What's up? 
                            ;-D
09:13           Anji Juo    Hey, I'm using WhatsApp!
11:17           Peter       Could you please tell me where is the feedback?
19:13           Anji Juo    I don't know where it is. 
19:14           Anji Juo    Do you by any chance know where I can catch a taxi ?

Run Code Online (Sandbox Code Playgroud)

正则表达式r"(?P<Name>.*?)\s*\((?P<Index>(?:\d|[01]\d|2[0-3]):[0-5]\d)\)"可以完美提取时间和名称列的值，但我不知道如何为每个时间索引突出显示和提取来自特定发件人的消息。

Answer 1

And*_*ely 5

您可以使用re模块来解析字符串（regex101）：

import re

s = """
Peter (08:16): 
Hi 
What's up? 
;-D

Anji Juo (09:13): 
Hey, I'm using WhatsApp!

Peter (11:17):
Could you please tell me where is the feedback?

Anji Juo (19:13): 
I don't know where it is. 

Anji Juo (19:14): 
Do you by any chance know where I can catch a taxi ?

"""


all_data = []
for part in re.findall(
    r"^\s*(.*?)\s+\(([^)]+)\):\s*(.*?)(?:\n\n|\Z)", s, flags=re.M | re.S
):
    all_data.append(part)

df = pd.DataFrame(all_data, columns=["Index(time)", "Name", "Message"])
print(df)

Run Code Online (Sandbox Code Playgroud)

印刷：

  Index(time)   Name                                                      Message
0       Peter  08:16                                        Hi \nWhat's up? \n;-D
1    Anji Juo  09:13                                     Hey, I'm using WhatsApp!
2       Peter  11:17              Could you please tell me where is the feedback?
3    Anji Juo  19:13                                   I don't know where it is. 
4    Anji Juo  19:14  Do you by any chance know where I can catch a taxi ?\n\n

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，9 月前
查看次数：	966 次
最近记录：	4 年，9 月前