在 Python 中使用正则表达式捕获电子邮件

Ewo*_*ugz 0 python regex string

我将从一个更大的 CSV 文件中收集分散的电子邮件。我现在正在学习正则表达式。我试图从这个例句中提取电子邮件。但是,电子邮件仅填充了@ 符号和紧接其之前的字母。你能帮我看看出了什么问题吗?

import re

String = "'Jessica's email is jessica@gmail.com, and Daniel's email is daniel123@gmail.com. Edward's is edwardfountain@gmail.com, and his grandfather, Oscar's, is odawg@gmail.com.'"

emails = re.findall(r'.[@]', String)
names = re.findall(r'[A-Z][a-z]*',String)

print(emails)
print(names)
Run Code Online (Sandbox Code Playgroud)

Jea*_*bre 5

您的正则表达式电子邮件根本不起作用:emails = re.findall(r'.[@]', String)匹配 anychar then @

我会尝试不同的方法:匹配句子并提取名称,电子邮件加上以下经验假设(如果您的文本更改过多,则会破坏逻辑)

  • 所有的名称,然后's"is地方(使用非贪婪.*?匹配所有介于两者之间
  • \w 匹配任何字母字符(或下划线),并且只匹配域的一个点(否则它匹配句子的最后一个点)

代码:

import re

String = "'Jessica's email is jessica@gmail.com, and Daniel's email is daniel123@gmail.com. Edward's is edwardfountain@gmail.com, and his grandfather, Oscar's, is odawg@gmail.com.'"

print(re.findall("(\w+)'s.*? is (\w+@\w+\.\w+)",String))
Run Code Online (Sandbox Code Playgroud)

结果:

[('Jessica', 'jessica@gmail.com'), ('Daniel', 'daniel123@gmail.com'), ('Edward', 'edwardfountain@gmail.com'), ('Oscar', 'odawg@gmail.com')]
Run Code Online (Sandbox Code Playgroud)

转换为dict甚至会给你一个字典名称 => 地址:

{'Oscar': 'odawg@gmail.com', 'Jessica': 'jessica@gmail.com', 'Daniel': 'daniel123@gmail.com', 'Edward': 'edwardfountain@gmail.com'}
Run Code Online (Sandbox Code Playgroud)

一般情况需要更多字符(不确定我是否详尽无遗):

String = "'Jessica's email is jessica_123@gmail.com, and Daniel's email is daniel-123@gmail.com. Edward's is edward.fountain@gmail.com, and his grandfather, Oscar's, is odawg@gmail.com.'"

print(re.findall("(\w+)'s.*? is ([\w\-.]+@[\w\-.]+\.[\w\-]+)",String))
Run Code Online (Sandbox Code Playgroud)

结果:

[('Jessica', 'jessica_123@gmail.com'), ('Daniel', 'daniel-123@gmail.com'), ('Edward', 'edward.fountain@gmail.com'), ('Oscar', 'odawg@gmail.com')]
Run Code Online (Sandbox Code Playgroud)