正则表达式:re.findall()用于所有字母单词的集合

Question

我正在尝试使用re.findall()函数来测试具有一组所有字母单词的句子.这是我的代码:

import re
s = 'Hello from the other side'
lst = re.findall('[:alpha:]', s)
print (lst)

有关如何更改代码的任何建议？

Answer 1

Python不支持POSIX :alpha:.写这个:

re.findall(r'[A-Za-z]+', s)

避免使用\w+除了字母字符外还接受下划线和数字.唯一真正的优点\w+是它适用于 re.LOCALE旗帜.

当我解析自然句子以提取整个单词时,我通常会扩展允许的字符以允许连字符和撇号:

re.findall(r"[A-Za-z\-\']+", s)

这将接受"不要"和"重新发明"和"死胡同"之类的词语,但会拒绝数字,下划线,空格,引号和其他标点符号.