如何获得所有不包含数字的特定长度的单词？

Question

如何获得所有不包含数字的特定长度的单词？

Ha *_*Bom 6 python regex

我有一个输入（包括unicode）：

s = "Question1: a12 is the number of a, b1 is the number of c?u th?"

我想获取所有不包含数字且超过2个字符的单词，希望输出：

['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'c?u', 'th?']。

我试过了

re.compile('[\w]{2,}').findall(s)

并得到

'Question1', 'a12', 'is', 'the', 'number', 'of', 'b1', 'is', 'the', 'number', 'of', 'c?u', 'th?'

有什么方法可以只获取没有数字的单词吗？

Answer 1

Wik*_*żew 4

您可以使用

\n\n

import re\ns = "Question1: a12 is the number of a, b1 is the number of c\xe1\xba\xa7u th\xe1\xbb\xa7"\nprint(re.compile(r\'\\b[^\\W\\d_]{2,}\\b\').findall(s))\n# => [\'is\', \'the\', \'number\', \'of\', \'is\', \'the\', \'number\', \'of\', \'c\xe1\xba\xa7u\', \'th\xe1\xbb\xa7\']\n

Run Code Online (Sandbox Code Playgroud)\n\n

或者，如果您只想限制为仅包含至少 2 个字母的 ASCII 字母单词：

\n\n

print(re.compile(r\'\\b[a-zA-Z]{2,}\\b\').findall(s))\n

Run Code Online (Sandbox Code Playgroud)\n\n

查看Python 演示

\n\n

细节

\n\n

要仅匹配字母，您需要使用[^\\W\\d_]（或r\'[a-zA-Z]仅 ASCII 变体）
要匹配整个单词，您需要单词边界，\\b
为了确保您在正则表达式模式中定义单词边界而不是退格字符，请使用原始字符串文字r\'...\'.

\n\n

因此，r\'\\b[^\\W\\d_]{2,}\\b\'定义一个匹配单词边界、两个或多个字母的正则表达式，然后断言这两个字母后面没有单词字符。

\n

归档时间：	6 年，6 月前
查看次数：	81 次
最近记录：	6 年，6 月前