如何让 python 的“in”运算符只产生真正的单词匹配,而不仅仅是子字符串匹配?

Tom*_*Tom 1 python regex substring pattern-matching string-matching

这是所需的输出:

"bacillus thurungensis" in "bacillus thurungensis"
TRUE

"bacillus thurungensis" in "Sentence containing bacillus thurungensis."
TRUE

"bacillus thurungensis" in "Subspecies bacillus thurungensis34"
FALSE

"bacillus thurungensis" in "bacillus thurungensis, bacillus genus"
TRUE

"bacillus thurungensis" in "Notbacillus thurungensis, must match word"
FALSE
Run Code Online (Sandbox Code Playgroud)

Python 通常认为任何子字符串匹配都是好的,但我并不是在寻找这一点。我希望某些正则表达式或匹配运算符仅当且仅当它将查询视为主题中的单独单词而不仅仅是子字符串时才产生 true。这怎么能实现呢?

Dek*_*kel 5

您可以使用正则表达式代替:

re.match(r"\bbacillus thurungensis\b", "bacillus thurungensis")
re.match(r"\bbacillus thurungensis\b", "Sentence containing bacillus thurungensis.")
re.match(r"\bbacillus thurungensis\b", "Subspecies bacillus thurungensis34")
Run Code Online (Sandbox Code Playgroud)

等等。

\b是一个单词边界r另请注意字符串中的用法r"..."

compile如果您要一遍又一遍地使用正则表达式,您也可以使用:

import re
matcher = re.compile(r'\bbacillus thurungensis\b')
matcher.match("bacillus thurungensis")
# and so on
Run Code Online (Sandbox Code Playgroud)