在BeautifulSoup中查找不同的字符串并返回包含标记

Lau*_*Mat 2 python beautifulsoup

说我有以下HTML:

<p>
If everybody minded their own business, the world would go around a great deal faster than it does.
</p>

<p>
Who in the world am I? Ah, that's the great puzzle.
</p>
Run Code Online (Sandbox Code Playgroud)

我希望能够找到包含我正在寻找的所有关键字的所有标签.例如(示例2和3不起作用):

>>> len(soup.find_all(text="world"))
2

>>> len(soup.find_all(text="world puzzle"))
1

>>> len(soup.find_all(text="world puzzle book"))
0
Run Code Online (Sandbox Code Playgroud)

我一直试图想出一个正则表达式,允许我搜索所有关键字,但似乎ANDing是不可能的(只有ORing).

提前致谢!

Leo*_*son 5

像这样进行复杂匹配的最简单方法是编写一个执行匹配的函数,并将函数作为text参数的值传递.

def must_contain_all(*strings):                                                 
    def must_contain(markup):                                                   
        return markup is not None and all(s in markup for s in strings)         
    return must_contain
Run Code Online (Sandbox Code Playgroud)

现在你可以得到匹配的字符串:

print soup.find_all(text=must_contain_all("world", "puzzle"))
# [u"\nWho in the world am I? Ah, that's the great puzzle.\n"]
Run Code Online (Sandbox Code Playgroud)

要获取包含字符串的标记,请使用.parent运算符:

print [text.parent for text in soup.find_all(text=must_contain_all("world", "puzzle"))]
# [<p>Who in the world am I? Ah, that's the great puzzle.</p>]
Run Code Online (Sandbox Code Playgroud)