Python Beautifulsoup Find_all除外

blo*_*tdj 5 python beautifulsoup html-parsing python-3.x

我正在努力寻找解决该问题的简单方法,希望您能为您提供帮助。

我一直在使用Beautifulsoup的find all并尝试一些正则表达式来查找除以下html中的“ emptyLine”行以外的所有项:

<div class="product_item0 ">...</div>
<div class="product_item1 ">...</div>
<div class="product_item2 ">...</div>
<div class="product_item0 ">...</div>
<div class="product_item1 ">...</div>
<div class="product_item2 ">...</div>
<div class="product_item0 ">...</div>
<div class="product_item1 last">...</div>
<div class="product_item2 emptyItem">...</div>
Run Code Online (Sandbox Code Playgroud)

有没有一种简单的方法来查找除“ emptyItem”以外的所有项目?

ale*_*cxe 6

只需跳过包含emptyItem该类的元素。工作样本:

from bs4 import BeautifulSoup

data = """
<div>
    <div class="product_item0">test0</div>
    <div class="product_item1">test1</div>
    <div class="product_item2">test2</div>
    <div class="product_item2 emptyItem">empty</div>
</div>
"""

soup = BeautifulSoup(data, "html.parser")

for elm in soup.select("div[class^=product_item]"):
    if "emptyItem" in elm["class"]:  # skip elements having emptyItem class
        continue

    print(elm.get_text())
Run Code Online (Sandbox Code Playgroud)

印刷:

test0
test1
test2
Run Code Online (Sandbox Code Playgroud)

请注意,div[class^=product_item]是一个 CSS 选择器,它将匹配所有div具有以 开头的类的元素product_item