使用BeautifulSoup以特定属性定位<a>

Question

使用BeautifulSoup以特定属性定位<a>

Jam*_*ite 2 python beautifulsoup web-scraping

我试图刮一个有这样一节的页面:

<a name="id_631"></a>

<hr>

<div class="store-class">
    <div>
        <span><strong>Store City</strong</span>
    </div>

    <div class="store-class-content">
        <p>Event listing</p>
        <p>Event listing2</p>
        <p>Event listing3</p>
    </div>

    <div>
        Stuff about contact info
    </div>
</div>

Run Code Online (Sandbox Code Playgroud)

该页面是这样的部分列表,区分它们的唯一方法是使用<a>标记中的name属性.

所以我想我想要那个目标然后转到next_sibling <hr>然后再到下一个兄弟来获得该<div class="store-class">部分.我想要的只是div标签中的信息.

我不确定如何将该<a>标记定位为向下移动两个兄弟姐妹.当我尝试时print(soup.find_all('a', {"name":"id_631"})),只是给我标签中的内容,这没什么.

这是我的脚本:

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.tandyleather.com/en/leathercraft-classes")

soup = soup = BeautifulSoup(r.text, 'html.parser')

print(soup.find("a", id="id_631").find_next_sibling("div", class_="store-class"))

Run Code Online (Sandbox Code Playgroud)

但我得到错误:

Traceback (most recent call last):
File "tandy.py", line 8, in <module>
print(soup.find("a", id="id_631").find_next_sibling("div", class_="store-class"))
AttributeError: 'NoneType' object has no attribute 'find_next_sibling'

Run Code Online (Sandbox Code Playgroud)

Answer 1

ale*_*cxe 5

find_next_sibling() 救援:

soup.find("a", attrs={"name": "id_631"}).find_next_sibling("div", class_="store-class")

Run Code Online (Sandbox Code Playgroud)

此外,html.parser必须替换为lxml或html5lib.

也可以看看:

解析器之间的差异

归档时间：	10 年前
查看次数：	2804 次
最近记录：	10 年前