定位没有 id 或 class 属性的表

Question

定位没有 id 或 class 属性的表

Pat*_*P76 5 python beautifulsoup web-scraping python-3.x

我试图用几张桌子来抓取一个网站。这两个表都没有类或 id，并且该站点实际上不使用任何一个，所以我不确定是否有办法获取数据。这是该网站的链接 - 我会发布 html 但它太长了。

http://epi.hbsna.com/products/dept.asp?msi=0&sid=6076533CE8C648AE9883BDDBED795B29&dept_id=315&parent_id=0

我试图提取的表格从第 310 行开始。

Answer 1

ale*_*cxe 8

由于这是一个BeautifulSoup特定的问题，因此这是一个有效的BeautifulSoup特定解决方案。这个想法是找到具有SKU#文本的元素并找到第一个table父元素：

import requests
from bs4 import BeautifulSoup


data = requests.get('http://epi.hbsna.com/products/dept.asp?msi=0&sid=6076533CE8C648AE9883BDDBED795B29&dept_id=315&parent_id=0').content
soup = BeautifulSoup(data, "html.parser")

table = soup.find(text="SKU#").find_parent("table")
for row in table.find_all("tr")[1:]:
    print([cell.get_text(strip=True) for cell in row.find_all("td")])

Run Code Online (Sandbox Code Playgroud)

打印表的内容：

['40010001', 'ABA Service Kit', '-', '1-1/4" 10', 'None', '5-1/2"', '0.63', 'Clamp', '42710566']
['40010002', 'ABA Service Kit', '-', '1-1/4" 10', '5/8" RH', '5-1/2"', '0.63', 'Clamp', '42710566']
...
['40010649', 'ABA Service Kit', '-', '1 1/2 - 10', '1.5', '6"', '0.50', 'Strap', '427-10517']
['40050604', 'ABA Service Kit', 'none', '1 1/2" - 10"', '1 1/2" LH', '6"', '0.50', 'Strap', '427-10601']

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，11 月前
查看次数：	4126 次
最近记录：	9 年，11 月前