使用 Beautiful Soup 从 td 元素中提取 URL

Question

使用 Beautiful Soup 从 td 元素中提取 URL

我正在尝试从 html 表中提取 URL。URL 位于 td 单元格内的锚标记内。html 看起来像：

<table width="100%" border="0" cellspacing="0" cellpadding="0" name="TabName" id="Tab" class="common-table">
    <tr>
        <td>Acme Company</a><br/><span class="f-10">07-11-2016</span></td>
        <td><span>Vendor</span><br>
        <td><a href="http://URL" title="Report Details">Details</a></td>
    </tr>
</table>

Run Code Online (Sandbox Code Playgroud)

这是我编写的 Python 代码：

from bs4 import BeautifulSoup
import requests
import re

r = requests.get('http://SourceURL')
soup = BeautifulSoup(r.content,"html.parser")
# Find table
table = soup.find("table",{"class": "common-table"})
# Find all tr rows
tr = table.find_all("tr")

for each_tr in tr:
    td = each_tr.find_all('td')
    # In each tr rown find each td cell
    for each_td in td:
        print(each_td.text)
        if(each_td.text == "Details"):

Run Code Online (Sandbox Code Playgroud)

我一直遍历到具有 URL 的最终 td 标记。我现在如何提取 URL？

在此先感谢您的时间。

Answer 1

Ale*_*all 5

像这样：

url = each_td.a['href']

归档时间：	9 年前
查看次数：	1305 次
最近记录：	9 年前