如何从 BeautifulSoup 结果中获取第三个链接

Question

如何从 BeautifulSoup 结果中获取第三个链接

mar*_*shp 3 python beautifulsoup python-2.7

我正在使用以下代码来使用 BeautifulSoup 检索一堆链接。它返回所有链接，但我想获取第三个链接，解析该链接，然后从该链接获取第三个链接，依此类推。我如何修改下面的代码来实现这一点？

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print tag.get('href', None)
    print tag.contents[0]

Run Code Online (Sandbox Code Playgroud)

Answer 1

ale*_*cxe 5

首先，您应该停止使用BeautifulSoup版本 3 - 它已经很旧并且不再维护。切换到BeautifulSoup版本4。通过以下方式安装：

pip install beautifulsoup4

Run Code Online (Sandbox Code Playgroud)

并将您的导入更改为：

from bs4 import BeautifulSoup

Run Code Online (Sandbox Code Playgroud)

然后，您需要递归地使用find_all()并通过索引获取第三个链接，直到页面上没有第三个链接。这是一种方法：

import urllib
from bs4 import BeautifulSoup

url = raw_input('Enter - ')

while True:
    html = urllib.urlopen(url)
    soup = BeautifulSoup(html, "html.parser")

    try:
        url = soup.find_all('a')[2]["href"]
        # if the link is not absolute, you might need `urljoin()` here
    except IndexError:
        break  # could not get the 3rd link - exiting the loop

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，7 月前
查看次数：	2862 次
最近记录：	9 年，7 月前