如何使用 Beautifulsoup 访问前五个 Google 结果链接

Question

如何使用 Beautifulsoup 访问前五个 Google 结果链接

Log*_*ogs 5 python url beautifulsoup hyperlink google-search

我想访问来自 Google 的前五个（或任何指定数量的）结果链接。经过研究，我发现并修改了以下代码。

import requests
from bs4 import BeautifulSoup
import re    
search = raw_input("Search:")
page = requests.get("https://www.google.com/search?q=" + search)
soup = BeautifulSoup(page.content, "lxml")
links = soup.find("a")
print links.get('href')

Run Code Online (Sandbox Code Playgroud)

这将返回页面上的第一个链接，似乎每次都是 Google 图片标签。

这不完全是我想要的。首先，我不想要任何谷歌网站的链接，只想要结果。另外，我想要前三个或五个或任何指定数量的结果。

我如何使用 python 来做到这一点？

提前致谢！

Answer 1

Ped*_*ito 7

您可以使用：

import requests
from bs4 import BeautifulSoup
import re
search = input("Search:")
results = 100 # valid options 10, 20, 30, 40, 50, and 100
page = requests.get(f"https://www.google.com/search?q={search}&num={results}")
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a")
for link in links :
    link_href = link.get('href')
    if "url?q=" in link_href and not "webcache" in link_href:
        print (link.get('href').split("?q=")[1].split("&sa=U")[0])

Run Code Online (Sandbox Code Playgroud)

谷歌搜索演示

对于duckduckgo.com使用：

import requests
from bs4 import BeautifulSoup
import re
search = input("Search:")
h = {"Host":"duckduckgo.com", "Origin": "https://duckduckgo.com", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
d = {"q":search}
page = requests.post(f"https://duckduckgo.com/html/", data=d, headers=h)
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a", {"class": "result__a"})
for link in links :
    link_href = link.get('href')
    if not "https://duckduckgo.com" in link_href:
        print(link_href)

Run Code Online (Sandbox Code Playgroud)

Answer 2

myf*_*hub 1

让您的选择器更加具体。请注意，结果 div 具有此类“_NId”。因此，选择该 div 内的第一个链接。

result_divs = soup.findAll('div', {'class': '_NId'})[:4]
links = [div.find('a') for div in result_divs]
hrefs = [link.get('href') for link in links]

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，8 月前
查看次数：	4118 次
最近记录：	4 年，10 月前