如何用漂亮的汤刮掉谷歌搜索的第一个链接

Question

如何用漂亮的汤刮掉谷歌搜索的第一个链接

我正在尝试制作一个脚本,它将刮掉谷歌搜索的第一个链接,这样它只会返回第一个链接,这样我就可以在终端中运行搜索,然后在搜索词中查看链接.我很难得到第一个结果.这是我到目前为止最接近的事情.

import requests
from bs4 import BeautifulSoup

research_later = "hiya"
goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + research_later


r = requests.get(goog_search)    
soup = BeautifulSoup(r.text)  

for link in soup.find_all('a'):
    print research_later + " :"+link.get('href')

Run Code Online (Sandbox Code Playgroud)

Answer 1

Kev*_*uan 10

好像谷歌使用cite标签来保存链接,所以我们可以soup.find('cite').text像这样使用:

import requests
from bs4 import BeautifulSoup

research_later = "hiya"
goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + research_later


r = requests.get(goog_search)

soup = BeautifulSoup(r.text, "html.parser")
print soup.find('cite').text

Run Code Online (Sandbox Code Playgroud)

输出是:

www.urbandictionary.com/define.php?term=hiya

Run Code Online (Sandbox Code Playgroud)

这现在不起作用..为什么？它返回错误，如 '''print (soup.find('cite').text) AttributeError: 'NoneType' object has no attribute 'text'''' (4认同)

归档时间：	10 年，2 月前
查看次数：	4522 次
最近记录：	10 年，2 月前