使用 BeautifulSoup 提取图片标题和图片网址

Question

使用 BeautifulSoup 提取图片标题和图片网址

Bil*_*ton 2 html python parsing beautifulsoup

我正在尝试使用 BeautifulSoup 从文章中提取图像 url 和图像标题。我可以将文章的图片 url 和图片标题与前后 HTML 分开，但我不知道如何将这两个与它们的 html 标签分开。这是我的代码：

from bs4 import BeautifulSoup
import requests
url = 'http://www.prnewswire.com/news-releases/dutch-philosopher-
koert-van-mensvoort-founder-of-the-next-nature-network-writes-a-
letter-to-humanity-619925063.html'
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, 'lxml')
links = soup.find_all('div', {'class': 'image'})

Run Code Online (Sandbox Code Playgroud)

我试图提取的两个部分是 src= 和 title= 部分。任何关于如何完成这两个解析的想法将不胜感激。

Answer 1

Him*_*dua 5

from bs4 import BeautifulSoup
import requests
url = 'http://www.prnewswire.com/news-releases/dutch-philosopher-koert-van-mensvoort-founder-of-the-next-nature-network-writes-a-letter-to-humanity-619925063.html'
r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, 'lxml')
links = soup.find_all('div', {'class': 'image'})
print [i.find('img')['src'] for i in links]
print [i.find('img')['title'] for i in links]

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，7 月前
查看次数：	9581 次
最近记录：	8 年，7 月前