如何使用 Python、Requests 和 Xpath 抓取网站？

Question

如何使用 Python、Requests 和 Xpath 抓取网站？

Nic*_*806 4 python lxml web-scraping python-requests

我尝试使用下面的代码在此网页（https://www.meleenumerique.com/scientist_comite）上抓取人员的名字+姓氏，但它不起作用。我怎样才能确定它出了什么问题？

这是我写的代码

from lxml import html  
import csv,os,json
import requests
url="https://www.meleenumerique.com/scientist_comite"
r=requests.get(url)
t=html.fromstring(r.content)

title=t.xpath('/html/head/title/text()')
#Create the list of speaker
speaker=t.xpath('//span[contains(@class,"speaker-name")]//text()')

print(title)
print("Speakers:",speaker)

Run Code Online (Sandbox Code Playgroud)

Answer 1

SIM*_*SIM 5

您可以尝试使用此Requests-HTML库，它应该可以让您从该页面抓取内容。该库支持 xpath 并且能够处理动态内容。

import requests_html

session = requests_html.HTMLSession()
r = session.get('https://www.meleenumerique.com/scientist_comite')
r.html.render(sleep=5, timeout=8)
for item in r.html.xpath("//*[contains(@class,'speaker-name')]"):
    print(item.text)

Run Code Online (Sandbox Code Playgroud)

仅支持 python 3.6...我将回到 selenium... (2认同)

归档时间：	7 年前
查看次数：	15709 次
最近记录：	7 年前