无法在python中获取页面源代码

Question

无法在python中获取页面源代码

我正在尝试使用以下方法获取页面的源代码:

import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data

Run Code Online (Sandbox Code Playgroud)

而且通过使用user_agent(headers) 我没有成功获取页面的源代码!

你们有什么想法可以做些什么吗？提前致谢

Answer 1

Mar*_*ard 7

我尝试过并且请求有效,但您收到的内容表明您的浏览器必须接受cookie(法语).你可以解决这个问题urllib2,但我认为最简单的方法是使用requestslib(如果你不介意有额外的依赖).

要安装requests:

pip install requests

Run Code Online (Sandbox Code Playgroud)

然后在你的脚本中:

import requests

url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'

response = requests.get(url)
print(response.content)

Run Code Online (Sandbox Code Playgroud)

我很确定页面的源代码将是您所期望的.

归档时间：	12 年，5 月前
查看次数：	13156 次
最近记录：	9 年，1 月前