相关疑难解决方法(0)

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent':user_agent }
link = "http://www.abc.com"
req = urllib2.Request(link, headers = headers)
page = urllib2.urlopen(req).read() - ERROR 402 generated here!

Run Code Online (Sandbox Code Playgroud)

如果页面不存在(错误402,或其他任何错误),我该怎么做page = ...才能确保我正在阅读的页面退出？

html python urlopen

Jam*_*len

2013 05-28

50
推荐指数

7
解决办法

9万
查看次数

Python:从urllib2.urlopen调用获取HTTP头？

是否urllib2在urlopen拨打电话时获取整个页面？

我想在不获取页面的情况下读取HTTP响应头.它看起来像urllib2打开HTTP连接,然后获取实际的HTML页面......或者它是否只是通过urlopen调用开始缓冲页面？

import urllib2
myurl = 'http://www.kidsidebyside.org/2009/05/come-and-draw-the-circle-of-unity-with-us/'
page = urllib2.urlopen(myurl) // open connection, get headers

html = page.readlines()  // stream page

Run Code Online (Sandbox Code Playgroud)

python forwarding urllib

shi*_*eta

2016 12-04

47
推荐指数

5
解决办法

10万
查看次数

使用来自Python 2的urllib2发出HTTP HEAD请求

我正在尝试使用Python 2对页面执行HEAD请求.

我在尝试

import misc_urllib2
.....
opender = urllib2.build_opener([misc_urllib2.MyHTTPRedirectHandler(), misc_urllib2.HeadRequest()])

Run Code Online (Sandbox Code Playgroud)

用misc_urllib2.py含

class HeadRequest(urllib2.Request):
    def get_method(self):
        return "HEAD"


class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def __init__ (self):
        self.redirects = []

    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301(
                self, req, fp, code, msg, headers)
        result.redirect_code = code
        return result

    http_error_302 = http_error_303 = http_error_307 = http_error_301

Run Code Online (Sandbox Code Playgroud)

但我得到了

TypeError: __init__() takes at least 2 arguments (1 given)

Run Code Online (Sandbox Code Playgroud)

如果我这样做

opender = urllib2.build_opener(misc_urllib2.MyHTTPRedirectHandler())

Run Code Online (Sandbox Code Playgroud)

然后它工作正常

python urllib2 head python-2.7

Wiz*_*ard

2016 06-27

23
推荐指数

1
解决办法

2万
查看次数

在不下载网页的情况下使用Python检查链接是否已死

对于那些知道的人wget,它有一个选项--spider,允许人们在没有实际下载网页的情况下检查链接是否损坏.我想在Python中做同样的事情.我的问题是我有一个我要检查的100'000个链接列表,每天最多一次,每周至少一次.无论如何,这将产生大量不必要的流量.

据我从urllib2.urlopen()文档中了解,它不下载页面而只下载元信息.它是否正确？或者是否有其他方式以一种很好的方式做到这一点？

最好的,
Troels

python urllib2

Tro*_*els

2010 07-12

6
推荐指数

1
解决办法

6507
查看次数

如何在不下载的情况下检查对象的HTTP状态代码？

>>> a=urllib.urlopen('http://www.domain.com/bigvideo.avi')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200

Run Code Online (Sandbox Code Playgroud)

我的问题是... bigvideo.avi是500MB.我的脚本是否首先下载文件,然后检查它？或者,它可以立即检查错误代码而不保存文件？

python http

TIM*_*MEX

2009 11-14

4
推荐指数

1
解决办法

3848
查看次数

如何使用Python/django检查http上的文件是否存在？

如何使用Python/Django检查http上的文件是否存在？

我尝试检查http://hostname/directory/file.jpg中的文件是否存在

python

Nip*_*ips

lucky-day

4
推荐指数

2
解决办法

9940
查看次数

在Python中,我如何检查2个不同的链接是否实际指向同一页面？

例如,这两个链接指向同一位置:

http://www.independent.co.uk/life-style/gadgets-and-tech/news/chinese-blamed-for-gmail-hacking-2292113.html

http://www.independent.co.uk/life-style/gadgets-and-tech/news/2292113.html

我如何在python中检查这个？

python urllib2

tap*_*pan

lucky-day

3
推荐指数

1
解决办法

1398
查看次数

如何使用urllib从网上下载图像

我正在尝试使用以下代码下载图像:

from urllib import urlretrieve
urlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg', 
            'google-image-search.jpg')

Run Code Online (Sandbox Code Playgroud)

有效.图像已下载,可由任何图像查看器软件打开.

但是,下面的代码不起作用.下载的图像只有2KB,任何图像查看器都无法打开.

from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 
            'Zindagi1976.jpg')

Run Code Online (Sandbox Code Playgroud)

这是HTML格式的结果.

    ERROR

The requested URL could not be retrieved

While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

The following error was encountered:

Access Denied.
Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.

Your cache administrator is nobody. 
Generated Mon, 05 Dec 2011 17:19:53 GMT by sq56.wikimedia.org (squid/2.7.STABLE9)

Run Code Online (Sandbox Code Playgroud)

python urllib

作者

2016 01-09

3
推荐指数

1
解决办法

8440
查看次数