Python中的httplib获取状态代码......但是它太棘手了？

Question

Python中的httplib获取状态代码......但是它太棘手了？

TIM*_*MEX 0 python regex http http-headers

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK

Run Code Online (Sandbox Code Playgroud)

此代码将获取HTTP状态代码.但请注意,我在2行上拆分了"google.com"和"/index.html".

而且令人困惑.

如果我想找到一般URL的状态代码怎么办？

http://mydomain.com/sunny/boo.avi
http://anotherdomain.com/podcast.mp3
http://anotherdomain.com/rss/fee.xml

Run Code Online (Sandbox Code Playgroud)

我不能只是将URL粘贴到它,并让它工作？

编辑...我不能使用urllib,因为我不想下载文件

Answer 1

Tho*_*mas 6

也许你最好使用URL库？

在Python 2中,使用urllib2:

>>> import urllib2
>>> url = urllib2.urlopen("http://www.google.com/index.html")
>>> url.getcode()
200

Run Code Online (Sandbox Code Playgroud)

在Python 3中,使用urllib.request:

>>> import urllib.request
>>> url = urllib.request.urlopen("http://www.google.com/index.html")
>>> url.getcode()
200

Run Code Online (Sandbox Code Playgroud)

Answer 2

Tho*_*mas 6

或者,如果您希望实际下载数据有问题并且您确实需要该HEAD方法,则可以使用urlparse以下方法解析URL :

>>> import httplib
>>> import urlparse
>>> url = "http://www.google.com/index.html"
>>> (scheme, netloc, path, params, query, fragment) = urlparse.urlparse(url)
>>> conn = httplib.HTTPConnection(netloc)
>>> conn.request("HEAD", urlparse.urlunparse(('', '', path, params, query, fragment))
>>> res = conn.getresponse()
>>> print res.status, res.reason
302 Found

Run Code Online (Sandbox Code Playgroud)

并将此包装成一个以URL作为参数的函数.

归档时间：	16 年，7 月前
查看次数：	2284 次
最近记录：	16 年，7 月前