从URL获取HTTP响应代码的最佳方法是什么？

Question

从URL获取HTTP响应代码的最佳方法是什么？

我正在寻找一种从URL(即200,404等)获取HTTP响应代码的快捷方法.我不确定使用哪个库.

Answer 1

使用精彩请求库进行更新.注意我们正在使用HEAD请求,这应该比完整的GET或POST请求更快地发生.

import requests
try:
    r = requests.head("https://stackoverflow.com")
    print(r.status_code)
    # prints the int of the status code. Find more at httpstatusrappers.com :)
except requests.ConnectionError:
    print("failed to connect")

Run Code Online (Sandbox Code Playgroud)

httpstatusrappers.com ...真棒!儿子,我的代码就是Lil Jon的身份! (5认同)
requests 比 urllib2 好得多，对于这样的链接：http://www.dianping.com/promo/208721#mod=4，urllib2 给我一个 404，而 requests 给我一个 200，就像我从浏览器得到的一样。 (2认同)
@Gourneau Ha!这不是我打算用我的评论,我认为它完全没问题,在这种情况下,人们应该尝试理解为什么它在浏览器中"正常工作",但在代码中返回403,实际上是相同的事情发生在两个地方. (2认同)

Answer 2

Eva*_*ark 64

这是一个使用的解决方案httplib.

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404

Run Code Online (Sandbox Code Playgroud)

为HEAD请求+1 - 无需检索整个实体进行状态检查. (13认同)
虽然你真的应该将`except`块限制为至少`StandardError`,这样你就不会错误地捕获像'KeyboardInterrupt`这样的东西. (7认同)
我想知道HEAD请求是否可靠.因为网站可能没有(正确)实施HEAD方法,这可能导致状态代码如404,501或500.或者我是偏执狂？ (3认同)
怎么会让这跟随301s？ (2认同)
@Blaise如果网站不允许HEAD请求,那么执行HEAD请求*应该*导致405错误.举个例子,试试运行`curl -I http:// www.amazon.com /`. (2认同)

Answer 3

Ric*_*dle 24

您应该使用urllib2,如下所示:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

Run Code Online (Sandbox Code Playgroud)

这不是一个有效的解决方案,因为urllib2将遵循重定向,因此您不会得到任何3xx响应. (3认同)

Answer 4

nic*_*nor 7

将来,对于那些使用python3及更高版本的用户,这里有另一个代码来查找响应代码.

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()

Run Code Online (Sandbox Code Playgroud)

这将引发状态代码(如404,500等)的HTTPError. (2认同)

归档时间：	16 年，6 月前
查看次数：	110995 次
最近记录：	6 年，5 月前