尝试/除了使用Python请求模块的正确方法?

Joh*_*ith 345 python request python-requests

try:
    r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
    print e #should I also sys.exit(1) after this?
Run Code Online (Sandbox Code Playgroud)

它是否正确?有没有更好的方法来构建它?这会涵盖我的所有基础吗?

Jon*_*art 667

查看Requests 异常文档.简而言之:

如果出现网络问题(例如DNS故障,拒绝连接等),请求将引发ConnectionError异常.

如果罕见的无效HTTP响应,请求将引发HTTPError异常.

如果请求超时,Timeout则会引发异常.

如果请求超过配置的最大重定向数,TooManyRedirects则会引发异常.

请求显式引发的所有异常都继承自requests.exceptions.RequestException.

要回答您的问题,您展示的内容不会涵盖您的所有基础.您只会捕获与连接相关的错误,而不是那些超时的错误.

捕获异常时要做的事情取决于脚本/程序的设计.退出是否可以接受?你能继续再试一次吗?如果错误是灾难性的,你不能继续,那么是的,呼叫sys.exit()是有序的.

您可以捕获基类异常,它将处理所有情况:

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e:  # This is the correct syntax
    print e
    sys.exit(1)
Run Code Online (Sandbox Code Playgroud)

或者你可以单独捕捉它们并做不同的事情.

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
    # Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
    # Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
    # catastrophic error. bail.
    print e
    sys.exit(1)
Run Code Online (Sandbox Code Playgroud)

正如克里斯蒂安指出:

如果您想要http错误(例如401 Unauthorized)引发异常,您可以致电Response.raise_for_status.HTTPError如果响应是http错误,那将引发一个.

一个例子:

try:
    r = requests.get('http://www.google.com/nothere')
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    print err
    sys.exit(1)
Run Code Online (Sandbox Code Playgroud)

将打印:

404 Client Error: Not Found for url: http://www.google.com/nothere
Run Code Online (Sandbox Code Playgroud)

  • 未来的评论读者:这是在请求2.9(捆绑urllib3 1.13)中修复的 (13认同)
  • 如果您希望http错误(例如401 Unauthorized)引发异常,您可以调用[Response.raise_for_status](http://docs.python-requests.org/en/latest/api/#requests.Response.raise_for_status).如果响应是http错误,那将引发HTTPError. (13认同)
  • 请注意,由于底层urllib3库中存在错误,如果使用超时,还需要捕获`socket.timeout`异常:https://github.com/kennethreitz/requests/issues/1236 (10认同)
  • 处理请求库的细节非常好的答案,也是一般的异常捕获. (7认同)
  • [请求网站](http://docs.python-requests.org/en/latest/api/#exceptions)上的例外列表尚未完成.您可以阅读完整列表[这里](https://github.com/kennethreitz/requests/blob/master/requests/exceptions.py). (5认同)
  • 谢谢@ChristianLong。我在回答中添加了一些极好的信息。 (2认同)
  • 很好的答案!如果您使用“urllib3”功能重试某些返回代码,另一种可能性是“requests.exceptions.RetryError”。要进行重试,但避免需要捕获重试失败,请参阅 /sf/ask/3043681581/ (2认同)
  • @PirateApp 好问题。这取决于你,但我会包装并重新引发请求异常。库不应将诊断信息写入 stdout/stderr。看看其他一些 API 包装器,例如 python-gitlab。 (2认同)

jou*_*ell 60

另外一个明确的建议.似乎最好从特定的一般到一般的错误堆栈来获得所需的错误,因此特定的错误不会被一般错误掩盖.

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)

Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah
Run Code Online (Sandbox Code Playgroud)

VS

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)     

OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah
Run Code Online (Sandbox Code Playgroud)

  • 这也是帖子的有效语法吗? (2认同)

tsh*_*tsh 29

异常对象还包含原始响应e.response,如果需要查看服务器响应中的错误正文,这可能很有用。例如:

try:
    r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print (e.response.text)
Run Code Online (Sandbox Code Playgroud)


mik*_*ent 8

这是一种通用的方法,至少意味着您不必用以下内容包围每个requests调用try ... except

基础版

def requests_call(method, url, **kwargs):
    # see the docs: if you set no timeout the call never times out! A tuple means "max 
    # connect time" and "max read time"
    DEFAULT_REQUESTS_TIMEOUT = (5, 15) # for example
    # but the coder can specify their own of course:
    if 'timeout' not in kwargs:
        kwargs['timeout'] = DEFAULT_REQUESTS_TIMEOUT
    try:
        response = requests.request(method, url, **kwargs)
    except BaseException as exception:
        # anticipate giant data string: curtail for logging purposes
        if 'data' in kwargs and len(kwargs['data']) > 500: 
            kwargs['data'] = f'{kwargs["data"][:500]}...'
        logger.exception(f'method |{method}|\nurl {url}\nkwargs {kwargs}')
        raw_tb = traceback.extract_stack()
        msg = 'Stack trace:\n' + ''.join(traceback.format_list(raw_tb[:-1]))
        logger.error(msg)
        return (False, exception)
    return (True, response)
Run Code Online (Sandbox Code Playgroud)

注意

  1. 请注意ConnectionError这是一个内置函数,与类 * 无关requests.ConnectionError。我认为后者在这种情况下更常见,但没有真正的想法......
  2. 根据文档,当检查未None返回的异常时,所有异常(包括)requests.RequestException的超类不是也许自接受答案以来它已经改变了。**requestsrequests.ConnectionErrorrequests.exceptions.RequestException
  3. 显然,这假设已经配置了记录器。
  4. 有必要在except块中包含logger.exception()logger.error(),后者将调用堆栈打印到函数。事实上,logger.exception显示了从此处到故障点的调用堆栈,但没有告诉您有关调用该函数的调用堆栈的任何信息。

*我查看了源代码:requests.ConnectionError子类化单个类requests.RequestException,子类化单个类IOError(内置)

**然而,在撰写本文时(2022-02),您可以在本页底部找到“requests.exceptions.RequestException”...但它链接到上面的页面:令人困惑。


用法非常简单:

success, deliverable = requests_call('get',
    f'http://localhost:9200/my_index/_search?q={search_string}')
Run Code Online (Sandbox Code Playgroud)

首先,您检查success:是否False发生了一些有趣的事情,并且deliverable是否是一个异常,必须根据上下文以某种方式采取行动。如果success是,True那么deliverable将是一个Response对象。

高级版本,当期望返回json对象时

(...可能会节省大量样板!)

关于上述内容,有几件事很繁琐:1)如何规定只能接受一个或多个状态代码?2)如何指定返回的JSON字典必须包含一定的键结构,也许还包含子字典?例如,对于 Elasticsearch,您经常会返回 JSON 对象,并且在获取它们的值之前检查所有键是否实际存在是很乏味的。

因此,这是建立在上述简单函数的基础上的。

def process_json_request(url, method='get', ok_status_codes=200, required_dict=None, **kwargs):
    if required_dict != None and type(required_dict) != dict:
        raise Exception(f'required_dict must be None or dict, not {type(required_dict)} ({required_dict})')
    # NB `ok_status_codes` can either be a list or an int (or None: "any status code acceptable"!
    ok_codes_int = isinstance(ok_status_codes, int)
    ok_codes_list = isinstance(ok_status_codes, list)
    if ok_status_codes != None and (not ok_codes_int) and (not ok_codes_list):
        raise Exception(f'ok_status_codes must be None, list or int, ' +\
             f'not {type(ok_status_codes)} ({ok_status_codes})') 
    success, deliverable = requests_call(method, url, **kwargs)
    if not success:
        deliverable.failure_reason = 'requests_call returned False: deliverable is Exception'
        deliverable.failure_code = 1
        return (False, deliverable)
    response = deliverable
    if ok_status_codes != None and ((ok_codes_list and (response.status_code not in ok_status_codes)) \
        or (ok_codes_int and (response.status_code != ok_status_codes))):
            response.failure_reason = f'unacceptable status code: {response.status_code}'
            response.failure_code = 2
            return (False, response)
    try:
        delivered_json_dict = response.json()
    except requests.exceptions.JSONDecodeError:
        response.failure_reason = f'Response body did not contain valid json'
        response.failure_code = 3
        return (False, response)
    def check_dictionary_key_values(required_dict, comparison_dict):
        all_checks_passed = True
        for key, value in required_dict.items():
            if key not in comparison_dict:
                logger.error(f'key absent: {key}')
                all_checks_passed = False
                break
            if type(required_dict[key]) == dict:
                sub_comparison_dict = comparison_dict[key]
                if type(sub_comparison_dict) != dict:
                    logger.error(f'key which yields subdictionary in required does not in comparison: {sub_comparison_dict}')
                    all_checks_passed = False
                    break
                if not check_dictionary_key_values(required_dict[key], sub_comparison_dict):
                    all_checks_passed = False
                    break
            # if a value of "None" is given for a key this means "can be anything"
            elif value != None and comparison_dict[key] != value:
                logger.error(f'key {key} was found as expected but value {value} was not found, instead: {comparison_dict[key]}')
                all_checks_passed = False
                break
        return all_checks_passed
    if not check_dictionary_key_values(required_dict, delivered_json_dict):
        response.failure_reason = f'delivered JSON\n{json.dumps(delivered_json_dict, indent=2)}\n' +\
            f' did not satisfy required_dict\n{json.dumps(required_dict, indent=2)}'
        response.failure_code = 4
        return (False, response)
    return (True, response)
Run Code Online (Sandbox Code Playgroud)

使用示例1:

required_dict = {
    'found': True,
    '_source': {
        'es_version': None,
    },
    # 'cats_and_dogs': 'shanty town',
}
success, deliverable = process_json_request(f'{url_for_specific_index}/_doc/1', required_dict=required_dict)    
if not success:    
    logger.error(f'failed to get Ldoc 1 from index... deliverable.failure_reason:\n{deliverable.failure_reason}')
    ...
returned_dict = deliverable.json()
es_version_from_status_doc = returned_dict['_source']['es_version']
Run Code Online (Sandbox Code Playgroud)

如果我取消注释“cats_and_dogs”行,它会返回success==False,因为缺少这个据称必需的密钥。相反,如果 required_dict 检查通过,您可以确定“_source”和“es_version”不会产生令人讨厌的KeyErrors。您还知道键“found”的值为True

注意,required_dict规定的键和值可以嵌套dict到任意深度;值“None”意味着“值可以是任何值,但至少检查密钥是否存在于所传递的字典中的这个位置”。
也可以使代码处理递归函数中的lists of s ,但这会使事情变得有点太复杂,无法包含在这里,并且可以说是一个非常边缘的要求。dictcheck_dictionary_key_values

使用示例2:

这使用了requests“get”以外的方法,并且因为它创建了新资源,所以状态代码应该是 201,而不是 200。注意,参数可以是ok_status_codesanint或 a list(或者实际上是None,意思是“任何状态代码”)。

data = {...}
headers = {'Content-type': 'application/json'}
success, deliverable = process_json_request(f'{ES_URL}{INDEX_NAME}/_doc/1', 
    'put', data=json.dumps(data), headers=headers, ok_status_codes=201)
if not success:    
    logger.error(f'... {deliverable.failure_reason}')
    ....
Run Code Online (Sandbox Code Playgroud)