记录来自python-requests模块的所有请求

dan*_*ast 78 python logging python-requests

我正在使用python 请求.我需要调试一些OAuth活动,为此我希望它记录所有正在执行的请求.我可以获得这些信息ngrep,但不幸的是,不可能grep https连接(这是必需的OAuth)

如何激活Requests正在访问的所有URL(+参数)的日志记录?

Yoh*_*ann 103

你必须在启用调试httplib级别(requests→交通urllib3→交通httplib).

以下是切换(..._on()..._off())或临时启用它的一些功能:

import logging
import contextlib
try:
    from http.client import HTTPConnection # py3
except ImportError:
    from httplib import HTTPConnection # py2

def debug_requests_on():
    '''Switches on logging of the requests module.'''
    HTTPConnection.debuglevel = 1

    logging.basicConfig()
    logging.getLogger().setLevel(logging.DEBUG)
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.DEBUG)
    requests_log.propagate = True

def debug_requests_off():
    '''Switches off logging of the requests module, might be some side-effects'''
    HTTPConnection.debuglevel = 0

    root_logger = logging.getLogger()
    root_logger.setLevel(logging.WARNING)
    root_logger.handlers = []
    requests_log = logging.getLogger("requests.packages.urllib3")
    requests_log.setLevel(logging.WARNING)
    requests_log.propagate = False

@contextlib.contextmanager
def debug_requests():
    '''Use with 'with'!'''
    debug_requests_on()
    yield
    debug_requests_off()
Run Code Online (Sandbox Code Playgroud)

演示使用:

>>> requests.get('http://httpbin.org/')
<Response [200]>

>>> debug_requests_on()
>>> requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 12150
send: 'GET / HTTP/1.1\r\nHost: httpbin.org\r\nConnection: keep-alive\r\nAccept-
Encoding: gzip, deflate\r\nAccept: */*\r\nUser-Agent: python-requests/2.11.1\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx
...
<Response [200]>

>>> debug_requests_off()
>>> requests.get('http://httpbin.org/')
<Response [200]>

>>> with debug_requests():
...     requests.get('http://httpbin.org/')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
...
<Response [200]>
Run Code Online (Sandbox Code Playgroud)

您将看到REQUEST,包括HEADERS和DATA,以及带HEADERS但没有DATA的RESPONSE.唯一缺少的是没有记录的response.body.

资源

  • `httplib.HTTPConnection.debuglevel = 2`也允许打印POST主体. (7认同)
  • 是否以某种方式阻止将记录的内容发送到标准输出? (2认同)

Mar*_*ers 74

底层urllib3库使用logging模块记录所有新连接和URL ,但不记录POST主体.对于GET请求,这应该足够了:

import logging

logging.basicConfig(level=logging.DEBUG)
Run Code Online (Sandbox Code Playgroud)

这为您提供了最详细的日志记录选项; 有关如何配置日志记录级别和目标的更多详细信息,请参阅日志记录HOWTO.

简短演示:

>>> import requests
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> r = requests.get('http://httpbin.org/get?foo=bar&baz=python')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): httpbin.org
DEBUG:requests.packages.urllib3.connectionpool:"GET /get?foo=bar&baz=python HTTP/1.1" 200 353
Run Code Online (Sandbox Code Playgroud)

记录以下消息:

  • INFO:新连接(HTTP或HTTPS)
  • INFO:丢弃连接
  • INFO:重定向
  • WARN:连接池已满(如果发生这种情况,通常会增加连接池大小)
  • WARN:重试连接
  • DEBUG:连接详细信息:方法,路径,HTTP版本,状态代码和响应长度

  • 奇怪的是,我在 OAuth 请求中没有看到 `access_token`。Linkedin 抱怨未经授权的请求,我想验证我正在使用的库(在 `requests` 之上的 `rauth`)是否正在随请求发送该令牌。我希望将其视为查询参数,但也许它在请求标头中?如何强制`urllib3`也显示标题?请求体呢?简单起见:我如何才能看到 **FULL** 请求? (2认同)

for*_*stj 41

对于那些使用python 3+的人

import requests
import logging
import http.client

http.client.HTTPConnection.debuglevel = 1

logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
Run Code Online (Sandbox Code Playgroud)


saa*_*aaj 7

拥有用于网络协议调试的脚本或什至应用程序子系统,需要查看请求-响应对究竟是什么,包括有效的 URL、标头、有效负载和状态。并且到处检测个人请求通常是不切实际的。同时有性能考虑建议使用 single (或少数专业)requests.Session,因此以下假设遵循该建议

requests支持所谓的事件挂钩(从 2.23 开始,实际上只有response挂钩)。它基本上是一个事件侦听器,在从requests.request. 此时请求和响应都已完全定义,因此可以记录。

import logging

import requests


logger = logging.getLogger('httplogger')

def logRoundtrip(response, *args, **kwargs):
    extra = {'req': response.request, 'res': response}
    logger.debug('HTTP roundtrip', extra=extra)

session = requests.Session()
session.hooks['response'].append(logRoundtrip)
Run Code Online (Sandbox Code Playgroud)

这基本上是如何记录会话的所有 HTTP 往返。

格式化 HTTP 往返日志记录

为了使上面的日志记录有用,可以有专门的日志格式化程序来理解日志记录req并提供res附加功能。它看起来像这样:

import textwrap

class HttpFormatter(logging.Formatter):   

    def _formatHeaders(self, d):
        return '\n'.join(f'{k}: {v}' for k, v in d.items())

    def formatMessage(self, record):
        result = super().formatMessage(record)
        if record.name == 'httplogger':
            result += textwrap.dedent('''
                ---------------- request ----------------
                {req.method} {req.url}
                {reqhdrs}

                {req.body}
                ---------------- response ----------------
                {res.status_code} {res.reason} {res.url}
                {reshdrs}

                {res.text}
            ''').format(
                req=record.req,
                res=record.res,
                reqhdrs=self._formatHeaders(record.req.headers),
                reshdrs=self._formatHeaders(record.res.headers),
            )

        return result

formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logging.basicConfig(level=logging.DEBUG, handlers=[handler])
Run Code Online (Sandbox Code Playgroud)

现在,如果您使用 执行一些请求session,例如:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')
Run Code Online (Sandbox Code Playgroud)

输出stderr将如下所示。

2020-05-14 22:10:13,224 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): httpbin.org:443
2020-05-14 22:10:13,695 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
2020-05-14 22:10:13,698 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/user-agent
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/user-agent
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: application/json
Content-Length: 45
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

{
  "user-agent": "python-requests/2.23.0"
}


2020-05-14 22:10:13,814 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
2020-05-14 22:10:13,818 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/status/200
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/status/200
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Run Code Online (Sandbox Code Playgroud)

GUI方式

当您有大量查询时,拥有一个简单的 UI 和一种过滤记录的方法就派上用场了。我将展示如何使用Chronologer(我是其作者)。

首先,钩子已被重写以生成logging可以在通过线路发送时序列化的记录。它看起来像这样:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')
Run Code Online (Sandbox Code Playgroud)

其次,日志配置必须适应使用logging.handlers.HTTPHandler(Chronologer 理解)。

2020-05-14 22:10:13,224 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): httpbin.org:443
2020-05-14 22:10:13,695 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
2020-05-14 22:10:13,698 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/user-agent
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/user-agent
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: application/json
Content-Length: 45
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

{
  "user-agent": "python-requests/2.23.0"
}


2020-05-14 22:10:13,814 DEBUG urllib3.connectionpool https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
2020-05-14 22:10:13,818 DEBUG httplogger HTTP roundtrip
---------------- request ----------------
GET https://httpbin.org/status/200
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/status/200
Date: Thu, 14 May 2020 20:10:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Run Code Online (Sandbox Code Playgroud)

最后,运行 Chronologer 实例。例如使用 Docker:

docker run --rm -it -p 8080:8080 -v /tmp/db \
    -e CHRONOLOGER_STORAGE_DSN=sqlite:////tmp/db/chrono.sqlite \
    -e CHRONOLOGER_SECRET=example \
    -e CHRONOLOGER_ROLES="basic-reader query-reader writer" \
    saaj/chronologer \
    python -m chronologer -e production serve -u www-data -g www-data -m
Run Code Online (Sandbox Code Playgroud)

并再次运行请求:

session.get('https://httpbin.org/user-agent')
session.get('https://httpbin.org/status/200')
Run Code Online (Sandbox Code Playgroud)

流处理程序将产生:

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): httpbin.org:443
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /user-agent HTTP/1.1" 200 45
DEBUG:httplogger:HTTP roundtrip
DEBUG:urllib3.connectionpool:https://httpbin.org:443 "GET /status/200 HTTP/1.1" 200 0
DEBUG:httplogger:HTTP roundtrip
Run Code Online (Sandbox Code Playgroud)

现在,如果您打开http://localhost:8080/(用户名使用“logger”,基本身份验证弹出窗口使用空密码)并单击“打开”按钮,您应该看到如下内容:

计时器的屏幕截图


kla*_*hin 7

只是改进这个答案

这对我来说是这样的:

import logging
import sys    
import requests
import textwrap
    
root = logging.getLogger('httplogger')


def logRoundtrip(response, *args, **kwargs):
    extra = {'req': response.request, 'res': response}
    root.debug('HTTP roundtrip', extra=extra)
    

class HttpFormatter(logging.Formatter):

    def _formatHeaders(self, d):
        return '\n'.join(f'{k}: {v}' for k, v in d.items())

    def formatMessage(self, record):
        result = super().formatMessage(record)
        if record.name == 'httplogger':
            result += textwrap.dedent('''
                ---------------- request ----------------
                {req.method} {req.url}
                {reqhdrs}

                {req.body}
                ---------------- response ----------------
                {res.status_code} {res.reason} {res.url}
                {reshdrs}

                {res.text}
            ''').format(
                req=record.req,
                res=record.res,
                reqhdrs=self._formatHeaders(record.req.headers),
                reshdrs=self._formatHeaders(record.res.headers),
            )

        return result

formatter = HttpFormatter('{asctime} {levelname} {name} {message}', style='{')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
root.addHandler(handler)
root.setLevel(logging.DEBUG)


session = requests.Session()
session.hooks['response'].append(logRoundtrip)
session.get('http://httpbin.org')
Run Code Online (Sandbox Code Playgroud)


abu*_*lka 6

当尝试使Python日志记录系统(import logging)发出低级调试日志消息时,我很惊讶地发现给定的内容:

requests --> urllib3 --> http.client.HTTPConnection
Run Code Online (Sandbox Code Playgroud)

urllib3实际上仅使用Python logging系统:

  • requests 没有
  • http.client.HTTPConnection 没有
  • urllib3

当然,您可以HTTPConnection通过设置以下内容来提取调试消息:

HTTPConnection.debuglevel = 1
Run Code Online (Sandbox Code Playgroud)

但是这些输出仅通过print语句发出。为了证明这一点,只需grep Python 3.7 client.py源代码并自己查看打印语句(感谢@Yohann):

curl https://raw.githubusercontent.com/python/cpython/3.7/Lib/http/client.py |grep -A1 debuglevel` 
Run Code Online (Sandbox Code Playgroud)

大概以某种方式重定向标准输出可能会起作用,以将标准输出插入记录系统并有可能捕获到例如日志文件。

选择' urllib3'logger not' requests.packages.urllib3'

与互联网上的许多建议相反,urllib3通过Python 3 logging系统捕获调试信息,正如@MikeSmith指出的那样,您不会有很多运气拦截:

log = logging.getLogger('requests.packages.urllib3')
Run Code Online (Sandbox Code Playgroud)

相反,您需要:

log = logging.getLogger('urllib3')
Run Code Online (Sandbox Code Playgroud)

调试urllib3到日志文件

这是一些urllib3使用Python logging系统将工作记录到日志文件中的代码:

requests --> urllib3 --> http.client.HTTPConnection
Run Code Online (Sandbox Code Playgroud)

结果:

Starting new HTTP connection (1): httpbin.org:80
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
Run Code Online (Sandbox Code Playgroud)

启用HTTPConnection.debuglevelprint()语句

如果您设定 HTTPConnection.debuglevel = 1

HTTPConnection.debuglevel = 1
Run Code Online (Sandbox Code Playgroud)

您将获得其他多汁的低级信息的print语句输出:

send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python- 
requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin 
header: Content-Encoding header: Content-Type header: Date header: ...
Run Code Online (Sandbox Code Playgroud)

Remember this output uses print and not the Python logging system, and thus cannot be captured using a traditional logging stream or file handler (though it may be possible to capture output to a file by redirecting stdout).

Combine the two above - maximise all possible logging to console

To maximise all possible logging, you must settle for console/stdout output with this:

import requests
import logging
from http.client import HTTPConnection  # py3

log = logging.getLogger('urllib3')
log.setLevel(logging.DEBUG)

# logging from urllib3 to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
log.addHandler(ch)

# print statements from `http.client.HTTPConnection` to console/stdout
HTTPConnection.debuglevel = 1

requests.get('http://httpbin.org/')
Run Code Online (Sandbox Code Playgroud)

giving the full range of output:

Starting new HTTP connection (1): httpbin.org:80
send: b'GET / HTTP/1.1\r\nHost: httpbin.org\r\nUser-Agent: python-requests/2.22.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
http://httpbin.org:80 "GET / HTTP/1.1" 200 3168
header: Access-Control-Allow-Credentials header: Access-Control-Allow-Origin 
header: Content-Encoding header: ...
Run Code Online (Sandbox Code Playgroud)

  • 那么将打印详细信息重定向到记录器又如何呢? (4认同)