Python请求 - 打印整个http请求(原始)?

hug*_*gie 170 python http python-requests

在使用requests模块时,有没有办法打印原始HTTP请求?

我不想只是标题,我想要请求行,标题和内容打印输出.是否有可能看到最终由HTTP请求构造的内容?

小智 183

由于v1.2.3请求添加了PreparedRequest对象.根据文档"它包含将发送到服务器的确切字节".

可以使用它来打印请求,如下所示:

import requests

req = requests.Request('POST','http://stackoverflow.com',headers={'X-Custom':'Test'},data='a=1&b=2')
prepared = req.prepare()

def pretty_print_POST(req):
    """
    At this point it is completely built and ready
    to be fired; it is "prepared".

    However pay attention at the formatting used in 
    this function because it is programmed to be pretty 
    printed and may differ from the actual request.
    """
    print('{}\n{}\r\n{}\r\n\r\n{}'.format(
        '-----------START-----------',
        req.method + ' ' + req.url,
        '\r\n'.join('{}: {}'.format(k, v) for k, v in req.headers.items()),
        req.body,
    ))

pretty_print_POST(prepared)
Run Code Online (Sandbox Code Playgroud)

产生:

-----------START-----------
POST http://stackoverflow.com/
Content-Length: 7
X-Custom: Test

a=1&b=2
Run Code Online (Sandbox Code Playgroud)

然后你可以发送实际的请求:

s = requests.Session()
s.send(prepared)
Run Code Online (Sandbox Code Playgroud)

这些链接是可用的最新文档,因此它们可能会更改内容: 高级 - 准备请求API - 低级别类

  • 如果你使用简单的`response = requests.post(...)`(或`requests.get`或`requests.put`等)方法,你实际上可以通过`response.request`获得`PreparedResponse`.如果在收到响应之前不需要访问原始http数据,它可以节省手动操作`requests.Request`和`requests.Session`的工作. (47认同)
  • 紧跟在 url 之后的 HTTP 协议版本部分呢?像“HTTP/1.1”?使用漂亮的打印机打印时找不到。 (4认同)
  • 这比我的猴子修补方法强得多。升级`requests'很简单,所以我认为这应该成为公认的答案。 (2认同)
  • 好答案。不过,您可能要更新的一件事是HTTP中的换行符应该是\ r \ n而不只是\ n。 (2认同)

gon*_*opp 42

注意:这个答案已经过时了.更新版本的requests 支持直接获取请求内容,如AntonioHerraizS的回答文档.

由于它只处理更高级别的对象,例如标题方法类型,因此无法获得请求的真实原始内容.使用发送请求,但不能与原始数据处理-它使用.这是请求的代表性堆栈跟踪:requestsrequestsurllib3urllib3 httplib

-> r= requests.get("http://google.com")
  /usr/local/lib/python2.7/dist-packages/requests/api.py(55)get()
-> return request('get', url, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/api.py(44)request()
-> return session.request(method=method, url=url, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/sessions.py(382)request()
-> resp = self.send(prep, **send_kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/sessions.py(485)send()
-> r = adapter.send(request, **kwargs)
  /usr/local/lib/python2.7/dist-packages/requests/adapters.py(324)send()
-> timeout=timeout
  /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(478)urlopen()
-> body=body, headers=headers)
  /usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py(285)_make_request()
-> conn.request(method, url, **httplib_request_kw)
  /usr/lib/python2.7/httplib.py(958)request()
-> self._send_request(method, url, body, headers)
Run Code Online (Sandbox Code Playgroud)

httplib机器内部,我们可以看到HTTPConnection._send_request间接使用HTTPConnection._send_output,最终创建原始请求正文(如果存在),并用于HTTPConnection.send单独发送它们.send终于到达了套接字.

由于没有任何钩子可以做你想做的事情,作为最后的手段,你可以通过补丁httplib来获取内容.这是一个脆弱的解决方案,如果httplib发生变化,您可能需要对其进行调整.如果您打算使用此解决方案分发软件,您可能需要考虑打包httplib而不是使用系统,这很容易,因为它是纯粹的python模块.

唉,不用多说,解决方案:

import requests
import httplib

def patch_send():
    old_send= httplib.HTTPConnection.send
    def new_send( self, data ):
        print data
        return old_send(self, data) #return is not necessary, but never hurts, in case the library is changed
    httplib.HTTPConnection.send= new_send

patch_send()
requests.get("http://www.python.org")
Run Code Online (Sandbox Code Playgroud)

产生输出:

GET / HTTP/1.1
Host: www.python.org
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/2.1.0 CPython/2.7.3 Linux/3.2.0-23-generic-pae
Run Code Online (Sandbox Code Playgroud)


Emi*_*röm 36

更好的想法是使用requests_toolbelt库,它可以将请求和响应转储为字符串,供您打印到控制台.它使用上述解决方案无法处理的文件和编码来处理所有棘手的情况.

它就像这样简单:

import requests
from requests_toolbelt.utils import dump

resp = requests.get('https://httpbin.org/redirect/5')
data = dump.dump_all(resp)
print(data.decode('utf-8'))
Run Code Online (Sandbox Code Playgroud)

来源:https://toolbelt.readthedocs.org/en/latest/dumputils.html

您只需输入以下命令即可安装:

pip install requests_toolbelt
Run Code Online (Sandbox Code Playgroud)

  • 但是,这似乎并没有发送请求就转储了请求。 (2认同)

Pay*_*man 33

import requests
response = requests.post('http://httpbin.org/post', data={'key1':'value1'})
print(response.request.body)
print(response.request.headers)
Run Code Online (Sandbox Code Playgroud)

我正在使用请求版本2.18.4和Python 3

  • 如果请求抛出异常,这将不起作用。 (3认同)

saa*_*aaj 9

requests支持所谓的事件挂钩(从 2.23 开始,实际上只有response挂钩)。该钩子可用于请求打印完整的请求-响应对的数据,包括有效的 URL、标题和正文,例如:

import textwrap
import requests

def print_roundtrip(response, *args, **kwargs):
    format_headers = lambda d: '\n'.join(f'{k}: {v}' for k, v in d.items())
    print(textwrap.dedent('''
        ---------------- request ----------------
        {req.method} {req.url}
        {reqhdrs}

        {req.body}
        ---------------- response ----------------
        {res.status_code} {res.reason} {res.url}
        {reshdrs}

        {res.text}
    ''').format(
        req=response.request, 
        res=response, 
        reqhdrs=format_headers(response.request.headers), 
        reshdrs=format_headers(response.headers), 
    ))

requests.get('https://httpbin.org/', hooks={'response': print_roundtrip})
Run Code Online (Sandbox Code Playgroud)

运行它打印:

---------------- request ----------------
GET https://httpbin.org/
User-Agent: python-requests/2.23.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

None
---------------- response ----------------
200 OK https://httpbin.org/
Date: Thu, 14 May 2020 17:16:13 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 9593
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

<!DOCTYPE html>
<html lang="en">
...
</html>
Run Code Online (Sandbox Code Playgroud)

如果响应是二进制的,您可能想要更改res.textres.content


小智 6

这是一个代码,与上面相同,但是带有响应头:

import socket
def patch_requests():
    old_readline = socket._fileobject.readline
    if not hasattr(old_readline, 'patched'):
        def new_readline(self, size=-1):
            res = old_readline(self, size)
            print res,
            return res
        new_readline.patched = True
        socket._fileobject.readline = new_readline
patch_requests()
Run Code Online (Sandbox Code Playgroud)

我花了很多时间寻找这个,所以如果有人需要,我就把它留在这里。


Jos*_*ush 6

@AntonioHerraizS 答案的一个分支(如评论中所述,HTTP 版本缺失)


使用以下代码获取表示原始 HTTP 数据包的字符串而不发送它:

import requests


def get_raw_request(request):
    request = request.prepare() if isinstance(request, requests.Request) else request
    headers = '\r\n'.join(f'{k}: {v}' for k, v in request.headers.items())
    body = '' if request.body is None else request.body.decode() if isinstance(request.body, bytes) else request.body
    return f'{request.method} {request.path_url} HTTP/1.1\r\n{headers}\r\n\r\n{body}'


headers = {'User-Agent': 'Test'}
request = requests.Request('POST', 'https://stackoverflow.com', headers=headers, json={"hello": "world"})
raw_request = get_raw_request(request)
print(raw_request)
Run Code Online (Sandbox Code Playgroud)

结果:

POST / HTTP/1.1
User-Agent: Test
Content-Length: 18
Content-Type: application/json

{"hello": "world"}
Run Code Online (Sandbox Code Playgroud)

还可以在响应对象中打印请求

r = requests.get('https://stackoverflow.com')
raw_request = get_raw_request(r.request)
print(raw_request)
Run Code Online (Sandbox Code Playgroud)