request.adapters.HTTPAdapter中pool_connections的含义是什么？

Question

request.adapters.HTTPAdapter中pool_connections的含义是什么？

lai*_*e9m 11 python urllib3 python-requests

初始化请求时Session,HTTPAdapter将创建两个并挂载到http和https.

这是如何HTTPAdapter定义的:

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10,
                                    max_retries=0, pool_block=False)

Run Code Online (Sandbox Code Playgroud)

虽然我理解pool_maxsize(这是一个池可以保存的会话数)的含义,但我不明白pool_connections它是什么意思或它做了什么.Doc说:

Parameters: 
pool_connections – The number of urllib3 connection pools to cache.

Run Code Online (Sandbox Code Playgroud)

但是什么意思"缓存"？使用多个连接池有什么意义？

Answer 1

lai*_*e9m 35

我写了一篇关于此的文章.贴在这里:

请求的秘密:pool_connections和pool_maxsize

如果不是Python程序员最熟悉的Python第三方库,请求就是其中之一.凭借其简单的API和高性能,人们倾向于使用请求而不是标准库为HTTP请求提供的urllib2.但是,每天使用请求的人可能不知道内部,今天我想介绍其中两个:pool_connections和pool_maxsize.

让我们从Session:

import requests

s = requests.Session()
s.get('https://www.google.com')

Run Code Online (Sandbox Code Playgroud)

这很简单.您可能知道请求' Session可以持久cookie.凉.但是你知道Session有mount方法吗？

mount(prefix, adapter)
将连接适配器注册到前缀.
适配器按密钥长度按降序排序.

没有？实际上,在初始化Session对象时,您已经使用过此方法:

class Session(SessionRedirectMixin):

    def __init__(self):
        ...
        # Default connection adapters.
        self.adapters = OrderedDict()
        self.mount('https://', HTTPAdapter())
        self.mount('http://', HTTPAdapter())

Run Code Online (Sandbox Code Playgroud)

现在是有趣的部分.如果您已阅读Ian Cordasco的文章Retries in Requests,您应该知道HTTPAdapter可用于提供重试功能.但HTTPAdapter真的是什么？来自doc的引用:

class requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=False)

用于urllib3的内置HTTP适配器.

为请求会话提供通用案例接口,以通过实现传输适配器接口来联系HTTP和HTTPS URL.这个类通常由Session类创建.

参数:
*pool_connections- 要缓存的urllib3连接池的数量.*pool_maxsize- 池中保存的最大连接数.*max_retries(int)- 每个连接应尝试的最大重试次数.请注意,这仅适用于失败的DNS查找,套接字连接和连接超时,从不适用于数据已发送到服务器的请求.默认情况下,请求不会重试失败的连接.如果您需要对我们重试请求的条件进行精细控制,请导入urllib3的Retry类并将其传递给它.*pool_block- 连接池是否应阻止连接.用法:

>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)

Run Code Online (Sandbox Code Playgroud)

如果上述文档让您感到困惑,这里是我的解释:HTTP Adapter所做的只是根据目标URL为不同的请求提供不同的配置.还记得上面的代码吗？

self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())

Run Code Online (Sandbox Code Playgroud)

它HTTPAdapter使用默认参数创建两个对象,并分别pool_connections=10, pool_maxsize=10, max_retries=0, pool_block=Falsemount到https://和http://,这意味着HTTPAdapter()如果您尝试向其发送请求http://xxx,则将使用第一个的配置,第二个HTTPAdapter()将用于请求https://xxx.在这种情况下,我们认为这两种配置是相同的,请求http和https仍然是分开处理的.我们稍后会看到它意味着什么.

正如我所说,本文的主要目的是解释pool_connections和pool_maxsize.

首先让我们来看看pool_connections.昨天我提出了一个关于stackoverflow 的问题,因为我不确定我的理解是否正确,答案消除了我的不确定性.众所周知,HTTP基于TCP协议.HTTP连接也是TCP连接,由五个值的元组标识:

(<protocol>, <src addr>, <src port>, <dest addr>, <dest port>)

Run Code Online (Sandbox Code Playgroud)

假设您已经建立了一个HTTP/TCP连接www.example.com,假设服务器支持Keep-Alive,下次您向www.example.com/a或发送请求时www.example.com/b,您可以使用相同的连接,因为五个值中没有一个会发生变化.实际上,请求'Session会自动为您执行此操作,并且只要可以,就会重用连接.

问题是,是什么决定了你是否可以重用旧的连接？是的,pool_connections!

pool_connections - 要缓存的urllib3连接池的数量.

我知道,我知道,我也不想带这么多术语,这是最后一个,我保证.为了便于理解,一个连接池对应一个主机,就是这样.

这是一个例子(忽略不相关的行):

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')

"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2621
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

Run Code Online (Sandbox Code Playgroud)

HTTPAdapter(pool_connections=1)安装到https://,这意味着一次只能保留一个连接池.调用后s.get('https://www.baidu.com'),缓存的连接池是connectionpool('https://www.baidu.com').现在s.get('https://www.zhihu.com')来了,会话发现它不能使用以前缓存的连接,因为它不是同一个主机(一个连接池对应一个主机,还记得吗？).因此,如果您愿意,会话必须创建新的连接池或连接.因为pool_connections=1,session不能同时拥有两个连接池,因此它放弃了旧的那个连接池connectionpool('https://www.baidu.com')并保留了新的连接池connectionpool('https://www.zhihu.com').接下来get是一样的.这就是为什么我们Starting new HTTPS connection在登录中看到三个.

如果我们设置pool_connections为2 怎么办:

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=2))
s.get('https://www.baidu.com')
s.get('https://www.zhihu.com')
s.get('https://www.baidu.com')
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.baidu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 None
"""

Run Code Online (Sandbox Code Playgroud)

太好了,现在我们只创建了两次连接并保存了一个建立时间的连接.

最后,pool_maxsize.

首先,pool_maxsize只有Session在多线程环境中使用时才应该关心,例如使用相同的 多线程发出并发请求Session.

实际上,pool_maxsize是一个初始化urllib3的参数HTTPConnectionPool,这正是我们上面提到的连接池. HTTPConnectionPool是与特定主机的连接集合的容器,pool_maxsize是可以重用的要保存的连接数.如果您在一个线程中运行代码,则既不可能也不需要创建到同一主机的多个连接,因为请求库是阻塞的,因此HTTP请求总是一个接一个地发送.

如果有多个线程,情况会有所不同.

def thread_get(url):
    s.get(url)

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
"""

Run Code Online (Sandbox Code Playgroud)

看到？它为同一主机建立了两个连接www.zhihu.com,就像我说的,这只能在多线程环境中发生.在这种情况下,我们创建一个连接池pool_maxsize=2,并且同时只有两个连接,所以这就足够了.我们可以看到来自t3和t4没有创建新连接的请求,它们重用旧的连接.

如果没有足够的尺寸怎么办？

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 = Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start()
t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2606
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (3): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57556
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: www.zhihu.com
"""

Run Code Online (Sandbox Code Playgroud)

现在,pool_maxsize=1警告按预期发布:

Connection pool is full, discarding connection: www.zhihu.com

Run Code Online (Sandbox Code Playgroud)

我们还可以注意到,由于此池中只能保存一个连接,因此会再次为t3或创建新连接t4.显然这是非常低效的.这就是为什么在urllib3的文档中它说:

如果您计划在多线程环境中使用此类池,则应将池的maxsize设置为更高的数字,例如线程数.

最后但同样重要的是,HTTPAdapter安装到不同前缀的实例是独立的.

s = requests.Session()
s.mount('https://', HTTPAdapter(pool_connections=1, pool_maxsize=2))
s.mount('https://baidu.com', HTTPAdapter(pool_connections=1, pool_maxsize=1))
t1 = Thread(target=thread_get, args=('https://www.zhihu.com',))
t2 =Thread(target=thread_get, args=('https://www.zhihu.com/question/36612174',))
t1.start();t2.start()
t1.join();t2.join()
t3 = Thread(target=thread_get, args=('https://www.zhihu.com/question/39420364',))
t4 = Thread(target=thread_get, args=('https://www.zhihu.com/question/21362402',))
t3.start();t4.start()
t3.join();t4.join()
"""output
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): www.zhihu.com
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (2): www.zhihu.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/36612174 HTTP/1.1" 200 21906
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 2623
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/39420364 HTTP/1.1" 200 28739
DEBUG:requests.packages.urllib3.connectionpool:"GET /question/21362402 HTTP/1.1" 200 57669
"""

Run Code Online (Sandbox Code Playgroud)

上面的代码很容易理解,所以我不解释.

我想这就是全部.希望本文能帮助您更好地理解请求.顺便说一句,我在这里创建了一个要点,其中包含了本文中使用的所有测试代码.只需下载并玩它:)

附录

对于https,请求使用urllib3的HTTPSConnectionPool,但它与HTTPConnectionPool几乎相同,因此我在本文中不区分它们.

Session的mount方法将确保最长的前缀首先匹配.它的实现非常有趣,所以我在这里发布.

def mount(self, prefix, adapter):
    """Registers a connection adapter to a prefix.
    Adapters are sorted in descending order by key length."""
    self.adapters[prefix] = adapter
    keys_to_move = [k for k in self.adapters if len(k) < len(prefix)]
    for key in keys_to_move:
        self.adapters[key] = self.adapters.pop(key)

Run Code Online (Sandbox Code Playgroud)

请注意,这self.adapters是一个OrderedDict.

您不应该在多个线程中使用相同的会话，因为它不是线程安全的，正如请求的维护者所概述的 - https://github.com/requests/requests/issues/1871 (3认同)

Answer 2

sha*_*zow 12

请求使用urllib3来管理其连接和其他功能.

重用连接是保持重复性HTTP请求的重要因素.urllib3 README解释说:

为什么我要重用连接？

性能.当您通常进行urllib调用时,会为每个请求创建一个单独的套接字连接.通过重用现有的套接字(从HTTP 1.1开始支持),请求将在服务器端占用更少的资源,并在客户端提供更快的响应时间.[...]

要回答您的问题,"pool_maxsize"是每个主机要保留的连接数(这对多线程应用程序很有用),而"pool_connections"是要保留的主机池数.例如,如果您要连接到100个不同的主机,pool_connections=10则只会重新使用最新的10个主机的连接.

Answer 3

Bra*_*mon 8

感谢 @laike9m 提供现有的问答和文章，但现有的答案没有提到pool_maxsize多线程代码的微妙之处及其与多线程代码的关系。

概括

pool_connections是在给定时间从一个（主机、端口、方案）端点可以在池中保持活动的连接数。如果您想n在池中保留最多打开的 TCP 连接以供 . 重用Session，则需要pool_connections=n.
pool_maxsize为用户有效地不相干requests由于作为默认值pool_block（在requests.adapters.HTTPAdapter）是False而不是True

细节

正如此处正确指出的那样，pool_connections是给定适配器前缀的最大打开连接数。最好通过示例来说明：

>>> import requests
>>> from requests.adapters import HTTPAdapter
>>> 
>>> from urllib3 import add_stderr_logger
>>> 
>>> add_stderr_logger()  # Turn on requests.packages.urllib3 logging
2018-12-21 20:44:03,979 DEBUG Added a stderr logging handler to logger: urllib3
<StreamHandler <stderr> (NOTSET)>
>>> 
>>> s = requests.Session()
>>> s.mount('https://', HTTPAdapter(pool_connections=1))
>>> 
>>> # 4 consecutive requests to (github.com, 443, https)
... # A new HTTPS (TCP) connection will be established only on the first conn.
... s.get('https://github.com/requests/requests/blob/master/requests/adapters.py')
2018-12-21 20:44:03,982 DEBUG Starting new HTTPS connection (1): github.com:443
2018-12-21 20:44:04,381 DEBUG https://github.com:443 "GET /requests/requests/blob/master/requests/adapters.py HTTP/1.1" 200 None
<Response [200]>
>>> s.get('https://github.com/requests/requests/blob/master/requests/packages.py')
2018-12-21 20:44:04,548 DEBUG https://github.com:443 "GET /requests/requests/blob/master/requests/packages.py HTTP/1.1" 200 None
<Response [200]>
>>> s.get('https://github.com/urllib3/urllib3/blob/master/src/urllib3/__init__.py')
2018-12-21 20:44:04,881 DEBUG https://github.com:443 "GET /urllib3/urllib3/blob/master/src/urllib3/__init__.py HTTP/1.1" 200 None
<Response [200]>
>>> s.get('https://github.com/python/cpython/blob/master/Lib/logging/__init__.py')
2018-12-21 20:44:06,533 DEBUG https://github.com:443 "GET /python/cpython/blob/master/Lib/logging/__init__.py HTTP/1.1" 200 None
<Response [200]>

Run Code Online (Sandbox Code Playgroud)

以上，最大连接数为1；它是(github.com, 443, https)。如果您想从新的（主机、端口、方案）三元组请求资源，Session内部将转储现有连接为新连接腾出空间：

>>> s.get('https://www.rfc-editor.org/info/rfc4045')
2018-12-21 20:46:11,340 DEBUG Starting new HTTPS connection (1): www.rfc-editor.org:443
2018-12-21 20:46:12,185 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4045 HTTP/1.1" 200 6707
<Response [200]>
>>> s.get('https://www.rfc-editor.org/info/rfc4046')
2018-12-21 20:46:12,667 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4046 HTTP/1.1" 200 6862
<Response [200]>
>>> s.get('https://www.rfc-editor.org/info/rfc4047')
2018-12-21 20:46:13,837 DEBUG https://www.rfc-editor.org:443 "GET /info/rfc4047 HTTP/1.1" 200 6762
<Response [200]>

Run Code Online (Sandbox Code Playgroud)

您可以将数字增加到pool_connections=2，然后在 3 个独特的主机组合之间循环，您将看到相同的内容。（要注意的另一件事是会话将以相同的方式保留和发送回 cookie。）

现在为pool_maxsize，即传递到urllib3.poolmanager.PoolManager并最终传递到urllib3.connectionpool.HTTPSConnectionPool。maxsize 的文档字符串是：

可以重用的要保存的连接数。多于 1 在多线程情况下很有用。如果block设置为 False，将创建更多连接，但一旦使用它们将不会保存。

顺便说一句，block=False是默认的HTTPAdapter，即使是默认True的HTTPConnectionPool。这意味着对pool_maxsize几乎没有影响HTTPAdapter。

此外，requests.Session()是不是线程安全的; 您不应该使用session来自多个线程的相同实例。（请参阅此处和此处。）如果您真的想要，更安全的方法是将每个线程借给自己的本地化会话实例，然后使用该会话通过多个 URL 发出请求，通过threading.local()：

import threading
import requests

local = threading.local()  # values will be different for separate threads.

vars(local)  # initially empty; a blank class with no attrs.


def get_or_make_session(**adapter_kwargs):
    # `local` will effectively vary based on the thread that is calling it
    print('get_or_make_session() called from id:', threading.get_ident())

    if not hasattr(local, 'session'):
        session = requests.Session()
        adapter = requests.adapters.HTTPAdapter(**kwargs)
        session.mount('http://', adapter)
        session.mount('https://', adapter)
        local.session = session
    return local.session

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年前
查看次数：	7467 次
最近记录：	7 年，1 月前