增加连接池大小

Mic*_*ael 7 google-cloud-python

我们正在运行以下代码以并行上传到 GCP Buckets。根据我们看到的警告,我们似乎正在快速耗尽池中的所有连接。有什么方法可以配置库正在使用的连接池吗?

def upload_string_to_bucket(content: str):
        blob = bucket.blob(cloud_path)
        blob.upload_from_string(content)

with concurrent.futures.ThreadPoolExecutor() as executor:
            executor.map(upload_string_to_bucket, content_list)
Run Code Online (Sandbox Code Playgroud)
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
Run Code Online (Sandbox Code Playgroud)

小智 1

我在并行下载 blob 时也遇到类似的问题。

这篇文章可能会提供一些信息。 https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/

就我个人而言,我不认为增加连接拉动是最好的解决方案,我更喜欢按 pool_maxsize 对“下载”进行分块。

def chunker(it: Iterable, chunk_size: int):
    chunk = []
    for index, item in enumerate(it):
        chunk.append(item)
        if not (index + 1) % chunk_size:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

for chunk in chunker(content_list, 10):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(upload_string_to_bucket, chunk)
Run Code Online (Sandbox Code Playgroud)

当然,我们可以在准备好后立即生成下载,这一切都如我们所愿。