Boto3 - 禁用自动分段上传

RuB*_*iCK 3 python amazon-s3 python-2.7 boto3

我使用的是 S3 兼容后端,它不支持 MultipartUpload。

我有一个奇怪的情况,其中某些服务器当我上传文件时,它完成正常,但在其他服务器中 boto3 自动尝试使用 MultipartUpload 上传文件。我尝试上传的文件与用于测试相同后端、区域/租户、存储桶等的完全相同的文件...

文档所示,MultipartUpload 在需要时会自动启用:

  • 当文件超过特定大小阈值时自动切换到分段传输

以下是自动切换到MultipartUpload时的一些日志:

自动切换到MultipartUpload时记录:

DEBUG:botocore.hooks:Event request-created.s3.CreateMultipartUpload: calling handler <function enable_upload_callbacks at 0x2b001b8>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [POST]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"POST /cassandra/samplefile.tgz?uploads HTTP/1.1" 501 None
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 09:12:48 GMT', 'transfer-encoding': 'chunked', 'content-type': 'application/xml;charset=UTF-8', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:
<?xml version='1.0' encoding='UTF-8'?>
<Error>
  <Code>NotImplemented</Code>
  <Message>The request requires functionality that is not implemented in the current release</Message>
  <RequestId>1450429968948</RequestId>
  <HostId>aGRpLmJvc3RoY3AuY2xvdWQuY29ycDoyNg==</HostId>
</Error>     
DEBUG:botocore.hooks:Event needs-retry.s3.CreateMultipartUpload: calling handler <botocore.retryhandler.RetryHandler object at 0x2a490d0>
Run Code Online (Sandbox Code Playgroud)

不从其他服务器切换到多部分但针对同一文件的日志:

DEBUG:botocore.hooks:Event request-created.s3.PutObject: calling handler <function enable_upload_callbacks at 0x7f436c025500>
DEBUG:botocore.endpoint:Sending http request: <PreparedRequest [PUT]>
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mytenant.mys3backend.cloud.corp
DEBUG:botocore.awsrequest:Waiting for 100 Continue response.
DEBUG:botocore.awsrequest:100 Continue response seen, now sending request body.
DEBUG:botocore.vendored.requests.packages.urllib3.connectionpool:"PUT /cassandra/samplefile.tgz HTTP/1.1" 200 0
DEBUG:botocore.parsers:Response headers: {'date': 'Fri, 18 Dec 2015 10:05:25 GMT', 'content-length': '0', 'etag': '"b407e71de028fe62fd9f2f799e606855"', 'server': 'HCP V7.2.0.26'}
DEBUG:botocore.parsers:Response body:

DEBUG:botocore.hooks:Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f436be1ecd0>
DEBUG:botocore.retryhandler:No retry needed.
Run Code Online (Sandbox Code Playgroud)

我按如下方式上传文件:

connection = boto3.client(service_name='s3',
        region_name='',
        api_version=None,
        use_ssl=True,
        verify=True,
        endpoint_url=url,
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        aws_session_token=None,
        config=None)
connection.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')
Run Code Online (Sandbox Code Playgroud)

问题是:

  • 为了避免自动切换到分段上传,如何默认禁用分段上传或提高阈值?
  • 一台服务器使用自动分段而其他服务器不使用同一文件是否有任何原因?

RuB*_*iCK 5

我找到了一种解决方法,使用 S3Transfer 和 Transferconfig 增加阈值大小,如下所示:

myconfig = TransferConfig(

    multipart_threshold=9999999999999999, # workaround for 'disable' auto multipart upload
    max_concurrency=10,
    num_download_attempts=10,
)

connection = boto3.client(service_name='s3',
        region_name='',
        api_version=None,
        use_ssl=True,
        verify=True,
        endpoint_url=url,
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        aws_session_token=None,
        config=None)
transfer=S3Transfer(connection,myconfig)

transfer.upload_file('/tmp/samplefile.tgz','mybucket','remotefile.tgz')
Run Code Online (Sandbox Code Playgroud)

我希望它对某人有帮助