ca9*_*3d9 2 python amazon-s3 amazon-web-services
我有以下使用 MultipartUpload 上传 s3 的代码。
import logging
import boto3
class UploadS3:
def __init__(self, bucket, prefix):
self.s3 = boto3.resource('s3')
self.bucket = bucket
self.prefix = prefix
def start(self, key):
'''Start to upload a new file'''
self.part_no = 1
self.parts = []
key_path = f'{self.prefix}/{key}'
self.s3obj = self.s3.Object(self.bucket, key_path)
self.mpu = self.s3obj.initiate_multipart_upload()
self.buffer = bytearray()
def upload(self, chunk):
'''Upload a chunk'''
if len(self.buffer) >= 5_000_000:
self._upload_buffer()
self.buffer += chunk
def end(self, part_info={}):
if len(self.buffer):
self._upload_buffer()
part_info['Parts'] = self.parts
mpu_result = self.mpu.complete(MultipartUpload=part_info)
logging.info(f'Upload result: {mpu_result}')
def _upload_buffer(self):
self.part = self.mpu.Part(self.part_no)
print(f'buffer len: {len(self.buffer)}')
resp = self.part.upload(Body=self.buffer)
print({'PartNumber': self.part_no, 'ETag': resp['ETag']})
self.parts.append({'PartNumber': self.part_no, 'ETag': resp['ETag']})
self.part_no += 1
self.buffer = bytearray()
Run Code Online (Sandbox Code Playgroud)
我创建了以下测试代码:
upload_s3 = UploadS3(BUCKET, PREFIX)
key = 'key2'
upload_s3.start(key)
upload_s3.upload(b'0' * 1_000_000)
upload_s3.upload(b'1' * 1_000_000)
upload_s3.upload(b'2' * 1_000_000)
upload_s3.upload(b'3' * 1_000_000)
upload_s3.upload(b'4' * 999_999)
upload_s3.upload(b'abcde')
upload_s3.upload(b'12345')
upload_s3.end({})
Run Code Online (Sandbox Code Playgroud)
但是,它出现以下错误。第一部分的长度是5000004,第二部分(最后)的长度是5,不需要超过5M?
buffer len: 5000004
{'PartNumber': 1, 'ETag': '"e616f253def9510e3be2af0854e4c992"'}
buffer len: 5
{'PartNumber': 2, 'ETag': '"db44331bface5c8678770426baf73bc2"'}
Traceback (most recent call last):
File "test1.py", line 35, in <module>
main()
File "test1.py", line 31, in main
upload_s3.end({})
File "/home/x/upload_s3.py", line 31, in end
mpu_result = self.mpu.complete(MultipartUpload=part_info)
File "/apps/external/4/anaconda3/lib/python3.6/site-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "/apps/external/4/anaconda3/lib/python3.6/site-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
File "/apps/external/4/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/apps/external/4/anaconda3/lib/python3.6/site-packages/botocore/client.py", line 705, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (EntityTooSmall) when calling the CompleteMultipartUpload operation: Your proposed upload is smaller than the minimum allowed size
Run Code Online (Sandbox Code Playgroud)
截至撰写本答案时,S3 分段上传限制页面具有下表:
| 物品 | 规格 |
|---|---|
| 最大物体尺寸 | 5TB |
| 每次上传的最大片段数 | 10,000 |
| 零件号 | 1至10,000(含) |
| 零件尺寸 | 5 MB 至 5 GB。分段上传的最后一部分没有最小大小限制。 |
| 列表部件请求返回的最大部件数 | 1000 |
| 列表分段上传请求中返回的最大分段上传数 | 1000 |
然而,有一个微妙的错误。它说的是 5 MB 而不是 5 MiB(可能 5 GB 实际上应该是 5 GiB)。
由于您将各个部分分割为每个5 000 000字节(为 5 MB,但“仅”约 4.77 MiB),因此第一部分和第二部分都小于最小大小。
相反,您应该每隔5 242 880( 5 * 1024 ** 2) 个字节分割这些部分(或者为了安全起见,甚至稍微[无双关语])。
我在 S3 文档页面上提交了拉取请求。
| 归档时间: |
|
| 查看次数: |
2789 次 |
| 最近记录: |