我知道 AWS S3 API 对上传大于 5 GB 的文件有限制。在boto3
我应该使用multipart
我正在尝试配置S3File
对象s3fs
以执行相同的操作,但我无法弄清楚。
我正在使用(作为错误示例)一个非常基本的代码:
import s3fs
s3 = s3fs.S3FileSystem()
with s3.open("s3://bucket/huge_file.csv", "w") as s3_obj:
with open("huge_file.csv") as local_file
s3_obj.write(local_file.read())
Run Code Online (Sandbox Code Playgroud)
哪里huge_file.csv
有尺寸 > 5Gb
。
我得到的错误是
...
botocore.exceptions.ClientError: An error occurred (EntityTooLarge) when calling the PutObject operation: Your proposed upload exceeds the maximum allowed size
...
File ... /s3fs/core.py" line 1487, in __exit__
self.close()
File ... /s3fs/core.py" line 1454, in close
Run Code Online (Sandbox Code Playgroud)
所以,问题是如何(如果可能)我可以设置s3fs
上传大于5Gb
(How shoud Iconfigure it to do multipart uploading)的文件?
小智 5
我认为这个Github线程应该可以解决您遇到的更多问题,并且为了让您的生活更轻松,我认为这就是您正在寻找的。
import boto3
from boto3.s3.transfer import TransferConfig
# Get the service client
s3 = boto3.client('s3')
GB = 1024 ** 3
# Ensure that multipart uploads only happen if the size of a transfer
# is larger than S3's size limit for nonmultipart uploads, which is 5 GB.
config = TransferConfig(multipart_threshold=5 * GB)
# Upload tmp.txt to bucket-name at key-name
s3.upload_file("tmp.txt", "bucket-name", "key-name", Config=config)
Run Code Online (Sandbox Code Playgroud)