从 s3 读取时出现溢出错误 - 有符号整数大于最大值

tra*_*ant 5 python amazon-s3 aws-lambda

使用以下代码将大文件从 S3 (>5GB) 读取到 lambda 中:

import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    
    response = s3.get_object(
        Bucket="my-bucket",
        Key="my-key"
    )
    
    text_bytes = response['Body'].read()

    ...
    
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }
Run Code Online (Sandbox Code Playgroud)

但是我收到以下错误:

"errorMessage": "signed integer is greater than maximum"
"errorType": "OverflowError"
"stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 13, in lambda_handler\n    text_bytes = response['Body'].read()\n"
    "  File \"/var/runtime/botocore/response.py\", line 77, in read\n    chunk = self._raw_stream.read(amt)\n"
    "  File \"/var/runtime/urllib3/response.py\", line 515, in read\n    data = self._fp.read() if not fp_closed else b\"\"\n"
    "  File \"/var/lang/lib/python3.8/http/client.py\", line 472, in read\n    s = self._safe_read(self.length)\n"
    "  File \"/var/lang/lib/python3.8/http/client.py\", line 613, in _safe_read\n    data = self.fp.read(amt)\n"
    "  File \"/var/lang/lib/python3.8/socket.py\", line 669, in readinto\n    return self._sock.recv_into(b)\n"
    "  File \"/var/lang/lib/python3.8/ssl.py\", line 1241, in recv_into\n    return self.read(nbytes, buffer)\n"
    "  File \"/var/lang/lib/python3.8/ssl.py\", line 1099, in read\n    return self._sslobj.read(len, buffer)\n"
  ]
Run Code Online (Sandbox Code Playgroud)

我正在使用 Python 3.8,我在这里发现了 Python 3.8/9 的一个问题,这可能就是原因:https ://bugs.python.org/issue42853

有没有办法解决?

Ano*_*ard 6

正如您链接到的错误中提到的,Python 3.8 的核心问题是一次读取超过 1GB 的错误。您可以使用错误中建议的解决方法的变体来分块读取文件。

import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    response = s3.get_object(
        Bucket="-example-bucket-",
        Key="path/to/key.dat"
    )
    buf = bytearray(response['ContentLength'])
    view = memoryview(buf)
    pos = 0
    while True:
        chunk = response['Body'].read(67108864)
        if len(chunk) == 0:
            break
        view[pos:pos+len(chunk)] = chunk
        pos += len(chunk)
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }
Run Code Online (Sandbox Code Playgroud)

然而,在每次 Lambda 运行中,您充其量只会花费一分钟或更长时间来从 S3 读取数据。如果您可以将文件存储在 EFS 中并从 Lambda 中读取它,或者使用 ECS 等其他解决方案来避免从远程数据源读取,那就更好了。