如何使用 Boto3 从 Amazon S3 读取大型 JSON 文件

Question

如何使用 Boto3 从 Amazon S3 读取大型 JSON 文件

我正在尝试从 Amazon S3 读取 JSON 文件，其文件大小约为 2GB。当我使用该方法时.read()，它给了我MemoryError.

这个问题有什么解决办法吗？任何帮助都可以，非常感谢！

Answer 1

所以，我找到了一种对我有效的方法。我有 1.60 GB 的文件，需要加载进行处理。

s3 = boto3.client('s3', aws_access_key_id=<aws_access_key_id>, aws_secret_access_key=<aws_secret_access_key>)

# Now we collected data in the form of bytes array.
data_in_bytes = s3.Object(bucket_name, filename).get()['Body'].read()

#Decode it in 'utf-8' format
decoded_data = data_in_bytes.decode('utf-8')

#I used io module for creating a StringIO object.
stringio_data = io.StringIO(decoded_data)

#Now just read the StringIO obj line by line.
data = stringio_data.readlines()

#Its time to use json module now.
json_data = list(map(json.loads, data))

Run Code Online (Sandbox Code Playgroud)

json_data文件的内容也是如此。我知道有很多变量操作，但它对我有用。

归档时间：	7 年，3 月前
查看次数：	8941 次
最近记录：	4 年，5 月前