Read .pptx file from s3

pir*_*ing 5 python amazon-s3 boto3 python-pptx

I try to open a .pptx from Amazon S3 and read it using the python-pptx library. This is the code:

from pptx import Presentation
import boto3
s3 = boto3.resource('s3')

obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))
Run Code Online (Sandbox Code Playgroud)

It gives "AttributeError: 'StreamingBody' object has no attribute 'seek'". Shouldn't this work? How can I fix this? I also tried using read() on body first. Is there a solution without actually downloading the file?

bco*_*a12 8

要从 S3 加载文件,您应该下载(或使用流策略)并使用可以处理的方式io.BytesIO转换数据pptx.Presentation

import io
import boto3

from pptx import Presentation

s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()

prs = Presentation(io.BytesIO(object_content))
Run Code Online (Sandbox Code Playgroud)

参考:

Just like what we do with variables, data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations. 日志开发