将 Excel 文件从 S3 读入 Pandas DataFrame

Raj*_*Raj 4 python lambda amazon-s3 pandas

我有一个 SNS 通知设置,当 .xlsx 文件上传到 S3 存储桶时,它会触发 Lambda 函数。

lambda 函数将 .xlsx 文件读入 Pandas DataFrame。

import os 
import pandas as pd
import json
import xlrd
import boto3

def main(event, context):
    message = event['Records'][0]['Sns']['Message']
    parsed_message = json.loads(message)
    src_bucket = parsed_message['Records'][0]['s3']['bucket']['name']
    filepath = parsed_message['Records'][0]['s3']['object']['key']

    s3 = boto3.resource('s3')
    s3_client = boto3.client('s3')

    obj = s3_client.get_object(Bucket=src_bucket, Key=filepath)
    print(obj['Body'])

    df = pd.read_excel(obj, header=2)
    print(df.head(2))
Run Code Online (Sandbox Code Playgroud)

我收到如下错误:

Invalid file path or buffer object type: <type 'dict'>: ValueError
Traceback (most recent call last):
File "/var/task/handler.py", line 26, in main
df = pd.read_excel(obj, header=2)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/var/task/pandas/io/excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "/var/task/pandas/io/excel.py", line 376, in __init__
io, _, _, _ = get_filepath_or_buffer(self._io)
File "/var/task/pandas/io/common.py", line 218, in get_filepath_or_buffer
raise ValueError(msg.format(_type=type(filepath_or_buffer)))
ValueError: Invalid file path or buffer object type: <type 'dict'>
Run Code Online (Sandbox Code Playgroud)

我该如何解决这个问题?

Tar*_*lai 5

这是完全正常的!obj 是一个字典,你试过吗?

df = pd.read_excel(obj['body'], header=2)
Run Code Online (Sandbox Code Playgroud)