读取由s3事件触发的文件

ner*_*ner 12 python csv amazon-s3 aws-lambda serverless-framework

这是我想要做的:

  1. 用户将csv文件上载到AWS S3存储桶.
  2. 上传文件后,S3存储桶调用我创建的lambda函数.
  3. 我的lambda函数读取csv文件内容,然后发送包含文件内容和信息的电子邮件

当地环境

无服务器框架版本1.22.0

Python 2.7

这是我的serverless.yml文件

service: aws-python # NOTE: update this with your service name

provider:
  name: aws
  runtime: python2.7
  stage: dev
  region: us-east-1
  iamRoleStatements:
        - Effect: "Allow"
          Action:
              - s3:*
              - "ses:SendEmail"
              - "ses:SendRawEmail"
              - "s3:PutBucketNotification"
          Resource: "*"

functions:
  csvfile:
    handler: handler.csvfile
    description: send mail whenever a csv file is uploaded on S3 
    events:
      - s3:
          bucket: mine2
          event: s3:ObjectCreated:*
          rules:
            - suffix: .csv
Run Code Online (Sandbox Code Playgroud)

这是我的lambda函数:

import json
import boto3
import botocore
import logging
import sys
import traceback
import csv

from botocore.exceptions import ClientError
from pprint import pprint
from time import strftime, gmtime
from json import dumps, loads, JSONEncoder, JSONDecoder


#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.INFO)

from botocore.exceptions import ClientError

def csvfile(event, context):
    """Send email whenever a csvfile is uploaded to S3 """
    body = {}
    emailcontent = ''
    status_code = 200
    #set email information
    email_from = '****@*****.com'
    email_to = '****@****.com'
    email_subject = 'new file is uploaded'
    try:
        s3 = boto3.resource(u's3')
        s3 = boto3.client('s3')
        for record in event['Records']:
            filename = record['s3']['object']['key']
            filesize = record['s3']['object']['size']
            source = record['requestParameters']['sourceIPAddress']
            eventTime = record['eventTime']
        # get a handle on the bucket that holds your file
        bucket = s3.Bucket(u'mine2')
        # get a handle on the object you want (i.e. your file)
        obj = bucket.Object(key= event[u'Records'][0][u's3'][u'object'][u'key'])
        # get the object
        response = obj.get()
        # read the contents of the file and split it into a list of lines
        lines = response[u'Body'].read().split()
        # now iterate over those lines
        for row in csv.DictReader(lines):    
            print(row)
            emailcontent = emailcontent + '\n' + row 
    except Exception as e:
        print(traceback.format_exc())
        status_code = 500
        body["message"] = json.dumps(e)

    email_body = "File Name: " + filename + "\n" + "File Size: " + str(filesize) + "\n" +  "Upload Time: " + eventTime + "\n" + "User Details: " + source + "\n" + "content of the csv file :" + emailcontent
    ses = boto3.client('ses')
    ses.send_email(Source = email_from,
        Destination = {'ToAddresses': [email_to,],}, 
            Message = {'Subject': {'Data': email_subject}, 'Body':{'Text' : {'Data': email_body}}}
            )
    print('Function execution Completed')
Run Code Online (Sandbox Code Playgroud)

我不知道我做错了什么,导致该部分,当我只是获得有关该文件的信息工作正常,这是当我添加读取部分,lambda函数不返回任何东西

nic*_*r88 17

我建议您在IAM策略中添加对Cloudwatch的访问权限.实际上你的lambda函数没有返回任何内容,但你可以在Cloudwatch中看到你的日志输出.我真的建议使用logger.info(message)而不是print在你设置时logger.

我希望这有助于调试您的功能.

除了发送部分之外,我将重写它(仅在AWS控制台中测试):

import logging
import boto3

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3')

def lambda_handler(event, context):
    email_content = ''

    # retrieve bucket name and file_key from the S3 event
    bucket_name = event['Records'][0]['s3']['bucket']['name']
    file_key = event['Records'][0]['s3']['object']['key']
    logger.info('Reading {} from {}'.format(file_key, bucket_name))
    # get the object
    obj = s3.get_object(Bucket=bucket_name, Key=file_key)
    # get lines inside the csv
    lines = obj['Body'].read().split(b'\n')
    for r in lines:
       logger.info(r.decode())
       email_content = email_content + '\n' + r.decode()
    logger.info(email_content)
Run Code Online (Sandbox Code Playgroud)

  • 只是一点点改进 - 最好在处理程序函数之外(在全局级别)初始化数据库连接、sdks 等。Lambda 服务保留函数的上下文,因此后续调用的执行速度要快得多 - 更多信息请访问:https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html (2认同)