Jon*_*Jon 10 python csv amazon-s3 amazon-web-services
我有获取AWS S3对象的代码.如何使用Python的csv.DictReader读取此StreamingBody?
import boto3, csv
session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>)
s3_resource = session.resource('s3')
s3_object = s3_resource.Object(<bucket>, <key>)
streaming_body = s3_object.get()['Body']
#csv.DictReader(???)
Run Code Online (Sandbox Code Playgroud)
gar*_*ary 20
代码将是这样的:
import boto3
import csv
# get a handle on s3
s3 = boto3.resource(u's3')
# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'bucket-name')
# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key=u'test.csv')
# get the object
response = obj.get()
# read the contents of the file and split it into a list of lines
# for python 2:
lines = response[u'Body'].read().split()
# for python 3 you need to decode the incoming bytes:
lines = response['Body'].read().decode('utf-8').split()
# now iterate over those lines
for row in csv.DictReader(lines):
# here you get a sequence of dicts
# do whatever you want with each line here
print(row)
Run Code Online (Sandbox Code Playgroud)
您可以在实际代码中压缩这一点,但我尝试一步一步地使用boto3显示对象层次结构.
编辑根据你关于避免将整个文件读入内存的评论:我没有遇到这个要求所以不能权威地说,但我会尝试包装流,这样我就可以得到一个像文本文件一样的迭代器.例如,您可以使用编解码器库替换上面的csv解析部分,例如:
for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])):
print(row)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
14774 次 |
| 最近记录: |