S3 选择检索 CSV 中的标头

Question

S3 选择检索 CSV 中的标头

Sum*_*pal 5 python csv amazon-s3 export-to-csv boto3

我正在尝试使用以下代码从存储在 S# 存储桶中的 CSV 中获取记录的子集：

s3 = boto3.client('s3')
bucket = bucket
file_name = file

sql_stmt = """SELECT S.* FROM s3object S LIMIT 10"""


req = s3.select_object_content(
    Bucket=bucket,
    Key=file,
    ExpressionType='SQL',
    Expression=sql_stmt,
    InputSerialization = {'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization = {'CSV': {}},
)

records = []
for event in req['Payload']:
    if 'Records' in event:
        records.append(event['Records']['Payload'])
    elif 'Stats' in event:
        stats = event['Stats']['Details']


file_str = ''.join(r.decode('utf-8') for r in records)

select_df = pd.read_csv(StringIO(file_str))
df = pd.DataFrame(select_df)
print(df)

Run Code Online (Sandbox Code Playgroud)

这成功地产生了记录但错过了标题。

我在这里读到S3 Select CSV Headers，S3 Select 根本不产生标题。那么，是否可以通过任何其他方式在 S3 中检索 CSV 文件的标头？

Answer 1

Red*_*Boy 2

改变InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},

到InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},

然后，它将打印完整内容，包括header.

解释：

FileHeaderInfo接受“无”或“使用”或“忽略”之一。

使用NONEoption 而不是USE，它也会打印header，因为NONE告诉您header还需要 for processing。

这里是参考。https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.select_object_content

我希望它有帮助。

归档时间：	6 年，10 月前
查看次数：	2432 次
最近记录：	5 年，1 月前