use*_*874 4 amazon-s3 amazon-web-services airflow
我尝试使用airflow.providers.amazon.aws.operators.s3_list S3ListOperator通过以下 DAG 运算符列出我的 AWS 账户中 S3 存储桶中的文件:
list_bucket = S3ListOperator(
task_id = 'list_files_in_bucket',
bucket = '<MY_BUCKET>',
aws_conn_id = 's3_default'
)
Run Code Online (Sandbox Code Playgroud)
我已Extra按以下形式配置了我的连接详细信息:{"aws_access_key_id": "<MY_ACCESS_KEY>", "aws_secret_access_key": "<MY_SECRET_KEY>"}
当我运行 Airflow 作业时,它似乎执行良好并且我的任务状态为Success。这是日志输出:
[2021-04-27 11:44:50,009] {base_aws.py:368} INFO - Airflow Connection: aws_conn_id=s3_default
[2021-04-27 11:44:50,013] {base_aws.py:170} INFO - Credentials retrieved from extra_config
[2021-04-27 11:44:50,013] {base_aws.py:84} INFO - Creating session with aws_access_key_id=<MY_ACCESS_KEY> region_name=None
[2021-04-27 11:44:50,027] {base_aws.py:157} INFO - role_arn is None
[2021-04-27 11:44:50,661] {taskinstance.py:1185} INFO - Marking task as SUCCESS. dag_id=two_step, task_id=list_files_in_bucket, execution_date=20210427T184422, start_date=20210427T184439, end_date=20210427T184450
[2021-04-27 11:44:50,676] {taskinstance.py:1246} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2021-04-27 11:44:50,700] {local_task_job.py:146} INFO - Task exited with return code 0
Run Code Online (Sandbox Code Playgroud)
我可以做些什么来将存储桶中的文件打印到日志中吗?TIA
这段代码就足够了,您不需要使用打印功能。只要查对应的日志,然后去xcom,返回列表就在那里。
list_bucket = S3ListOperator(
task_id='list_files_in_bucket',
bucket='ob-air-pre',
prefix='data/',
delimiter='/',
aws_conn_id='aws'
)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4840 次 |
| 最近记录: |