我有一个像下面这样的场景:
Task 1和。Task 2当一天中有新的数据分区时,应该触发任务 1 和任务 2。Task 3完成时触发Task 1Task 2Task 4完成Task 3我的代码
from airflow import DAG
from airflow.contrib.sensors.aws_glue_catalog_partition_sensor import AwsGlueCatalogPartitionSensor
from datetime import datetime, timedelta
from airflow.operators.postgres_operator import PostgresOperator
from utils import FAILURE_EMAILS
yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time())
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': yesterday,
'email': FAILURE_EMAILS,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG('Trigger_Job', default_args=default_args, …Run Code Online (Sandbox Code Playgroud) python directed-acyclic-graphs airflow amazon-athena airflow-scheduler
我有如下数据框。
输入
df
A B C
1 2 1
NaN 4 2
3 NaN NaN
NaN NaN NaN
4 2 NaN
NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
输出
A B C
1 2 1
NaN 4 2
3 NaN NaN
4 2 NaN
Run Code Online (Sandbox Code Playgroud)
在 python 中如何做到这一点
我有如下所示的数据框。我想将列 zip 内的值拆分为行值,如下所示。这些值可以_ ,.由这些分隔符分隔。这如何在 python 中完成。
输入
df.head(5)
Date Item_Code Type Zip
1/1/2020 A Long 07_08_09
12/4/2020 B Small AB_CD_EF_GF
13/4/2020 A Long 08_14
1/5/2020 A Long
21/5/2020 B Small 09,07,16
22/5/2020 B Small AB,07
Run Code Online (Sandbox Code Playgroud)
预期产出
Date Item_Code Type Zip
1/1/2020 A Long 07
1/1/2020 A Long 08
1/1/2020 A Long 09
12/4/2020 B Small AB
12/4/2020 B Small CD
12/4/2020 B Small EF
12/4/2020 B Small GF
13/4/2020 A Long 08
13/4/2020 A Long 14
1/5/2020 A …Run Code Online (Sandbox Code Playgroud)