我有一个 python DAGParent Job
和 DAG Child Job
。中的任务Child Job
应该在成功完成Parent Job
每天运行的任务时触发。如何添加外部作业触发器?
我的代码
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.postgres_operator import PostgresOperator
from utils import FAILURE_EMAILS
yesterday = datetime.combine(datetime.today() - timedelta(1), datetime.min.time())
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': yesterday,
'email': FAILURE_EMAILS,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG('Child Job', default_args=default_args, schedule_interval='@daily')
execute_notebook = PostgresOperator(
task_id='data_sql',
postgres_conn_id='REDSHIFT_CONN',
sql="SELECT * FROM athena_rs.shipments limit 5",
dag=dag
)
Run Code Online (Sandbox Code Playgroud) python directed-acyclic-graphs python-3.x airflow airflow-scheduler
使用nltk时,标点符号和数字小写不起作用.
我的代码
stopwords=nltk.corpus.stopwords.words('english')+ list(string.punctuation)
user_defined_stop_words=['st','rd','hong','kong']
new_stop_words=stopwords+user_defined_stop_words
def preprocess(text):
return [word for word in word_tokenize(text) if word.lower() not in new_stop_words and not word.isdigit()]
miss_data['Clean_addr'] = miss_data['Adj_Addr'].apply(preprocess)
Run Code Online (Sandbox Code Playgroud)
样本输入
23FLOOR 9 DES VOEUX RD WEST HONG KONG
PAG CONSULTING FLAT 15 AIA CENTRAL 1 CONNAUGHT RD CENTRAL
C/O CITY LOST STUDIOS AND FLAT 4F 13-15 HILLIER ST SHEUNG HONG KONG
Run Code Online (Sandbox Code Playgroud)
预期产出
floor des voeux west
pag consulting flat aia central connaught central
co city lost studios flat f hillier sheung
Run Code Online (Sandbox Code Playgroud) 如何将计数矢量化文本数据转换回文本形式。我有文本数据,我使用 countvectorizer 将其制成稀疏矩阵进行分类。现在我希望将文本数据的稀疏矩阵转换回文本数据。
我的代码
cv = CountVectorizer( max_features = 500,analyzer='word')
cv_addr = cv.fit_transform(data.pop('Clean_addr'))
for i, col in enumerate(cv.get_feature_names()):
data[col] = pd.SparseSeries(cv_addr[:, i].toarray().ravel(), fill_value=0)
Run Code Online (Sandbox Code Playgroud) 我有一个如下所示的数据框df
SL.No Invoice
1 A2345
2 B1624
3 C1234
Run Code Online (Sandbox Code Playgroud)
我需要Status
针对每一行创建另一列,其中包含 ['Approved'、'Rejected'、'Partially Approved'] 等值,以写入 Excel 文件。
如何使用 python 来完成此操作?
python ×4
pandas ×3
dataframe ×2
airflow ×1
nltk ×1
openpyxl ×1
python-3.x ×1
scikit-learn ×1
string ×1