我正在尝试使用不同的参数实现完美的流程:
from prefect import Flow, Parameter
from prefect.schedules import Schedule
from prefect.schedules.clocks import CronClock
a = Parameter('a', default=None, required=False)
b = Parameter('b', default=None, required=False)
schedule = Schedule(clocks=[
CronClock(' 0 18 * * 6', parameter_defaults={'a': 'a', 'b': 'b'}),
CronClock(' 0 12 * * 0', parameter_defaults={'a': 'a', 'b': 'b'})
])
flow = Flow(
name='test flow', schedule=schedule
)
flow.register()
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误:
Result check: OK
Traceback (most recent call last):
File "/home/psimakis/.config/JetBrains/PyCharm2020.2/scratches/scratch.py", line 18, in <module>
flow.register()
File "/home/psimakis/.local/share/virtualenvs/data-workflows-GfPV92cZ/lib/python3.6/site-packages/prefect/core/flow.py", line 1443, in register …Run Code Online (Sandbox Code Playgroud) 就我而言,postgres 数据库作为主要的 django 后端数据库。需要额外的 postgres 初始化。问题是postgres服务状态ready在额外的数据库初始化之前就变成了。因此,依赖的 django 应用程序在数据库初始化之前开始运行。
postgres有没有办法以额外初始化后的方式配置服务ready?
docker-compose.yml:
version: "3.3"
services:
postgres:
image: library/postgres:11
volumes:
- some_folder:/docker-entrypoint-initdb.d
django_app:
image: custom_django_image:latest
volumes:
- $PWD:/app
ports:
- 8000:8000
depends_on:
- postgres
Run Code Online (Sandbox Code Playgroud) 我正在尝试在 virtualenv 上安装 Apache Airflow。
首先我创建并激活了一个新的 python 虚拟环境,然后我通过 pip 安装了 apache-airflow。
$ virtualenv $HOME/.p2env -p /usr/bin/python
Running virtualenv with interpreter /usr/bin/python
New python executable in /home/cass/.p2env/bin/python
Installing setuptools, pip, wheel...done.
$ source $HOME/.p2env/bin/activate
(.p2env) $ pip install apache-airflow
Run Code Online (Sandbox Code Playgroud)
然后我尝试初始化气流元数据数据库我收到这些错误:
(.p2env) $ airflow initdb
[2018-03-14 16:50:22,924] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2018-03-14 16:50:22,944] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2018-03-14 16:50:23,043] {__init__.py:45} INFO - Using executor SequentialExecutor
DB: sqlite:////home/cass/airflow/airflow.db
[2018-03-14 16:50:23,128] {db.py:312} INFO - Creating tables …Run Code Online (Sandbox Code Playgroud) 我开发了一个有很多动作的爬虫。涉及许多 xpath,因此我使用 json 文件进行存储。然后爬虫开始运行我想对 xpath 进行基本语法检查(在使用 xpath 之前),并针对无效的 xpath 引发错误。
例如:
xpath1 = '//*[@id="react-root"]/section'
xpath2 = '//*[[@id="react-root"]/section'
xpath3 = '//*[@id="react-root"]\section'
Run Code Online (Sandbox Code Playgroud)
从这些 xpath 中,只有 xpath1 有效
是否有任何模块或正则表达式可以进行这种验证?
我正在尝试在单个索引中创建多个类型.例如,我试图在索引中创建两个类型(host,post)ytb,以便在它们之间创建父子关系.
PUT /ytb
{
"mappings": {
"post": {
"_parent": {
"type": "host"
},
"properties":{
"@timestamp": {
"type": "date"
},
"indexed": {
"type": "date"
},
"n_comments": {
"type": "long"
},
"n_harvested": {
"type": "long"
},
"n_likes": {
"type": "long"
},
"network": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"parent_id": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"type": "text"
},
"post_dbid": {
"type": "long"
}, …Run Code Online (Sandbox Code Playgroud) python ×2
airflow ×1
django ×1
docker ×1
postgresql ×1
prefect ×1
python-2.7 ×1
python-3.x ×1
virtualenv ×1
web-crawler ×1
web-scraping ×1
xpath ×1