小编Tom*_*nič的帖子

Neo4j如何建模时间版图

我的部分图表具有以下架构:

图的主要部分是域,其中有一些人链接到它.Person对电子邮件属性有一个独特的约束,因为我也有来自其他来源的数据,这非常适合.

在我的情况下,一个人可以是管理员,他有一些链接到他的设备/日历.我从一个SQL数据库中获取这些数据,我导入了几个表来组合整个图片.我从一个表开始,它有两列,管理员的电子邮件和他的用户ID.此用户标识仅适用于生产数据库,并且不会全局用于其他源.这就是为什么我使用电子邮件作为人的全球ID.我目前正在使用以下查询来导入所有生产表都链接到的用户ID.我总是得到用户设置和信息的当前快照.此查询每天运行4次:

CALL apoc.load.jdbc(url, import_query) yield row
MERGE (p:Person{email:row.email})
SET p.user_id = row.id

Run Code Online (Sandbox Code Playgroud)

然后我从其他表导入链接到此用户ID的所有数据.

现在问题出现了,因为来自生产数据库的用户可以更改他的电子邮件.所以我现在导入这个的方式我将最终得到两个具有相同user_id的人,随后所有设备/日历将链接到两个人,因为他们都共享相同的user_id.所以这不是对现实的准确表述.我们还需要捕获设备连接/断开连接到特定user_id的时间,因为可以连接/断开设备并将其借给具有不同管理员(user_id)的朋友.

如何更改我的图形模型(导入查询),以便:

查询当前是谁的管理员不需要复杂的查询
查询当前已连接设备的人不需要复杂查询
查询历史可能会更复杂一些.

graph neo4j cypher

Tom*_*nič

lucky-day

11
推荐指数

1
解决办法

1881
查看次数

Airflow 从私人谷歌容器存储库中拉取 docker 镜像

我正在使用https://github.com/puckle/docker-airflow图像来运行 Airflow。我不得不添加pip install docker它才能支持 DockerOperator。

一切似乎都很好，但我不知道如何从私人 google docker 容器存储库中提取图像。

我尝试在 google cloud conenction 的管理部分类型中添加连接并运行 docker 操作符。

    t2 = DockerOperator(
            task_id='docker_command',
            image='eu.gcr.io/project/image',
            api_version='2.3',
            auto_remove=True,
            command="/bin/sleep 30",
            docker_url="unix://var/run/docker.sock",
            network_mode="bridge",
            docker_conn_id="google_con"
    )

Run Code Online (Sandbox Code Playgroud)

但是总是报错...

[2019-11-05 14:12:51,162] {{taskinstance.py:1047}} 错误 - 未提供 Docker 注册表 URL

我也试过 docker_conf_option

    t2 = DockerOperator(
            task_id='docker_command',
            image='eu.gcr.io/project/image',
            api_version='2.3',
            auto_remove=True,
            command="/bin/sleep 30",
            docker_url="unix://var/run/docker.sock",
            network_mode="bridge",
            dockercfg_path="/usr/local/airflow/config.json",

    )

Run Code Online (Sandbox Code Playgroud)

我收到以下错误：

[2019-11-06 13:59:40,522] {{docker_operator.py:194}} 信息 - 从镜像 eu.gcr.io/project/image 启动 docker 容器 [2019-11-06 13:59:40,524] { {taskinstance.py:1047}} 错误 - ('连接中止。', FileNotFoundError(2, '没有这样的文件或目录'))

我也尝试只使用 dockercfg_path="config.json" …

docker google-container-registry airflow

Tom*_*nič

lucky-day

5
推荐指数

1
解决办法

2445
查看次数

如何在 AllenNLP 中加载微调的 sciBERT 模型？

我已经在 SciIE 数据集上微调了 SciBERT 模型。存储库使用 AllenNLP 来微调模型。训练执行如下：

python -m allennlp.run train $CONFIG_FILE  --include-package scibert -s "$@"

Run Code Online (Sandbox Code Playgroud)

成功训练后，我有一个 model.tar.gz 文件作为输出，其中包含 weights.th、config.json 和词汇文件夹。我试图将它加载到 allenlp 预测器中：

from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("model.tar.gz")

Run Code Online (Sandbox Code Playgroud)

但我收到以下错误：

配置错误：bert-pretrained 不是 dataset_reader.token_indexers.bert.type 可接受的选择：['single_id', 'characters', 'elmo_characters', 'spacy', 'pretrained_transformer', 'pretrained_transformer_mismatched']。您应该使用 --include-package 标志来确保加载了正确的模块，或者在配置文件中使用完全限定的类名，例如 {"model": "my_module.models.MyModel"} 以使其自动导入.

我从未与 allenNLP 合作过，所以我很困惑该怎么做。

作为参考，这是描述令牌索引器的配置的一部分

"token_indexers": {
            "bert": {
                "type": "bert-pretrained",
                "do_lowercase": "false",
                "pretrained_model": "/home/tomaz/neo4j/scibert/model/vocab.txt",
                "use_starting_offsets": true
            }
        }

Run Code Online (Sandbox Code Playgroud)

我正在使用 allenlp 版本

名称：allennlp 版本：1.2.1

编辑：

我想我已经取得了很大的进步，我必须使用与训练模型相同的版本，我可以像这样导入模块：

from allennlp.predictors.predictor import Predictor
from scibert.models.bert_crf_tagger import *
from scibert.models.bert_text_classifier …

Run Code Online (Sandbox Code Playgroud)

python allennlp

Tom*_*nič

2020 11-19

5
推荐指数

1
解决办法

851
查看次数

如何在 Transformers 库中截断 Bert 分词器

我正在使用 Scibert 预训练模型来获取各种文本的嵌入。代码如下：

from transformers import *

tokenizer = AutoTokenizer.from_pretrained('allenai/scibert_scivocab_uncased', model_max_length=512, truncation=True)
model = AutoModel.from_pretrained('allenai/scibert_scivocab_uncased')

Run Code Online (Sandbox Code Playgroud)

我已将最大长度和截断参数添加到分词器，但不幸的是，它们不会截断结果。如果我通过分词器运行更长的文本：