sha*_*han 3 python spacy spacy-transformers
我正在尝试将我的 spacy 版本升级到每晚,特别是为了使用 spacy 变压器
\n所以我转换了spacy简单的火车数据集,其格式如下
\ntd = [["Who is Shaka Khan?", {"entities": [(7, 17, "FRIENDS")]}],["I like London.", {"entities": [(7, 13, "LOC")]}],]
以上至
\n[[{"head": 0, "dep": "", "tag": "", "orth": "Who", "ner": "O", "id": 0}, {"head": 0, "dep": "", "tag": "", "orth": "is", "ner": "O", "id": 1}, {"head": 0, "dep": "", "tag": "", "orth": "Shaka", "ner": "B-FRIENDS", "id": 2}, {"head": 0, "dep": "", "tag": "", "orth": "Khan", "ner": "L-FRIENDS", "id": 3}, {"head": 0, "dep": "", "tag": "", "orth": "?", "ner": "O", "id": 4}], [{"head": 0, "dep": "", "tag": "", "orth": "I", "ner": "O", "id": 0}, {"head": 0, "dep": "", "tag": "", "orth": "like", "ner": "O", "id": 1}, {"head": 0, "dep": "", "tag": "", "orth": "London", "ner": "U-LOC", "id": 2}, {"head": 0, "dep": "", "tag": "", "orth": ".", "ner": "O", "id": 3}]]
使用以下脚本
\nsentences = []\nfor t in td:\n doc = nlp(t[0])\n tags = offsets_to_biluo_tags(doc, t[1]['entities'])\n ner_info = list(zip(doc, tags))\n tokens = []\n for n, i in enumerate(ner_info):\n token = {"head" : 0,\n "dep" : "",\n "tag" : "",\n "orth" : i[0].orth_,\n "ner" : i[1],\n "id" : n}\n tokens.append(token)\n sentences.append(tokens)\n\n\n\nwith open("train_data.json","w") as js:\n json.dump(sentences,js)```\n\n\nthen i tried to convert this train_data.json using \nspacy's convert command\n\n```python -m spacy convert train_data.json converted/```\n\n\nbut the result in converted folder is\n\n```\xe2\x9c\x94 Generated output file (0 documents): converted/train_data.spacy``` \n\nwhich means it doesn't created dataset\n\ncan anybody help on what i am missing\n\ni am trying to do this with spacy-nightly\nRun Code Online (Sandbox Code Playgroud)\n
您可以跳过中间 JSON 步骤并将注释直接转换为DocBin.
import spacy
from spacy.training import Example
from spacy.tokens import DocBin
td = [["Who is Shaka Khan?", {"entities": [(7, 17, "FRIENDS")]}],["I like London.", {"entities": [(7, 13, "LOC")]}],]
nlp = spacy.blank("en")
db = DocBin()
for text, annotations in td:
example = Example.from_dict(nlp.make_doc(text), annotations)
db.add(example.reference)
db.to_disk("td.spacy")
Run Code Online (Sandbox Code Playgroud)
请参阅:https ://nightly.spacy.io/usage/v3#migration-training-python
(如果您确实想使用中间 JSON 格式,请参阅以下规范: https: //spacy.io/api/annotation#json-input。您可以只包含和orth并保留其他功能,但您需要此结构包含、和。示例如下: https: //github.com/explosion/spaCy/blob/45c9a688285081cd69faa0627d9bcaf1f5e799a1/examples/training/training-data.json)nertokensparagraphsrawsentences
| 归档时间: |
|
| 查看次数: |
2127 次 |
| 最近记录: |