相关疑难解决方法(0)

spacy是否将令牌列表作为输入？

我想使用spacy的POS标记，NER和依赖项解析，而不使用单词标记化。确实，我的输入是表示一个句子的标记列表，并且我想尊重用户的标记化。使用spacy或任何其他NLP软件包，这是否完全可能？

现在，我正在使用基于spacy的函数以Conll格式放置一个句子（一个unicode字符串）：

import spacy
nlp = spacy.load('en')
def toConll(string_doc, nlp):
   doc = nlp(string_doc)
   block = []
   for i, word in enumerate(doc):
          if word.head == word:
                  head_idx = 0
          else:
                  head_idx = word.head.i - doc[0].i + 1
          head_idx = str(head_idx)
          line = [str(i+1), str(word), word.lemma_, word.tag_,
                      word.ent_type_, head_idx, word.dep_]
          block.append(line)
   return block
conll_format = toConll(u"Donald Trump is the new president of the United States of America")

Output:
[['1', 'Donald', u'donald', u'NNP', u'PERSON', '2', u'compound'],
 ['2', 'Trump', u'trump', u'NNP', u'PERSON', '3', u'nsubj'], …

Run Code Online (Sandbox Code Playgroud)

tokenize python-2.7 spacy dependency-parsing

dad*_*ada

2018 01-09

7
推荐指数

1
解决办法

2146
查看次数

标签统计

dependency-parsing ×1

python-2.7 ×1

spacy ×1

tokenize ×1

spacy是否将令牌列表作为输入？

标签 统计

标签统计