我正在使用spacy 2.0并使用带引号的字符串作为输入.
示例字符串
"The quoted text 'AA XX' should be tokenized"
Run Code Online (Sandbox Code Playgroud)
并期望提取
[The, quoted, text, 'AA XX', should, be, tokenized]
Run Code Online (Sandbox Code Playgroud)
然而,我在尝试时得到了一些奇怪的结果.Noun chunk和ents失去了其中一个引用.
import spacy
nlp = spacy.load('en')
s = "The quoted text 'AA XX' should be tokenized"
doc = nlp(s)
print([t for t in doc])
print([t for t in doc.noun_chunks])
print([t for t in doc.ents])
Run Code Online (Sandbox Code Playgroud)
结果
[The, quoted, text, ', AA, XX, ', should, be, tokenized]
[The quoted text 'AA XX]
[AA XX']
Run Code Online (Sandbox Code Playgroud)
解决我需要什么的最佳方法是什么