ema*_*tru 4 python dictionary linguistics spacy
我正在尝试解析语料库中的动词并在字典中列出它们,并计算每个动词作为及物、不及物和双及物出现的次数。我想知道如何使用 spacy 解析动词并将它们标记为及物、不及物和双及物。
在这里,我总结了Mirith/Verb-categorizer
. 基本上,您可以遍历VERB
标记并查看它们的子代,将它们分类为传递性、非传递性或双传递性。一个例子如下。
首先,导入spacy
,
import spacy
nlp = spacy.load('en')
Run Code Online (Sandbox Code Playgroud)
假设您有一个令牌示例,
tokens = nlp('I like this dog. It is pretty good. I saw a bird. We arrived at the classroom door with only seven seconds to spare.')
Run Code Online (Sandbox Code Playgroud)
您可以根据需要创建以下函数以转换VERB
为新类型:
def check_verb(token):
"""Check verb type given spacy token"""
if token.pos_ == 'VERB':
indirect_object = False
direct_object = False
for item in token.children:
if(item.dep_ == "iobj" or item.dep_ == "pobj"):
indirect_object = True
if (item.dep_ == "dobj" or item.dep_ == "dative"):
direct_object = True
if indirect_object and direct_object:
return 'DITRANVERB'
elif direct_object and not indirect_object:
return 'TRANVERB'
elif not direct_object and not indirect_object:
return 'INTRANVERB'
else:
return 'VERB'
else:
return token.pos_
Run Code Online (Sandbox Code Playgroud)
例子
[check_verb(t) for t in tokens] # ['PRON', 'TRAN', 'DET', 'NOUN', 'PUNCT', ...]
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2739 次 |
最近记录: |