alv*_*vas 5 python nlp nltk wordnet wsd
根据文档,我可以在nltk中加载有意义的标记语料库:
>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
Run Code Online (Sandbox Code Playgroud)
我还可以得到definition,pos,offset,examples因为这样的:
>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition
Run Code Online (Sandbox Code Playgroud)
但是如何从语料库中获取synset的频率?打破这个问题:
我设法做到了。
from nltk.corpus import wordnet as wn
word = "dog"
synsets = wn.synsets(word)
sense2freq = {}
for s in synsets:
freq = 0
for lemma in s.lemmas:
freq+=lemma.count()
sense2freq[s.offset+"-"+s.pos] = freq
for s in sense2freq:
print s, sense2freq[s]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5210 次 |
| 最近记录: |