Python中的词义消歧算法

gee*_*oid 1 python nlp nltk

我正在开发一个简单的NLP项目,我正在寻找,给出一个文本和一个单词,找到文本中最可能的意义.

在Python中是否有任何wsd算法的实现?目前尚不清楚NLTK中是否有某些东西可以帮助我.即使是像莱斯克算法那样天真的实现,我也会很高兴.

我已经读过类似的问题,比如NLTK Python中的Word sense disambiguation,但它们只提供了一本NLTK书的参考,这本书并不是WSD问题.

alv*_*vas 10

简而言之: https ://github.com/alvations/pywsd

长期以来: WSD使用了无穷无尽的技术,从需要大量GPU功能的思维爆破机技术到简单地使用wordnet中的信息甚至只是使用频率,请参阅http://dl.acm.org/citation.cfm ?id = 1459355.

让我们从允许可选词干的简单lesk算法开始,参见http://en.wikipedia.org/wiki/Lesk_algorithm:

from nltk.corpus import wordnet as wn
from nltk.stem import PorterStemmer
from itertools import chain

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']

plant_sents = ['The workers at the industrial plant were overworked',
'The plant was no longer bearing flowers']

ps = PorterStemmer()

def lesk(context_sentence, ambiguous_word, pos=None, stem=True, hyperhypo=True):
    max_overlaps = 0; lesk_sense = None
    context_sentence = context_sentence.split()
    for ss in wn.synsets(ambiguous_word):
        # If POS is specified.
        if pos and ss.pos is not pos:
            continue

        lesk_dictionary = []

        # Includes definition.
        lesk_dictionary+= ss.definition.split()
        # Includes lemma_names.
        lesk_dictionary+= ss.lemma_names

        # Optional: includes lemma_names of hypernyms and hyponyms.
        if hyperhypo == True:
            lesk_dictionary+= list(chain(*[i.lemma_names for i in ss.hypernyms()+ss.hyponyms()]))       

        if stem == True: # Matching exact words causes sparsity, so lets match stems.
            lesk_dictionary = [ps.stem(i) for i in lesk_dictionary]
            context_sentence = [ps.stem(i) for i in context_sentence] 

        overlaps = set(lesk_dictionary).intersection(context_sentence)

        if len(overlaps) > max_overlaps:
            lesk_sense = ss
            max_overlaps = len(overlaps)
    return lesk_sense

print "Context:", bank_sents[0]
answer = lesk(bank_sents[0],'bank')
print "Sense:", answer
print "Definition:",answer.definition
print

print "Context:", bank_sents[1]
answer = lesk(bank_sents[1],'bank','n')
print "Sense:", answer
print "Definition:",answer.definition
print

print "Context:", plant_sents[0]
answer = lesk(plant_sents[0],'plant','n', True)
print "Sense:", answer
print "Definition:",answer.definition
print
Run Code Online (Sandbox Code Playgroud)

除了lesk样的算法,有尝试过的人,一个不错,但过时,但仍然是有用的调查不同的相似措施:http://acl.ldc.upenn.edu/P/P97/P97-1008.pdf