使用wordnet nltk确定Hypernym或Hyponym

ank*_*ngh 6 python sparql nltk rdflib

我要检查两个单词之间的上位/下位词关系(由用户给定的),这意味着它们中的任何可以是其他上位词或它也可以是不存在的two.Can之间没有上位词关系我使用path_similarity的情况下我正试图这样做.如果你可以建议任何更好的方法.我也想知道从sparql查询检查相同是否更好

 first=wn.synset('automobile.n.01')
 second=wn.synset('car.n.01')
 first.path_similarity(second) 
Run Code Online (Sandbox Code Playgroud)

alv*_*vas 15

首先,wordnet wordsynset/ 之间存在差异concept.

在这里,我们看到一个词可以有多个含义(即链接到多个概念):

>>> from nltk.corpus import wordnet as wn
>>> car = 'car'
>>> auto = 'automobile'
>>> wn.synsets(auto)
[Synset('car.n.01'), Synset('automobile.v.01')]
>>> wn.synsets(car)
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
Run Code Online (Sandbox Code Playgroud)

在这种情况下,'汽车'和'汽车'可以指相同的Synset('car.n.01'),如果是这样,那么他们没有低/高关系.

还有一个概念lemma会使事情变得复杂,所以我们暂时不会这样做.

假设你不是在比较单词而是比较同义词,那么你可以简单地找到synset的所有下位词,看看是否在其中发生了其他同义词.

如果你要比较简单的单词,请参阅如何在python nltk和wordnet中获取单词/ synset的所有下位词?

以下将显示如何比较同义词.例如,我会使用'fruit'和'apple',它比'cars'和'car'更合乎逻辑,因为'auto'和'car'只有一个名词synset

>>> from nltk.corpus import wordnet as wn
>>>
>>> fruit = 'fruit'
>>> wn.synsets(fruit)
[Synset('fruit.n.01'), Synset('yield.n.03'), Synset('fruit.n.03'), Synset('fruit.v.01'), Synset('fruit.v.02')]
>>> wn.synsets(fruit)[0].definition()
u'the ripened reproductive body of a seed plant'
>>> fruit = wn.synsets(fruit)[0]
>>> 
>>> apple = 'apple'
>>> wn.synsets(apple)
[Synset('apple.n.01'), Synset('apple.n.02')]
>>> wn.synsets(apple)[0].definition()
u'fruit with red or yellow or green skin and sweet to tart crisp whitish flesh'
>>> apple = wn.synsets(apple)[0]
>>>
Run Code Online (Sandbox Code Playgroud)

下面,我们看到苹果不是水果的直接下位词:

>>> fruit.hyponyms()
[Synset('accessory_fruit.n.01'), Synset('achene.n.01'), Synset('acorn.n.01'), Synset('aggregate_fruit.n.01'), Synset('berry.n.02'), Synset('buckthorn_berry.n.01'), Synset('buffalo_nut.n.01'), Synset('chokecherry.n.01'), Synset('cubeb.n.01'), Synset('drupe.n.01'), Synset('ear.n.05'), Synset('edible_fruit.n.01'), Synset('fruitlet.n.01'), Synset('gourd.n.02'), Synset('hagberry.n.01'), Synset('hip.n.05'), Synset('juniper_berry.n.01'), Synset('marasca.n.01'), Synset('may_apple.n.01'), Synset('olive.n.01'), Synset('pod.n.02'), Synset('pome.n.01'), Synset('prairie_gourd.n.01'), Synset('pyxidium.n.01'), Synset('quandong.n.02'), Synset('rowanberry.n.01'), Synset('schizocarp.n.01'), Synset('seed.n.01'), Synset('wild_cherry.n.01')]
>>> 
>>> apple in fruit.hyponyms()
False
Run Code Online (Sandbox Code Playgroud)

所以我们必须迭代所有的下位词,看看苹果是否在其中一个:

>>> hypofruits = set([i for i in fruit.closure(lambda s:s.hyponyms())])
>>> apple in hypofruits
True
Run Code Online (Sandbox Code Playgroud)

你有它!为了完整起见:

>>> hyperapple = set([i for i in apple.closure(lambda s:s.hypernyms())])
>>> fruit in hyperapple
True
>>> hypoapple = set([i for i in apple.closure(lambda s:s.hyponyms())])
>>> fruit in hypoapple
False
>>> hyperfruit = set([i for i in fruit.closure(lambda s:s.hypernyms())])
>>> apple in hyperfruit
False
Run Code Online (Sandbox Code Playgroud)