我们有用于将单词转换为向量的模型(例如word2vec模型).是否存在将句子/文档转换为向量的类似模型,可能使用为单个单词学习的向量?
如何使用nltk Python模块和WordNet找到单词域?
假设我有像(交易,需求汇票,支票,存折)这样的词,所有这些词的域名都是"BANK".我们如何在Python中使用nltk和WordNet来实现这一目标?
我正在尝试通过hypernym和hyponym关系:
例如:
from nltk.corpus import wordnet as wn
sports = wn.synset('sport.n.01')
sports.hyponyms()
[Synset('judo.n.01'), Synset('athletic_game.n.01'), Synset('spectator_sport.n.01'), Synset('contact_sport.n.01'), Synset('cycling.n.01'), Synset('funambulism.n.01'), Synset('water_sport.n.01'), Synset('riding.n.01'), Synset('gymnastics.n.01'), Synset('sledding.n.01'), Synset('skating.n.01'), Synset('skiing.n.01'), Synset('outdoor_sport.n.01'), Synset('rowing.n.01'), Synset('track_and_field.n.01'), Synset('archery.n.01'), Synset('team_sport.n.01'), Synset('rock_climbing.n.01'), Synset('racing.n.01'), Synset('blood_sport.n.01')]
Run Code Online (Sandbox Code Playgroud)
和
bark = wn.synset('bark.n.02')
bark.hypernyms()
[Synset('noise.n.01')]
Run Code Online (Sandbox Code Playgroud) 我知道WordNet有域名层次结构:例如sport-> football.
1)是否可以列出所有相关的单词,例如,"sport-> football"子域?
Response: goalkeeper, forward, penalty, ball, field, stadium, referee and so on.
Run Code Online (Sandbox Code Playgroud)
2)获取给定单词的域名,例如"守门员"?
Need something like [sport->football; sport->hockey] or [football;hockey] or just 'football'.
Run Code Online (Sandbox Code Playgroud)
它用于文档分类任务.
nlp semantic-web cluster-analysis wordnet document-classification