aks*_*aks 13 python nltk wordnet
我需要用一个单词输入一个输入文本文件.然后我需要使用wordnet找到lemma_names,定义和单词的synset的例子.我已经阅读了这本书:"使用NLTK 2.0 Cookbook进行Python文本处理"以及"使用NLTK进行自然语言处理"来帮助我实现这一目标.虽然我已经理解如何使用终端完成这项工作,但我无法使用文本编辑器执行相同的操作.
例如,如果输入文本具有单词"flabbergasted",则输出必须采用以下方式:
大吃一惊(动词)flabbergast,boggle,bowl over - 惊奇地克服; "这令人难以置信!" (形容词)傻眼,笨拙,大吃一惊,恍恍惚惚,雷鸣般的,笨拙的,笨拙的 - 仿佛惊讶和惊讶地打了个傻瓜; "她拒绝看到这起事故,一群警察傻眼了"; "这些惊讶的市议员说不出话来"; "被他晋升的消息震惊了"
同义词,定义和例句直接从WordNet获得!
我有以下代码:
from __future__ import division
import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()
#to tokenize input text into sentences
print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences
#to tokenize the tokenized sentences into words
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print the tokens
for a in words:
print a
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
Run Code Online (Sandbox Code Playgroud)
我得到以下输出:
flabbergasted
['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']
Run Code Online (Sandbox Code Playgroud)
有没有办法检索词性以及引理名称组?
And*_*oev 22
def synset(word):
wn.synsets(word)
Run Code Online (Sandbox Code Playgroud)
没有返回任何东西,所以默认你得到 None
你应该写
def synset(word):
return wn.synsets(word)
Run Code Online (Sandbox Code Playgroud)
提取引理名称:
from nltk.corpus import wordnet
syns = wordnet.synsets('car')
syns[0].lemmas[0].name
>>> 'car'
[s.lemmas[0].name for s in syns]
>>> ['car', 'car', 'car', 'car', 'cable_car']
[l.name for s in syns for l in s.lemmas]
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']
Run Code Online (Sandbox Code Playgroud)
在这里,我创建了一个可以轻松使用(导入)的模块,并且传递给它的字符串将返回字符串的所有引理字.
模块:
#!/usr/bin/python2.7
''' pass a string to this funciton ( eg 'car') and it will give you a list of
words which is related to cat, called lemma of CAT. '''
from nltk.corpus import wordnet as wn
import sys
#print all the synset element of an element
def lemmalist(str):
syn_set = []
for synset in wn.synsets(str):
for item in synset.lemma_names:
syn_set.append(item)
return syn_set
Run Code Online (Sandbox Code Playgroud)
用法:
注意:模块名称是lemma.py因此"从引理导入lemmalist"
>>> from lemma import lemmalist
>>> lemmalist('car')
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']
Run Code Online (Sandbox Code Playgroud)
干杯!