sam*_*lis 25 python nlp nltk wordnet
我想要一个python库函数,它可以翻译/转换不同的语音部分.有时它应该输出多个单词(例如"编码器"和"代码"都是动词"代码"中的名词,一个是主题,另一个是对象)
# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']
Run Code Online (Sandbox Code Playgroud)
我主要关心动词<=>名词,我想写的笔记程序.即我可以写"咖啡因拮抗A1"或"咖啡因是A1拮抗剂"和一些NLP,它可以发现它们的意思相同.(我知道这并不容易,并且它将需要NLP解析并且不仅仅是标记,但我想破解原型).
类似的问题... 将形容词和副词转换为名词形式 (这个答案仅限于根POS.我想在POS之间进行.)
ps称为语言学转换http://en.wikipedia.org/wiki/Conversion_%28linguistics%29
bog*_*ogs 15
这更像是一种启发式方法.我刚刚编写了这样的风格的应用程序.它使用来自wordnet的derivationally_related_forms().我已经实现了nounify.我猜verbify的作用类似.从我测试过的工作做得很好:
from nltk.corpus import wordnet as wn
def nounify(verb_word):
""" Transform a verb to the closest noun: die -> death """
verb_synsets = wn.synsets(verb_word, pos="v")
# Word not found
if not verb_synsets:
return []
# Get all verb lemmas of the word
verb_lemmas = [l for s in verb_synsets \
for l in s.lemmas if s.name.split('.')[1] == 'v']
# Get related forms
derivationally_related_forms = [(l, l.derivationally_related_forms()) \
for l in verb_lemmas]
# filter only the nouns
related_noun_lemmas = [l for drf in derivationally_related_forms \
for l in drf[1] if l.synset.name.split('.')[1] == 'n']
# Extract the words from the lemmas
words = [l.name for l in related_noun_lemmas]
len_words = len(words)
# Build the result in the form of a list containing tuples (word, probability)
result = [(w, float(words.count(w))/len_words) for w in set(words)]
result.sort(key=lambda w: -w[1])
# return all the possibilities sorted by probability
return result
Run Code Online (Sandbox Code Playgroud)
stu*_*art 11
这里是一个函数,在理论上能够名词/动词/形容词/副词形式,我是从更新的言语间转换这里(由最初写沼泽,我相信)是符合NLTK 3.2.5现在synset.lemmas和sysnset.name有功能。
from nltk.corpus import wordnet as wn
# Just to make it a bit more readable
WN_NOUN = 'n'
WN_VERB = 'v'
WN_ADJECTIVE = 'a'
WN_ADJECTIVE_SATELLITE = 's'
WN_ADVERB = 'r'
def convert(word, from_pos, to_pos):
""" Transform words given from/to POS tags """
synsets = wn.synsets(word, pos=from_pos)
# Word not found
if not synsets:
return []
# Get all lemmas of the word (consider 'a'and 's' equivalent)
lemmas = []
for s in synsets:
for l in s.lemmas():
if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
lemmas += [l]
# Get related forms
derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]
# filter only the desired pos (consider 'a' and 's' equivalent)
related_noun_lemmas = []
for drf in derivationally_related_forms:
for l in drf[1]:
if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
related_noun_lemmas += [l]
# Extract the words from the lemmas
words = [l.name() for l in related_noun_lemmas]
len_words = len(words)
# Build the result in the form of a list containing tuples (word, probability)
result = [(w, float(words.count(w)) / len_words) for w in set(words)]
result.sort(key=lambda w:-w[1])
# return all the possibilities sorted by probability
return result
convert('direct', 'a', 'r')
convert('direct', 'a', 'n')
convert('quick', 'a', 'r')
convert('quickly', 'r', 'a')
convert('hunger', 'n', 'v')
convert('run', 'v', 'a')
convert('tired', 'a', 'r')
convert('tired', 'a', 'v')
convert('tired', 'a', 'n')
convert('tired', 'a', 's')
convert('wonder', 'v', 'n')
convert('wonder', 'n', 'a')
Run Code Online (Sandbox Code Playgroud)
正如您在下面看到的,它的效果不是很好。它无法在形容词和副词形式之间切换(我的特定目标),但它在其他情况下确实给出了一些有趣的结果。
>>> convert('direct', 'a', 'r')
[]
>>> convert('direct', 'a', 'n')
[('directness', 0.6666666666666666), ('line', 0.3333333333333333)]
>>> convert('quick', 'a', 'r')
[]
>>> convert('quickly', 'r', 'a')
[]
>>> convert('hunger', 'n', 'v')
[('hunger', 0.75), ('thirst', 0.25)]
>>> convert('run', 'v', 'a')
[('persistent', 0.16666666666666666), ('executive', 0.16666666666666666), ('operative', 0.16666666666666666), ('prevalent', 0.16666666666666666), ('meltable', 0.16666666666666666), ('operant', 0.16666666666666666)]
>>> convert('tired', 'a', 'r')
[]
>>> convert('tired', 'a', 'v')
[]
>>> convert('tired', 'a', 'n')
[('triteness', 0.25), ('banality', 0.25), ('tiredness', 0.25), ('commonplace', 0.25)]
>>> convert('tired', 'a', 's')
[]
>>> convert('wonder', 'v', 'n')
[('wonder', 0.3333333333333333), ('wonderer', 0.2222222222222222), ('marveller', 0.1111111111111111), ('marvel', 0.1111111111111111), ('wonderment', 0.1111111111111111), ('question', 0.1111111111111111)]
>>> convert('wonder', 'n', 'a')
[('curious', 0.4), ('wondrous', 0.2), ('marvelous', 0.2), ('marvellous', 0.2)]
Run Code Online (Sandbox Code Playgroud)
希望这可以为某人省点麻烦