Gun*_*her 7 python nltk sentiment-analysis
我希望使用NLTK获得单个单词和句子中每个单词之间的相似性.
NLTK可以获得两个特定单词之间的相似性,如下所示.这个方法要求给出对该单词的特定引用,在这种情况下它是'dog.n.01',其中dog是名词,我们想要使用第一个(01)NLTK定义.
dog = wordnet.synset('dog.n.01')
cat = wordnet.synset('cat.n.01')
print dog.path_similarity(cat)
>> 0.2
Run Code Online (Sandbox Code Playgroud)
问题是我需要从句子中的每个单词中获取词性信息.NLTK包能够获得句子中每个单词的词性,如下所示.但是,这些语音部分('NN','VB','PRP'...)与synset用作参数的格式不匹配.
text = word_tokenize("They refuse to permit us to obtain the refuse permit")
pos_tag(text)
>> [('They', 'PRP'), ('refuse', 'VBP'), ('to', 'TO'), ('permit', 'VB'), ('us', 'PRP'), ('to', 'TO'), ('obtain', 'VB'), ('the', 'DT'), ('refuse', 'NN'), ('permit', 'NN')]
Run Code Online (Sandbox Code Playgroud)
是否可以从pos_tag()获取synset格式的数据导致NLTK?通过synset格式化我的意思是格式dog.n.01
bog*_*ogs 10
您可以使用简单的转换功能:
from nltk.corpus import wordnet as wn
def penn_to_wn(tag):
if tag.startswith('J'):
return wn.ADJ
elif tag.startswith('N'):
return wn.NOUN
elif tag.startswith('R'):
return wn.ADV
elif tag.startswith('V'):
return wn.VERB
return None
Run Code Online (Sandbox Code Playgroud)
标记句子后,您可以使用此函数将句子中的单词与SYNSET绑定.这是一个例子:
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag, word_tokenize
sentence = "I am going to buy some gifts"
tagged = pos_tag(word_tokenize(sentence))
synsets = []
lemmatzr = WordNetLemmatizer()
for token in tagged:
wn_tag = penn_to_wn(token[1])
if not wn_tag:
continue
lemma = lemmatzr.lemmatize(token[0], pos=wn_tag)
synsets.append(wn.synsets(lemma, pos=wn_tag)[0])
print synsets
Run Code Online (Sandbox Code Playgroud)
结果:[Synset('be.v.01'),Synset('travel.v.01'),Synset('buy.v.01'),Synset('gift.n.01')]
| 归档时间: |
|
| 查看次数: |
2588 次 |
| 最近记录: |