Mig*_*hty 5 python nlp nltk python-3.x sentiment-analysis
我正在使用vaderinnltk来查找文件中每一行的情绪。我有两个问题:
vader_lexicon.txt但其语法如下:攻击 -2.5 0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3]
-2.5和代表什么0.92195 [-1, -3, -3, -3, -4, -3, -1, -2, -2, -3]?
我应该如何为新单词编码?假设我必须添加类似'100%',的内容'A1'。
nltk_data\corpora\opinion_lexicon。这些如何被利用?我也可以在这些 txt 文件中添加我的话吗?我相信 vader 在对文本进行分类时只使用单词和第一个值。如果你想添加新单词,你可以简单地创建一个单词及其情感值的字典,可以使用更新函数添加:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
Analyzer = SentimentIntensityAnalyser()
Analyzer.lexicon.update(your_dictionary)
Run Code Online (Sandbox Code Playgroud)
您可以根据感知的情绪强度手动为单词分配情绪值,或者如果这不切实际,那么您可以在两个类别之间分配一个广泛的值(例如 -1.5 和 1.5)。
您可以使用此脚本(不是我的)来检查您的更新是否已包含在内:
import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import pandas as pd
Analyzer = SentimentIntensityAnalyzer()
sentence = 'enter your text to test'
tokenized_sentence = nltk.word_tokenize(sentence)
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]
for word in tokenized_sentence:
if (Analyzer.polarity_scores(word)['compound']) >= 0.1:
pos_word_list.append(word)
elif (Analyzer.polarity_scores(word)['compound']) <= -0.1:
neg_word_list.append(word)
else:
neu_word_list.append(word)
print('Positive:',pos_word_list)
print('Neutral:',neu_word_list)
print('Negative:',neg_word_list)
score = Analyzer.polarity_scores(sentence)
print('\nScores:', score)
Run Code Online (Sandbox Code Playgroud)
更新维达之前:
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'
Positive: []
Neutral: ['stocks', 'were', 'volatile', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'calamities', 'in', 'the', 'Chinese', 'markets']
Negative: []
Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
Run Code Online (Sandbox Code Playgroud)
使用基于金融的词典更新 vader 后:
Analyzer.lexicon.update(Financial_Lexicon)
sentence = 'stocks were volatile on Tuesday due to the recent calamities in the Chinese market'
Positive: []
Neutral: ['stocks', 'were', 'on', 'Tuesday', 'due', 'to', 'the', 'recent', 'in', 'the', 'Chinese', 'markets']
Negative: ['volatile', 'calamities']
Scores: {'neg': 0.294, 'neu': 0.706, 'pos': 0.0, 'compound': -0.6124}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3316 次 |
| 最近记录: |