标签: sentiment-analysis

from nltk.corpus import twitter_samples

documents = [(list(twitter_samples.strings(fileid)), category)
             for category in twitter_samples.categories()
             for fileid in twitter_samples.fileids(category)]

Run Code Online (Sandbox Code Playgroud)

但它给了我这个错误：

    Traceback (most recent call last):
  File "C:/Users/neptun/PycharmProjects/Thesis/First_sentimental.py", line 6, in <module>
    for category in twitter_samples.categories()
  File "C:\Users\neptun\AppData\Local\Programs\Python\Python36-32\lib\site-packages\nltk\corpus\util.py", line 119, in __getattr__
    return getattr(self, attr)
AttributeError: 'TwitterCorpusReader' object has no attribute 'categories'

Run Code Online (Sandbox Code Playgroud)

我不知道如何为他们提供可用的属性，以便让我的列表具有积极和消极的情绪。

python twitter nltk sentiment-analysis

Cav*_*ier

2017 05-12

2
推荐指数

1
解决办法

1855
查看次数

LightGBM中不平衡数据集的情感分析

我试图对2个类的数据集(二进制分类)进行情感分析.数据集严重失衡70% - 30%.我正在使用LightGBM和Python 3.6制作模型并预测输出.

我认为我的模型的数据集效果性能不平衡.我得到了90%准确性,但即使我已经对参数进行了微调,它也没有进一步增加.我不认为这是最大可能的准确性,因为有其他人得分比这更好.

我用Textacy和清理了数据集nltk.我CountVectorizer用来编码文本.

我已经尝试up-sampling过数据集,但导致模型不佳(我没有调整过该模型)

我试过使用is_unbalance参数LightGBM,但它没有给我一个更好的模型.

是否有任何方法可以处理这种不平衡的数据集.我怎样才能进一步改进我的模型.我应该尝试下采样吗？或者它是最大可能的准确度.我怎么能确定它.

nlp machine-learning python-3.x sentiment-analysis lightgbm

Sre*_* TP

2017 11-19

2
推荐指数

1
解决办法

1408
查看次数

在python中读取一个大型的预训练fastext词嵌入文件

我正在做情绪分析，我想使用预训练的 fasttext 嵌入，但是文件非常大（6.7 GB）并且程序需要很长时间才能编译。

fasttext_dir = '/Fasttext'
embeddings_index = {}
f = open(os.path.join(fasttext_dir, 'wiki.en.vec'), 'r', encoding='utf-8')
for line in tqdm(f):
    values = line.rstrip().rsplit(' ')
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('found %s word vectors' % len(embeddings_index))

embedding_dim = 300

embedding_matrix = np.zeros((max_words, embedding_dim))
for word, i in word_index.items():
    if i < max_words:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector

Run Code Online (Sandbox Code Playgroud)

有什么办法可以加快这个过程吗？

python sentiment-analysis keras fasttext

Blu*_*ngo

lucky-day

2
推荐指数

1
解决办法

3904
查看次数

无法从 utils 导入 process_tweets

感谢您对此进行研究，我有一个我需要的 python 程序，process_tweet并且build_freqs对于某些 NLP 任务，nltk它已经安装了，但utils 没有安装，所以我通过安装了它pip install utils，但上面提到的两个模块显然没有安装，错误我这里得到的是标准的，

ImportError: cannot import name 'process_tweet' from
'utils' (C:\Python\lib\site-packages\utils\__init__.py)

Run Code Online (Sandbox Code Playgroud)

我做错了什么或者有什么遗漏吗？我还提到了这个 stackoverflow 答案，但没有帮助。

python nlp nltk sentiment-analysis

Paw*_*pal

2020 12-19

2
推荐指数

1
解决办法

5588
查看次数

从 Huggingface 模型中提取中性情绪

我正在使用 Hugging-face 管道进行情绪分析任务，它为我提供积极/消极情绪以及置信度得分。就我而言，我需要三个输出（正/中性/负）。问题是，即使是中性句子（例如：“他有她有”），拥抱脸也会给我很高的置信度分数？有什么建议么？

from transformers import pipeline
model = pipeline(task = 'sentiment-analysis')
sentence = 'some text to evaluate'
predicted = model(sentence)
print(predicted)

Run Code Online (Sandbox Code Playgroud)

以下是一些输出示例：

----------------------------------------------
sentence = 'I love you'
predicted = model(sentence)
predicted
[{'label': 'POSITIVE', 'score': 0.9998656511306763}]
----------------------------------------------
sentence = 'I hate you'
predicted = model(sentence)
predicted
[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
----------------------------------------------
sentence = 'I have she had'
predicted = model(sentence)
predicted
[{'label': 'POSITIVE', 'score': 0.9821817874908447}]
----------------------------------------------
sentence = 'I go to work'
predicted = model(sentence)
predicted
[{'label': 'POSITIVE', 'score': …

Run Code Online (Sandbox Code Playgroud)

python nlp sentiment-analysis huggingface-transformers

You*_*cef

lucky-day

2
推荐指数

1
解决办法

1555
查看次数

无法下载Niek Sanders的Twitter情绪语料库

我正在关注Twitter情绪分析的教程.我在这里下载了代码http://www.sananalytics.com/lab/twitter-sentiment/.我按照以下步骤从cmd提示符运行install.py,而它确实在'rawdata'文件夹中创建了json文件,当我查看这些json文件时,它说:

{
    "errors": [
        {
            "message": "SSL is required",
            "code": 92
        }
    ]
}

Run Code Online (Sandbox Code Playgroud)

install.py代码如下:

#
# Sanders-Twitter Sentiment Corpus Install Script
# Version 0.1
#
# Pulls tweet data from Twitter because ToS prevents distributing it directly.
#
# Right now we use unauthenticated requests, which are rate-limited to 150/hr.
# We use 125/hr to stay safe.  
#
# We could more than double the download speed by using authentication with
# OAuth logins.  But for now, …

Run Code Online (Sandbox Code Playgroud)

twitter python-2.7 sentiment-analysis

Kub*_*888

lucky-day

1
推荐指数

1
解决办法

2879
查看次数

CSV文件上的Textblob情感分析

我有一个约50行句子的csv文件。我正在使用textblob情绪分析工具。为了测试句子的极性，该示例显示了您写一个句子的过程，并显示了极性和主观性。但是，它仅适用于单个句子，我希望它适用于我拥有的csv文件，因为我不能在每一行中单独进行测试，因为这会花费很长时间。我将如何去做呢？

TextBlob显示了此示例，当我输入一个句子时，极性显示出来，您不能一次输入两个句子，它不允许您输入。如何将我的csv文件输入下面的示例中，以便为所有行提供极性？

>>> testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
>>> testimonial.sentiment
Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)
>>> testimonial.sentiment.polarity
0.39166666666666666

Run Code Online (Sandbox Code Playgroud)

编辑了chishaku解决方案，它对我有用。解：

import csv
from textblob import TextBlob

infile = 'xxx.csv'

with open(infile, 'r') as csvfile:
    rows = csv.reader(csvfile)
    for row in rows:
        sentence = row[0]
        blob = TextBlob(sentence)
        print blob.sentiment

Run Code Online (Sandbox Code Playgroud)

python sentiment-analysis textblob

u.1*_*234

2016 02-23

1
推荐指数

1
解决办法

1万
查看次数

添加多个隐藏层keras

我有一个使用keras的简单情绪分析器，这是我的代码，其中我在github上使用keras代码：https : //github.com/keras-team/keras/blob/master/examples/imdb_lstm.py

最初的工作模型是：

from __future__ import print_function

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding, Activation
from keras.layers import GRU, LeakyReLU
from keras.datasets import imdb

max_features = 2000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 256
hidden_layer_size = 32
dropout = 0.2
num_epochs = 1
activation_func = LeakyReLU(alpha=0.5)

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test …

Run Code Online (Sandbox Code Playgroud)

nlp python-3.x sentiment-analysis deep-learning keras

ira*_*v94

lucky-day

1
推荐指数

1
解决办法

2441
查看次数