相关疑难解决方法(0)

nltk NaiveBayesClassifier培训情绪分析

我正在NaiveBayesClassifier使用句子训练Python,它给出了下面的错误.我不明白错误是什么,任何帮助都会很好.

我尝试了很多其他输入格式,但错误仍然存在.代码如下:

from text.classifiers import NaiveBayesClassifier
from text.blob import TextBlob
train = [('I love this sandwich.', 'pos'),
         ('This is an amazing place!', 'pos'),
         ('I feel very good about these beers.', 'pos'),
         ('This is my best work.', 'pos'),
         ("What an awesome view", 'pos'),
         ('I do not like this restaurant', 'neg'),
         ('I am tired of this stuff.', 'neg'),
         ("I can't deal with this", 'neg'),
         ('He is my sworn enemy!', 'neg'),
         ('My boss is horrible.', 'neg') ]

test = [('The beer …

Run Code Online (Sandbox Code Playgroud)

python nlp nltk sentiment-analysis textblob

stu*_*001

2014 11-16

22
推荐指数

3
解决办法

3万
查看次数

使用NLTK/Python中的电影评论语料库进行分类

我想在NLTK第6章中进行一些分类.这本书似乎跳过了创建类别的一步,我不确定我做错了什么.我的脚本在这里,响应如下.我的问题主要源于第一部分 - 基于目录名称的类别创建.这里的一些其他问题使用了文件名(即pos_1.txt和neg_1.txt),但我更喜欢创建可以将文件转储到的目录.

from nltk.corpus import movie_reviews

reviews = CategorizedPlaintextCorpusReader('./nltk_data/corpora/movie_reviews', r'(\w+)/*.txt', cat_pattern=r'/(\w+)/.txt')
reviews.categories()
['pos', 'neg']

documents = [(list(movie_reviews.words(fileid)), category)
            for category in movie_reviews.categories()
            for fileid in movie_reviews.fileids(category)]

all_words=nltk.FreqDist(
    w.lower() 
    for w in movie_reviews.words() 
    if w.lower() not in nltk.corpus.stopwords.words('english') and w.lower() not in  string.punctuation)
word_features = all_words.keys()[:100]

def document_features(document): 
    document_words = set(document) 
    features = {}
    for word in word_features:
        features['contains(%s)' % word] = (word in document_words)
    return features
print document_features(movie_reviews.words('pos/11.txt'))

featuresets = [(document_features(d), c) for …

Run Code Online (Sandbox Code Playgroud)

python nlp corpus nltk sentiment-analysis

use*_*184

2014 01-20

13
推荐指数

1
解决办法

2万
查看次数

使用我自己的语料库而不是movie_reviews语料库在NLTK中进行分类

我使用以下代码,并使用NLTK/Python中的电影评论语料库进行分类

import string
from itertools import chain
from nltk.corpus import movie_reviews as mr
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.classify import NaiveBayesClassifier as nbc
import nltk

stop = stopwords.words('english')
documents = [([w for w in mr.words(i) if w.lower() not in stop and w.lower() not in string.punctuation], i.split('/')[0]) for i in mr.fileids()]

word_features = FreqDist(chain(*[i for i,j in documents]))
word_features = word_features.keys()[:100]

numtrain = int(len(documents) * 90 / 100)
train_set = [({i:(i in tokens) for i in …

Run Code Online (Sandbox Code Playgroud)

nlp classification corpus nltk python-2.7

ZaM*_*ZaM

2017 05-23

5
推荐指数

1
解决办法

4240
查看次数

使用数据集进行NLTK培训和测试

我正在尝试使用Naive Bayes算法进行情感分析,并且正在阅读一些文章.正如几乎每篇文章中都提到的,我需要用一些预先计算的情绪来训练我的朴素贝叶斯算法.

现在,我有一段使用随NLTK提供的movie_review模块的代码.代码是:

import nltk
import random
from nltk.corpus import movie_reviews

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

all_words = []
for w in movie_reviews.words():
    all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)

    return features

featuresets = [(find_features(rev), category) for (rev, category) in documents]


training_set = featuresets[:1900]
testing_set = featuresets[1900:]

classifier = nltk.NaiveBayesClassifier.train(training_set)
print("Classifier accuracy percent:",(nltk.classify.accuracy(classifier, testing_set))*100)

Run Code Online (Sandbox Code Playgroud)

所以,在上面的代码中我有一个training_set和一个testing_set.我查看了movie_review模块,在电影评论模块中,我们有许多包含评论的小文本文件. …

python nlp nltk

arq*_*qam

2016 02-09

5
推荐指数

0
解决办法

2114
查看次数