我正在NaiveBayesClassifier使用句子训练Python,它给出了下面的错误.我不明白错误是什么,任何帮助都会很好.
我尝试了很多其他输入格式,但错误仍然存在.代码如下:
from text.classifiers import NaiveBayesClassifier
from text.blob import TextBlob
train = [('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg') ]
test = [('The beer …Run Code Online (Sandbox Code Playgroud) 我想在NLTK第6章中进行一些分类.这本书似乎跳过了创建类别的一步,我不确定我做错了什么.我的脚本在这里,响应如下.我的问题主要源于第一部分 - 基于目录名称的类别创建.这里的一些其他问题使用了文件名(即pos_1.txt和neg_1.txt),但我更喜欢创建可以将文件转储到的目录.
from nltk.corpus import movie_reviews
reviews = CategorizedPlaintextCorpusReader('./nltk_data/corpora/movie_reviews', r'(\w+)/*.txt', cat_pattern=r'/(\w+)/.txt')
reviews.categories()
['pos', 'neg']
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
all_words=nltk.FreqDist(
w.lower()
for w in movie_reviews.words()
if w.lower() not in nltk.corpus.stopwords.words('english') and w.lower() not in string.punctuation)
word_features = all_words.keys()[:100]
def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains(%s)' % word] = (word in document_words)
return features
print document_features(movie_reviews.words('pos/11.txt'))
featuresets = [(document_features(d), c) for …Run Code Online (Sandbox Code Playgroud) 我使用以下代码,并使用NLTK/Python中的电影评论语料库进行分类
import string
from itertools import chain
from nltk.corpus import movie_reviews as mr
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.classify import NaiveBayesClassifier as nbc
import nltk
stop = stopwords.words('english')
documents = [([w for w in mr.words(i) if w.lower() not in stop and w.lower() not in string.punctuation], i.split('/')[0]) for i in mr.fileids()]
word_features = FreqDist(chain(*[i for i,j in documents]))
word_features = word_features.keys()[:100]
numtrain = int(len(documents) * 90 / 100)
train_set = [({i:(i in tokens) for i in …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用Naive Bayes算法进行情感分析,并且正在阅读一些文章.正如几乎每篇文章中都提到的,我需要用一些预先计算的情绪来训练我的朴素贝叶斯算法.
现在,我有一段使用随NLTK提供的movie_review模块的代码.代码是:
import nltk
import random
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def find_features(document):
words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features
featuresets = [(find_features(rev), category) for (rev, category) in documents]
training_set = featuresets[:1900]
testing_set = featuresets[1900:]
classifier = nltk.NaiveBayesClassifier.train(training_set)
print("Classifier accuracy percent:",(nltk.classify.accuracy(classifier, testing_set))*100)
Run Code Online (Sandbox Code Playgroud)
所以,在上面的代码中我有一个training_set和一个testing_set.我查看了movie_review模块,在电影评论模块中,我们有许多包含评论的小文本文件. …