Python Maxent分类器

cjd*_*jds 2 python nltk maxent

我一直在python中使用maxent分类器,它失败了,我不明白为什么.

我正在使用电影评论语料库.(总菜鸟)

import nltk.classify.util
from nltk.classify import MaxentClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
 return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
classifier = MaxentClassifier.train(trainfeats)
Run Code Online (Sandbox Code Playgroud)

这是错误(我知道我做错了,请链接到Maxent如何工作)

警告(来自警告模块):文件"C:\ Python27\lib\site-packages \nltk\classify\maxent.py",第1334行sum1 = numpy.sum(exp_nf_delta*A,axis = 0)运行时警告:遇到无效值乘以

警告(来自警告模块):文件"C:\ Python27\lib\site-packages \nltk\classify\maxent.py",第1335行sum2 = numpy.sum(nf_exp_nf_delta*A,axis = 0)运行时警告:遇到无效值乘以

警告(来自警告模块):文件"C:\ Python27\lib\site-packages \nltk\classify\maxent.py",第1341行deltas - =(ffreq_empirical - sum1)/ -sum2 RuntimeWarning:在div中遇到无效值

J4c*_*4cK 6

我改变并稍微更新了代码.

import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import MaxentClassifier
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist
from sklearn import cross_validation


from nltk.classify import MaxentClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
 return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
#classifier = nltk.MaxentClassifier.train(trainfeats)

algorithm = nltk.classify.MaxentClassifier.ALGORITHMS[0]
classifier = nltk.MaxentClassifier.train(trainfeats, algorithm,max_iter=3)

classifier.show_most_informative_features(10)

all_words = nltk.FreqDist(word for word in movie_reviews.words())
top_words = set(all_words.keys()[:300])

def word_feats(words):
    return {word:True for word in words if word in top_words}
Run Code Online (Sandbox Code Playgroud)