我正在NaiveBayesClassifier
使用句子训练Python,它给出了下面的错误.我不明白错误是什么,任何帮助都会很好.
我尝试了很多其他输入格式,但错误仍然存在.代码如下:
from text.classifiers import NaiveBayesClassifier
from text.blob import TextBlob
train = [('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg') ]
test = [('The beer …
Run Code Online (Sandbox Code Playgroud) 如何使用sklearn CountVectorizer同时使用'word'和'char'分析器? http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
我可以通过单词或字符分别提取文本功能,但我如何创建charword_vectorizer
?有没有办法结合矢量化器?或使用多个分析仪?
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> word_vectorizer = CountVectorizer(analyzer='word', ngram_range=(1, 2), min_df=1)
>>> char_vectorizer = CountVectorizer(analyzer='char', ngram_range=(1, 2), min_df=1)
>>> x = ['this is a foo bar', 'you are a foo bar black sheep']
>>> word_vectorizer.fit_transform(x)
<2x15 sparse matrix of type '<type 'numpy.int64'>'
with 18 stored elements in Compressed Sparse Column format>
>>> char_vectorizer.fit_transform(x)
<2x47 sparse matrix of type '<type 'numpy.int64'>'
with 64 stored elements in Compressed Sparse Column format>
>>> char_vectorizer.get_feature_names()
[u' ', …
Run Code Online (Sandbox Code Playgroud)