如何使用sklearn CountVectorizer同时使用'word'和'char'分析器? http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
我可以通过单词或字符分别提取文本功能,但我如何创建charword_vectorizer?有没有办法结合矢量化器?或使用多个分析仪?
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> word_vectorizer = CountVectorizer(analyzer='word', ngram_range=(1, 2), min_df=1)
>>> char_vectorizer = CountVectorizer(analyzer='char', ngram_range=(1, 2), min_df=1)
>>> x = ['this is a foo bar', 'you are a foo bar black sheep']
>>> word_vectorizer.fit_transform(x)
<2x15 sparse matrix of type '<type 'numpy.int64'>'
with 18 stored elements in Compressed Sparse Column format>
>>> char_vectorizer.fit_transform(x)
<2x47 sparse matrix of type '<type 'numpy.int64'>'
with 64 stored elements in Compressed Sparse Column format>
>>> char_vectorizer.get_feature_names()
[u' ', …Run Code Online (Sandbox Code Playgroud)