相关疑难解决方法(0)

如何使用sklearn CountVectorizer同时使用'word'和'char'分析器？ - 蟒蛇

如何使用sklearn CountVectorizer同时使用'word'和'char'分析器？ http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html

我可以通过单词或字符分别提取文本功能,但我如何创建charword_vectorizer？有没有办法结合矢量化器？或使用多个分析仪？

>>> from sklearn.feature_extraction.text import CountVectorizer
>>> word_vectorizer = CountVectorizer(analyzer='word', ngram_range=(1, 2), min_df=1)
>>> char_vectorizer = CountVectorizer(analyzer='char', ngram_range=(1, 2), min_df=1)
>>> x = ['this is a foo bar', 'you are a foo bar black sheep']
>>> word_vectorizer.fit_transform(x)
<2x15 sparse matrix of type '<type 'numpy.int64'>'
    with 18 stored elements in Compressed Sparse Column format>
>>> char_vectorizer.fit_transform(x)
<2x47 sparse matrix of type '<type 'numpy.int64'>'
    with 64 stored elements in Compressed Sparse Column format>
>>> char_vectorizer.get_feature_names()
[u' ', …

Run Code Online (Sandbox Code Playgroud)

python machine-learning text-analysis analyzer scikit-learn

alv*_*vas

lucky-day

7
推荐指数

1
解决办法

9137
查看次数