如何使用TF-IDF vectorizer从scikit学习库提取unigrams和bigrams鸣叫的?我想用输出训练分类器。
这是来自 scikit-learn 的代码:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
'This is the first document.',
'This document is the second document.',
'And this is the third one.',
'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
Run Code Online (Sandbox Code Playgroud)