Alo*_*sai 6 python nlp numpy ml scikit-learn
我试图将两个计数和tfidf用作多项NB模型的特征.这是我的代码:
text = ["this is spam", "this isn't spam"]
labels = [0,1]
count_vectorizer = CountVectorizer(stop_words="english", min_df=3)
tf_transformer = TfidfTransformer(use_idf=True)
combined_features = FeatureUnion([("counts", self.count_vectorizer), ("tfidf", tf_transformer)]).fit(self.text)
classifier = MultinomialNB()
classifier.fit(combined_features, labels)
Run Code Online (Sandbox Code Playgroud)
但是我在使用FeatureUnion和tfidf时遇到错误:
TypeError: no supported conversion for types: (dtype('S18413'),)
Run Code Online (Sandbox Code Playgroud)
知道为什么会发生这种情况吗?是不是可以将两个计数和tfidf作为功能?
错误并非来自FeatureUnion,它来自于TfidfTransformer
您应该使用TfidfVectorizer而不是TfidfTransformer,变换器期望numpy数组作为输入而不是明文,因此TypeError
你的测试句对于Tfidf测试来说太小了,所以尝试使用更大的测试句,这是一个例子:
from nltk.corpus import brown
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.pipeline import FeatureUnion
from sklearn.naive_bayes import MultinomialNB
# Let's get more text from NLTK
text = [" ".join(i) for i in brown.sents()[:100]]
# I'm just gonna assign random tags.
labels = ['yes']*50 + ['no']*50
count_vectorizer = CountVectorizer(stop_words="english", min_df=3)
tf_transformer = TfidfVectorizer(use_idf=True)
combined_features = FeatureUnion([("counts", count_vectorizer), ("tfidf", tf_transformer)]).fit_transform(text)
classifier = MultinomialNB()
classifier.fit(combined_features, labels)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1859 次 |
| 最近记录: |