fas*_*oth 27 python tf-idf scikit-learn
本页:http://scikit-learn.org/stable/modules/feature_extraction.html提及:
由于tf-idf经常用于文本特征,因此还有另一个名为TfidfVectorizer的类,它将CountVectorizer和TfidfTransformer的所有选项组合在一个模型中.
然后我按照代码在我的语料库上使用fit_transform().如何获得fit_transform()计算的每个特征的权重?
我试过了:
In [39]: vectorizer.idf_
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-5475eefe04c0> in <module>()
----> 1 vectorizer.idf_
AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'
Run Code Online (Sandbox Code Playgroud)
但是这个属性丢失了.
谢谢
YS-*_*S-L 78
由于0.15版本,每个特征的TF-IDF评分可以通过属性来检索idf_所述的TfidfVectorizer对象:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
"This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
Run Code Online (Sandbox Code Playgroud)
输出:
{u'is': 1.0,
u'nice': 1.4054651081081644,
u'strange': 1.4054651081081644,
u'this': 1.0,
u'very': 1.0}
Run Code Online (Sandbox Code Playgroud)
正如评论中所讨论的,在版本0.15之前,解决方法是idf_通过所谓的矢量化器的隐藏_tfidf(实例TfidfTransformer)访问该属性:
idf = vectorizer._tfidf.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
Run Code Online (Sandbox Code Playgroud)
它应该提供与上面相同的输出.
| 归档时间: |
|
| 查看次数: |
51245 次 |
| 最近记录: |