use*_*890 3 python tf-idf scikit-learn tfidfvectorizer
我想从 sklearn 的 Tfidfvectorizer 对象中获取矩阵。这是我的代码:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]
vectorizer = TfidfVectorizer()
vectorizer.fit_transform(text)
Run Code Online (Sandbox Code Playgroud)
这是我尝试并返回错误的方法:
vectorizer.toarray()
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-117-76146e626284> in <module>() ----> 1 vectorizer.toarray() AttributeError: 'TfidfVectorizer' object has no attribute 'toarray'
另一种尝试
vectorizer.todense()
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-118-6386ee121184> in <module>() ----> 1 vectorizer.todense() AttributeError: 'TfidfVectorizer' object has no attribute 'todense'
请注意,vectorizer.fit_transform返回您想要获取的术语-文档矩阵。因此,保存它返回的内容,并使用todense,因为它将采用稀疏格式:
返回:X:稀疏矩阵,[n_samples,n_features]。Tf-idf 加权文档术语矩阵。
a = vectorizer.fit_transform(text)
a.todense()
matrix([[0.36388646, 0.27674503, 0.27674503, 0.36388646, 0.36388646,
0.36388646, 0.36388646, 0.42983441],
[0. , 0.78980693, 0. , 0. , 0. ,
0. , 0. , 0.61335554],
[0. , 0. , 0.78980693, 0. , 0. ,
0. , 0. , 0.61335554]])
Run Code Online (Sandbox Code Playgroud)