Sha*_*eed 3 python-3.x scikit-learn
我正在尝试运行以下代码,但在执行 pipe['count'] 时出现“Pipeline”对象不可下标的错误。
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
import numpy as np
corpus = ['this is the first document',
'this document is the second document',
'and this is the third one',
'is this the first document']
vocabulary = ['this', 'document', 'first', 'is', 'second', 'the',
'and', 'one']
pipe = Pipeline([('count', CountVectorizer(vocabulary=vocabulary)),
('tfid', TfidfTransformer())]).fit(corpus)
pipe['count'].transform(corpus).toarray()
array([[1, 1, 1, 1, 0, 1, 0, 0],
[1, 2, 0, 1, 1, 1, 0, 0],
[1, 0, 0, 1, 0, 1, 1, 1],
[1, 1, 1, 1, 0, 1, 0, 0]])
pipe['tfid'].idf_
array([1. , 1.22314355, 1.51082562, 1. , 1.91629073,
1. , 1.91629073, 1.91629073])
pipe.transform(corpus).shape
(4, 8)```
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1245 次 |
| 最近记录: |