thi*_*tbl 10 python scikit-learn
给出以下示例:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.pipeline import Pipeline
import pandas as pd
pipe = Pipeline([
("tf_idf", TfidfVectorizer()),
("nmf", NMF())
])
data = pd.DataFrame([["Salut comment tu vas", "Hey how are you today", "I am okay and you ?"]]).T
data.columns = ["test"]
pipe.fit_transform(data.test)
Run Code Online (Sandbox Code Playgroud)
我想在scikit学习管道中获得与tf_idf输出相对应的中间数据状态(在tf_idf上的fit_transform但不是NMF之后)或NMF输入.或者用另一种方式说出来,这与申请相同
TfidfVectorizer().fit_transform(data.test)
Run Code Online (Sandbox Code Playgroud)
我知道pipe.named_steps ["tf_idf"] ti获得中间变换器,但我无法获取数据,只能使用此方法获取变换器的参数.
正如@Vivek Kumar在评论中所建议的那样,我在这里回答,我找到一个调试步骤,用于打印信息或将中间数据帧写入csv有用:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.pipeline import Pipeline
import pandas as pd
from sklearn.base import TransformerMixin, BaseEstimator
class Debug(BaseEstimator, TransformerMixin):
def transform(self, X):
print(X.shape)
self.shape = shape
# what other output you want
return X
def fit(self, X, y=None, **fit_params):
return self
pipe = Pipeline([
("tf_idf", TfidfVectorizer()),
("debug", Debug()),
("nmf", NMF())
])
data = pd.DataFrame([["Salut comment tu vas", "Hey how are you today", "I am okay and you ?"]]).T
data.columns = ["test"]
pipe.fit_transform(data.test)
Run Code Online (Sandbox Code Playgroud)
我现在向调试转换器添加了一个状态.现在您可以通过@datasailor在答案中访问形状:
pipe.named_steps["debug"].shape
Run Code Online (Sandbox Code Playgroud)
据我了解,您想获取转换后的训练数据。您已经在中拟合了数据pipe.named_steps["tf_idf"],因此只需使用此拟合模型再次转换训练数据即可:
pipe.named_steps["tf_idf"].transform(data.test)
Run Code Online (Sandbox Code Playgroud)