Sklearn:有没有办法调试管道？

Question

Sklearn:有没有办法调试管道？

Ama*_*don 10 python python-2.7 scikit-learn

我已经为分类任务创建了一些管道,我想查看每个阶段存在/存储的信息(例如text_stats,ngram_tfidf).我怎么能这样做

pipeline = Pipeline([
    ('features',FeatureUnion([
                ('text_stats', Pipeline([
                            ('length',TextStats()),
                            ('vect', DictVectorizer())
                        ])),
                ('ngram_tfidf',Pipeline([
                            ('count_vect', CountVectorizer(tokenizer=tokenize_bigram_stem,stop_words=stopwords)),
                            ('tfidf', TfidfTransformer())
                        ]))
            ])),   
    ('classifier',MultinomialNB(alpha=0.1))
])

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_* V. 6

我发现有时临时添加一个调试步骤以打印出您感兴趣的信息很有用。在sklearn示例1的示例基础上，您可以执行此操作以例如打印出前5行（形状），或者在调用分类器之前需要查看的内容：

from sklearn import svm
from sklearn.datasets import samples_generator
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression
from sklearn.pipeline import Pipeline
from sklearn.base import TransformerMixin, BaseEstimator

class Debug(BaseEstimator, TransformerMixin):

    def transform(self, X):
        print(pd.DataFrame(X).head())
        print(X.shape)
        return X

    def fit(self, X, y=None, **fit_params):
        return self

X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)
anova_filter = SelectKBest(f_regression, k=5)
clf = svm.SVC(kernel='linear')
anova_svm = Pipeline([('anova', anova_filter), ('dbg', Debug()), ('svc', clf)])
anova_svm.set_params(anova__k=10, svc__C=.1).fit(X, y)

prediction = anova_svm.predict(X)

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 0

您可以使用和属性遍历Pipeline()树。前者是元组列表，而后者为您提供从此列表构造的字典stepsnamed_steps('step_name', Step())

transformer_list可以使用属性以相同的方式探索FeatureUnion()内容

归档时间：	9 年，10 月前
查看次数：	1899 次
最近记录：	8 年，1 月前