相关疑难解决方法(0)

管道:多个分类器？

我在Python中阅读以下关于Pipelines和GridSearchCV的示例:http://www.davidsbatista.net/blog/2017/04/01/document_classification/

Logistic回归:

pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words=stop_words)),
    ('clf', OneVsRestClassifier(LogisticRegression(solver='sag')),
])
parameters = {
    'tfidf__max_df': (0.25, 0.5, 0.75),
    'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)],
    "clf__estimator__C": [0.01, 0.1, 1],
    "clf__estimator__class_weight": ['balanced', None],
}

Run Code Online (Sandbox Code Playgroud)

SVM:

pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words=stop_words)),
    ('clf', OneVsRestClassifier(LinearSVC()),
])
parameters = {
    'tfidf__max_df': (0.25, 0.5, 0.75),
    'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)],
    "clf__estimator__C": [0.01, 0.1, 1],
    "clf__estimator__class_weight": ['balanced', None],
}

Run Code Online (Sandbox Code Playgroud)

有没有一种方法可以将Logistic回归和SVM组合成一个管道？比方说,我有一个TfidfVectorizer,喜欢测试多个分类器,然后每个分类器输出最好的模型/参数.

python pipeline scikit-learn grid-search

Chr*_*her

2018 12-26

5
推荐指数

3
解决办法

4223
查看次数

使用sklearn管线比较多种算法

我正在尝试建立scikit学习管道来简化我的工作。我面临的问题是我不知道哪种算法（随机森林，朴素贝叶斯，决策树等）最适合，因此我需要尝试每种算法并比较结果。但是，流水线一次只采用一种算法吗？例如，下面的管道仅采用SGDClassifier（）作为算法。

pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier()),])

Run Code Online (Sandbox Code Playgroud)

如果我想比较不同的算法该怎么办？我可以做这样的事情吗？

pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SGDClassifier()),
('classifier', MultinomialNB()),])

Run Code Online (Sandbox Code Playgroud)

我不想将其分为两个管道，因为数据的预处理非常耗时。

提前致谢！

python algorithm machine-learning scikit-learn

viv*_*704

2018 08-05

5
推荐指数

2
解决办法

1232
查看次数

标签统计

python ×2

scikit-learn ×2

algorithm ×1

grid-search ×1

machine-learning ×1

pipeline ×1

管道:多个分类器？

使用sklearn管线比较多种算法

标签 统计

标签统计