sklearn GridSearchCV：如何获取分类报告？

Question

sklearn GridSearchCV：如何获取分类报告？

use*_*234 3 classification scikit-learn grid-search

我像这样使用 GridSearchCV：

corpus = load_files('corpus')

with open('stopwords.txt', 'r') as f:
    stop_words = [y for x in f.read().split('\n') for y in (x, x.title())]

x = corpus.data

y = corpus.target

pipeline = Pipeline([
    ('vec', CountVectorizer(stop_words=stop_words)),
    ('classifier', MultinomialNB())])

parameters = {'vec__ngram_range': [(1, 1), (1, 2)],
              'classifier__alpha': [1e-2, 1e-3],
              'classifier__fit_prior': [True, False]}

gs_clf = GridSearchCV(pipeline, parameters, n_jobs=-1, cv=5, scoring="f1", verbose=10)

gs_clf = gs_clf.fit(x, y)

joblib.dump(gs_clf.best_estimator_, 'MultinomialNB.pkl', compress=1)

Run Code Online (Sandbox Code Playgroud)

然后，在另一个文件中，为了对新文档（不是来自语料库的文档）进行分类，我这样做：

  classifier = joblib.load(filepath) # path to .pkl file
  result = classifier.predict(tokenlist)

Run Code Online (Sandbox Code Playgroud)

我的问题是：我在哪里可以获得所需的值classification_report？

在许多其他示例中，我看到人们将语料库分为训练集和测试集。但是，由于我使用的GridSearchCV是 kfold-cross-validation，所以我不需要这样做。那么我怎样才能从中获取这些值呢GridSearchCV？

Answer 1

Tri*_*ath 6

如果您有 GridSearchCV 对象：

from sklearn.metrics import classification_report
clf = GridSearchCV(....)
clf.fit(x_train, y_train)
classification_report(y_test,clf.best_estimator_.predict(x_test))

Run Code Online (Sandbox Code Playgroud)

如果您已保存最佳估计器并加载它，则：

classifier = joblib.load(filepath)
classification_report(y_test,classifier.predict(x_test))

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年前
查看次数：	5150 次
最近记录：	4 年，1 月前