TypeError:get_params()缺少1个必需的位置参数:'self'

Xia*_*ian 19 python scikit-learn

我试图使用scikit-learnpython-3.4包进行网格搜索,

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.preprocessing import LabelBinarizer
import numpy as np

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression)
])

parameters = {
    'vect__max_df': (0.25, 0.5, 0.75),
    'vect__stop_words': ('english', None),
    'vect__max_features': (2500, 5000, 10000, None),
    'vect__ngram_range': ((1, 1), (1, 2)),
    'vect__use_idf': (True, False),
    'vect__norm': ('l1', 'l2'),
    'clf__penalty': ('l1', 'l2'),
    'clf__C': (0.01, 0.1, 1, 10)
}

if __name__ == '__main__':
    grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3)
    df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='\t', header=None)
    lb = LabelBinarizer()
    X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])])
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    grid_search.fit(X_train, y_train)
    print('Best score: ', grid_search.best_score_)
    print('Best parameter set:')
    best_parameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(best_parameters):
        print(param_name, best_parameters[param_name])
Run Code Online (Sandbox Code Playgroud)

但是,它无法成功运行,错误消息如下所示:

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
  File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
    grid_search.fit(X_train, y_train)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
    base_estimator = clone(self.estimator)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
    new_object_params[name] = clone(param, safe=False)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
    return estimator_type([clone(e, safe=safe) for e in estimator])
  File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
    new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'
Run Code Online (Sandbox Code Playgroud)

我也尝试过只使用

if __name__ == '__main__':
    pipeline.get_params()
Run Code Online (Sandbox Code Playgroud)

它给出了相同的错误消息.谁知道如何解决这个问题?

aba*_*ert 30

这个错误几乎总是误导,并且实际上意味着,我们在调用这个类的一个实例方法,而不是实例(如调用dict.keys(),而不是d.keys()在一个dict命名d).*

而这正是这里发生的事情.文档暗示best_estimator_属性(如estimator初始化程序的参数)不是估计器实例,它是估计器类型,并且"为每个网格点实例化该类型的对象".

因此,如果要调用方法,则必须为某些特定网格点构造该类型的对象.

但是,通过快速浏览一下文档,如果你试图获得用于特定最佳估算器实例的参数,那么这不是那么best_params_好吗?(我很抱歉这部分有点猜测...)


对于这个Pipeline电话,你肯定有一个实例.并且该方法的唯一文档是一个参数规范,它表明它需要一个可选参数,deep.但在幕后,它可能会将get_params()调用转发给其中一个属性.而且('clf', LogisticRegression),看起来你正在用 构建它LogisticRegression,而不是那个类的实例,所以如果它最终转发到它,那就可以解释问题.


*错误说"缺少1个必需的位置参数:'self'"而不是"必须在实例上调用"或者在Python中的某些东西d.keys()被有效地转换成错误的原因dict.keys(d),并且调用它是完全合法的(有时是有用的)它明确地说,所以Python不能真正告诉你这dict.keys()是非法的,只是它错过了self参数.


Xia*_*ian 18

我终于解决了问题.原因正如abarnert所说的那样.

首先我试过:

pipeline = LogisticRegression()

parameters = {
    'penalty': ('l1', 'l2'),
    'C': (0.01, 0.1, 1, 10)
}
Run Code Online (Sandbox Code Playgroud)

而且运作良好.

凭借这种直觉,我将管道修改为:

pipeline = Pipeline([
    ('vect', TfidfVectorizer(stop_words='english')),
    ('clf', LogisticRegression())
])
Run Code Online (Sandbox Code Playgroud)

请注意,有一个()LogisticRegression.这次它有效.

  • `()` 表示您正在_调用_它。调用一个类会构造该类的一个实例。正如我在回答中所解释的,这正是您必须做的。 (2认同)