Gia*_*ini 4 python numpy scikit-learn
我正在编写一个小程序来绘制SVM和Naive Bayes的学习曲线,用于交叉验证的数据集.这是绘图函数的代码
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.learning_curve import learning_curve
def plot_learning_curves(X, y, nb=GaussianNB, svc=SVC(kernel='linear'), ylim=None, cv=None, n_jobs=1,
train_sizes=np.linspace(.1, 1.0, 5)):
plt.figure()
plt.title('Learning Curves with NB and SVM')
if ylim is not None:
plt.ylim(*ylim)
train_sizes_nb, test_scores_nb = learning_curve(
nb, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
test_scores_mean_nb = np.mean(test_scores_nb, axis=1)
train_sizes_svc, test_scores_svc = learning_curve(
svc, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
test_scores_mean_svc = np.mean(test_scores_svc, axis=1)
plt.grind()
plt.plot(train_sizes_nb, test_scores_mean_nb, 'o-', color="g",
label="NB")
plt.plot(train_sizes_svc, test_scores_mean_svc,'o',color="r",label="SVM")
return plt
Run Code Online (Sandbox Code Playgroud)
这是函数调用:
digits = load_digits()
X, y = digits.data, digits.target
cv = cross_validation.ShuffleSplit(digits.data.shape[0], n_iter=100,
test_size=0.2, random_state=0)
plot_learning_curves(X, y, ylim=(0.7, 1.01), cv=cv,n_jobs=1)
plt.show()
Run Code Online (Sandbox Code Playgroud)
我不知道是什么问题,但我得到这个错误:
Traceback (most recent call last):
File "C:/Users/Gianmarco/PycharmProjects/Learning/plotLearningCurves.py", line 43, in <module>
plot_learning_curves(X, y, ylim=(0.7, 1.01), cv=cv,n_jobs=1)
File "C:/Users/Gianmarco/PycharmProjects/Learning/plotLearningCurves.py", line 19, in plot_learning_curves
nb, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\learning_curve.py", line 136, in learning_curve
for train, test in cv for n_train_samples in train_sizes_abs)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 652, in __call__
for function, args, kwargs in iterable:
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\learning_curve.py", line 136, in <genexpr>
for train, test in cv for n_train_samples in train_sizes_abs)
File "C:\Users\Gianmarco\Anaconda\lib\site-packages\sklearn\base.py", line 45, in clone
new_object_params = estimator.get_params(deep=False)
TypeError: unbound method get_params() must be called with GaussianNB instance as first argument (got nothing instead)
Process finished with exit code 1
Run Code Online (Sandbox Code Playgroud)
我不明白什么行"TypeError:unbound方法get_params()必须用GaussianNB实例作为第一个参数调用(没有得到任何东西)"意味着.
什么是可能的解决方案?
Gia*_*ini 16
解决方案非常简单.不是
nb=GaussianNB
Run Code Online (Sandbox Code Playgroud)
但
nb=GaussianNB()
Run Code Online (Sandbox Code Playgroud)
TypeError:必须使用GaussianNB实例作为第一个参数调用未绑定方法get_params()(没有任何内容)
此错误表示已接收方法get_params()None而不是GaussianNB对象.
该错误发生在sklearn模块内部的几个步骤中.因此,如果没有使用调试工具和读取sklearn源代码进入代码,就很难调试确切的原因.
如果您使用ipython,%debugmagic命令对于调查这些异常非常有用.
查看代码,看起来问题可能是您正在传递类GaussianNB而不是该类的实例sklearn.learning_curve.learning_curve()
参数: estimator:实现"fit"和"predict"方法的对象类型为每次验证克隆的该类型的对象.
我发现这个很暧昧.但是在示例代码中,使用的是GaussianNB实例,而不是类型.
除此之外,使用mutable作为默认参数通常不是一个好主意.对象实例是可变的.它还使您的代码更难以阅读和调试.
有了这么多可选的关键字参数,这样的东西可能更具可读性.
def plot_learning_curves(x, y, ylim=None, **kwargs):
""" Plots learning curves with NB and SVM """
nb = kwargs.get('nb', GaussianNB())
svc = kwargs.get('svc', SVC(kernel='linear'))
train_sizes = kwargs.get('train_sizes', np.linspace(.1, 1.0, 5))
Run Code Online (Sandbox Code Playgroud)
您可能根本不需要这些关键字参数.看起来你开始时复制一些示例代码并添加自己的东西.最好先简化示例代码,并确保了解正在发生的事情.
def plot_learning_curves(x, y, ylim=None):
nb = GaussianNB()
svc = SVC(kernel='linear')
train_sizes = np.linspace(.1, 1.0, 5)
Run Code Online (Sandbox Code Playgroud)