Bar*_*ich 7 python regression scikit-learn lars
这是我学习时出现的scikit-learn错误
my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)
Run Code Online (Sandbox Code Playgroud)
请注意,如果我将max_n_alphas从1e5减少到1e4,我不会再出现此错误.
任何人都知道发生了什么?
我打电话时发生错误
my_estimator.fit(x, y)
Run Code Online (Sandbox Code Playgroud)
我有尺寸的40k数据点40.
完整的堆栈跟踪看起来像这样
File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fit
axis=0)(all_alphas)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__
y = self._evaluate(x)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate
out_of_bounds = self._check_bounds(x_new)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_bounds
raise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.
Run Code Online (Sandbox Code Playgroud)
必须有一些特定的数据. LassoLarsCV()似乎与这个相当良好的数据的合成示例正常工作:
import numpy
import sklearn.linear_model
# create 40000 x 40 sample data from linear model with a bit of noise
npoints = 40000
ndims = 40
numpy.random.seed(1)
X = numpy.random.random((npoints, ndims))
w = numpy.random.random(ndims)
y = X.dot(w) + numpy.random.random(npoints) * 0.1
clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)
clf.fit(X, y)
# coefficients are almost exactly recovered, this prints 0.00377
print max(abs( clf.coef_ - w ))
# alphas actually used are 41 or ndims+1
print clf.alphas_.shape
Run Code Online (Sandbox Code Playgroud)
这是sklearn 0.16,我没有positive=True选择.
我不知道你为什么要使用非常大的max_n_alphas.虽然我不知道为什么1e + 4工作而1e + 5不工作,但我怀疑你从max_n_alphas = ndims + 1和max_n_alphas = 1e + 4获得的路径或者对于表现良好的数据而言是相同的.此外,通过交叉验证估计的最佳alpha clf.alpha_将是相同的.使用LARS示例查看Lasso路径,了解alpha正在尝试做什么.
另外,来自LassoLars 文档
alphas_ array,shape(n_alphas + 1,)
每次迭代时的协方差最大值(绝对值).n_alphas是max_iter,n_features或路径中相关性大于alpha的节点数,以较小者为准.
所以我们以上面的大小为ndims + 1(即n_features + 1)的alphas_结束是有意义的.
PS测试了sklearn 0.17.1和positive = True,也测试了一些正负系数,结果相同:alphas_是ndims + 1或更小.
| 归档时间: |
|
| 查看次数: |
2940 次 |
| 最近记录: |