zho*_*ing 11 python machine-learning gradient-descent
我上周开始学习机器学习.当我想制作一个梯度下降脚本来估计模型参数时,我遇到了一个问题:如何选择合适的学习率和方差.我发现,不同的(学习率,方差)对可能会导致不同的结果,有些你甚至无法收敛.此外,如果更改为另一个训练数据集,选择良好(学习率,方差)对可能不起作用.例如(下面的脚本),当我将学习率设置为0.001并且方差为0.00001时,对于'data1',我可以得到合适的theta0_guess和theta1_guess.但是对于'data2',它们无法使算法收敛,即使我尝试了几十个(学习率,方差)对仍然无法达到收敛.
所以,如果有人能告诉我,有一些标准或方法来确定(学习率,方差)对.
import sys
data1 = [(0.000000,95.364693) ,
(1.000000,97.217205) ,
(2.000000,75.195834),
(3.000000,60.105519) ,
(4.000000,49.342380),
(5.000000,37.400286),
(6.000000,51.057128),
(7.000000,25.500619),
(8.000000,5.259608),
(9.000000,0.639151),
(10.000000,-9.409936),
(11.000000, -4.383926),
(12.000000,-22.858197),
(13.000000,-37.758333),
(14.000000,-45.606221)]
data2 = [(2104.,400.),
(1600.,330.),
(2400.,369.),
(1416.,232.),
(3000.,540.)]
def create_hypothesis(theta1, theta0):
return lambda x: theta1*x + theta0
def linear_regression(data, learning_rate=0.001, variance=0.00001):
theta0_guess = 1.
theta1_guess = 1.
theta0_last = 100.
theta1_last = 100.
m = len(data)
while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):
theta1_last = theta1_guess
theta0_last = theta0_guess
hypothesis = create_hypothesis(theta1_guess, theta0_guess)
theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])
return ( theta0_guess,theta1_guess )
points = [(float(x),float(y)) for (x,y) in data1]
res = linear_regression(points)
print res
Run Code Online (Sandbox Code Playgroud)
jab*_*edo 14
绘图是查看算法性能的最佳方式.要查看是否已实现收敛,您可以在每次迭代后绘制成本函数的演变,在一定的迭代后,您将看到它没有太大改进,您可以假设收敛,请查看以下代码:
cost_f = []
while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):
theta1_last = theta1_guess
theta0_last = theta0_guess
hypothesis = create_hypothesis(theta1_guess, theta0_guess)
cost_f.append((1./(2*m))*sum([ pow(hypothesis(point[0]) - point[1], 2) for point in data]))
theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])
import pylab
pylab.plot(range(len(cost_f)), cost_f)
pylab.show()
Run Code Online (Sandbox Code Playgroud)
这将绘制以下图形(执行与learning_rate = 0.01,差异= 0.00001)

正如您所看到的,经过一千次迭代后,您无法获得太多改进.如果成本函数在一次迭代中减少小于0.001,我通常会声明收敛,但这只是基于我自己的经验.
为了选择学习率,你可以做的最好的事情是绘制成本函数,看看它是如何表现的,并且始终记住这两件事:
如果你运行你的代码选择learning_rate> 0.029和variance = 0.001你会在第二种情况下,梯度下降不会收敛,而如果你选择值learning_rate <0.0001,variance = 0.001你会看到你的算法需要很多迭代收敛.
与learning_rate = 0.03不一致的收敛示例

学习率= 0.0001的慢收敛示例

| 归档时间: |
|
| 查看次数: |
9641 次 |
| 最近记录: |