为了简化问题,比如当维度(或特征)已经更新n次时,下次看到该特征时,我想将学习率设置为1/n.
我想出了这些代码:
def test_adagrad():
embedding = theano.shared(value=np.random.randn(20,10), borrow=True)
times = theano.shared(value=np.ones((20,1)))
lr = T.dscalar()
index_a = T.lvector()
hist = times[index_a]
cost = T.sum(theano.sparse_grad(embedding[index_a]))
gradients = T.grad(cost, embedding)
updates = [(embedding, embedding+lr*(1.0/hist)*gradients)]
### Here should be some codes to update also times which are omitted ###
train = theano.function(inputs=[index_a, lr],outputs=cost,updates=updates)
for i in range(10):
print train([1,2,3],0.05)
Run Code Online (Sandbox Code Playgroud)
Theano没有给出任何错误,但训练结果有时会给出Nan.有人知道如何解决这个问题吗?
谢谢您的帮助
PS:我怀疑是稀疏空间中的操作会产生问题.所以我试图用theano.sparse.mul替换*.如前所述,这给出了一些结果