如何更新keras中的权重以进行强化学习？

Question

如何更新keras中的权重以进行强化学习？

Eka*_*Eka 8 python reinforcement-learning theano keras

我正在加强学习计划,我正在使用这篇文章作为参考.我使用python与keras(theano)创建神经网络和我正在使用的伪代码

Do a feedforward pass for the current state s to get predicted Q-values for all actions.

Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’).

Set Q-value target for action to r + ?max a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.

Update the weights using backpropagation.

Run Code Online (Sandbox Code Playgroud)

这里的损失函数方程就是这个

我的奖励是+1,maxQ(s',a')= 0.8375,Q(s,a)= 0.6892

我的L会 1/2*(1+0.8375-0.6892)^2=0.659296445

现在,如果我的模型结构是这样,我应该如何使用上面的损失函数值更新我的模型神经网络权重

model = Sequential()
model.add(Dense(150, input_dim=150))
model.add(Dense(10))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ale*_*lex 0

假设 NN 正在对 Q 值函数进行建模，您只需将目标传递给网络即可。例如

model.train_on_batch(state_action_vector, target)

Run Code Online (Sandbox Code Playgroud)

其中 state_action_vector 是一些表示网络状态动作输入的预处理向量。由于您的网络使用 MSE 损失函数，因此它将使用前向传递的状态动作来计算预测项，然后根据您的目标更新权重。

归档时间：	9 年前
查看次数：	1380 次
最近记录：	8 年，9 月前