我无法理解如何更新tic tac toe游戏的Q值.我读了所有这些,但我无法想象如何做到这一点.我读到Q值在游戏结束时更新,但我不明白,如果每个动作都有Q值?
artificial-intelligence machine-learning reinforcement-learning tic-tac-toe q-learning
artificial-intelligence ×1
machine-learning ×1
q-learning ×1
reinforcement-learning ×1
tic-tac-toe ×1