如何在alpha-beta minimax中使用"历史启发式"？

Question

如何在alpha-beta minimax中使用"历史启发式"？

use*_*270 6 algorithm tree hashmap minimax alpha-beta-pruning

我正在为国际象棋比赛做AI.

到目前为止,我已经成功实现了Alpha-Beta Pruning Minimax算法,它看起来像这样(来自维基百科):

(* Initial call *)
alphabeta(origin, depth, -?, +?, TRUE)

function alphabeta(node, depth, ?, ?, maximizingPlayer)
    if depth = 0 or node is a terminal node
        return the heuristic value of node
    if maximizingPlayer
        for each child of node
            ? := max(?, alphabeta(child, depth - 1, ?, ?, FALSE))
            if ? ? ?
                break (* ? cut-off *)
        return ?
    else
        for each child of node
            ? := min(?, alphabeta(child, depth - 1, ?, ?, TRUE))
            if ? ? ?
                break (* ? cut-off *)
        return ?

Run Code Online (Sandbox Code Playgroud)

由于这花费了太多的时间复杂性(逐个遍历所有树),我遇到了一种叫做"历史启发式"的东西.

原始论文中的算法:

int AlphaBeta(pos, d, alpha, beta) 
{ 
    if (d=0 || game is over) 
        return Eval (pos);  // evaluate leaf position from current player’s standpoint 

    score = - INFINITY;     // preset return value 
    moves = Generate(pos);  // generate successor moves 

    for i=1 to sizeof(moves) do                // rating all moves 
        rating[i] = HistoryTable[ moves[i] ]; 
    Sort( moves, rating );                     // sorting moves according to their history scores 

    for i =1 to sizeof(moves) do { // look over all moves 
        Make(moves[i]); // execute current move 
        cur = - AlphaBeta(pos, d-1, -beta, -alpha); //call other player

        if (cur > score) {
            score = cur; 
            bestMove = moves[i];      // update best move if necessary 
        } 

        if (score > alpha) alpha = score;    //adjust the search window 
            Undo(moves[i]);                  // retract current move 

        if (alpha >= beta) goto done;        // cut off 
     } 

     done: 
     // update history score 
     HistoryTable[bestMove] = HistoryTable[bestMove] + Weight(d); 

     return score; 
}

Run Code Online (Sandbox Code Playgroud)

所以基本上,我们的想法是为之前的"移动"跟踪Hashtable或Dictionary.

现在我很困惑这个"移动"在这里意味着什么.我不确定它是在字面上指的是每次移动后的单个移动还是整体状态.

例如,在国际象棋中,这个哈希表的"关键"应该是什么？