Lit*_*lin 10 c++ microcontroller artificial-intelligence tic-tac-toe minimax
我在微控制器上创建了一个Tic-Tac-Toe游戏,包括一个完美的AI(完美意味着它不会丢失).我没有使用minimax算法,只是一个具有所有可能和最佳移动的小状态机.我现在的问题是我想实现不同的困难(简单,中等和困难).到目前为止,人工智能将是艰难的.所以我已经考虑过如何以最好的方式做到这一点,最终想要使用minimax
算法,但它计算所有游戏位置的所有分数,这样我有时也可以选择第二个最佳分数而不是最佳分数.由于我不能总是在微控制器本身上进行所有这些计算,我想创建一个可以在我的计算机上运行的小程序,它给出了所有可能的板状态的数组(关于对称性等,以最小化存储使用)和他们的相应分数.为此,我首先尝试实现minimax算法本身,depth
以便正确计算scores
每个状态.然后它应该让我回到阵列中的所有最佳动作(现在).但是,它似乎没有那么好用.我试图用一些printf
线调试它.这是迄今为止的两个代码minimax
功能以及我的主要功能:
static int minimax(int *board, int depth)
{
int score;
int move = -1;
int scores[9];
int nextDepth;
printf("\n----- Called Minimax, Depth: %i -----\n\n", depth);
if(depth%2 ==1){
player = -1;
} else {
player = 1;
}
printf("Player: %i\n---\n", player);
if(isWin(board) != 0){
score = (10-depth)*winningPlayer;
printf("Player %i won on depth %i\n", winningPlayer, depth);
printf("Resulting score: (10-%i)*%i = %i\nScore returned to depth %i\n---\n", depth, winningPlayer, score, depth-1);
return score;
}
score = -20;
nextDepth = depth+1;
printf("Next depth is %i\n---\n", nextDepth);
int i;
for(i=0; i<9; i++){
if(board[i] == 0) {
if(nextDepth%2 ==0) {
player = -1;
} else {
player = 1;
}
printf("Found vacant space at position %i\n", i);
printf("Set value of position %i to %i\n---\n", i, player);
board[i] = player;
int thisScore = minimax(board, nextDepth);
printf("Value of the move at position %i on next depth %i is %i\n---\n", i, nextDepth, thisScore);
scores[i] = thisScore;
if(thisScore > score){
printf("New score value is greater than the old one: %i < %i\n---\n", thisScore, score);
score = thisScore;
move = i;
g_moves[nextDepth-1] = move;
printf("Score was set to %i\n", thisScore);
printf("Remembered move %i\n---\n", move);
}
board[i] = 0;
printf("Value of position %i was reset to 0 on next depth %i\n---\n", i, nextDepth);
}
}
if(move == -1) {
printf("Game ended in a draw.\n Returned score: 0\n---\n");
return 0;
}
printf("Move at position %i was selected on next depth %i\n", move, nextDepth);
printf("Returning score of %i to depth %i\n---\n", score, depth);
return score;
}
Run Code Online (Sandbox Code Playgroud)
该main
方法是:
int main(int argc, char **argv)
{
memcpy(board, initBoard, sizeof(board));
int score = 0;
int depth = getDepth(board);
score = minimax(board, depth);
printf("\n--- finished ---\n\n");
printf("Moves with the highest score: ");
int i;
for(i=0; i<9; i++){
printf("%i | ", g_moves[i]);
}
printf("\n");
printf("The score is %i\n", score);
printf("The best next board is: \n|----|----|----|\n");
for(i=0; i<3; i++){
printf("| %-2i ", board[i]);
}
printf("|\n|----|----|----|\n");
for(i=3; i<6; i++){
printf("| %-2i ", board[i]);
}
printf("|\n|----|----|----|\n");
for(i=6; i<9; i++){
printf("| %-2i ", board[i]);
}
printf("|\n|----|----|----|\n");
return 0;
}
Run Code Online (Sandbox Code Playgroud)
此外,我有一些变数:
//1 = Beginning Player
//-1 = second Player
static int player;
static int winningPlayer = 0;
static int g_moves[9];
/* 0 1 2
* 3 4 5
* 6 7 8
*/
int initBoard[9] = {
0, 0, 0,
0, 0, 0,
0, 0, 0,
};
int board[9];
Run Code Online (Sandbox Code Playgroud)
以及我的获胜功能:
int isWin(int *board)
{
unsigned winningBoards[8][3] = {
{board[0], board[1], board[2],},
{board[3], board[4], board[5],},
{board[6], board[7], board[8],},
{board[0], board[3], board[6],},
{board[1], board[4], board[7],},
{board[2], board[5], board[8],},
{board[0], board[4], board[8],},
{board[2], board[4], board[6],},
};
int i;
for(i=0; i<8; i++){
if( (winningBoards[i][0] != 0) &&
(winningBoards[i][0] == winningBoards[i][1]) &&
(winningBoards[i][0] == winningBoards[i][2])){
winningPlayer = winningBoards[i][0];
return winningPlayer;
}
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
出于某种原因,最后一次minimax从depth 7
一步一步返回到depth 1
,它g_moves
用全0 覆盖我的数组,从而在我的打印输出中产生以下行(仅最后70行):
...
----- Called Minimax, Depth: 7 -----
Player: -1
---
Player 1 won on depth 7
Resulting score: (10-7)*1 = 3
Score returned to depth 6
---
Value of the move at position 2 on next depth 7 is 3
---
Value of position 2 was reset to 0 on next depth 7
---
Move at position 0 was selected on next depth 7
Returning score of 3 to depth 6
---
Value of the move at position 3 on next depth 6 is 3
---
Value of position 3 was reset to 0 on next depth 6
---
Move at position 0 was selected on next depth 6
Returning score of 3 to depth 5
---
Value of the move at position 4 on next depth 5 is 3
---
Value of position 4 was reset to 0 on next depth 5
---
Move at position 0 was selected on next depth 5
Returning score of 3 to depth 4
---
Value of the move at position 5 on next depth 4 is 3
---
Value of position 5 was reset to 0 on next depth 4
---
Move at position 0 was selected on next depth 4
Returning score of 3 to depth 3
---
Value of the move at position 6 on next depth 3 is 3
---
Value of position 6 was reset to 0 on next depth 3
---
Move at position 0 was selected on next depth 3
Returning score of 5 to depth 2
---
Value of the move at position 7 on next depth 2 is 5
---
Value of position 7 was reset to 0 on next depth 2
---
Move at position 0 was selected on next depth 2
Returning score of 5 to depth 1
---
Value of the move at position 8 on next depth 1 is 5
---
Value of position 8 was reset to 0 on next depth 1
---
Move at position 0 was selected on next depth 1
Returning score of 5 to depth 0
---
--- finished ---
Moves with the highest score: 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The score is 5
The best next board is:
|----|----|----|
| 0 | 0 | 0 |
|----|----|----|
| 0 | 0 | 0 |
|----|----|----|
| 0 | 0 | 0 |
|----|----|----|
Run Code Online (Sandbox Code Playgroud)
如果您需要任何其他信息以帮助我,如果我自己拥有它们,我很乐意将它们交给您.
提前致谢.
编辑:
所以我重写了我的minimax
功能,所以它现在使用控制台打印.txt文件中的所有可能的板状态(相应文件夹中的cmd:./NAME_OF_FILE> DEST_NAME.txt).代码如下:
int minimax(int *board, int depth)
{
g_node++;
int player;
int move = -1;
int score = -20;
int thisScore = -20;
int i;
if(isWin(board) != 0){
printf("\nNode: %i\n", g_node);
printf("Board state:");
for(i=0;i<9;i++) {
if((i%3) == 0)
printf("\n");
printf("%2i ", board[i]);
}
printf("\n");
printf("has a score of %i\n", (10-depth)*winningPlayer);
return (10-depth)*winningPlayer;
}
if(depth%2 ==1){
player = -1;
} else {
player = 1;
}
for(i=0; i<9; i++){
if(board[i] == 0){
board[i] = player;
thisScore = minimax(board, depth+1);
if(thisScore > score){
score = thisScore;
move = i;
}
board[i] = 0;
}
}
printf("\nNode: %i\n", g_node);
printf("Board state:");
for(i=0;i<9;i++) {
if((i%3) == 0)
printf("\n");
printf("%2i ", board[i]);
}
printf("\n");
if(move == -1){
printf("has a score of 0\n");
return 0;
}
printf("has a score of %i\n", score);
return score;
}
Run Code Online (Sandbox Code Playgroud)
我的下一步是打印出score
每个移动的最大值在相应位置的板.
Example:
10 8 10
8 7 8
10 8 10
for the empty board at the beginning.
Run Code Online (Sandbox Code Playgroud)
编辑2:
我现在添加了另一个函数printScoredBoards
,它基本上应该做我在上一次编辑中所描述的内容,但是它存在问题.因为如果你的对手足够愚蠢并且minimax
尝试了所有可能性,包括那些,使用以下代码,我总是有可能在第5次移动后获胜,我得到了空板的所有15分的得分板.
void printScoredBoards(int *board, int depth)
{
int player;
int scoredBoard[9] = {0,0,0,0,0,0,0,0,0,};
int i;
if(isWin(board) == 0){
if(depth%2 ==1){
player = -1;
} else {
player = 1;
}
for(i=0; i<9; i++){
if(board[i] == 0){
board[i] = player;
scoredBoard[i] = minimax(board, depth+1)+10;
printScoredBoards(board, depth+1);
board[i] = 0;
}
}
printf("Scored board:");
dumpTable(scoredBoard);
printf("\n");
}
}
Run Code Online (Sandbox Code Playgroud)
虽然角落应该更有价值,但中心价值最低.有没有人碰巧知道解决这个问题?
编辑:我已经设置了一个新的minimax算法并将其发布在另一篇文章中.您可以在右侧或此处的"链接"部分找到该帖子.现在我所做的就是在微控制器代码中实现它并创建一个函数,该函数可以从所有得分的移动中选择最佳/第二最佳移动,并且如果存在具有相同分数的多个移动则随机化它.从而可以关闭这篇文章.
我认为试图通过全面的深度分析来获得第二最佳的举动是过度的。不要通过限制最小最大深度来探索整个树(前进 2 步可以获胜,但 AI 仍然很强),或者只是对真正不完美的 AI 使用随机移动。