程序员拼图:在整个游戏中编码国际象棋棋盘状态

And*_*ngs 92 language-agnostic puzzle algorithm chess

不是严格意义上的问题,更多的是谜题......

多年来,我参与了一些新员工的技术访谈.除了询问标准"你知道X技术"的问题之外,我还试图了解他们如何处理问题.通常情况下,我会在面试前一天通过电子邮件向他们发送问题,并希望他们在第二天提出解决方案.

通常结果会非常有趣 - 错误但有趣 - 如果他们能解释为什么采取特定的方法,那么这个人仍会得到我的建议.

所以我想我会向Stack Overflow的观众抛出我的一个问题.

问题:您可以想到最有效的方式来编码国际象棋游戏(或其子集)的状态是什么?也就是说,给定具有合法排列的棋盘的棋盘,编码该初始状态和游戏中的玩家所采取的所有后续合法移动.

答案不需要代码,只是您将使用的算法的描述.

编辑:正如其中一张海报所指出的,我没有考虑移动之间的时间间隔.随意作为一个可选的额外:)考虑到这一点:)

EDIT2:只是为了进一步说明......请记住,编码器/解码器是规则感知的.唯一真正需要存储的是玩家的选择 - 编码器/解码器可以假设其他任何东西.

EDIT3:在这里挑选一个胜利者很难:)很多很棒的答案!

cle*_*tus 132

更新:我非常喜欢这个主题,我编写了编程拼图,国际象棋位置和霍夫曼编码.如果您仔细阅读,我已经确定存储完整游戏状态的唯一方法是存储完整的移动列表.请继续阅读原因.因此,我使用问题的略微简化版本进行部分布局.

问题

此图像说明了起始国际象棋的位置.国际象棋发生在一个8x8的棋盘上,每个玩家以相同的16件棋子开始,包括8个棋子,2个骑士,2个骑士,2个主教,1个女王和1个国王,如图所示:

下棋位置http://img222.imageshack.us/img222/5970/chess.png

Positions are generally recorded as a letter for the column followed by the number for the row so White’s queen is at d1. Moves are most often stored in algebraic notation, which is unambiguous and generally only specifies the minimal information necessary. Consider this opening:

  1. e4 e5
  2. Nf3 Nc6

which translates to:

  1. White moves king’s pawn from e2 to e4 (it is the only piece that can get to e4 hence "e4");
  2. Black moves the king’s pawn from e7 to e5;
  3. White moves the knight (N) to f3;
  4. Black moves the knight to c6.

The board looks like this:

early opening http://img222.imageshack.us/img222/371/chessx.png

An important ability for any programmer is to be able to correctly and unambiguously specify the problem.

So what’s missing or ambiguous? A lot as it turns out.

Board State vs Game State

您需要确定的第一件事是您是存储游戏的状态还是棋盘上的棋子位置.简单地编码碎片的位置是一回事,但问题是"所有后续的合法动作".问题也没有说明了解到这一点的动作.这实际上是一个问题,我会解释.

易位

游戏进行如下:

  1. e4 e5
  2. Nf3 Nc6
  3. Bb5 a6
  4. Ba4 Bc5

该委员会看起来如下:

后来开放

怀特可以选择铸造.对此的部分要求是国王和相关车辆永远不会移动,因此无论是国王还是每一方的车辆都需要存放.显然,如果他们不在他们的起始位置,他们已经移动,否则需要指定.

There are several strategies that can be used for dealing with this problem.

Firstly, we could store an extra 6 bits of information (1 for each rook and king) to indicate whether that piece had moved. We could streamline this by only storing a bit for one of these six squares if the right piece happens to be in it. Alternatively we could treat each unmoved piece as another piece type so instead of 6 piece types on each side (pawn, rook, knight, bishop, queen and king) there are 8 (adding unmoved rook and unmoved king).

En Passant

Another peculiar and often-neglected rule in Chess is En Passant.

en passant http://img37.imageshack.us/img37/6535/chessa.png

The game has progressed.

  1. e4 e5
  2. Nf3 Nc6
  3. Bb5 a6
  4. Ba4 Bc5
  5. O-O b5
  6. Bb3 b4
  7. c4

Black’s pawn on b4 now has the option of moving his pawn on b4 to c3 taking the White pawn on c4. This only happens on the first opportunity meaning if Black passes on the option now he can’t take it next move. So we need to store this.

If we know the previous move we can definitely answer if En Passant is possible. Alternatively we can store whether each pawn on its 4th rank has just moved there with a double move forward. Or we can look at each possible En Passant position on the board and have a flag to indicate whether its possible or not.

Promotion

pawn promotion http://img689.imageshack.us/img689/5970/chess.png

It is White’s move. If White moves his pawn on h7 to h8 it can be promoted to any other piece (but not the king). 99% of the time it is promoted to a Queen but sometimes it isn’t, typically because that may force a stalemate when otherwise you’d win. This is written as:

  1. h8=Q

This is important in our problem because it means we can’t count on there being a fixed number of pieces on each side. It is entirely possible (but incredibly unlikely) for one side to end up with 9 queens, 10 rooks, 10 bishops or 10 knights if all 8 pawns get promoted.

Stalemate

When in a position from which you cannot win your best tactic is to try for a stalemate. The most likely variant is where you cannot make a legal move (usually because any move when put your king in check). In this case you can claim a draw. This one is easy to cater for.

The second variant is by threefold repetition. If the same board position occurs three times in a game (or will occur a third time on the next move), a draw can be claimed. The positions need not occur in any particular order (meaning it doesn’t have to the same sequence of moves repeated three times). This one greatly complicates the problem because you have to remember every previous board position. If this is a requirement of the problem the only possible solution to the problem is to store every previous move.

Lastly, there is the fifty move rule. A player can claim a draw if no pawn has moved and no piece has been taken in the previous fifty consecutive moves so we would need to store how many moves since a pawn was moved or a piece taken (the latest of the two. This requires 6 bits (0-63).

Whose Turn Is It?

Of course we also need to know whose turn it is and this is a single bit of information.

Two Problems

Because of the stalemate case, the only feasible or sensible way to store the game state is to store all the moves that led to this position. I’ll tackle that one problem. The board state problem will be simplified to this: store the current position of all pieces on the board ignoring castling, en passant, stalemate conditions and whose turn it is.

Piece layout can be broadly handled in one of two ways: by storing the contents of each square or by storing the position of each piece.

Simple Contents

There are six piece types (pawn, rook, knight, bishop, queen and king). Each piece can be White or Black so a square may contain one of 12 possible pieces or it may be empty so there are 13 possibilities. 13 can be stored in 4 bits (0-15) So the simplest solution is to store 4 bits for each square times 64 squares or 256 bits of information.

The advantage of this method is that manipulation is incredibly easy and fast. This could even be extended by adding 3 more possibilities without increasing the storage requirements: a pawn that has moved 2 spaces on the last turn, a king that hasn’t moved and a rook that hasn’t moved, which will cater for a lot of previously mentioned issues.

But we can do better.

Base 13 Encoding

It is often helpful to think of the board position as a very large number. This is often done in computer science. For example, the halting problem treats a computer program (rightly) as a large number.

The first solution treats the position as a 64 digit base 16 number but as demonstrated there is redundancy in this information (being the 3 unused possibilities per "digit") so we can reduce the number space to 64 base 13 digits. Of course this can’t be done as efficiently as base 16 can but it will save on storage requirements (and minimizing storage space is our goal).

In base 10 the number 234 is equivalent to 2 x 102 + 3 x 101 + 4 x 100.

In base 16 the number 0xA50 is equivalent to 10 x 162 + 5 x 161 + 0 x 160 = 2640 (decimal).

So we can encode our position as p0 x 1363 + p1 x 1362 + ... + p63 x 130 where pi represents the contents of square i.

2256 equals approximately 1.16e77. 1364 equals approximately 1.96e71, which requires 237 bits of storage space. That saving of a mere 7.5% comes at a cost of significantly increased manipulation costs.

Variable Base Encoding

In legal boards certain pieces can’t appear in certain squares. For example, pawns cannot occur at in the first or eighth ranks, reducing the possibilities for those squares to 11. That reduces the possible boards to 1116 x 1348 = 1.35e70 (approximately), requiring 233 bits of storage space.

Actually encoding and decoding such values to and from decimal (or binary) is a little more convoluted but it can be done reliably and is left as an exercise to the reader.

Variable Width Alphabets

The previous two methods can both be described as fixed-width alphabetic encoding. Each of the 11, 13 or 16 members of the alphabet is substituted for another value. Each "character" is the same width but the efficiency can be improved when you consider that each character is not equally likely.

摩尔斯电码

Consider Morse code (pictured above). Characters in a message are encoded as a sequence of dashes and dots. Those dashes and dots are transferred over radio (typically) with a pause between them to delimit them.

Notice how the letter E (the most common letter in English) is a single dot, the shortest possible sequence, whereas Z (the least frequent) is two dashes and two beeps.

Such a scheme can significantly reduce the size of an expected message but comes at the cost of increasing the size of a random character sequence.

It should be noted that Morse code has another inbuilt feature: dashes are as long as three dots so the above code is created with this in mind to minimize the use of dashes. Since 1s and 0s (our building blocks) don’t have this problem, it’s not a feature we need to replicate.

Lastly, there are two kinds of rests in Morse code. A short rest (the length of a dot) is used to distinguish between dots and dashes. A longer gap (the length of a dash) is used to delimit characters.

So how does this apply to our problem?

Huffman Coding

There is an algorithm for dealing with variable length codes called Huffman coding. Huffman coding creates a variable length code substitution, typically uses expected frequency of the symbols to assign shorter values to the more common symbols.

霍夫曼代码树

In the above tree, the letter E is encoded as 000 (or left-left-left) and S is 1011. It should be clear that this encoding scheme is unambiguous.

This is an important distinction from Morse code. Morse code has the character separator so it can do otherwise ambiguous substitution (eg 4 dots can be H or 2 Is) but we only have 1s and 0s so we choose an unambiguous substitution instead.

Below is a simple implementation:

private static class Node {
  private final Node left;
  private final Node right;
  private final String label;
  private final int weight;

  private Node(String label, int weight) {
    this.left = null;
    this.right = null;
    this.label = label;
    this.weight = weight;
  }

  public Node(Node left, Node right) {
    this.left = left;
    this.right = right;
    label = "";
    weight = left.weight + right.weight;
  }

  public boolean isLeaf() { return left == null && right == null; }

  public Node getLeft() { return left; }

  public Node getRight() { return right; }

  public String getLabel() { return label; }

  public int getWeight() { return weight; }
}
Run Code Online (Sandbox Code Playgroud)

with static data:

private final static List<string> COLOURS;
private final static Map<string, integer> WEIGHTS;

static {
  List<string> list = new ArrayList<string>();
  list.add("White");
  list.add("Black");
  COLOURS = Collections.unmodifiableList(list);
  Map<string, integer> map = new HashMap<string, integer>();
  for (String colour : COLOURS) {
    map.put(colour + " " + "King", 1);
    map.put(colour + " " + "Queen";, 1);
    map.put(colour + " " + "Rook", 2);
    map.put(colour + " " + "Knight", 2);
    map.put(colour + " " + "Bishop";, 2);
    map.put(colour + " " + "Pawn", 8);
  }
  map.put("Empty", 32);
  WEIGHTS = Collections.unmodifiableMap(map);
}
Run Code Online (Sandbox Code Playgroud)

and:

private static class WeightComparator implements Comparator<node> {
  @Override
  public int compare(Node o1, Node o2) {
    if (o1.getWeight() == o2.getWeight()) {
      return 0;
    } else {
      return o1.getWeight() < o2.getWeight() ? -1 : 1;
    }
  }
}

private static class PathComparator implements Comparator<string> {
  @Override
  public int compare(String o1, String o2) {
    if (o1 == null) {
      return o2 == null ? 0 : -1;
    } else if (o2 == null) {
      return 1;
    } else {
      int length1 = o1.length();
      int length2 = o2.length();
      if (length1 == length2) {
        return o1.compareTo(o2);
      } else {
        return length1 < length2 ? -1 : 1;
      }
    }
  }
}

public static void main(String args[]) {
  PriorityQueue<node> queue = new PriorityQueue<node>(WEIGHTS.size(),
      new WeightComparator());
  for (Map.Entry<string, integer> entry : WEIGHTS.entrySet()) {
    queue.add(new Node(entry.getKey(), entry.getValue()));
  }
  while (queue.size() > 1) {
    Node first = queue.poll();
    Node second = queue.poll();
    queue.add(new Node(first, second));
  }
  Map<string, node> nodes = new TreeMap<string, node>(new PathComparator());
  addLeaves(nodes, queue.peek(), &quot;&quot;);
  for (Map.Entry<string, node> entry : nodes.entrySet()) {
    System.out.printf("%s %s%n", entry.getKey(), entry.getValue().getLabel());
  }
}

public static void addLeaves(Map<string, node> nodes, Node node, String prefix) {
  if (node != null) {
    addLeaves(nodes, node.getLeft(), prefix + "0");
    addLeaves(nodes, node.getRight(), prefix + "1");
    if (node.isLeaf()) {
      nodes.put(prefix, node);
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

One possible output is:

         White    Black
Empty          0 
Pawn       110      100
Rook     11111    11110
Knight   10110    10101
Bishop   10100    11100
Queen   111010   111011
King    101110   101111
Run Code Online (Sandbox Code Playgroud)

For a starting position this equates to 32 x 1 + 16 x 3 + 12 x 5 + 4 x 6 = 164 bits.

State Difference

Another possible approach is to combine the very first approach with Huffman coding. This is based on the assumption that most expected Chess boards (rather than randomly generated ones) are more likely than not to, at least in part, resemble a starting position.

So what you do is XOR the 256 bit current board position with a 256 bit starting position and then encode that (using Huffman coding or, say, some method of run length encoding). Obviously this will be very efficient to start with (64 0s probably corresponding to 64 bits) but increase in storage required as the game progresses.

Piece Position

As mentioned, another way of attacking this problem is to instead store the position of each piece a player has. This works particularly well with endgame positions where most squares will be empty (but in the Huffman coding approach empty squares only use 1 bit anyway).

Each side will have a king and 0-15 other pieces. Because of promotion the exact make up of those pieces can vary enough that you can’t assume the numbers based on the starting positions are maxima.

The logical way to divide this up is store a Position consisting of two Sides (White and Black). Each Side has:

  • A king: 6 bits for the location;
  • Has pawns: 1 (yes), 0 (no);
  • If yes, number of pawns: 3 bits (0-7+1 = 1-8);
  • If yes, the location of each pawn is encoded: 45 bits (see below);
  • Number of non-pawns: 4 bits (0-15);
  • For each piece: type (2 bits for queen, rook, knight, bishop) and location (6 bits)

As for the pawn location, the pawns can only be on 48 possible squares (not 64 like the others). As such, it is better not to waste the extra 16 values that using 6 bits per pawn would use. So if you have 8 pawns there are 488 possibilities, equalling 28,179,280,429,056. You need 45 bits to encode that many values.

That’s 105 bits per side or 210 bits total. The starting position is the worst case for this method however and it will get substantially better as you remove pieces.

It should be pointed out that there are less than 488 possibilities because the pawns can’t all be in the same square  The first has 48 possibilities, the second 47 and so on. 48 x 47 x … x 41 = 1.52e13 = 44 bits storage.

You can further improve this by eliminating the squares that are occupied by other pieces (including the other side) so you could first place the white non-pawns then the black non-pawns, then the white pawns and lastly the black pawns. On a starting position this reduces the storage requirements to 44 bits for White and 42 bits for Black.

Combined Approaches

Another possible optimization is that each of these approaches has its strength and weaknesses. You could, say, pick the best 4 and then encode a scheme selector in the first two bits and then the scheme-specific storage after that.

With the overhead that small, this will by far be the best approach.

Game State

I return to the problem of storing a game rather than a position. Because of the threefold repetition we have to store the list of moves that have occurred to this point.

Annotations

One thing you have to determine is are you simply storing a list of moves or are you annotating the game? Chess games are often annotated, for example:

  1. Bb5!! Nc4?

White’s move is marked by two exclamation points as brilliant whereas Black’s is viewed as a mistake. See Chess punctuation.

Additionally you could also need to store free text as the moves are described.

I am assuming that the moves are sufficient so there will be no annotations.

Algebraic Notation

We could simply store the the text of the move here ("e4", "Bxb5", etc). Including a terminating byte you’re looking at about 6 bytes (48 bits) per move (worst case). That’s not particularly efficient.

The second thing to try is to store the starting location (6 bits) and end location (6 bits) so 12 bits per move. That is significantly better.

Alternatively we can determine all the legal moves from the current position in a predictable and deterministic way and state which we’ve chosen. This then goes back to the variable base encoding mentioned above. White and Black have 20 possible moves each on their first move, more on the second and so on.

Conclusion

There is no absolutely right answer to this question. There are many possible approaches of which the above are just a few.

我喜欢这个和类似的问题是,它要求任何程序员重要的能力,如考虑使用模式,准确确定要求和思考角落案件.

国际象棋位置作为国际象棋职业训练师的截图.

  • 至于对骑士的推广,我曾经做过一次.真的很疯狂的情况 - 他是一个与我交配的举动,我无法阻止它.他忽略了我的棋子,因为虽然它会促进它,但是会迟到一步.当我升级为骑士并与他交配时,我希望我有相机! (9认同)
  • 好帖子.小的修正:王车易位需要4位,一个用于易位(白色和黑色,王翼和后翼)的每一个方法,因为乌鸦可能已经移动,然后搬回也.更重要的是:你应该包括它的举动.=) (5认同)
  • 然后gzip结果(如果标题不会增加结果); ^) (3认同)
  • 这是一个好的开始.你可以做得更好:) (2认同)
  • 我很惊讶你的文章没有提到[FEN] [1],它处理castling,en passant availability等等.[1] http://en.wikipedia.org/wiki/FEN (2认同)

Rob*_*ant 48

最好只以人类可读的标准格式存储国际象棋游戏.

便携式游戏符号假定一个标准的起始位置(尽管它不具有),只是列出了动作,轮流转.紧凑,人性化的标准格式.

例如

[Event "F/S Return Match"]
[Site "Belgrade, Serbia Yugoslavia|JUG"]
[Date "1992.11.04"]
[Round "29"]
[White "Fischer, Robert J."]
[Black "Spassky, Boris V."]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 {This opening is called the Ruy Lopez.} 3... a6
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8  10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2
Run Code Online (Sandbox Code Playgroud)

如果你想让它变小,那就拉链吧.任务完成!

  • 在我对这2个downvotes的防御中得到了:1)它做你想要的2)它通过了http://thedailywtf.com/articles/riddle-me-an-interview.aspx测试:"......一些能够解决这些谜语的人正是你不想作为程序员的那种人.你想和那些建造水位移秤/驳船的人一起工作,乘坐出租车747到码头,然后加重重量喷气机使用它,而不是简单地首先调用波音?" 你不会聘请一个在面试中发明随机编码的人,因为他们也会在他们的代码中这样做. (23认同)
  • 这绝对是我在接受采访时对这个问题的第一个回答.您希望表明您的第一直觉是寻找现成的解决方案.如果面试官告诉你他们想要听到你自己想出的东西,那么你可以进入一个包装解决方案. (18认同)
  • @reinier:我不是说我对信息密度问题完全无能为力(你错误地认为我的回答是承认无能).当然,你想聘请编码现有数据存储标准的人,并且认识到使用适当的现有工具而不是自己动手可能是个好主意 - "我们发明了The Wheel 2.0!它现在甚至更圆!" 你绝对不想聘请那些思考的人 - 奇怪的是 - 使用图书馆功能是一种软弱的表现. (7认同)
  • 这不是练习的重点.:) (3认同)
  • 我和罗伯特在这一方面 - 现有的解决方案实用,人性可读且足够紧凑.与自定义超级解决方案相比,所有这些都是MAJOR专长,它们具有复杂的算法来解码它们.如果是关于采访我肯定也会考虑实际方面!你会惊讶于多少次真正聪明的人想出了超复杂的不切实际的解决方案.这通常归因于他们可以处理头脑中的复杂性这一事实,但那么 - 我们其他人呢...... (2认同)

Tho*_*mas 14

太棒了!

我看到大多数人都在存放每件作品的位置.如何采取更简单的方法,并存储每个方块的内容?这会自动处理促销和捕获的碎片.

它允许霍夫曼编码.实际上,棋盘上的棋子的初始频率几乎是完美的:一半的正方形是空的,剩下的正方形的一半是棋子,等等.

考虑到每件作品的频率,我在纸上构建了一张霍夫曼树,我在此不再赘述.结果,c代表颜色(白色= 0,黑色= 1):

  • 0表示空方块
  • 1c0为典当
  • 1c100 for rook
  • 1c101为骑士
  • 1c110为主教
  • 1c1110为女王
  • 1c1111为国王

对于整个董事会的初始情况,我们有

  • 空方块:32*1位= 32位
  • pawns:16*3位= 48位
  • rooks/knights/bishops:12*5 bits = 60 bits
  • 皇后/国王:4*6位= 24位

总计:初始板状态为164位.显着低于当前最高投票答案的235位.随着游戏的进行,它只会变小(除了促销之后).

我只看着棋盘上棋子的位置; 其他状态(轮到谁,谁已经阉割,通过,重复移动等)将必须单独编码.也许最多16位,因此整个游戏状态为180位.可能的优化:

  • 省去不太频繁的碎片,并分别存放它们的位置.但这无济于事......用一个空方格替换国王和王后可以节省5位,这正是你需要用另一种方式编码位置的5位.
  • "后排没有棋子"可以很容易地通过使用不同的霍夫曼表来编码后排,但我怀疑它有多大帮助.你可能仍然会得到同样的霍夫曼树.
  • "一个白色,一个黑色主教"可以通过引入没有c位的额外符号来编码,然后可以从主教所在的方格推导出.(典当晋升为主教,扰乱了这一计划......)
  • 空方块的重复可以通过引入额外的符号来进行行程编码,例如,"连续2个空方块"和"连续4个空方块".但估计这些频率并不容易,如果你弄错了,那将会伤害而不是帮助.

  • 你可以为64个方格中的每一个做一个单独的霍夫曼树,因为有些人可能比其他人更频繁地拥有一些棋子. (2认同)

Joh*_*ooy 9

非常大的查找表方法

位置 - 18个字节
合法位置的估计数量是 10 43
简单地枚举它们并且位置可以仅存储143位.需要多一位来指示下一个要播放的一侧

枚举当然不实用,但这表明至少需要144位.

移动 - 1个字节
每个位置通常有大约30-40个合法移动,但数量可能高达218个Lets枚举每个位置的所有合法移动.现在每个移动可以编码为一个字节.

我们仍有足够的空间进行特殊动作,例如0xFF来代表辞职.

  • 直接的要求的核心"你可以想到的最节省空间的方式来编码国际象棋游戏的状态" - 没有什么比压缩字典更好的了,包括苍蝇. (3认同)