 Rybka Chess Community Forum
Topic The Rybka Lounge / Chess / Using Monte Carlo Results to Choose Moves  By dalbertz Date 2009-09-16 19:27
I wanted to look at how well using a strategy of playing a large number of random games for each move candidate and then using the overall score of the games for each candidate as a means of selecting the optimal move.  I played in a tournament here recently where using this strategy led to some pretty poor results.  Two games isn’t much of a sample, though, so I thought I’d try to see if there was a way of showing that such a strategy should be expected to fail.  I think I’ve managed to get a handle on why this strategy didn’t work very well.  It’s a bit of a long post, but maybe you’ll find it interesting.

The efficacy of this Monte Carlo-type strategy depends upon two things: One, the number of random games played must be enough so that the difference between the random score and the actual score is minimal (actual score = total number of wins plus half the total number of draws in the game tree below the branch representing the chosen move).  Two, the actual score of a branch must be a valid indicator of the optimal move, i.e. that the optimal move has the best score often enough to be reliable.

We will assume that the first criterion is always satisfied, since if a random score bears no relationship to the actual score of a branch, it is unlikely to contain useful information about the game tree or the optimal move.  The second criterion is where I think we run into trouble.  I’ll try to show that the branch score is NOT a valid indicator of optimal moves no matter how many random games are sampled.

First of all, let’s settle the question of how often the optimal move should have the best score.  Suppose the optimal move scores best 90% of the time.  In that case, if you use this strategy over a period of 20 moves, there is an 88% chance that you will choose at least one sub-optimal move.  In fact, the expected number of sub-optimal moves would be about 2 out of the 20 moves.  Suppose the optimal move scores best 95% of the time.  Then there is a 64% chance that you will choose at least one sub-optimal move.  You could expect to make 1 sub-optimal move out of the 20 in that case.  At 99% accuracy, there is still an 18% chance that you’ll choose at least one sub-optimal move, but a this point the expected number of “mistakes” is close to zero for 20 moves.  In correspondence chess, it seems, two mistakes can be fatal.  One mistake, you might get by.  If the Monte Carlo strategy is to be useful, it seems that we need the branch with the best score to represent the optimal move at least 95% of the time.

So let’s look at the question of accuracy.  One thing we can show fairly easily is that it’s possible for chess’s game tree to contain configurations of branches and leaves where the best move does not have the best score. (A branch is a position in the tree where there are one or more possible moves that can be made; a leaf is a position in the tree where we’ve reached a final result – mate, stalemate, or draw.)  First we’ll show how it’s possible for a branch score to steer us wrong, and then we’ll try to get some idea of how often that might happen.

Because there are a finite number of legal chess positions, the rules of chess ensure that there is an upper limit on the depth of the game tree (maximum game length).  If there are P legal positions, then the upper limit for number of half-turns is 2P+1, because of the repetition rule. This is the absolute upper limit – the actual upper limit is less, since there are many positions which can only be entered once.  Therefore there exists some N (measured in number of half-moves) that marks the deepest level of the entire chess game tree.

Let the maximum depth of the tree be N. Let each leaf of the tree score 1 for a white win, .5 for a draw, or 0 for a black win.  The score of a branch is the sum of the scores of all the leaves on that branch divided by the number of leaves.  (Note that this produces the same score value as if a tournament were played where each game outcome occurred once.)

Assume, without loss of generality, that the player on move at depth N is white.

At depth N; there are only leaves that score either 1 or .5 (a leaf at depth N cannot score 0 because black cannot win on white’s move; black would have to have one more move and that would make the maximum tree depth N+1).

*  Monte Carlo strategy is not applicable (white chooses a leaf with score 1 if available).

At depth N-1; there are leaves that score 0 or .5 and branches that have scores ranging from .5 to 1 (white winning/drawing branches).

*  Monte Carlo strategy is again not applicable at branches with winning leaves. At any depth, if the branch contains a winning leaf, player on move just has to pick that leaf.  If the branch contains a drawing leaf (but not a winning one), the player on move should choose that leaf since a draw is the best result possible at this level.

*  Where there are only branches, a Monte Carlo strategy would give perfect results - black must choose a depth N branch that has a score of .5 in order to draw. Any other choice loses.

At depth N-2; there are leaves that score 1 or .5 and branches that contain some combination of leaves (scoring 0 or .5) and branches with scores ranging from .5 to 1.

*  For N-2 branches without leaves, white must avoid choosing an N-1 branch that contains a leaf (black will choose that move to force a win or draw).

*  For N-2 branches where the N-1 branches contain no leaves, the Monte Carlo strategy again gives valid results.  Any N-1 branch with a score greater than .5 leads to a white win.  Choosing the branch with the highest score is a winning strategy.

At depth N-3 (and greater); each branch contains some combination of leaves that score 0 or .5 and branches with scores ranging from 0 to 1.

*  For N-3 branches without leaves, black must avoid choosing an N-2 branch that contains a leaf (as with white at N-2).

*  For N-3 branches where the N-2 branches contain no leaves, the Monte Carlo strategy no longer reliably gives valid results.  To see this we need to look at how the scores of N-2 branches compare to each other.  (Recall that black wants to pick the branch with the lowest score.)  We’re looking only at N-2 branches that contain no leaves, but N-1 branches might contain leaves (scoring 0 or .5). In fact, black has to choose an N-2 branch that contains at least one zero scoring N-1 leaf to have a chance at winning.  Unfortunately, the presence of a leaf at N-1 is no guarantee that the N-2 branch will have a lower score than another N-2 branch without N-1 leaves.  The following diagram shows one way this strategy can fail:

N-3 position (black to move)
|
Branch with two sub-branches (black has two possible moves)
|
________________________________________________________
|                                                                                        |
N-2 position        <------ white to move ------>               N-2 position
(score = .8)                                                                        (score = .6)
|                                                                                        |
One sub-branch (forced move)                                       One sub-branch (forced move)
|                                                                                        |
N-1 position        <------ black to move ------>                N-1 position
(score = .8)                                                                        (score = .6)
|                                                                                        |
One leaf, one sub-branch                                             One sub-branch (forced move)
|                                                                                        |
______________________                                                                     |
|                                   |                                                                    |
leaf                     N position       <--- white to move --->          N position
(score = 0)               (score = 1)                                                       (score = .6)
Black win                           |                                                                    |
(leaf)                   four move choices                                           five move choices
(leaves)                                                        (leaves)
|                                                                   |
____________________________               _______________________________________
|            |                 |             |             |              |             |              |               |
White        White          White       White        White       Draw        Draw         Draw          Draw
Win          Win             Win          Win           Win     (score=.5) (score=.5)  (score=.5) (score=.5)
(score=1)  (score=1)    (score=1)  (score=1)   (score=1)

In this example, black is at an N-3 branch with two dependent N-2 branches (a choice of two moves).

The first N-2 branch contains one N-1 branch (white has only one possible move). This N-1 branch contains one 0-leaf (black win) and one N-branch with four 1-leaves (white wins).  If black chooses this first N-2 branch, he can force the win at turn N-1 but will lose if the alternate N-1 move is chosen.  The score for this branch is .8 (5 leaves, 1 black win, 4 white wins).

The second N-2 branch also contains just one N-1 branch but it contains only one N-branch with five leaves, one 1-leaf (white wins) and four .5-leaves (draw).  If black chooses this second N-2 branch, he will lose the game (unless white blunders and chooses one of the draws).  The score for this branch is .6 (5 leaves, 1 white win, 4 draws).

Obviously the first move is the optimal one, but it has a higher score than the losing move.  Using the Monte Carlo strategy at N-3, in this case, would fail.

So, we have a counter-example that shows it’s possible for the score of the winning branch to be worse than the score of the losing branch.  But one example doesn’t tell us much.  We need to get an idea of how likely it is that this kind of false result is returned.  In the following discussion I’m going to assume that the player with black pieces is trying to win.

General Case

Recall that the Monte Carlo method essentially runs down the game tree starting at a particular branch and gives us a random sample of the leaves below.  We don’t really care in what order the leaves of the sample are returned, so we can use combinations instead of permutations, which saves computation time.  Each combination of leaves will have a particular score (sum of leaf scores / number of leaves) which we are assuming is representative of the actual branch score.

Next, we have no idea which combinations represent an optimal branch when compared to another so we’ll have to make an assumption about that.  We’ll use the most restrictive, one that we know must be true. A branch that contains a black leaf is always optimal compared to a branch that contains no black leaves.  If the success rate in identifying the optimal branch under these circumstances is bad, it can only get worse if we have to compare branches where both have black leaves.

I wrote a program to look at the score of every possible combination of leaves for a given number of random games and then compare the scores of every combination with black leaves to every combination without black leaves.  If the score of the black-leafed branch is lower then count it as a success, if equal or greater then count it as a failure.  The ratio of successes to failures gives us the accuracy percentage.  Remember we’re shooting for 95% or better.

Another problem we have is that we don’t really know how leaves are distributed in the tree below the branches we’re examining.  We’ll try out four different assumptions and see what happens.

1.  All leaves are equally likely, i.e. same number of wins, losses and draws in the branches below. Assuming 300 random games played, the success rate was only 82%.  Expected number of sub-optimal choices made in 10 moves = 1.7, in 20 moves = 3.4, in 30 moves = 5.1, in 40 moves = 6.8.
2.   Draw is twice as likely as any other outcome, but same number of wins and loses.  Again, 300 random games give a success rate of 79%.  Expected number of sub-optimal choices in 10 moves = 2.1, in 20 moves = 4.1, in 30 moves = 6.2, in 40 moves = 8.2.
3.  Twice as many wins as draws or losses.  300 games give a success rate of 93%.  Not bad but still not up to the 95% we’d like.  Expected number of sub-optimal choices in 10 moves = .7, in 20 moves =1.3, in 30 moves = 2, in 40 moves = 2.6.
4.  Twice as many loses as draws or wins.  300 games give a success rate of 82% again.

We could try other distributions, but the pattern seems pretty clear.  Early in the middle game, when it’s reasonable to assume that the chances of winning, losing, or drawing are about equal, using the Monte Carlo strategy is likely to lead you astray 3 or more times in a game of typical length.  Later on in the game, even after you may have increased your chances of winning, you’re still likely to be led astray once or twice.  It seems pretty clear that the use of this particular Monte Carlo strategy for selecting moves is not the way to go, as we discovered to our misfortune in our recent tournament games.  By Uly Date 2009-09-16 20:58
Very interesting, though what I expected to see as a much more serious problem for Monte Carlo analysis is where you get a move which success rate is very high, say 95%, while others seem about 50%, so it's clear that the move is best and you play it, but to your surprise the opponent plays a move that required high depth to see that you didn't include in your MC, and when you run a new one you only get a 45% success rate, turning out that many of the other moves were better than what you played.

I'd have expected these holes to be critical, and the main reason you'd not like to use MC to choose moves.  By dalbertz Date 2009-09-16 22:16
I agree.  Choosing the list of candidate moves is a more serious problem.  If the optimum move is one you didn't consider, no amount mathematical analysis you do on the one's you did consider, no matter how accurate, is going to help.  The problem I see is that MC appears to fail even if you use it on ALL possible moves for your turn.

Actually, I see I may have misinterpreted what you're saying, i.e. that the MC runs might miss one of your opponent's responses that could skew the score results so that that real score of the branch is much less than the 95% you got from your MC run.  That's really a restatement of the first criteria I mentioned above.  If you don't run enough games to get a branch score that's close to the real score, then MC won't work at all.  But, again, what my results seem to indicate is that it doesn't matter how many games you run - even if you run enough so that every possible move is included in the run - the actual score of a branch still won't be indicative of the optimal move often enough to be useful. By Uly Date 2009-09-16 22:40
The problem I described not only happens at the root, it happens everywhere, the only way to avoid it is to turn it into some sort of "Brute force" mode where you eventually cover all critical choices and don't miss anything, but of course it's impractical since optimal moves are found by effective pruning, not the opposite (and it'd take unreasonable time.)

I was just saying that a MC method wouldn't work for picking the best moves even in principle, but I'm glad that it was tested at this level.

It could be used to detect e.g. fortress positions, perhaps if we used Monte Carlo in our last game we'd have avoided positions with a high drawing rate and the outcome would have been different.  By Neuromancer Date 2009-09-24 09:10
Hello,

of course pure monte carlo evaluation is not going to converge to optimal play, no matter how large your monte carlo samples. Monte Carlo Tree Search however, i.e. building a game tree from your monte-carlo playouts and using it to explore strong moves preferentially when near the root, will converge to optimal play. In chess there would still be practical problems due to the high drawing threshold, the fact that not all moves which keep a position won actually move closer to mate and the difficulty of getting a fast playout policy that gives meaningful playout results, but in principle such an algorithm will at least converge towards moves that keep won games won and drawn games at least drawn.
I suppose that MCTS methods could be productively used in chess if one were to use relatively strong playouts (supported by a shallow traditional alpha-beta searcher with some randomization) and if the values propagated back through the MCTS tree contained distance-to-win information (i.e., a playout that was won after 30 moves has to be viewed by the program as better than a playout that was won after 40), or if playouts are cut-off after a certain number of plies and an ordinary static evaluation is propagated back. One interesting experiment which one could conduct to test whether such an approach can be interesting in terms of eventually improving upon the playing strength of traditional chess programs could be to do a basic implementation of such a chess MCTS system and see how well playing strength scales with doubling of computing time. If scaling were found to be better than in pure alpha-beta search, one would know that with enough computing power an MCTS approach can be competitive.

Best regards,
Neuromancer By dalbertz Date 2009-09-24 14:08
Interesting.  I can see how using distance-to-win information from Monte Carlo analysis might be useful.  After all, chess is really a race down the game tree to see who can be the first to get to a branch with a winning leaf.  It doesn't matter how favorable or unfavorable the win/loss/draw percentages are in the remainder of the tree if you can choose a winning move before your opponent can.

It might be possible to develop a meaningful metric using MC that could compare distance-to-win probabilities and give extra weight to those moves where your probable distance-to-win is sooner than your opponent's.  It would indeed be an interesting experiment.
Topic The Rybka Lounge / Chess / Using Monte Carlo Results to Choose Moves