I wanted to look at how well using a strategy of playing a large number of random games for each move candidate and then using the overall score of the games for each candidate as a means of selecting the optimal move. I played in a tournament here recently where using this strategy led to some pretty poor results. Two games isn’t much of a sample, though, so I thought I’d try to see if there was a way of showing that such a strategy should be expected to fail. I think I’ve managed to get a handle on why this strategy didn’t work very well. It’s a bit of a long post, but maybe you’ll find it interesting.
The efficacy of this Monte Carlotype strategy depends upon two things: One, the number of random games played must be enough so that the difference between the random score and the actual score is minimal (actual score = total number of wins plus half the total number of draws in the game tree below the branch representing the chosen move). Two, the actual score of a branch must be a valid indicator of the optimal move, i.e. that the optimal move has the best score often enough to be reliable.
We will assume that the first criterion is always satisfied, since if a random score bears no relationship to the actual score of a branch, it is unlikely to contain useful information about the game tree or the optimal move. The second criterion is where I think we run into trouble. I’ll try to show that the branch score is NOT a valid indicator of optimal moves no matter how many random games are sampled.
First of all, let’s settle the question of how often the optimal move should have the best score. Suppose the optimal move scores best 90% of the time. In that case, if you use this strategy over a period of 20 moves, there is an 88% chance that you will choose at least one suboptimal move. In fact, the expected number of suboptimal moves would be about 2 out of the 20 moves. Suppose the optimal move scores best 95% of the time. Then there is a 64% chance that you will choose at least one suboptimal move. You could expect to make 1 suboptimal move out of the 20 in that case. At 99% accuracy, there is still an 18% chance that you’ll choose at least one suboptimal move, but a this point the expected number of “mistakes” is close to zero for 20 moves. In correspondence chess, it seems, two mistakes can be fatal. One mistake, you might get by. If the Monte Carlo strategy is to be useful, it seems that we need the branch with the best score to represent the optimal move at least 95% of the time.
So let’s look at the question of accuracy. One thing we can show fairly easily is that it’s possible for chess’s game tree to contain configurations of branches and leaves where the best move does not have the best score. (A branch is a position in the tree where there are one or more possible moves that can be made; a leaf is a position in the tree where we’ve reached a final result – mate, stalemate, or draw.) First we’ll show how it’s possible for a branch score to steer us wrong, and then we’ll try to get some idea of how often that might happen.
An example of bad results:
Because there are a finite number of legal chess positions, the rules of chess ensure that there is an upper limit on the depth of the game tree (maximum game length). If there are P legal positions, then the upper limit for number of halfturns is 2P+1, because of the repetition rule. This is the absolute upper limit – the actual upper limit is less, since there are many positions which can only be entered once. Therefore there exists some N (measured in number of halfmoves) that marks the deepest level of the entire chess game tree.
Let the maximum depth of the tree be N. Let each leaf of the tree score 1 for a white win, .5 for a draw, or 0 for a black win. The score of a branch is the sum of the scores of all the leaves on that branch divided by the number of leaves. (Note that this produces the same score value as if a tournament were played where each game outcome occurred once.)
Assume, without loss of generality, that the player on move at depth N is white.
At depth N; there are only leaves that score either 1 or .5 (a leaf at depth N cannot score 0 because black cannot win on white’s move; black would have to have one more move and that would make the maximum tree depth N+1).
* Monte Carlo strategy is not applicable (white chooses a leaf with score 1 if available).
At depth N1; there are leaves that score 0 or .5 and branches that have scores ranging from .5 to 1 (white winning/drawing branches).
* Monte Carlo strategy is again not applicable at branches with winning leaves. At any depth, if the branch contains a winning leaf, player on move just has to pick that leaf. If the branch contains a drawing leaf (but not a winning one), the player on move should choose that leaf since a draw is the best result possible at this level.
* Where there are only branches, a Monte Carlo strategy would give perfect results  black must choose a depth N branch that has a score of .5 in order to draw. Any other choice loses.
At depth N2; there are leaves that score 1 or .5 and branches that contain some combination of leaves (scoring 0 or .5) and branches with scores ranging from .5 to 1.
* For N2 branches without leaves, white must avoid choosing an N1 branch that contains a leaf (black will choose that move to force a win or draw).
* For N2 branches where the N1 branches contain no leaves, the Monte Carlo strategy again gives valid results. Any N1 branch with a score greater than .5 leads to a white win. Choosing the branch with the highest score is a winning strategy.
At depth N3 (and greater); each branch contains some combination of leaves that score 0 or .5 and branches with scores ranging from 0 to 1.
* For N3 branches without leaves, black must avoid choosing an N2 branch that contains a leaf (as with white at N2).
* For N3 branches where the N2 branches contain no leaves, the Monte Carlo strategy no longer reliably gives valid results. To see this we need to look at how the scores of N2 branches compare to each other. (Recall that black wants to pick the branch with the lowest score.) We’re looking only at N2 branches that contain no leaves, but N1 branches might contain leaves (scoring 0 or .5). In fact, black has to choose an N2 branch that contains at least one zero scoring N1 leaf to have a chance at winning. Unfortunately, the presence of a leaf at N1 is no guarantee that the N2 branch will have a lower score than another N2 branch without N1 leaves. The following diagram shows one way this strategy can fail:
N3 position (black to move)

Branch with two subbranches (black has two possible moves)

________________________________________________________
 
N2 position < white to move > N2 position
(score = .8) (score = .6)
 
One subbranch (forced move) One subbranch (forced move)
 
N1 position < black to move > N1 position
(score = .8) (score = .6)
 
One leaf, one subbranch One subbranch (forced move)
 
______________________ 
  
leaf N position < white to move > N position
(score = 0) (score = 1) (score = .6)
Black win  
(leaf) four move choices five move choices
(leaves) (leaves)
 
____________________________ _______________________________________
        
White White White White White Draw Draw Draw Draw
Win Win Win Win Win (score=.5) (score=.5) (score=.5) (score=.5)
(score=1) (score=1) (score=1) (score=1) (score=1)
In this example, black is at an N3 branch with two dependent N2 branches (a choice of two moves).
The first N2 branch contains one N1 branch (white has only one possible move). This N1 branch contains one 0leaf (black win) and one Nbranch with four 1leaves (white wins). If black chooses this first N2 branch, he can force the win at turn N1 but will lose if the alternate N1 move is chosen. The score for this branch is .8 (5 leaves, 1 black win, 4 white wins).
The second N2 branch also contains just one N1 branch but it contains only one Nbranch with five leaves, one 1leaf (white wins) and four .5leaves (draw). If black chooses this second N2 branch, he will lose the game (unless white blunders and chooses one of the draws). The score for this branch is .6 (5 leaves, 1 white win, 4 draws).
Obviously the first move is the optimal one, but it has a higher score than the losing move. Using the Monte Carlo strategy at N3, in this case, would fail.
So, we have a counterexample that shows it’s possible for the score of the winning branch to be worse than the score of the losing branch. But one example doesn’t tell us much. We need to get an idea of how likely it is that this kind of false result is returned. In the following discussion I’m going to assume that the player with black pieces is trying to win.
General Case
Recall that the Monte Carlo method essentially runs down the game tree starting at a particular branch and gives us a random sample of the leaves below. We don’t really care in what order the leaves of the sample are returned, so we can use combinations instead of permutations, which saves computation time. Each combination of leaves will have a particular score (sum of leaf scores / number of leaves) which we are assuming is representative of the actual branch score.
Next, we have no idea which combinations represent an optimal branch when compared to another so we’ll have to make an assumption about that. We’ll use the most restrictive, one that we know must be true. A branch that contains a black leaf is always optimal compared to a branch that contains no black leaves. If the success rate in identifying the optimal branch under these circumstances is bad, it can only get worse if we have to compare branches where both have black leaves.
I wrote a program to look at the score of every possible combination of leaves for a given number of random games and then compare the scores of every combination with black leaves to every combination without black leaves. If the score of the blackleafed branch is lower then count it as a success, if equal or greater then count it as a failure. The ratio of successes to failures gives us the accuracy percentage. Remember we’re shooting for 95% or better.
Another problem we have is that we don’t really know how leaves are distributed in the tree below the branches we’re examining. We’ll try out four different assumptions and see what happens.
1. All leaves are equally likely, i.e. same number of wins, losses and draws in the branches below. Assuming 300 random games played, the success rate was only 82%. Expected number of suboptimal choices made in 10 moves = 1.7, in 20 moves = 3.4, in 30 moves = 5.1, in 40 moves = 6.8.
2. Draw is twice as likely as any other outcome, but same number of wins and loses. Again, 300 random games give a success rate of 79%. Expected number of suboptimal choices in 10 moves = 2.1, in 20 moves = 4.1, in 30 moves = 6.2, in 40 moves = 8.2.
3. Twice as many wins as draws or losses. 300 games give a success rate of 93%. Not bad but still not up to the 95% we’d like. Expected number of suboptimal choices in 10 moves = .7, in 20 moves =1.3, in 30 moves = 2, in 40 moves = 2.6.
4. Twice as many loses as draws or wins. 300 games give a success rate of 82% again.
We could try other distributions, but the pattern seems pretty clear. Early in the middle game, when it’s reasonable to assume that the chances of winning, losing, or drawing are about equal, using the Monte Carlo strategy is likely to lead you astray 3 or more times in a game of typical length. Later on in the game, even after you may have increased your chances of winning, you’re still likely to be led astray once or twice. It seems pretty clear that the use of this particular Monte Carlo strategy for selecting moves is not the way to go, as we discovered to our misfortune in our recent tournament games.
The efficacy of this Monte Carlotype strategy depends upon two things: One, the number of random games played must be enough so that the difference between the random score and the actual score is minimal (actual score = total number of wins plus half the total number of draws in the game tree below the branch representing the chosen move). Two, the actual score of a branch must be a valid indicator of the optimal move, i.e. that the optimal move has the best score often enough to be reliable.
We will assume that the first criterion is always satisfied, since if a random score bears no relationship to the actual score of a branch, it is unlikely to contain useful information about the game tree or the optimal move. The second criterion is where I think we run into trouble. I’ll try to show that the branch score is NOT a valid indicator of optimal moves no matter how many random games are sampled.
First of all, let’s settle the question of how often the optimal move should have the best score. Suppose the optimal move scores best 90% of the time. In that case, if you use this strategy over a period of 20 moves, there is an 88% chance that you will choose at least one suboptimal move. In fact, the expected number of suboptimal moves would be about 2 out of the 20 moves. Suppose the optimal move scores best 95% of the time. Then there is a 64% chance that you will choose at least one suboptimal move. You could expect to make 1 suboptimal move out of the 20 in that case. At 99% accuracy, there is still an 18% chance that you’ll choose at least one suboptimal move, but a this point the expected number of “mistakes” is close to zero for 20 moves. In correspondence chess, it seems, two mistakes can be fatal. One mistake, you might get by. If the Monte Carlo strategy is to be useful, it seems that we need the branch with the best score to represent the optimal move at least 95% of the time.
So let’s look at the question of accuracy. One thing we can show fairly easily is that it’s possible for chess’s game tree to contain configurations of branches and leaves where the best move does not have the best score. (A branch is a position in the tree where there are one or more possible moves that can be made; a leaf is a position in the tree where we’ve reached a final result – mate, stalemate, or draw.) First we’ll show how it’s possible for a branch score to steer us wrong, and then we’ll try to get some idea of how often that might happen.
An example of bad results:
Because there are a finite number of legal chess positions, the rules of chess ensure that there is an upper limit on the depth of the game tree (maximum game length). If there are P legal positions, then the upper limit for number of halfturns is 2P+1, because of the repetition rule. This is the absolute upper limit – the actual upper limit is less, since there are many positions which can only be entered once. Therefore there exists some N (measured in number of halfmoves) that marks the deepest level of the entire chess game tree.
Let the maximum depth of the tree be N. Let each leaf of the tree score 1 for a white win, .5 for a draw, or 0 for a black win. The score of a branch is the sum of the scores of all the leaves on that branch divided by the number of leaves. (Note that this produces the same score value as if a tournament were played where each game outcome occurred once.)
Assume, without loss of generality, that the player on move at depth N is white.
At depth N; there are only leaves that score either 1 or .5 (a leaf at depth N cannot score 0 because black cannot win on white’s move; black would have to have one more move and that would make the maximum tree depth N+1).
* Monte Carlo strategy is not applicable (white chooses a leaf with score 1 if available).
At depth N1; there are leaves that score 0 or .5 and branches that have scores ranging from .5 to 1 (white winning/drawing branches).
* Monte Carlo strategy is again not applicable at branches with winning leaves. At any depth, if the branch contains a winning leaf, player on move just has to pick that leaf. If the branch contains a drawing leaf (but not a winning one), the player on move should choose that leaf since a draw is the best result possible at this level.
* Where there are only branches, a Monte Carlo strategy would give perfect results  black must choose a depth N branch that has a score of .5 in order to draw. Any other choice loses.
At depth N2; there are leaves that score 1 or .5 and branches that contain some combination of leaves (scoring 0 or .5) and branches with scores ranging from .5 to 1.
* For N2 branches without leaves, white must avoid choosing an N1 branch that contains a leaf (black will choose that move to force a win or draw).
* For N2 branches where the N1 branches contain no leaves, the Monte Carlo strategy again gives valid results. Any N1 branch with a score greater than .5 leads to a white win. Choosing the branch with the highest score is a winning strategy.
At depth N3 (and greater); each branch contains some combination of leaves that score 0 or .5 and branches with scores ranging from 0 to 1.
* For N3 branches without leaves, black must avoid choosing an N2 branch that contains a leaf (as with white at N2).
* For N3 branches where the N2 branches contain no leaves, the Monte Carlo strategy no longer reliably gives valid results. To see this we need to look at how the scores of N2 branches compare to each other. (Recall that black wants to pick the branch with the lowest score.) We’re looking only at N2 branches that contain no leaves, but N1 branches might contain leaves (scoring 0 or .5). In fact, black has to choose an N2 branch that contains at least one zero scoring N1 leaf to have a chance at winning. Unfortunately, the presence of a leaf at N1 is no guarantee that the N2 branch will have a lower score than another N2 branch without N1 leaves. The following diagram shows one way this strategy can fail:
N3 position (black to move)

Branch with two subbranches (black has two possible moves)

________________________________________________________
 
N2 position < white to move > N2 position
(score = .8) (score = .6)
 
One subbranch (forced move) One subbranch (forced move)
 
N1 position < black to move > N1 position
(score = .8) (score = .6)
 
One leaf, one subbranch One subbranch (forced move)
 
______________________ 
  
leaf N position < white to move > N position
(score = 0) (score = 1) (score = .6)
Black win  
(leaf) four move choices five move choices
(leaves) (leaves)
 
____________________________ _______________________________________
        
White White White White White Draw Draw Draw Draw
Win Win Win Win Win (score=.5) (score=.5) (score=.5) (score=.5)
(score=1) (score=1) (score=1) (score=1) (score=1)
In this example, black is at an N3 branch with two dependent N2 branches (a choice of two moves).
The first N2 branch contains one N1 branch (white has only one possible move). This N1 branch contains one 0leaf (black win) and one Nbranch with four 1leaves (white wins). If black chooses this first N2 branch, he can force the win at turn N1 but will lose if the alternate N1 move is chosen. The score for this branch is .8 (5 leaves, 1 black win, 4 white wins).
The second N2 branch also contains just one N1 branch but it contains only one Nbranch with five leaves, one 1leaf (white wins) and four .5leaves (draw). If black chooses this second N2 branch, he will lose the game (unless white blunders and chooses one of the draws). The score for this branch is .6 (5 leaves, 1 white win, 4 draws).
Obviously the first move is the optimal one, but it has a higher score than the losing move. Using the Monte Carlo strategy at N3, in this case, would fail.
So, we have a counterexample that shows it’s possible for the score of the winning branch to be worse than the score of the losing branch. But one example doesn’t tell us much. We need to get an idea of how likely it is that this kind of false result is returned. In the following discussion I’m going to assume that the player with black pieces is trying to win.
General Case
Recall that the Monte Carlo method essentially runs down the game tree starting at a particular branch and gives us a random sample of the leaves below. We don’t really care in what order the leaves of the sample are returned, so we can use combinations instead of permutations, which saves computation time. Each combination of leaves will have a particular score (sum of leaf scores / number of leaves) which we are assuming is representative of the actual branch score.
Next, we have no idea which combinations represent an optimal branch when compared to another so we’ll have to make an assumption about that. We’ll use the most restrictive, one that we know must be true. A branch that contains a black leaf is always optimal compared to a branch that contains no black leaves. If the success rate in identifying the optimal branch under these circumstances is bad, it can only get worse if we have to compare branches where both have black leaves.
I wrote a program to look at the score of every possible combination of leaves for a given number of random games and then compare the scores of every combination with black leaves to every combination without black leaves. If the score of the blackleafed branch is lower then count it as a success, if equal or greater then count it as a failure. The ratio of successes to failures gives us the accuracy percentage. Remember we’re shooting for 95% or better.
Another problem we have is that we don’t really know how leaves are distributed in the tree below the branches we’re examining. We’ll try out four different assumptions and see what happens.
1. All leaves are equally likely, i.e. same number of wins, losses and draws in the branches below. Assuming 300 random games played, the success rate was only 82%. Expected number of suboptimal choices made in 10 moves = 1.7, in 20 moves = 3.4, in 30 moves = 5.1, in 40 moves = 6.8.
2. Draw is twice as likely as any other outcome, but same number of wins and loses. Again, 300 random games give a success rate of 79%. Expected number of suboptimal choices in 10 moves = 2.1, in 20 moves = 4.1, in 30 moves = 6.2, in 40 moves = 8.2.
3. Twice as many wins as draws or losses. 300 games give a success rate of 93%. Not bad but still not up to the 95% we’d like. Expected number of suboptimal choices in 10 moves = .7, in 20 moves =1.3, in 30 moves = 2, in 40 moves = 2.6.
4. Twice as many loses as draws or wins. 300 games give a success rate of 82% again.
We could try other distributions, but the pattern seems pretty clear. Early in the middle game, when it’s reasonable to assume that the chances of winning, losing, or drawing are about equal, using the Monte Carlo strategy is likely to lead you astray 3 or more times in a game of typical length. Later on in the game, even after you may have increased your chances of winning, you’re still likely to be led astray once or twice. It seems pretty clear that the use of this particular Monte Carlo strategy for selecting moves is not the way to go, as we discovered to our misfortune in our recent tournament games.
Very interesting, though what I expected to see as a much more serious problem for Monte Carlo analysis is where you get a move which success rate is very high, say 95%, while others seem about 50%, so it's clear that the move is best and you play it, but to your surprise the opponent plays a move that required high depth to see that you didn't include in your MC, and when you run a new one you only get a 45% success rate, turning out that many of the other moves were better than what you played.
I'd have expected these holes to be critical, and the main reason you'd not like to use MC to choose moves.
I'd have expected these holes to be critical, and the main reason you'd not like to use MC to choose moves.
I agree. Choosing the list of candidate moves is a more serious problem. If the optimum move is one you didn't consider, no amount mathematical analysis you do on the one's you did consider, no matter how accurate, is going to help. The problem I see is that MC appears to fail even if you use it on ALL possible moves for your turn.
Actually, I see I may have misinterpreted what you're saying, i.e. that the MC runs might miss one of your opponent's responses that could skew the score results so that that real score of the branch is much less than the 95% you got from your MC run. That's really a restatement of the first criteria I mentioned above. If you don't run enough games to get a branch score that's close to the real score, then MC won't work at all. But, again, what my results seem to indicate is that it doesn't matter how many games you run  even if you run enough so that every possible move is included in the run  the actual score of a branch still won't be indicative of the optimal move often enough to be useful.
Actually, I see I may have misinterpreted what you're saying, i.e. that the MC runs might miss one of your opponent's responses that could skew the score results so that that real score of the branch is much less than the 95% you got from your MC run. That's really a restatement of the first criteria I mentioned above. If you don't run enough games to get a branch score that's close to the real score, then MC won't work at all. But, again, what my results seem to indicate is that it doesn't matter how many games you run  even if you run enough so that every possible move is included in the run  the actual score of a branch still won't be indicative of the optimal move often enough to be useful.
The problem I described not only happens at the root, it happens everywhere, the only way to avoid it is to turn it into some sort of "Brute force" mode where you eventually cover all critical choices and don't miss anything, but of course it's impractical since optimal moves are found by effective pruning, not the opposite (and it'd take unreasonable time.)
I was just saying that a MC method wouldn't work for picking the best moves even in principle, but I'm glad that it was tested at this level.
It could be used to detect e.g. fortress positions, perhaps if we used Monte Carlo in our last game we'd have avoided positions with a high drawing rate and the outcome would have been different.
I was just saying that a MC method wouldn't work for picking the best moves even in principle, but I'm glad that it was tested at this level.
It could be used to detect e.g. fortress positions, perhaps if we used Monte Carlo in our last game we'd have avoided positions with a high drawing rate and the outcome would have been different.
Hello,
of course pure monte carlo evaluation is not going to converge to optimal play, no matter how large your monte carlo samples. Monte Carlo Tree Search however, i.e. building a game tree from your montecarlo playouts and using it to explore strong moves preferentially when near the root, will converge to optimal play. In chess there would still be practical problems due to the high drawing threshold, the fact that not all moves which keep a position won actually move closer to mate and the difficulty of getting a fast playout policy that gives meaningful playout results, but in principle such an algorithm will at least converge towards moves that keep won games won and drawn games at least drawn.
I suppose that MCTS methods could be productively used in chess if one were to use relatively strong playouts (supported by a shallow traditional alphabeta searcher with some randomization) and if the values propagated back through the MCTS tree contained distancetowin information (i.e., a playout that was won after 30 moves has to be viewed by the program as better than a playout that was won after 40), or if playouts are cutoff after a certain number of plies and an ordinary static evaluation is propagated back. One interesting experiment which one could conduct to test whether such an approach can be interesting in terms of eventually improving upon the playing strength of traditional chess programs could be to do a basic implementation of such a chess MCTS system and see how well playing strength scales with doubling of computing time. If scaling were found to be better than in pure alphabeta search, one would know that with enough computing power an MCTS approach can be competitive.
Best regards,
Neuromancer
of course pure monte carlo evaluation is not going to converge to optimal play, no matter how large your monte carlo samples. Monte Carlo Tree Search however, i.e. building a game tree from your montecarlo playouts and using it to explore strong moves preferentially when near the root, will converge to optimal play. In chess there would still be practical problems due to the high drawing threshold, the fact that not all moves which keep a position won actually move closer to mate and the difficulty of getting a fast playout policy that gives meaningful playout results, but in principle such an algorithm will at least converge towards moves that keep won games won and drawn games at least drawn.
I suppose that MCTS methods could be productively used in chess if one were to use relatively strong playouts (supported by a shallow traditional alphabeta searcher with some randomization) and if the values propagated back through the MCTS tree contained distancetowin information (i.e., a playout that was won after 30 moves has to be viewed by the program as better than a playout that was won after 40), or if playouts are cutoff after a certain number of plies and an ordinary static evaluation is propagated back. One interesting experiment which one could conduct to test whether such an approach can be interesting in terms of eventually improving upon the playing strength of traditional chess programs could be to do a basic implementation of such a chess MCTS system and see how well playing strength scales with doubling of computing time. If scaling were found to be better than in pure alphabeta search, one would know that with enough computing power an MCTS approach can be competitive.
Best regards,
Neuromancer
Interesting. I can see how using distancetowin information from Monte Carlo analysis might be useful. After all, chess is really a race down the game tree to see who can be the first to get to a branch with a winning leaf. It doesn't matter how favorable or unfavorable the win/loss/draw percentages are in the remainder of the tree if you can choose a winning move before your opponent can.
It might be possible to develop a meaningful metric using MC that could compare distancetowin probabilities and give extra weight to those moves where your probable distancetowin is sooner than your opponent's. It would indeed be an interesting experiment.
It might be possible to develop a meaningful metric using MC that could compare distancetowin probabilities and give extra weight to those moves where your probable distancetowin is sooner than your opponent's. It would indeed be an interesting experiment.
Powered by mwForum 2.27.4 © 19992012 Markus Wichitill