Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / AlphaZero beats Stockfish 8 by 64-36
1 2 3 4 5 6 7 Previous Next  
Parent - - By Kreuzfahrtschiff (***) [de] Date 2017-12-11 20:35
u both have never player real chess, right?
Parent - - By glennsamuel32 (*) Date 2017-12-11 20:55
Define so called "real chess" first...
Parent - By Chris Whittington (**) [fr] Date 2017-12-11 22:07
try him on 30 Qc8+ in his "operator" analysis
Parent - - By turbojuice1122 (Gold) [us] Date 2017-12-12 02:28
Maybe I'm missing something--I liked 22. Rf6, also, but why didn't Stockfish respond with 22...Nh5 ?
Parent - - By glennsamuel32 (*) Date 2017-12-12 04:22 Edited 2017-12-12 04:43
SF 8, AsmFish, Houdini and Komodo almost instantly chose 22...Nd5, with evals ranging from -0.25 to -0.75 at depth 30...

This is on my laptop...
Parent - By Chris Whittington (**) [fr] Date 2017-12-12 11:41 Edited 2017-12-12 13:03
22 .... Nh5 offers a draw by repetition

22 ... Nd5 Nxd5, Bxd5, what says Stockfish?

is full of wild variations, but (no chess computer here) I can't find quite enough. The AZ theme tune is just hold the bind, so I suppose Rd1 to ask the Bd5 where it intends to go.
- By rocket (***) [se] Date 2017-12-12 01:48 Edited 2017-12-12 01:57
I will post it here as well in case anyone misses it: a patzer can block up the positon and draw Stockfish easily as white.  I am a very strong chess player but I didn't even need to show it and anyone rated 14-1500 could play the game below

[Event "Blitz:1'+2""]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "Adam"]
[Black "Stockfish 8 64 POPCNT"]
[Result "1/2-1/2"]
[PlyCount "248"]
[TimeControl "60+2"]

{32MB, Fritz11.ctg, DESKTOP-M7G326U} 1. Nf3 {0} d5 {4} 2. e3 {0} Nf6 {3} 3. d4
{2} e6 {2} 4. Bd3 {1} b6 {4} 5. Nbd2 {2} Bb7 {5} 6. O-O {1} Bd6 {3} 7. a3 {3}
Nbd7 {3} 8. Qe2 {2} O-O {5} 9. Rd1 {3} Ne4 {8} 10. Ba6 {10} Bxa6 {0} 11. Qxa6 {
1} Ndf6 {2} 12. Nxe4 {7} Nxe4 {5} 13. Qe2 {9} c6 {0} 14. Nd2 {25} f5 {0} 15. f3
{4} Nf6 {2} 16. c4 {4} Qc7 {3} 17. g3 {3} a5 {4} 18. b3 {3} e5 {3} 19. Bb2 {1}
e4 {6} 20. f4 {0} a4 {6} 21. b4 {1} Qd7 {2} 22. Rdc1 {5} Rab8 {4} 23. Rab1 {2}
b5 {2} 24. c5 {1} Be7 {9} 25. Kg2 {1} Kf7 {4} 26. Rh1 {2} h5 {6} 27. h4 {1} g6
{2} 28. Nf1 {3} Kg8 {8} 29. Rd1 {2} Rfe8 {3} 30. Re1 {1} Bf8 {2} 31. Rd1 {0}
Bh6 {3} 32. Nh2 {2} Rf8 {3} 33. Rhg1 {1} Bg7 {4} 34. Rh1 {0} Rbe8 {5} 35. Rhg1
{0} Qa7 {6} 36. Rh1 {1} Kh7 {2} 37. Rhg1 {0} Rh8 {2} 38. Rh1 {0} Re6 {3} 39.
Rc1 {0} Rb8 {2} 40. Rcd1 {0} Re7 {2} 41. Rc1 {0} Ree8 {2} 42. Rcd1 {0} Bf8 {2}
43. Rc1 {1} Qd7 {2} 44. Rcd1 {0} Re7 {3} 45. Rc1 {0} Rd8 {2} 46. Rcd1 {0} Qb7 {
2} 47. Rc1 {0} Rc7 {2} 48. Rcd1 {0} Bg7 {2} 49. Rc1 {1} Ra8 {2} 50. Rcd1 {0}
Ne8 {3} 51. Rhg1 {1} Bf6 {2} 52. Rh1 {1} Be7 {2} 53. Rc1 {2} Kg8 {2} 54. Rcd1 {
0} Nf6 {1} 55. Rc1 {0} Rf8 {2} 56. Rcd1 {0} Rd8 {3} 57. Rc1 {0} Kh7 {1} 58.
Rcd1 {0} Rcc8 {2} 59. Rc1 {0} Rh8 {2} 60. Rcd1 {0} Bd8 {2} 61. Rc1 {0} Bc7 {1}
62. Rcd1 {2} Rhf8 {2} 63. Rc1 {0} Ra8 {5} 64. Rcd1 {0} Kg8 {2} 65. Rc1 {0} Rfd8
{2} 66. Rcd1 {1} Rd7 {2} 67. Rc1 {1} Bd8 {2} 68. Rcd1 {1} Ra7 {2} 69. Rc1 {0}
Qc8 {1} 70. Rcd1 {0} Rf7 {3} 71. Rc1 {0} Ra8 {2} 72. Rcd1 {1} Ng4 {2} 73. Rc1 {
15} Nxh2 {2} 74. Rxh2 {1} Bf6 {2} 75. Rch1 {2} Raa7 {2} 76. Kf1 {1} Qd7 {2} 77.
Kf2 {0} Qe7 {2} 78. Kf1 {3} Rb7 {2} 79. Kf2 {0} Qc7 {2} 80. Kf1 {0} Qd8 {2} 81.
Kf2 {0} Rbd7 {2} 82. Kf1 {0} Rh7 {2} 83. Kf2 {0} Rdf7 {2} 84. Ke1 {1} Qe7 {2}
85. Kd2 {1} Qd7 {2} 86. Ke1 {1} Qe6 {1} 87. Kd2 {1} Rd7 {2} 88. Ke1 {1} Qe7 {1}
89. Kf2 {1} Bg7 {1} 90. Kf1 {1} Bh6 {2} 91. Ke1 {2} Ra7 {2} 92. Kd2 {0} Bg7 {2}
93. Ke1 {1} Bf6 {1} 94. Kd2 {1} Rf7 {2} 95. Ke1 {1} Rg7 {1} 96. Kd2 {1} Qd8 {3}
97. Ke1 {1} Kh7 {2} 98. Kd2 {1} Rg8 {3} 99. Ke1 {1} Re7 {1} 100. Kd2 {1} Qc7 {3
} 101. Ke1 {1} Rb8 {2} 102. Kd2 {1} Ree8 {2} 103. Ke1 {1} Kg7 {1} 104. Kd2 {1}
Rg8 {2} 105. Ke1 {0} Kf7 {0} 106. Kd2 {1} Qd7 {8} 107. Ke1 {1} Qe7 {1} 108. Kd2
{1} Qc7 {2} 109. Ke1 {1} Kg7 {0} 110. Kd2 {2} Qf7 {3} 111. Ke1 {1} Qe7 {5} 112.
Kd2 {1} Rbe8 {2} 113. Ke1 {1} Rgf8 {3} 114. Kd2 {1} Qc7 {2} 115. Ke1 {1} Kf7 {2
} 116. Kd2 {2} Rg8 {2} 117. Ke1 {1} Rg7 {2} 118. Kd2 {1} Kg8 {1} 119. Ke1 {2}
Rge7 {2} 120. Kd2 {1} Qd7 {2} 121. Ke1 {1} Bg7 {2} 122. Kd2 {1} Bh6 {2} 123.
Ke1 {1} Rf7 {2} 124. Kd2 {1} Ree7 {2} 1/2-1/2
- - By rocket (***) [se] Date 2017-12-12 14:44
There are some questions I want resolved: What was the maximum depth reached by Alphazero, and how close to perfect chess does it actually play? Caruana mentioned that Alphazero did not gain much in strenght the last section of the training stage. ..

I would be very surprised if Stockfish is close to perfect chess, but its results with white seems to suggest that since it only lost 3 out of 50 games.
Parent - - By zaarcis (***) [lv] Date 2017-12-12 15:36
The playouts (simulated games) is played until they end. The "maximum depth" could be 400 plies then, easily.
(The statistics from all such simulations all collected and used to make move.)

If I said something wrong, please correct me. Thanks.
Parent - - By Lazy Frank (****) [gb] Date 2017-12-12 17:28
Really? 400 plies, 200 moves? :roll:
I'm very pessimistic ... Mini-max engines may have be fully blind ~ after 25 moves.
Alpha Zero - i guess around 35-40 moves (70-80 plies).
Parent - - By zaarcis (***) [lv] Date 2017-12-12 19:33 Edited 2017-12-12 19:39
The answer is weird because the question is such, too.

AlphaZero uses a variation of Monte Carlo Tree Search (MTCS) - each playout (simulated game) continues until end.
200 moves could happen easily - for example, just get to some complex endgame (for example, exchange vs more pawns, play it some 50 moves and then continue to KQPkq ending, even better if there are stilll some more additional pawns.) :) (Of course, it depends on how fast resignation would happen it this simulated game, maybe there's some maximum length too.) And then in the next simulated game it ends after 50 moves. Etc.

MTCS and search depth (as we are used to) doesn't work together, imo. There are visited position count or nodes, still. Number of playouts can be measured too.
Parent - - By zaarcis (***) [lv] Date 2017-12-12 19:47 Edited 2017-12-12 19:54
It seems that for programs that use MCTS, number of playouts are the main thing that is measured.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-12 20:01
are you sure the AZ search is MCTS?
Parent - - By zaarcis (***) [lv] Date 2017-12-12 23:07
Yes, it's in the AlphaZero paper. Search for MCTS and you will get many results in that pdf.

All Alphas (AlphaGo, the Alpha that was in the middle (forgot name) and now AlphaZero) are using variation of MCTS.
Parent - By Chris Whittington (**) [fr] Date 2017-12-13 00:34
Yes, I read it several times - there are explanation "gaps", unsurprisingly for a first paper.

They don't do full rollouts, they use the NN at some limiting depth (don't understand how that depth is determined), and they don't select random rollout paths, the NN score is used to select the moves (or maybe I am extrapolating that unsoundly) together with some clever rules extracted from the tree so far (I may be extrapolating that too). In that sense its a depth limited guided search with ingame evaluations and not backpropagating by minimax, but using averaging as per Monte Carlo. Well that's my alpha beta old paradigm way of trying to wrap my head around it.

If the search is being depth limited, that suggests iterating is viable, and some sort of time control possible?
Parent - - By Sesse (****) [no] Date 2017-12-12 23:58
AlphaZero's MCTS uses the neural network to evaluate the MCTS nodes, not playouts.
Parent - - By zaarcis (***) [lv] Date 2017-12-13 00:23
Yes, and playouts ends in some result (win, draw or loss - in this case), that are used for evaluating the nodes.
Parent - - By zaarcis (***) [lv] Date 2017-12-13 00:36
Sleepy scratching head, but here's quote form the article.

> AlphaZero learns these move probabilities and value estimates entirely from self-play; these are then used to guide its search.
> Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general-purpose Monte-Carlo tree search (MCTS) algorithm.  Each search consists of a series of simulated games of self-play that traverse a tree from root $s_root$ to leaf. Each simulation proceeds by selecting in each state $s$ a move $a$ with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected $a$ from $s$) according to the current neural network $f_θ$. The search returns a vector $π$ representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state.

I understood it like this: AlphaZero has knowledge (probability distribution over moves before search) and uses for intelligent simulated games, getting updated probability distribution that is better.

And leafs, I believe, are reasonable ends of those simulated games.
Parent - By Chris Whittington (**) [fr] Date 2017-12-13 01:09
likewise scratching head here ...
so, they say it runs at 80K nodes per second, and the games played were minute per move, so that's about 5 million nodes per move. Say a full game playout  is 100 half moves, then each one minute thought period looks at 50,000 game play outs. Not sure if branching factor is a realistic concept for MCTS, but that implies a ridiculously low figure. Well, either ridiculous or utterly brilliant.

It's clear that the learning cycles require full rollouts, but, is it so clear that the game play search engine does too?
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 00:37
they are playing each rollout to game end? that implies the NN probability evaluation is only for search guidance/move selection
Parent - - By zaarcis (***) [lv] Date 2017-12-13 09:12
As I understand - yes.
I didn't find anything about winning probability (from NN) before all the MCTS.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 11:40
but why rollout the search in full, when you have a probability engine?

limiting the depth gains width, albeit at the cost of the absolute accuracy of 0,1,1/2, but is more and more doable with a well trained engine, surely? Maybe I am just not getting something.
Parent - - By zaarcis (***) [lv] Date 2017-12-13 12:07 Edited 2017-12-13 12:10
[As far as I understand, Deepmind just wanted to get proof of concept and have no actual interest in further experiments and optimisations of chess engines (or shogi, or baduk). They are interested in machine learning in general. Bad for chess, good for anyone else. :D]
Rollouts are necessary because the current neural net doesn't return probability of win/draw/loss. Win percentage etc. are calculated using MCTS, which is guided by the neural net.

Of course, one could make some more advanced AlphaZero, which in addition to move probabilities would return win/draw/loss probabilities, too. And then use it to stop playouts faster - for example, if win probability is 95%, then it is won for this player, or if draw probability is 95%, then let's return that the position is drawn and stop current simulation. Probably Deepmind didn't find such approach necessary, that's all. And maybe it complicates things too much, not sure.

(I should get some free week and read all those Deepmind papers, and understand them in good enough level...)
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 12:36
the paper says the network produces from the board position, a scalar value, v, estimating the expected outcome (I'm guessing this is some sortof probability function between -1 and +1, matching to win, loss, draw.

Are we just getting confused by something simple? I agree MCTS full rollouts to win/loss/draw are used in the training game phase. The paper describes it.
But, when the NN is trained (or part trained), and is playing tournaments, it is possible the algorithm is different, in that it may be depth limited and using the scalar value, v. The paper is silent on this. So I am guessing.

You may well be right (eg full rollouts all the time), because DeepMind philosophy is to have a generalised engine, and just how much depth limitation to use would be game dependent. But, I find it unclear. Hopefully their follow up paper will reveal ....
Parent - - By zaarcis (***) [lv] Date 2017-12-13 13:07
Nice! :) I missed that part. (Honestly, that paper makes my head hurt. Not used to read research publications.)
Thank you for this pair reading and increasing my understanding.

> The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ. Games are played by selecting moves for both players by MCTS, $a_t$ ∼ $π_t$. At the end of the game, the terminal position $s_T$ is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for a draw, and +1 for a win. The neural network parameters θ are updated so as to minimise the error between the predicted outcome $v_t$ and the game outcome z, and to maximise the similarity of the policy vector $p_t$ to the search probabilities $π_t$.

It's trained, therefore probably used, too.
I believe now that it's used here, probably including some pruning (ending of playout) if 95% or similar probability is reached:

> Each simulation proceeds by selecting in each state $s$ a move $a$ with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected $a$ from $s$) according to the current neural network $f_θ$.

No idea how that's combined with move probabilities. The paper is too silent sometimes, I agree.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 14:24 Upvotes 1
I find I need some back and forth discussion to get to grips with this, so thanks to you and Seese likewise.

If one considers AZ always went to full rollouts, then ALL the intelligence would be in the search guidance, and NONE in the evaluation. So then it would be the search guide only  that generated the playing style. I'm struggling a bit with the idea here, but I think this means AZ can't be said to have a global search style, because it depends then on the opponent moves. It's style may be the NEGATION of the opponent style. When AZ plays Stockfish, it plays as Not-Stockfish; when it plays an old-style materialist program, it plays as Not-materialist.

Come to think of it, even if AZ is using its learnt values, the same applies. It's a Not-Engine. I will ponder this point ....
Parent - - By zaarcis (***) [lv] Date 2017-12-13 19:53
We haven't seen how AZ plays against different opponents, sadly. :(
And won't know until we make our own AlphaZero.

But, AlphaGo (baduk engine) had very changing playing style with different opponents - and, I believe, it is simply consequence of playing very good moves (possibly, best ones). No style, just best possible moves.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 22:01 Upvotes 1
it would be very informative to see AZ self play games, and then run them past a Stockfish analysis. Maybe one day. I didn't know the opponent dependent play style of AlphaGo. Interesting.

Been thinking about MC and AZ all day, and all yesterday. Ok, I figured as soon as seeing the games a few days ago, that AZ was maximising on some sort of energy/opportunity/potential/mobility/cramping function and really was not too bothered, apparently, about material, or not in the sense that the old paradigm programs were bound by their necessarily mobility heavy evaluation functions. AZ is patient, it builds energy/options/potential/threats into its positions, and it waits, it doesn't actually execute this potential energy, it holds it. Patience, unless and until it's time to execute. Strong players can't execute flashy tactics against Stockfish, it sees too deep. Nor can AZ. What AZ does, is build, build, build, hold the advantages, and the opponent is slowly overwhelmed. Two adages from very strong players "stop looking all the time for tactics, just play positional, the tactics will come themselves" and "the threat is stronger than its execution".

So, I concluded that AZ neural net evaluation was the Energy maximiser, and the search just found the lines that the evaluation wanted it to find to maintain the energy. Such a powerful evaluation that even at 1000 times slower it defeated Stockfish deep search, without even going as deep, nor looking at as many nodes. I got this the wrong way round, enough tobe completely wrong. And try to explain ....

ok, I spent some time looking at Game 3, earlier stages, I think Glennsamuel32 proposed 22....Nd5, and other people were making other suggested moves. So, I was looking at these moves, trying replies that the normal programs were rejecting, trying little threat maneouvres, trying flashy moves, and I kept finding many many lines of fireworks, if only a piece was on a different square, or one piece was not blocking in the way, or white had a tempo, or whatever, they'ld have worked, spectacularly, but they didn't,  just, black always had a resource, one move that refuted. What worked for white, was not a flashy move, but just continue developing, just hold the advantage.

An example of this, after the 22....Nd5 was to try to play Rxc6, giving up the exchange. If Black takes with the Q, then fireworks, white seems winning. But black takes with the pawn and game over, white exchange down and can't continue the attack.
Now, what does an alpha-beta search of this situation? It doesn't see anything special in static, is all too complex, but the search refutes Rxc6 with d7xc6, cuts away Rxc6 and thinks no more of the situation. And MCTS? Well, it sees Rxc6 pawnxd6 good for black but  it also sees Rxc6 Qxc6 and a bunch of white winning playout lines, good for white. Then it averages. So the fact that there are lots of white winning playouts arising, gets averaged into the backpropagated score. Alpha beta says "refuted, end of story", MCTS says "hmmmm, lots of possibilities for white in this position". MCTS plays towards positions with potential, and tries to maintain potential, even though it doesn't actually execute, because the potential is not yet viably resolvable. Which means that the "energy-maximiser" is in the MCTS search, not the evaluation function. Isn't that interesting!? And completely unlike minimax alpha-beta.
Parent - By zaarcis (***) [lv] Date 2017-12-13 22:48 Edited 2017-12-13 22:59

> Choi Jung about the gender of Alphago: Alphago plays neither male nor female, but neutral. "Male" Baduk is profound and soft. Female Baduk is aggressive and full of fighting spirit. Alphago is very strong in both aspects. The games touched me very much. It has played a "natural" Baduk, as I always dreamed. An absolute harmony, as Go Seigen has always emphasized. At Alphago I had the feeling that I had when I first saw Takemiyas games. If I had tried something new, the people laughed at me. That intimidated me. Now comes Alphago, playing one strange move after the other and still winning.

(That "gender" is probably related to concepts of yin/yang.)

Also nice quote from the same place one could relate about AlphaZero (chess engine), too:

> Concluding words of Park Young Hun 9p : "Alphago has broken some stereotypes, stereotypes of human baduk. We often thought, 'But this move, it's strange, you can not play like this!' Now we have the chance to try bravely and self-consciously such moves which once seemed forbidden. Man has no more fetters. Alphago has given us freedom.

I mean, comparison to Tal and similar.

I had read other impressions about AlphaGo (Master and others), but I can't found them now as that was too long time ago. I believe - when someone gets closer to perfection (at least in board games :D) it loses its "style".

One can compare AlphaGo games again humans (for example, all the Master games) - I believe it can be said that AlphaGo confidently outplayed humans by relatively peaceful, confident moves.
And then one can see all those 50 (if I remember correctly) published AlphaGo selfplay games - they were some agressive fighting craziness from outer space etc., because now AlphaGo had equal opponent.
Parent - - By zaarcis (***) [lv] Date 2017-12-13 23:18
I would call it "positional advantage maximiser". :) AlphaZero (and previous versions, related to baduk, too) had incredible positional intuition, therefore my choice.
Parent - By Chris Whittington (**) [fr] Date 2017-12-14 11:43 Edited 2017-12-14 12:53 Upvotes 1
I think it is important to analyse the effects of the MCTS search compared to alpha-beta minimax.

Consider the chess tree as a mass of minefield, densely sown in some parts, lightly sown in others. AB is very good at picking out a path through the mines and coming out the other side unscathed. If a move in the tree steps on a mine, AB says "not going there" and takes another path. AB neither knows, nor cares, how dense the mine field is around his pathway, just as long as he can pick a way through. AB is a mine detector, but he doesn't map the minefields, he maps the pathway. AB assumes his opponent wants to do the same (avoid mines).

If you like, you can see the tree also as a deep, dark forest (Tal), and AB is quite happy just as long as there is a path through and out the other side. Again, AB neither knows nor cares about the dangers lurking in the forest all around him, just as long as he finds the pathway.

Compare MCTS. MCTS sends a robot straight (well, it is guided by the MCTS neural net) into the minefields, taking "AB robot" with him. They both emerge at the other side. MCTS might be dead, AB might be dead or they both might survive.
MCTS sends another robot pair in, and again, and again and again. Each "path" is guided again by the neural net combined with being in the general region of prior paths that tended to explode AB and left MCTS unscathed combined with a bit of exploration variance.

So, think about this now, MCTS is building a memory map of part of the tree which appears to contain a high density of AB anti-personnel mines. And MCTS is going to play the move which leads to this part of the tree (or stays in it, if already there; or moves to even higher density anti-AB mine regions).

MCTS steers a path into the tree containing high density anti-AB mines, and conversely low-density anti-MCTS mines.
MCTS knows generally about, and tries to increase the general direction into dangerous regions. It has area knowledge.
AB has no area knowledge at all. It just knows how to find a path.

MCTS is building potential, creating risk for opponent.
AB relies on the tactical path finder.

Tal "You must take your opponent into a deep, dark forest where 2+2=5 and the path leading out is only wide enough for one."

And MCTS is playing Tal-style via search. I remain amazed, I thought this was going to be done, if ever, by evaluation. Ok, is true that the NN has now encoded this "move towards the deep dark forest" into its evaluation function as well, so AlphaZero has double whammy Tal operation.

Highlights of five games, here's the proof of the MCTS in action ...


Bye-bye "chess is a draw" predictions, for the time being. Must see AZ self-play games  .....
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 16:14
Rechecked the paper. I think the below text confirms AZ is depth limiting at leaf nodes and evaluating. Not rolling out to completion (in game play mode). How it decides to stop expansion and cut is unclear, well silent. Maybe just an extension of the way it chooses which child next? It can decide no children at all, presumably.

AlphaZero evaluates positions using non-linear function approximation based on a deep neural network, rather than the linear function approximation used in typical chess programs. This provides a much more powerful representation, but may also introduce spurious approxi- mation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree
Parent - - By zaarcis (***) [lv] Date 2017-12-13 17:56
Maybe they thought that it's irrelevant detail (probably it's so, as it can be done in multiple ways) or maybe they left it out by mistake.

I would do it in such way:
1) if side A believes that it wins with probability >95%, and in the next ply another side agrees (that it loses with probability 95%), then the playout ends with obvious result;
2) if both sides agrees that it's draw (in the same way), then the playout ends with draw;
3) if the playout is too long (one has to specify, what exactly it means), it ends with draw; maybe it's delegated to 50 move rule that is included as a rule of chess game (according to the paper).

Sadly it can't be deduced from the paper, imho. Even from the quoted fragment.
Sitting on thumbs and waiting for promised next, more serious paper. :|
Parent - - By Chris Whittington (**) [fr] Date 2017-12-13 18:38 Edited 2017-12-13 18:43
I thought, instead of keeping at each node, a win count and a visit count (as typical MCTS description), it would keep, a visit count and a summation of backpropagated values.

so, normal MCTS, as per below, expands node 3/3, generates child, the child rollout game is a loss, so 3/3 becomes 3/4, backs up 5/7, 7/11 and 12/22 to the root.

but, instead of win count, we just add in the decimal score from the network (say 0.25 for example, a losing score), then we get 3/3 becomes 3.25/4, 5.25/8, 7.25/11 and 12.25/22.

so we can cut anywhere we want, and we can return the actual decimal probability score, we don't have to convert it to win/loss.

So, as you wrote, what remains is the cut decision algorithm, which, in keeping with the game independent DeepMind rule, can't contain chess concepts.
Parent - - By zaarcis (***) [lv] Date 2017-12-13 22:57
Sounds like, at least, an interesting idea to think about.

I don't know if Deepmind even didn't care (because usual MCTS worked perfectly for them, I believe), or maybe they knew some disadvantages of such approach or maybe it even was tried already and found not acceptable... I will try to read/skim relevant literature to see if someone haven't already had this idea.

(Reinventing bicycle is fun, but it's better to check if someone hasn't already done it. I had fun experience, when I dreamed about approach that was basically Temporal Difference Learning... :) and sadly didn't know the right search terms and wasted many hours until I found how it is called, what is achieved and etc.)
Parent - By zaarcis (***) [lv] Date 2017-12-17 18:46
Maybe I was too lazy or unskilled, but I didn't find anything about such variation of MCTS. (Maybe it's because the standard MCTS work wonderfully.)
But it should be doable and, I expect, would play good enough.
Parent - By Lazy Frank (****) [gb] Date 2017-12-12 17:38
You must understand chess better!

If Apha Zero not able to find winning probability in 120 plies, then 99% this game is a draw against another Alpha Zero player.
Defensive resources (not in forced down positions) always are greater than attacking resources. That is why chess actually is draw despite of first move advantage.
Parent - - By zaarcis (***) [lv] Date 2017-12-12 15:39
According to this logic, AlphaZero then is much more close to perfect chess, as it won 50% of all white games. :)
Parent - - By rocket (***) [se] Date 2017-12-12 17:15
We don't know if it plays perfect chess? Maybe it does
Parent - By zaarcis (***) [lv] Date 2017-12-12 19:35
If the stockfish can't get the same result with white as AlphaZero does, it sure isn't close to perfect game.
- By rocket (***) [se] Date 2017-12-12 17:52
One problem playing stockfish or any other pc program opening book off is that they will repeat the same two or three lines ad inifinitum, that they deem best against the first move. 

So stockfish is relegated to playing the french defence all games against 1.e4 because it thinks its besr
- - By glennsamuel32 (*) Date 2017-12-15 02:49
Well, maybe there was some good to come out of this whole AZ experience.

The Komodo team has agreed to work on it, as soon as finances permit.

Developers have already started projects too.

And Nvidia has the hardware covered with the Titan V :grin:

AZ has shaken the chess world to action and I foresee only better things ahead.
Parent - By zaarcis (***) [lv] Date 2017-12-15 10:08
Also, from

> I'd be enormously interested in it - but I'm extremely busy with this project right here already! So it seems likely someone in the chess community will have something before we finish our run.
> I told the Stockfish people they can use the training, OpenCL etc code from this project but we'll see what they end up doing.

That's gcp of LeelaZero project. "Sadly" he's busy with baduk. :)
Parent - - By MarshallArts (***) [us] Date 2017-12-15 23:11

> The Komodo team has agreed to work on it, as soon as finances permit.

Do you have an exact quote that shows that?

Parent - By glennsamuel32 (*) Date 2017-12-16 02:57 Edited 2017-12-16 03:22 Upvotes 1

> Do you have an exact quote that shows that?

"Larry and I often discuss Monte Carlo Tree Search, and are interested in trying this. We have also discussed uses for neural networks. Small nns could be useful in present PCs, but the massive nn used in AlphaGo Zero is currently beyond what we, and most chess engine users can afford."

"We listen, and try to add what we think people want. But we do not have endless resources. We can afford to buy roughly 1 new server each year."

If they can manage a server, they could manage to get a Titan V with 640 Tensor cores :grin:
- - By gsgs (***) [de] Date 2018-01-09 01:08
those who should know best - the researchers who write the papers -
won't tell us what they really think, how promising this is.
This is instead decided by the investors , who pay them for doing this [or that] research;
So the researchers are biased, they have an interest to make the
research look promising to the investors.

the system sucks
Since then major venture capital firms Horizons Ventures and Founders Fund
have invested in the company,[19] as well as entrepreneurs Scott Banister[20]
and Elon Musk.[21] Jaan Tallinn was an early investor and an adviser to the
company.[22] The sale to Google took place after Facebook reportedly ended
negotiations with DeepMind Technologies in 2013.[23]
Parent - - By Venator (Silver) [nl] Date 2018-01-09 16:32
So the researchers are biased, they have an interest to make the research look promising to the investors.

They have proved with Go what this research is capable of.

Games like chess and go are just side shows, which are of minor interest to investors only. Investors are interested in the question "which possibilities does the neural network/learning method have to earn money".
Parent - By Chris Whittington (**) [fr] Date 2018-01-09 19:33
Chess and more recently Go were/are the propaganda poster childs for purposes encourage investment out of the monied classes. Expectations raised by chess in the past never led anywhere, indeed there was the famous AI winter when investments just dried up and the AI researchers found themselves unemployed. This time it is supposed to be different because the Deep Learning has industrial grade applications. But, it may have been overhyped. For example
Parent - By CSullivan (**) [us] Date 2018-01-10 21:31
As a side note, Fidelity Investments launched "Disciplined Equity Fund" in the early 1990's -- it made a big splash because it used an investing approach called "quantitative analysis" which was a neural network.  I was a fund investor for a few years, but it never really did better than an index fund.  It is still in existence and still slightly trails the S&P 500.  I don't know if they still use a neural network, but I bet a lot of investment managers are now paying attention to that approach.
- By gsgs (***) [de] Date 2018-01-11 00:29
I'd like to see such a paper by the DeepMind experts.
I'd like to see others contributing and discussing ideas
with them in an open forum like this one.
Silver,Hassabis etc. at what price are they buying or selling DeepMind
virtual stocks ?
The Stockfish project has shown that open mind and multiple
people cooperating has advantages over few people's secret
research however deep their mind.
Up Topic The Rybka Lounge / Computer Chess / AlphaZero beats Stockfish 8 by 64-36
1 2 3 4 5 6 7 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill