Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / AlphaZero beats Stockfish 8 by 64-36
1 2 3 4 5 6 7 Previous Next  
Parent - - By Chris Whittington (**) [fr] Date 2017-12-08 13:55 Upvotes 1
Yes, correct, first game is random moves, most of the earlier games will seem very random. Do you remember learning to walk? You got there in the end right? With each blundering step, step by step, your motor neurons and sensory neurons gradually organize them selves.
One thing to note, as a network trains, the effect on the finished network of the early learning is diminished, it's the later learning cycles that dominate. So, it doesn't really matter that early training was on dumb data, all that matters is that the early learning generated some progress to gradually improve on. And, obviously in this case A0, it did.
Parent - - By rocket (****) [se] Date 2017-12-08 16:03
But how is it organized? He must have implemented a complex algorithm. Playing millions of chessgames does you no good if you don't "take notes". And it doesn't help guide you how to play if the positions aren't identical. Computers can't reason based on "similiar pattern" unless you specifically program: for example: pawn storms=dangerous, space advantage etc...
Parent - - By Sesse (****) [no] Date 2017-12-08 16:08 Upvotes 1

> But how is it organized? He must have implemented a complex algorithm.

It's called “artificial neural networks” or “deep learning”. Look it up; there are many good tutorials and explanations on it these days, as they are wildly popular in modern AI.

> Computers can't reason based on "similiar pattern" unless you specifically program: for example: pawn storms=dangerous, space advantage etc...

This used to be true twenty years ago, but it's no longer so.
Parent - - By rocket (****) [se] Date 2017-12-08 16:15
But why did it only score 3 wins as black out of 50? I don't believe Stockfish 8 would score that as white against an engine 2 years from now.
Parent - By zaarcis (***) [lv] Date 2017-12-08 16:20
Probably wasn't trained enough (and the first move advantage still was enough for SF to almost always keep draw).
Parent - - By Chris Whittington (**) [fr] Date 2017-12-08 16:21
Give it a chance! It only started to learn chess four hours beforehand.
Parent - - By Thomas A. Anderson (*) Date 2017-12-17 02:18 Edited 2017-12-18 07:17
Why we see this discrepancy between the white and black score is indeed one of the most interesting outcomes of the match. Might be interpreted in the way that playing white gives one a huge advantage (>100 ELO handicap?).
P.s.: AZ was trained for 9 hours before the final 100 game match. After 4 hours of training it reached SF8 strength. Of course this doesn't change anything.
Parent - By Chris Whittington (**) [fr] Date 2017-12-17 10:32
Self play games would be revealing on your question. At the moment we can only speculate. It may be that AZ-AZ games find the first move to be advantageous, hence AZ playing as black will perhaps avoid the chaos situations it likes as white (to some extent, it won't even find advantageous black chaos because the training MCTS tells it that loses). AZ against a weaker opponent making "mistakes" would obviously race off into attack mode again; so it is credit to Stockfish to be mostly holding ok when playing white.
Does AZ know the game move number in half moves? Because in that case it would know who was black, that might make a difference.

But ok, your real point: does white have 100 ELO advantage? Answer: well, according to statistics, not if you're a materialist program (as they all are) incestuously playing each other in drawish regions of the tree. Or a human. If you're a Tal style neural net, then consign all the old statistics to the dustbin of history, so it could be.
Parent - - By Carl Bicknell (*****) [gb] Date 2017-12-17 08:58

> It only started to learn chess four hours beforehand.

But then got saturated.

I think it needs a larger (more layered? whatever the right term is) NN to benefit from more hours of training. It's doable, but probably needs a redesign of some kind.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-17 10:20
I don't think we know if it got saturated, the paper is silent, save for the graph of ELO progress which, on the scaling used, appears to have either flattened out or climbing at a slow rate. But, it can happen, and often does, that a neural net can flatline during training and then make a step improvement when the latest weight jigglings suddenly come together and find something. You can see steps in the ELO graphs in the paper, Go, Shogi, Chess, for example. So we don't know what would happen if this particular version of AZ would do if trained for longer. We can posit, however, that tinkering with the network size, giving it some domain knowledge bla-bla and starting training all over again would quite probably produce even stronger versions.
Parent - - By Venator (Silver) [nl] Date 2017-12-18 19:15 Upvotes 1
You might enjoy this one! Here is OpenTal 1.0, a speculative engine that sacrifices material like Misha Tal:
Parent - - By InspectorGadget (*****) [za] Date 2017-12-20 11:29
Hi Jeroen.

Have you tested it against Thinker 5.2 C? :lol:
Parent - - By Venator (Silver) [nl] Date 2017-12-20 13:09
No, I don't test engines.

BTW, it doesn't run in the CB GUI, I cannot install it somehow. So I gave up :-).
Parent - By InspectorGadget (*****) [za] Date 2017-12-21 07:34
I managed to install it in the Shredder GUI by fluke. When I tried to install it to the other machine, I couldn't. I will try again either tonight or tomorrow night :(

I can install it in the Fritz GUI, but it doesn't run. I guess that's what you meant.
Parent - By zaarcis (***) [lv] Date 2017-12-08 16:10
That's called learning.
Nobody gave you a ready formalization/algorithm of walking when you were a baby, I believe. :D
Parent - By Chris Whittington (**) [fr] Date 2017-12-08 16:19
same way as you would recognise a "happy face" of someone you had never seen before, as a "happy face". You learnt to extract the salient details despite all the component bits being a bit different and arranged a bit different. How? Well, your brain is a network of neurons, it learnt. Neural net software models that idea.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-11 23:49
do you mean this?

" ... of the tuning came after we added the ability to tune tables. In the DT evaluation function are a number of tables, for example the pawn advancement gradient, or the King centrality table. Instead of just making up a linear gradient, we allowed the tuning code to change the values of these table and allow more complex gradients. This resulted in some strange 3D curves that upon inspection by Murray were found to contain some known chess heuristics, which were not originally part of DT's evaluation function (for example: the pawn advancement gradient became more pronounced near the center and tapered off towards the promotion squares). "
Parent - By Sesse (****) [se] Date 2017-12-12 11:15
Parent - - By Chris Whittington (**) [fr] Date 2017-12-08 12:26
No, there won't be a material table somewhere deep down in the network. Material is an entirely artificial construct, taught to children and used by old paradigm programmers. You may as well argue there's a phlogiston table deep down in the network. Material doesn't exist.
Parent - - By Sesse (****) [se] Date 2017-12-08 12:37
Well, let me put it another way: You can certainly embed a material table into a neural network if you wish.
Parent - By Chris Whittington (**) [fr] Date 2017-12-08 13:36 Upvotes 1
Well, yes, one could. But, based on what the A0 paper tells us, the neural net was trained on game result, not material. And since material is an artificial construct, used to teach absolute beginners, a blunt, inaccurate, non constant term; there's no reason an integrated evaluation would even use it as a "building block" at the near input network layer levels. Anyway, from games see so far, A0 encodes some sort of dynamic mobility, it's a romantic neural net, in effect.
Parent - - By Kappatoo (*****) [de] Date 2017-12-08 18:48
Material is clearly defined and - unlike phlogiston - it has instances. In this sense, it's real. I guess what you're suggesting is that talk of material is dispensable and that we can cash out everything regarding material in terms of mobility.
It isn't obvious to me that this is true, though. (I am not at all an expert on how mobility is implemented in an engine, so I may well be missing something.) Suppose you're a pawn up. Very often, this doesn't add much, if anything, to your mobility. But it's still really good to be a pawn up. So in these cases, mobility alone won't tell you how good your position is.
You might have the following objection: Pawns are important because they can become queens. And being a queen up really increases your mobility. If the engine is armed with a sufficiently good search, it will see that the pawn becomes a queen (or wins a piece by threatening to promote, or whatever).
My response: Sure, but if your search is good enough, you don't need any evaluation. Does that many that all evaluation terms are 'artificial constructs'?
Parent - - By Chris Whittington (**) [fr] Date 2017-12-08 22:48
Does Stockfish Q on h8 in game 3 exist?  Stockfish thinks it does and awards it nine pawn material points. Does this queen have any mobility? Stockfish sees none and might penalize it a little. You look at it, and the rook f6 and realize it's a dead queen.  What do you think A0 evaluation thinks? Do you think A0 needs search to see this, or is it seen by the neural net?

Ok, I rephrase all the above, don't even answer it. When a strong player looks at this position, what does he say? You've probably heard the GM comment many time "what can black do?". Nothing. What can white do? Stuff. White wins. Looks like AlphaZero integrated  evaluator sees this in static. Stockfish is still happy because material adds up in polynomial evaluation. Plus, added problem, the pruned search acts on material base.

Btw, what I assume you are classically referring to as material is really a form of mobility (square control and moveability) plus a stored up potential for what a piece might be expected to do in future (e.g. Rook in start of game has little immediate use value, but it ought to come into use later, so take care of it). Splitting the position into artificial components was useful, until the evaluator arrived using the concept everything depends on everything. The material positional separation disproved.
Parent - - By gsgs (***) [de] Date 2017-12-09 01:19
I cannot imagine it playing without some sort of material-comparisons
Parent - By Chris Whittington (**) [fr] Date 2017-12-09 10:40 Upvotes 1
that's neural nets for you, we can't imagine how or what they work out. so, we try to parallelise to our human way of thinking about the problem, or to our old paradigm method of writing a chess program evaluation, and "we can't imagine how it plays without material comparisons". since there is actually no such thing as "material", pieces of solid wood notwithstanding, I'm suggesting we junk the "material" concept and start seeing chess in another way - if you are strong player then it ought not to be too difficult to see positions in terms of "what is to be done"? if the white answer to the question is positive and the black answer is zero then white wins. Perhaps this can be seen as an Energy concept, "do my pieces and position have energy, or not?". Whatever, AlphaZero tells us we need a new integrated concept with which to see chess; polynomial addition is proven to be comparitavely dead.
Parent - - By Sesse (****) [gb] Date 2017-12-21 08:39
Well, certainly the queen exists; and it definitely has a nonzero value, since it can be exchanged for the white rook. But it's not a good piece by any means.

Given that we don't get to see AlphaZero's evaluations during the game, we can't even say for sure what it thought at this point. Did it think it was slightly better? Massively better? Note that after 49. Rf6, even Stockfish doesn't seem to think it's ahead (despite being the exchange and two pawns up), merely equal.

(PS: Stockfish' default value for the queen with no adjustments for imbalance etc. is 10, not 9.)
Parent - - By Chris Whittington (**) [fr] Date 2017-12-21 10:20
well, I suppose the neural net "knows" (in generalised terms) what rollouts from these sorts of generalised positions returned when it was being trained. I'm stating the obvious, but that's what it knows. and I guess we are referring to NN lookup evaluation, not the AZ search+NN evaluation game player.

So, what did the rollouts tell it? many, many losing lines for black. NN likes that, but we can't tell by how much. Do we know what the value function gets trained on? rollout win rate? or some combination of win rate and the mobility rnge and probability of opponent replies (eg seemingly forced singularities?). I don't know, only have imagination until there's more paper. Meanwhile I've been busy writing a simple game, MCTS and NN, which has been a learning curve, since I basically haven't programmed since about 2005 or so. In order to understand better. I guess we all got "excited" back into software action.
Parent - - By Sesse (****) [gb] Date 2017-12-21 14:01
The value network gets trained on a combination of two things: Matching game result (win/loss/draw), and matching the result of the MCTS search. Especially the latter is quite interesting, since it means the evaluation is trained to predict its own outcome when search is added.
Parent - By Chris Whittington (**) [fr] Date 2017-12-21 18:29
I might have misread the paper, assuming the value was trained on win/rate, but I read you saying it is trained against game result,  zero, half or one. On rereading the relevent bit of the paper, you seem to be correct.
Parent - By Christian Packi (****) [de] Date 2017-12-12 21:45
Speaking from a game theoretical perspective the right terminology would be strategic options. If you have more strategic options your expected value increases.

It is more than just two pawns can move to twice as many squares as one pawn can, which would be the mobility you're talking about. Rather you have the option to move either pawn that is important here.

This is similar in poker. If you increase the number of bet sizes a player can make they will increase their EV.
- - By rocket (****) [se] Date 2017-12-08 18:35
Can somebody clear up if the google computer was more powerful or not? Stockfish had more cores which the numbers also reflected, but was alphazeros computer still faster?
Parent - By billyraybar (***) [us] Date 2017-12-08 20:03 Upvotes 1
The question is a sort of red herring.  The paper is a proof of concept for A..I.  It’s also a cold reminder of ‘no risk no reward’.
Parent - - By Sesse (****) [no] Date 2017-12-08 20:05
Why do you think Stockfish had more cores?
Parent - By rocket (****) [se] Date 2017-12-09 00:15
Okey, apparently it was the other way around. Apparently the node counts and other things are so astronomical for Stockfish that it doesn't matter that that alphazero had the equivalence of 400-2000 cores, depending on how you count it.
- - By rocket (****) [se] Date 2017-12-09 00:25
How could the engine vary between 1.e4, 1.d4 and 1.c4 against Stockfish? Isn't its preference already coded after the self-play stage? Can a game prompt it to dislike it's first move?

Or is the search route simply indeterministic, like multiprocessor engines?
Parent - By zaarcis (***) [lv] Date 2017-12-09 02:11
Err, why not? Do you always react in the same way to opponents move - like, if he played the same moves, you could play the same game all the 1000s times? :)

Yes, the search route is indeterministic, according to its feelings about how much it wants to explore each of possible options.
- - By gsgs (***) [de] Date 2017-12-09 01:25
... waiting for the opinions of the chessprogrammers

they are strangely absent from the discussion so far
Parent - - By zaarcis (***) [lv] Date 2017-12-09 02:06
Parent - By gsgs (***) [de] Date 2017-12-09 08:56 Edited 2017-12-09 08:59
TordRomstad :
> I don't really think I have the insight or knowledge to have anything particularly
> interesting to contribute to the discussion, ...


I remember , Mark Lefler , probably in TCEC-chat, some weeks ago being afraid he might lose
his "job" as chess programmer due to the alpha movement.

At that time I still did consider that unlikely
Parent - By leavenfish (***) [us] Date 2017-12-10 04:38
They have only been 'tweakers' of things...blindly groping for 'more'. Their days belong to the old paradigm. :cry:
Parent - - By Chris Whittington (**) [fr] Date 2017-12-11 10:00
the chess programmers lost their jobs to robots.
Parent - - By Labyrinth (*****) [us] Date 2017-12-11 11:50
I so wish they would keep going, like put in another 24 hours, or 48, or a week and see the Elo difference, and show us the games of course. I guess you can't reasonably ask them to use their $25 million equipment this way, but it would be sooo cool! I'd be very interested to see if the draw ratio starts to climb, and if so by how much. Would the Elo start to stagnate at some point? Such a fascinating space to explore.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-11 14:07
Draw rate and maximum ELO statistical guesswork was based on the materialistic tree subset. Old assumptions are no longer valid.
Parent - - By Labyrinth (*****) [us] Date 2017-12-11 23:53
I mean the draw rate in the games against itself as its playing strength increases. Same with Elo as a basic calculation can be done through wins/losses/draws.
Parent - - By Chris Whittington (**) [fr] Date 2017-12-12 00:24
ok, right, sorry. I thought you meant it in the "chess is a draw" way. I would guess two programs maximising on the same features will tend to produce more draws with depth than two maximising on different features.
Parent - - By Venator (Silver) [nl] Date 2017-12-18 17:01
Michael Sherwin gives an interesting view on the Alpha Zero victory:
Parent - By Chris Whittington (**) [fr] Date 2017-12-18 21:56
yes, well, he is locked in old thought patterns which are not helping him. Much confused thinking.

I'm still wrapping my head around all this stuff, which is not so easy because there are many gaps in the paper. But ....
The learning process does build an enormous tree, game by game by game. And it stores every move in every game in a massive linked list. Basically this is the tree. Each node in the tree represents a position and stored for each node is the number of visits and the numbers of wins from that position. We'll forget draws for sake of simplicity.

The "wins" aren't what I think he imagines, they are the result of the game rollout, branching factor one wide, until end of game. These wins or losses or draws don't have the quality I think he is assuming. Ok I don't need to explain to you the difference between a one move wide play out and a minimax search.
Further degrading the play out result quality is that some games were played when AZ was young and random, some when just young and dumb and some when old and wise. But the tree node just contains a total, dumb and wise added together.

So, AZ uses this huge tree while learning and when learning is done and AZ turned into a play engine, the tree is not used. Unsurprisingly because it is full of crap. Random play outs. And, later, guided play outs, but still one move wide play outs. And no clue as to which is which.

Well, I say the tree is not used, but it is used in the sense that the neural net has generalized from the moves in the tree.
He gives an example of a node: w 283 L 264 d 191. The NN gets trained every time that node gets a visit. First visit let's say w1 l0 d0, and the NN weights get a microscopic nudge to bring its actual output microscopically towards the desired output (e.g. This position is a win). Next visit is maybe a loss, w1 l1 d0, so now the NN is given another microscopic training nudge to bring actual output to 0.5 and so on and so on until it gets microscopically nudged towards the result 283/264/181 whatever that works out to.

Now, there's no good reason to assume the NN will actually now give the 283/264/181 probability. What it has done however is to have generalized. Is no longer a dumb lookup tree which falls over if one pawn is another place. Because it learnt as the tree grew, it has the benefit that smarter later games tend to improve and overwrite the knowledge extracted from earlier dumber ones.

I guess the guy you referred to thinks AZ contains the tree data. It doesn't.
- - By glennsamuel32 (*) Date 2017-12-11 04:09
I felt this was the most humbling experience for SF that I've ever witnessed :cry:
AZ imprisons the queen with 49.Rf6 -- and SF sacs it's queen out of sheer desperation at 56...Qxf6 !!
What a unique search algorithm AZ has...

1. Nf3 Nf6 2. c4 b6 3. d4 e6 4. g3 Ba6 5. Qc2 c5 6. d5 exd5 7. cxd5 Bb7 8. Bg2 Nxd5 9. 0-0 Nc6 10.
Rd1 Be7 11. Qf5 Nf6 12. e4 g6 13. Qf4 0-0 14. e5 Nh5 15. Qg4 Re8 16. Nc3 Qb8 17. Nd5 Bf8 18.
Bf4 Qc8 19. h3 Ne7 20. Ne3 Bc6 21. Rd6 Ng7 22. Rf6 Qb7 23. Bh6 Nd5 24. Nxd5 Bxd5 25. Rd1 Ne6
26. Bxf8 Rxf8 27. Qh4 Bc6 28. Qh6 Rae8 29. Rd6 Bxf3 30. Bxf3 Qa6 31. h4 Qa5 32. Rd1 c4 33. Rd5
Qe1+ 34. Kg2 c3 35. bxc3 Qxc3 36. h5 Re7 37. Bd1 Qe1 38. Bb3 Rd8 39. Rf3 Qe4 40. Qd2 Qg4 41.
Bd1 Qe4 42. h6 Nc7 43. Rd6 Ne6 44. Bb3 Qxe5 45. Rd5 Qh8 46. Qb4 Nc5 47. Rxc5 bxc5 48. Qh4
Rde8 49. Rf6 Rf8 50. Qf4 a5 51. g4 d5 52. Bxd5 Rd7 53. Bc4 a4 54. g5 a3 55. Qf3 Rc7 56. Qxa3 Qxf6
57. gxf6 Rfc8 58. Qd3 Rf8 59. Qd6 Rfc8 60. a4 1-0
Parent - - By Chris Whittington (**) [fr] Date 2017-12-11 10:26 Edited 2017-12-11 10:53
Magic game. A0 not based on a material evaluation, so it can "think the unthinkable". Very much a game of mobility/immobility evaluation. I liked 22. Rf6 very much. Looks unnatural at first sight, you just don't do that with rooks, it will get trapped and you lose the exchange. But. Fixes the long term target f7, and after black makes a battery Bc6, Qb7, the Rook back defends the Nf3 (back defence by sliders is a common human oversight blunder btw). Attacker doing double duty as a defender is a key theme. Then who would have possibly imagined that white would get from the king side attack position (say at 29.Rd6) to the zugswang/immobility crush at 54 g5 via Bd1b3 (long term pin target), sac pawn e5 (going to tie bQ to defence of g7) and then sac Rook for Knight (removing only useful black piece, zugswang coming up). After this game everything is different.
Parent - - By glennsamuel32 (*) Date 2017-12-11 17:54
I completely agree with your observations.
Mobility and position is valued...material evaluation hardly matters.
Gives up pawns and rooks alike.
Yeah, AZ has buried the chess engines for good !!
Up Topic The Rybka Lounge / Computer Chess / AlphaZero beats Stockfish 8 by 64-36
1 2 3 4 5 6 7 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill