One thing to note, as a network trains, the effect on the finished network of the early learning is diminished, it's the later learning cycles that dominate. So, it doesn't really matter that early training was on dumb data, all that matters is that the early learning generated some progress to gradually improve on. And, obviously in this case A0, it did.
> But how is it organized? He must have implemented a complex algorithm.
It's called “artificial neural networks” or “deep learning”. Look it up; there are many good tutorials and explanations on it these days, as they are wildly popular in modern AI.
> Computers can't reason based on "similiar pattern" unless you specifically program: for example: pawn storms=dangerous, space advantage etc...
This used to be true twenty years ago, but it's no longer so.
P.s.: AZ was trained for 9 hours before the final 100 game match. After 4 hours of training it reached SF8 strength. Of course this doesn't change anything.
Does AZ know the game move number in half moves? Because in that case it would know who was black, that might make a difference.
But ok, your real point: does white have 100 ELO advantage? Answer: well, according to statistics, not if you're a materialist program (as they all are) incestuously playing each other in drawish regions of the tree. Or a human. If you're a Tal style neural net, then consign all the old statistics to the dustbin of history, so it could be.
> It only started to learn chess four hours beforehand.
But then got saturated.
I think it needs a larger (more layered? whatever the right term is) NN to benefit from more hours of training. It's doable, but probably needs a redesign of some kind.
Have you tested it against Thinker 5.2 C?
BTW, it doesn't run in the CB GUI, I cannot install it somehow. So I gave up :-).
I can install it in the Fritz GUI, but it doesn't run. I guess that's what you meant.
Nobody gave you a ready formalization/algorithm of walking when you were a baby, I believe. :D
" ... of the tuning came after we added the ability to tune tables. In the DT evaluation function are a number of tables, for example the pawn advancement gradient, or the King centrality table. Instead of just making up a linear gradient, we allowed the tuning code to change the values of these table and allow more complex gradients. This resulted in some strange 3D curves that upon inspection by Murray were found to contain some known chess heuristics, which were not originally part of DT's evaluation function (for example: the pawn advancement gradient became more pronounced near the center and tapered off towards the promotion squares). "
It isn't obvious to me that this is true, though. (I am not at all an expert on how mobility is implemented in an engine, so I may well be missing something.) Suppose you're a pawn up. Very often, this doesn't add much, if anything, to your mobility. But it's still really good to be a pawn up. So in these cases, mobility alone won't tell you how good your position is.
You might have the following objection: Pawns are important because they can become queens. And being a queen up really increases your mobility. If the engine is armed with a sufficiently good search, it will see that the pawn becomes a queen (or wins a piece by threatening to promote, or whatever).
My response: Sure, but if your search is good enough, you don't need any evaluation. Does that many that all evaluation terms are 'artificial constructs'?
Ok, I rephrase all the above, don't even answer it. When a strong player looks at this position, what does he say? You've probably heard the GM comment many time "what can black do?". Nothing. What can white do? Stuff. White wins. Looks like AlphaZero integrated evaluator sees this in static. Stockfish is still happy because material adds up in polynomial evaluation. Plus, added problem, the pruned search acts on material base.
Btw, what I assume you are classically referring to as material is really a form of mobility (square control and moveability) plus a stored up potential for what a piece might be expected to do in future (e.g. Rook in start of game has little immediate use value, but it ought to come into use later, so take care of it). Splitting the position into artificial components was useful, until the evaluator arrived using the concept everything depends on everything. The material positional separation disproved.
Given that we don't get to see AlphaZero's evaluations during the game, we can't even say for sure what it thought at this point. Did it think it was slightly better? Massively better? Note that after 49. Rf6, even Stockfish doesn't seem to think it's ahead (despite being the exchange and two pawns up), merely equal.
(PS: Stockfish' default value for the queen with no adjustments for imbalance etc. is 10, not 9.)
So, what did the rollouts tell it? many, many losing lines for black. NN likes that, but we can't tell by how much. Do we know what the value function gets trained on? rollout win rate? or some combination of win rate and the mobility rnge and probability of opponent replies (eg seemingly forced singularities?). I don't know, only have imagination until there's more paper. Meanwhile I've been busy writing a simple game, MCTS and NN, which has been a learning curve, since I basically haven't programmed since about 2005 or so. In order to understand better. I guess we all got "excited" back into software action.
It is more than just two pawns can move to twice as many squares as one pawn can, which would be the mobility you're talking about. Rather you have the option to move either pawn that is important here.
This is similar in poker. If you increase the number of bet sizes a player can make they will increase their EV.
Or is the search route simply indeterministic, like multiprocessor engines?
Yes, the search route is indeterministic, according to its feelings about how much it wants to explore each of possible options.
they are strangely absent from the discussion so far
> I don't really think I have the insight or knowledge to have anything particularly
> interesting to contribute to the discussion, ...
I remember , Mark Lefler , probably in TCEC-chat, some weeks ago being afraid he might lose
his "job" as chess programmer due to the alpha movement.
At that time I still did consider that unlikely
I'm still wrapping my head around all this stuff, which is not so easy because there are many gaps in the paper. But ....
The learning process does build an enormous tree, game by game by game. And it stores every move in every game in a massive linked list. Basically this is the tree. Each node in the tree represents a position and stored for each node is the number of visits and the numbers of wins from that position. We'll forget draws for sake of simplicity.
The "wins" aren't what I think he imagines, they are the result of the game rollout, branching factor one wide, until end of game. These wins or losses or draws don't have the quality I think he is assuming. Ok I don't need to explain to you the difference between a one move wide play out and a minimax search.
Further degrading the play out result quality is that some games were played when AZ was young and random, some when just young and dumb and some when old and wise. But the tree node just contains a total, dumb and wise added together.
So, AZ uses this huge tree while learning and when learning is done and AZ turned into a play engine, the tree is not used. Unsurprisingly because it is full of crap. Random play outs. And, later, guided play outs, but still one move wide play outs. And no clue as to which is which.
Well, I say the tree is not used, but it is used in the sense that the neural net has generalized from the moves in the tree.
He gives an example of a node: w 283 L 264 d 191. The NN gets trained every time that node gets a visit. First visit let's say w1 l0 d0, and the NN weights get a microscopic nudge to bring its actual output microscopically towards the desired output (e.g. This position is a win). Next visit is maybe a loss, w1 l1 d0, so now the NN is given another microscopic training nudge to bring actual output to 0.5 and so on and so on until it gets microscopically nudged towards the result 283/264/181 whatever that works out to.
Now, there's no good reason to assume the NN will actually now give the 283/264/181 probability. What it has done however is to have generalized. Is no longer a dumb lookup tree which falls over if one pawn is another place. Because it learnt as the tree grew, it has the benefit that smarter later games tend to improve and overwrite the knowledge extracted from earlier dumber ones.
I guess the guy you referred to thinks AZ contains the tree data. It doesn't.
AZ imprisons the queen with 49.Rf6 -- and SF sacs it's queen out of sheer desperation at 56...Qxf6 !!
What a unique search algorithm AZ has...
1. Nf3 Nf6 2. c4 b6 3. d4 e6 4. g3 Ba6 5. Qc2 c5 6. d5 exd5 7. cxd5 Bb7 8. Bg2 Nxd5 9. 0-0 Nc6 10.
Rd1 Be7 11. Qf5 Nf6 12. e4 g6 13. Qf4 0-0 14. e5 Nh5 15. Qg4 Re8 16. Nc3 Qb8 17. Nd5 Bf8 18.
Bf4 Qc8 19. h3 Ne7 20. Ne3 Bc6 21. Rd6 Ng7 22. Rf6 Qb7 23. Bh6 Nd5 24. Nxd5 Bxd5 25. Rd1 Ne6
26. Bxf8 Rxf8 27. Qh4 Bc6 28. Qh6 Rae8 29. Rd6 Bxf3 30. Bxf3 Qa6 31. h4 Qa5 32. Rd1 c4 33. Rd5
Qe1+ 34. Kg2 c3 35. bxc3 Qxc3 36. h5 Re7 37. Bd1 Qe1 38. Bb3 Rd8 39. Rf3 Qe4 40. Qd2 Qg4 41.
Bd1 Qe4 42. h6 Nc7 43. Rd6 Ne6 44. Bb3 Qxe5 45. Rd5 Qh8 46. Qb4 Nc5 47. Rxc5 bxc5 48. Qh4
Rde8 49. Rf6 Rf8 50. Qf4 a5 51. g4 d5 52. Bxd5 Rd7 53. Bc4 a4 54. g5 a3 55. Qf3 Rc7 56. Qxa3 Qxf6
57. gxf6 Rfc8 58. Qd3 Rf8 59. Qd6 Rfc8 60. a4 1-0
Mobility and position is valued...material evaluation hardly matters.
Gives up pawns and rooks alike.
Yeah, AZ has buried the chess engines for good !!
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill