Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / News from AlphaZero
Parent - - By rocket (****) Date 2018-12-08 05:28 Edited 2018-12-08 05:35
Isn't Alpha zero a version of the Monte Carlo? Just a statistical database of self play (during the game)? It doesn't have an evaluation function, right?

I don't concider that a chess player of any sorts and not fair to compare with those that actually do have evaluation functions.

It's random self-play statistics algorithm.
Parent - - By user923005 (****) Date 2018-12-08 06:17
Kind of like none of that. 
It does not store information about positions it has played, though it does learn what patterns on the board are beneficial.
It is all evaluation function (very shallow searches yet still super strong)

Read the papers about it.  It is a really interesting approach.
Parent - - By Vegan (****) Date 2018-12-08 07:22
i did some work with markov chains for databases and chess could use that approach

lots of database ideas over hashes
Parent - - By user923005 (****) Date 2018-12-08 07:35
I have a giant database of chess positions, but that will never be more than a glorified opening book.
As soon as an unusual move happens, you fall out of book.

At any rate, the information collected by AlphaOne and Leela is not about individual positions but about factors that make a position strong or weak.
The amazing thing is that the latest approaches only need the rules of the game to begin learning.
It is a totally new approach.

You can say that it is similar to what we know about how humans play chess.
Which is to say, we do not even know what the constants for evaluation written out by the learning engine even mean.
It figures them out itself.  When you load them into memory, it plays good chess.

Similarly we know that people use patter matching to play chess and also envision future board events.  But we really can't say exactly how it works.
Parent - - By Vegan (****) Date 2018-12-12 00:57
With larger capacity storage and compression a few billion positions for an opening book is not so far fetched anymore.
Parent - - By user923005 (****) Date 2018-12-12 01:08
Billions is not enough.  Since the branching factor of chess is 36, by 6 plies you are looking at 2,176,782,336 positions (3 moves).
To be really useful for game playing, it is probably trillions that are needed.  However, technology might soon make this feasible.
A chess position can be encoded in about 184 bits or 23 bytes.  With the move to make (encoded in 1 byte) and the score (2 bytes) gives 25 bytes per position.
Hence, storage is not a problem.  It is the time needed to compute all those positions to a meaningful depth that makes that approach difficult.

Storing information about opening positions has been tried and it is (in fact) quite useful.  But it has literally nothing to do with the LC0 or Alpha0 approach.
Storing information about opening positions obviously has no value once you fall out of the book you have created through this method.

These new approaches examine zillions of chess boards and learn what makes a position good or bad.  This information can be applied to any arbitrary chess board.
It is radically different than all previous approaches that I have heard about.
That includes td-lamda and td-leaf.
Parent - By h.g.muller (****) Date 2018-12-12 13:13
Well, what you describe is not that different from tuning of a conventonal evaluation function through the 'Texel method'. The difference is that the function to be tuned is vastly more complex: instead of some 100 hand-picked evaluation terms it has millions of randomly generated terms, more or less covering every conceivable pattern (most of those of course entirely useless).

I think the main innovation is that AlphaZero also has an independent elaborate evaluation function for moves, to decide how much search they deserve. (In a conventional engine we would say: how much to reduce each move.) Conventional engines do not get much beyond testing whether the move is a capture or non-capture, or a check. And sometimes how high a history score it has.
Parent - - By rocket (****) Date 2018-12-08 09:22
It says in the article that it uses a Monte Carlo system, which is exactly what I wrote. Moves selected based on self-play statistics for each position. It's just one giant database of statistics for each position. It's not artificial intelligence at all.
Parent - By Sesse (****) Date 2018-12-09 22:32 Upvotes 2
There are many different techniques that are dubbed “Monte Carlo” (in the most general form, MC only means a random process run many times). AlphaZero is not merely a database of statistics, any more than Stockfish is.
Parent - - By Lazy Frank (****) Date 2018-12-09 09:21
Mainly, yes, Alpha Zero is just playing engine on good hardware and stored statistical good positions, far away from AI.
Parent - - By gsgs (***) Date 2018-12-11 04:52
it approximates chessposition outcome-probabilities by a function
with some million to-be-optimized parameters ("weights")
Parent - - By Lazy Frank (****) Date 2018-12-11 05:00
Doesn't matter how they call technique.
If you look somewhere for help (outside) it isn't your knowledge.
Parent - - By h.g.muller (****) Date 2018-12-11 12:59 Upvotes 1
You really have no clue at all how AlphaZero works, have you?
Parent - - By gsgs (***) Date 2018-12-11 13:09
reminds me, how Albert Silver agreed that it's a "black box" in the
TCEC-interview with Cato
"Nobody really knows, how it works."

Demis Hassabis (or Matthew Lai or such ?)  also announced recently,
that they are trying to figure it out now (how it works)

maybe I can find the link ...
Parent - - By h.g.muller (****) Date 2018-12-11 17:48 Upvotes 1
'How it works' is a relative notion. People know very well how the neural net works. There are layers of cells, and weights, and non-linear operations on the sum of the cell inputs... Just like we know how the human brain works, with neurons, axons, synapses and neuro-transmitters. On a higher level of organization, however, people have no clue why the weights must be what they are, or what patters they conspire to recognize for recommending certain moves over other moves. Reverse engineering the learned info in a neural network is much harder than for a program on a conventional  stored-program digital computer.
Parent - - By gsgs (***) Date 2018-12-13 04:24
she must learn to understand what we tell her.
E.g. that in Queen Endings the danger of perpetual checks is high
Or that shuffling, repitition, no-prograss also increases the probability of draw
Or that in endings with opposite color bishops one pawn more is rarely decisive
Or that rook+3 vs. rook+2 on one side is draw.
Or to compute eval-penalties for double pawns, isolated pawns, backward pawns

With normal engines you knew how to do that, how to improve such weaknesses,
when spotted.
With NN you can just only play more training games.
Well, maybe training games started from special designed positions
this hasn't been tested yet.
Parent - - By h.g.muller (****) Date 2018-12-13 09:30
One of the benefits of having it learn by self play is that it will generate examples of the errors it has not yet learned to avoid, which then are excellent training material for learning to avoid it. The danger of supervised learning is that it would never learn the stuff that 'everyone knows', because these errors will not occur in the training set.

Recognizing things that you mention should be well within the capabilities of a NN as large as used by AlphaZero. It will take a lot of training, though, because it first must develop a reliable procedure to count total material on the board in the early layers, before it can start drawing conclusions from the material balance in the layers that follow.

I think a 'hybrid' approach could be much less resource-consuming than the current 'zero' approach. It has been used in Shogi with impressive results. In this approach you would use a hand-crafted pre-processor for the NN inputs, which efficiently and reliably provides information we know to be important for chess-like games. Like the number of attackers and protectors on each square, the material composition. That would relieve the NN of learning how to extract that information, so that you could probably do with a much smaller NN.
Parent - - By gsgs (***) Date 2018-12-13 10:26
I haven't thought about that ...but including Stockfish evals in the handcrafted preprocessor
should be useful.

Shouldn't the NN have figured out something like attackers and defenders
for each square by it's own training ? Aren't there parameters in the weights file
corresponding to something like that ?
Parent - - By h.g.muller (****) Date 2018-12-13 13:02
It should find it by its own training. The only type of weights are how much the output of one cell contributes to the input of another cell in the next layer. In the first (input) layer each cell corresponds to the presence of a piece of a certain type on a certain square. So the only weights that correspond to something tangible and predetermined are those between the input layer and the next. The weight then determines how important it is that there is (or was, a few moves ago) a piece of a certain type in a certain location for whatever pattern the cell in the next layer will learn to recognize.

The structure of the NN is very simple and completely general: each layer is an 8x8 chess board, with 256 cells in each square, where each of these cells gets input from the 9*256 cells in the previous layer for that same square and the 8 adjacent ones. There are 20 or 40 of such layers, and only the last (output) layer has just a few cells, (like a single one for the evaluation) which each are connected to every cell of the previous layer.

In the input layer each square has only as many cells as needed to specify the board position and a few recent predecessors, i.e. 12 cells for the pieces (one for each colored piece type) for the current position and each of the considered previous ones, 4 cells for castling rights, and a cell to specify the value of the 50-move counter. (And I believe there also is a cell to indicate if the position already is a repeat.)

There might be cells that correspond to number of attackers or defenders, somewhere deep in the NN, but we wouldn't know where they will arise or if they are really arise at all. (The information could be distributed over a group of cells.) They cannot be in early layers, because each layer can only look to adjacent squares, and in Chess attacks can come from a distance of 7 squares. So the network will first have to learn that it is important to propagate the 'influence' of sliders along the entire length of the rays through them. Which takes at least 7 layers.
Parent - - By gsgs (***) Date 2018-12-14 08:48
thanks for the short and instructive explanation.

> each layer is an 8x8 chess board, with 256 cells in each square,
> .. each of these cells gets input from the 9*256 cells in the previous layer
> [ for that same square and the 8 adjacent ones.]
> There are 20 or 40 of such layers,

So, can we view this as a (mathematical) directed edge-weighted graph with
327680=20*64*256 vertices (layers*squares*cells) and
(36*9+24*6+4*4)*256*256 edges from layer(n) to layer(n+1)
{(squares,neighbors)*source-cells*target-cells }

> and only the last (output) layer has just a few cells,
> (like a single one for the evaluation) which each are connected to every cell of the previous layer.

plus <100 vertices of the output layer and edges from the last of the 20(40) layers to the output layer

> In the input layer each square has only as many cells as needed to specify
> the board position and a few recent predecessors,
> i.e. 12 cells for the pieces (one for each colored piece type) for the current
> position and each of the considered previous ones, 4 cells for castling rights,
> and a cell to specify the value of the 50-move counter.
> (And I believe there also is a cell to indicate if the position already is a repeat.)

plus 6192=(64*12+4+1+1)*8 vertices for the input layer , assuming the positions for the last 8 halfmoves

We could probably post in a few lines the actual code to set up the graph and
assign 1 to each of the edges from the 330000x330000~110M edge-array


it is nonintuitive to take these neighbors only from one layer to the next
when some pieces can move long distances

I would take the 64 cells times the 13 possible pieces on it and connect 2 cells when
they can be part of positions before and after a move


the optimal structure of that graph could maybe itself be improved by some learning/trainig process
Parent - - By gsgs (***) Date 2018-12-14 09:29
it seems to me that a0, lc0 is only the beginning and that training
would be 10,100 times faster, more efficient in some years or decades.
Do you agree ?
Parent - - By h.g.muller (****) Date 2018-12-14 12:26 Edited 2018-12-14 12:37
Agreed; the network they use now seems very ill-suited for Chess. Which of course makes it all the more amazing that AlphaZero works so well. Apparently the built-in inefficiency could be overcome by throwing massive resources at it.

I brought up the point about 8x1 views to be at least as important as 3x3 views with the developers of Leela earlier this year, but it was firmly rejected, based on the argument that there are zillions of alternative ways to organize the NN, so that it is best to just 'pick one and stick with it' than first do a pilot study of what seems the most promising and then use that. It also seems that applying 3x3 'convolutional filters' (called this way because all squares use the same set of weights for inputs of corresponding cells) can be done somewhat more efficiently than applying 8x1 filters, despite the fact that the latter have one less input.

Beware that the description I gave of the NN is not entirely precise; I vaguely recall that there were also some 'skip connections', meaning that a layer doesn't only get input from the previous layer, but also from the one before that. If you want to know the exact details, you should really read the Alpha Go Zere paper. What I wrote gives a good impression of the kind of NN they use.

And you are right; the code to 'run' the NN would be very short and simple (but very repetitive) code. Training it would be somewhat more complex, though. But even that is relatively straightforward.
Parent - By gsgs (***) Date 2018-12-18 16:01
can I use a Lc0-Stockfish hybrid , which at each move calls Lc0 on GPU
and then stockfish.exe multipv for x% of the Lc0 time and returns the Lc0-move
unless stockfish says blunder, in which case it returns the best stockfish move.
And in the ending it just plays stockfish.
So, basically a UCI-program that calls the 2 other executables , does some simple
comparisons and calculations and returns the bestmove

I would expect that such programs are very common already ?!?
Parent - - By Labyrinth (*****) Date 2018-12-13 22:29
I would think training on Fischer random would help, should encounter a lot more variety, but the idea doesn't seem to be that popular.
Parent - - By gsgs (***) Date 2018-12-14 09:30
why do you want variety, when most of those positions rarely -if ever- occur
I would think that training on positions from games of to-be-expected opponents
should help.
I would think that training on tactical positions should help
(if you don't want to make a NN-normal hybrid, which looks easier)

Someone should keep a database of positions where Lc0 made mistakes
and where it could be trained on
Parent - - By Labyrinth (*****) Date 2018-12-14 11:10 Upvotes 1

>why do you want variety, when most of those positions rarely -if ever- occur

Well it's not about individual positions, it's about 'understanding' the game. There is only so much variety from the starting position, and when you play strongly you inevitably cut out a great deal of the possible positions. Blind spots form that require 'inferior' play to reach. The way the AI will 'perceive' a stronger opponent is an opponent that appears to play inferior moves that mysteriously work quite well. Having greater variety from the starting position would help patch up these spots.
Parent - By billyraybar (***) Date 2019-01-04 14:08
I agree.  This is ingenious.  Uncovering a single serendipitous blind spot could have a profound impact on play.
Parent - - By Venator (Silver) Date 2018-12-16 07:16
After looking at a lot of LC0 games, there seem to be a few chess topics that are still difficult to detect by a NN engine. In my view AB engines are still superior when it comes to:

1. Detecting perpetual check (LC0 often takes a long time before realising this)
2. Evaluating drawn endings correctly (i.e. showing 0.00, where LC0 still thinks it is clearly better)
3. Tactics (f.e. LC0 made tactical errors vs SF in the TCEC 13 cup and in its loss in TCEC 14 division 3)

Of course both AB and NN engines also have problems in detecting a fortress correctly, which is a rather difficult problem to solve, because there are no clear patterns how to detect one and it takes the 50 move rule to understand that no progress can be made. A similar issue is 'blocked pawn structures', although I only saw one of them in the AlphaZero game samples. It is likely (although I am not 100% sure) that avoiding pawn exchanges increases the chances of winning, but this has a nasty side effect that engines like to completely block the pawn structure, from time to time turning a promising position into a draw. Humans are still better at detecting this phenomenon.

Do you think that more training can solve these issues?

It is a pity that DeepMind only gave the game scores of their training matches with Stockfish and no additional information. It would be interesting to know AlphaZero's 'evaluation' at every move.
Parent - - By gsgs (***) Date 2018-12-16 14:14 Edited 2018-12-16 14:17
for the fortess / no-progress problem  and maybe also for the perpetual/ending problems
it should help to replace the 50-move rule by a 10 or 15 move rule
and 1 repitition = draw.  [except tablebase]

That's a new game, easy to change , and should gain Elo for normal chess [?]
Useful for analysis, correspondence chess

is there any engine available already which can do that ?
Parent - By Lazy Frank (****) Date 2018-12-16 17:54
For the fortress positions humans use exclusion method, e.g. look only for:

1) can i force opponent to exact play? No ... Sad.
2) remaining pawn levers.
3) sacrifice piece to destroy fortress (usually pawn chain) or sacrifice piece for success invasion of other piece/s.
Parent - By h.g.muller (****) Date 2018-12-16 20:58
Actually many fortresses are rather easy to recognize, in a static evaluation. But of course doing so would bring no measurable Elo. So the real problem is to convince engine developers to implement something for that.
Parent - - By gsgs (***) Date 2018-12-11 13:04
what do you mean ?

who is "they"
my knowledge or their knowledge ?
what technique ?
you agree or disagree with my post ?

(replying tp Lazy Frank)
Parent - By Lazy Frank (****) Date 2018-12-11 19:24
I think really people stuck on theory they know how human brain work. :smile:
Up Topic The Rybka Lounge / Computer Chess / News from AlphaZero

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill