Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / AZ vs SF - game 99
1 2 3 Previous Next  
Parent - By Rebel (****) Date 2017-12-23 21:15
Hi there BFL, I will look into it. But perhaps it's an idea Peter runs it as well, he has the better hardware and the better engines.
Parent - - By Peter Grayson (****) [gb] Date 2017-12-23 21:52
Komodo 11.2.2 preferred 21..Qc7 0.02/38

[Event "?"]
[Site "?"]
[Date "2017.12.01"]
[Round "99"]
[White "AlphaZero"]
[Black "Stockfish"]
[Result "1-0"]
[ECO "E17"]
[Annotator "Grayson,Peter"]
[SetUp "1"]
[FEN "rnb2r2/p3b2p/1ppq1p1k/6p1/Q6P/2P1B1P1/P4PB1/R3R1K1 b - - 0 21"]
[PlyCount "61"]
[EventDate "2017.??.??"]

{Komodo 11.2.2 64-bit:} 21... Qc7 22. hxg5+ fxg5 23. Qh4+ Kg6 24. Qh1 Bf5 25.
Bd4 h5 26. Be4 Qd6 27. Re3 Nd7 28. Rae1 Bxe4 29. Qxe4+ Rf5 30. Qxe7 Qxe7 31.
Rxe7 Nf6 32. R1e6 c5 33. Be3 Rd5 34. Kg2 Rad8 35. Rxa7 R5d6 36. Rxd6 Rxd6 37.
Ra6 Ne4 38. a4 Kf5 39. c4 h4 40. g4+ Kxg4 41. f3+ Kf5 42. a5 Nf6 43. Rxb6 Rd3
44. Bxc5 g4 45. fxg4+ Nxg4 46. Rb1 Rd2+ 47. Kh1 Rh2+ 48. Kg1 Ra2 49. Rf1+ Ke6
50. Bb6 Ne5 51. Rf4 h3 {[%eval 2,38]} 1-0

Komodo 11.2.2 also preferred 27..Bxe4 0.11/40

[Event "?"]
[Site "?"]
[Date "2017.12.01"]
[Round "99"]
[White "AlphaZero"]
[Black "Stockfish"]
[Result "1-0"]
[ECO "E17"]
[Annotator "Grayson,Peter"]
[SetUp "1"]
[FEN "rn3r2/p3b1kp/2p5/1p3bp1/4B3/q1P1B1P1/P4P2/3RR1KQ b - - 0 27"]
[PlyCount "63"]
[EventDate "2017.??.??"]

{Komodo 11.2.2 64-bit:} 27... Bxe4 28. Qxe4 Kg8 29. Qe6+ Rf7 30. Qg4 Na6 31.
Bxg5 Bxg5 32. Rd8+ Rxd8 33. Qxg5+ Kh8 34. Qxd8+ Rf8 35. Qd4+ Kg8 36. Re4 Qc1+
37. Kg2 h5 38. Re5 Nc7 39. Qd6 Ne8 40. Qe6+ Kg7 41. Qe7+ Rf7 42. Rg5+ Kh6 43.
Rxh5+ Kxh5 44. Qxf7+ Kh6 45. Qe6+ Kg7 46. Qxe8 Qxc3 47. Qe7+ Kg6 48. Qxa7 Qc4
49. Qd7 b4 50. Qd6+ Kf5 51. Qf8+ Kg6 52. Qe8+ Kf5 53. Qh5+ Ke6 54. Qh6+ Kf5 55.
Qh7+ Ke5 56. Qe7+ Kf5 57. Qe3 c5 58. Qd2 Kg6 {[%eval 11,40]} 1-0
Parent - - By Chris Whittington (**) [fr] Date 2017-12-23 22:18
So what? Is this a Stockfish take back move feature and try something else?
Firstly Stockfish played what it played, and lost
Secondly, what sense is there in asking another AB program to play as AZ
Parent - - By Peter Grayson (****) [gb] Date 2017-12-24 02:47

> So what? Is this a Stockfish take back move feature and try something else?


Where does "take back" feature in the discussion? It seems to me you are trying to subdue discussion by abrasive comment and ridicule. I am interested to find out where Stockfish went wrong and if Komodo and/or Houdini can provide additional information to help that understanding then it is worthwhile posting for further discussion. You also dismiss the fact that engines may produce more than one move of roughly equal score when on another pass it may choose a completely different move.

> Secondly, what sense is there in asking another AB program to play as AZ


If a current Alpha Beta engine(s) can reproduce Alpha Zero moves then what gain has Alpha Zero actually made over currently available software?

If you have nothing positive to add to the discussion please feel free to ignore it.
Parent - - By Chris Whittington (**) [gb] Date 2017-12-24 11:36
it only takes one move. or, rather, because there is no AZ move score, nor line, one only gets to see via the "one move".

you can carry on reading my posts, I have lots of insights and interesting things to say. entirely your choice. I'll read yours, maybe I see something of consequence, you never know, but there's an awful lot of wrong.

your comment about second move choice, position Re1 is not senseful. of course materialist AB programs select Ng4 as first move, it's the only way to hope to save the knight. of course, if you deny them Ng4, they then choose Re1 developing the Rook. in this case, these time controls, your comment of "moves of roughly equal score" is absurd.and leads to false conclusion.
Parent - By Peter Grayson (****) [gb] Date 2017-12-24 14:00

> you can carry on reading my posts


I probably will because I am aware of your chess background and I am interested in both sides of the argument, but an abrasive tone makes it far easier to ignore what you have to say.

>your comment about second move choice, position Re1 is not senseful ... your comment of "moves of roughly equal score" is absurd.and leads to false conclusion


The comment was based on my findings using the three top engines to analyse the position. Within the game time control criteria they all assessed Re1 as being in the range -0.3 to -0.5 adverse to Ng4 that explained why Re1 was not their first choice. Multiple move analysis also confirmed Re1 was their secondary choice.

19.Re1 does not look to be anything particularly special and of more interest perhaps is move 21..Bf5 that in analysis but within game time constraints, Stockfish 8 changes to 21..Kg7, Houdini 6.03 prefers 21..b5 and Komodo 11.2.2 choses 21..Qc7 with all engines scoring the position equal. Stockfish 8 looked to be already caught up in its 0.00 draw issue that has been commented on in the past.
Parent - - By Lazy Frank (***) [gb] Date 2017-12-24 16:09
So nothing! :lol: If you wanna glorify A0 go ahead! I would say you can't win without blunder ... Why SF blundered? Ask to DeepMind people, they know answer ... :roll:
Parent - By Lazy Frank (***) [gb] Date 2017-12-24 16:17
And sorry to say that ... I have feeling Deep Mind not provide all 100 games (and all SF loses) just because there SF loses (uncommon moves) was more visible!
Parent - - By Chris Whittington (**) [fr] Date 2017-12-23 22:33
After Rf7 there is no point in wasting time with Qg4 and allowing black to return the knight and disentangle, even though it seems everybody's analysis suggests this move. Qc8+ I think works, ideas are Qb7xa8 or if knight disentangles Rd7 and a 7th rank battery on the Be7. It's complicated, but AZ likes complicated
Parent - - By Banned for Life (Gold) Date 2017-12-23 23:54
As you alluded to above, SF8 isn't a good stand in for AZ, but after 30. Qc8+, SF8 sees 30... Bf8 as giving a very drawish position. But that could be because SF8 is so enthralled with having AZ grab the pawn on g5... :lol:
Parent - - By Chris Whittington (**) [gb] Date 2017-12-24 11:21 Edited 2017-12-24 11:58
I looked at Bf8 in detail a week ago. Forgotten what now. But I think my idea was to stop the black Q coming back into defense. i think the problem for white is his own bishop. On e3 it blocks the Re1 entry, on d4 blocks Rd1 and if on g5 it no longer defends f2 after Qxa2. iirc, I found some quiet way to get round the B problem and just increase the pin pressure. it's a "what can black do" position.

If you try lines from here, you find many many almost wins. AZ maximises on almost wins. AB opponent remains happy because AB refutes almost wins. but too many of them and AB just falls apart.Hence the several astonishing quiet moves in AZ play. Our comp chess  analysers (human ones) are always looking for tactical shots, always looking to do something. that's the level of their chess play in general, and this is kind of supported by using AB analysers. Hence they don't understand AZ, you need to go to IMs and GM level thought pattern to get it.

Rule: you can't understand AZ by using AB.
Parent - - By Banned for Life (Gold) Date 2017-12-24 17:54
Rule: you can't understand AZ by using AB.

I think this is very true for the reason you originally espoused, which was that AZ seems to be very comfortable working with positions where it is significantly behind in material, whereas any top AB engine will only go to approaches down in material after a deep search showing either a recovery of the material or a likely mate.

After Bf8, SF gives the 0.0 eval at high depths, but as we all know this really doesn't preclude anything. The main reason that I pointed this out was that Ed annotated in the OP: "19... Kxh6 is the game over already by taking the knight?", which clearly couldn't be the case if an easy draw (unconfirmed) was available many moves later without an intervening blunder by AZ...
Parent - - By Chris Whittington (**) [gb] Date 2017-12-24 18:12
Yup. And also all this taking back SF moves and trying something else. The "where did Stockfish go wrong" question. To be answered by AB analysis or playout on a different move.

The where go wrong question is a kind of nonsense. Stockfish didn't "go wrong", it got outplayed. There is a difference.

Why did Stockfish get eaten by the crocodile? Well, it got eaten by the crocodile because AZ took it into the swamp. How did AZ take it into the swamp? Well, because Stockfish can't tell the difference beween the swamp, the sandy desert full of snakes, the deep dark dangerous forest, or the peaceful, calm, flat grasslands. MCTS understands dangerous places and seeks them out. AB has no idea of regions, all AB desires is one single pathway. This may be beyond the comprehension of many people, but its the crux of the issue.
Parent - - By Banned for Life (Gold) Date 2017-12-24 22:00 Upvotes 2
There is another way to think about it that was discussed on this forum way back when Vas was a regular contributor regarding whether it was more important for an engine to make really great moves, or to not make really terrible moves. :lol: Of course everybody agreed that it was more important for Elo to not make terrible moves, and this was probably a contributing factor to AB based engines playing chess that is generally about as exciting as watching the grass grow.

Because AZ doesn't start off focusing on material, it is free to make really great moves that lose material without a clear path to regaining the material or mate. This leads to much more interesting games. Of course AZ also has to avoid blunders to play against top ranked AB engines.

The most exciting possibility with NN based engines is that if chess has different approaches to equally good result, we may end up with engines which are roughly equal in strength, yet play very different styles of chess. In any event, there is no reason to believe that AZ's performance is anywhere near what is possible with improved algorithms and hardware, and that's a good thing!
Parent - By MarshallArts (***) [us] Date 2017-12-25 00:13

> Of course everybody agreed that it was more important for Elo to not make terrible moves, and this was probably a contributing factor to AB based engines playing chess that is generally about as exciting as watching the grass grow.


Precisely what I've been thinking. Their gain, our loss. :yell:
Parent - By Carl Bicknell (*****) [gb] Date 2017-12-24 15:47

> I have been been analyzing this (beautiful) game with Komodo 9.42 and the latest SF version running the annotated moves for hours. None of them plays the AZ moves.


That's interesting. One of my biggest gripes was they didn't allow SF to grow its tree to a decent size as it could have done by spending 10 minutes on a critical opening position. But if what you're saying is true it may not have made too much difference if a longer and more flexible TC was used.
Parent - - By CSullivan (**) [us] Date 2017-12-24 19:56
Please note that Black has a possible draw as late as move 34.  (Perhaps White can avoid the exchanges that lead to the drawn position, but this is the way it looks to me right now):
rn2r1q1/p5k1/2p3p1/1p4p1/2PR4/6PQ/5PK1/7R b - - 0 34

34...a6 (and after an overnight search of 65 plies, Stockfish8 rates White as +2.65 and gives) 35.Rd3 Ra7 36.Rf3 Qh8 37.Qg4 Qxh1+ 38.Kxh1 Rd7 39.Qxg5 Rd1+ 40.Kg2 Nd7 41.Ra3 bxc4 42.Rxa6 Ne5 43.Qe3 Kf7 44.Ra7+ Rd7 45.Qf4+ Kg7 46.Rxd7+ Nxd7 47.Qxc4 c5 48.Qd5 Re7 49.f4 c4 50.Qd4+ Kh6 51.Qxc4
8/3nr3/6pk/8/2Q2P2/6P1/6K1/8 b - - 0 51

This is almost certainly a draw.  Consider the following position that might eventually be reached:
8/2kn4/5r2/5PK1/4Q3/8/8/8 w - - 0 66

This 6-man endgame position is a draw, according to the endgame tables.  Yet even with the appropriate Syzygy tables loaded, Stockfish8 always gives White at least a +2.00 advantage.  It seems that Stockfish could work to improve its evaluation of unbalanced endgame positions.
Parent - - By Chris Whittington (**) [gb] Date 2017-12-25 18:02
34. g4 might maintain the white pressure
Parent - - By Chris Whittington (**) [gb] Date 2017-12-26 11:42
haha!! I see some guy has now posted, several hours later, this g4 idea (generated by my head btw) into CCC asking for AB analysis. Looks like AB engines don't find g4, but it's the idea that holds the advantage. I think (without computer analysis). The response in CCC indicated g4 good.

unless anyone with AB engine can find a refutation line?
Parent - - By CSullivan (**) [us] Date 2017-12-26 23:41
Hi Chris,
I think I found the thread ("AlphaZero SF game 10 Does your engine fine 29Qh3?") you are talking about over at talkchess.com.  I see discussion about 35.g5 winning against 34...bxc4.  I also see that both F. Bluemers and Eelco de Groot assume (mistakenly, according to my analysis) that a quick analysis (only 42 seconds, I think) conducted by de Groot using Kaissa (24 0:42 +1.69 34...a6 35.Rd3 Ra7 36.Rf3 Qh8 37.Qg4 Qxh1+ 38.Kxh1 Rd7 39.Qxg5 Rf7 40.Rf4 Nd7 41.Rxf7+ Kxf7 42.Qf4+ Nf6 43.Qc7+ Re7 44.Qxc6 bxc4 45.Qxc4+ Re6 46.Kg2 Ke7 47.Qc7+ Nd7 (96.358.446)) was winning for White. 
I looked at your idea of 34...a6 35.g4 for a few hours, but I couldn't find anything other than a draw; one line is 35...Re7 36.Qh6+ Kf7 37.Qxg5 Nd7 38.Rf4+ Ke8 39.Rh6 Nf8 40.Qf6 Re6 41.Qa1 g5 42.Rxf8+ Kxf8 43.Rh8 Rd8 44.Rxg8+ Kxg8 45.Qxa6 bxc4 46.Qxc4 Rd5
6k1/8/2p1r3/3r2p1/2Q3P1/8/5PK1/8 w - - 0 47

and this looks to be a draw.
Charles Sullivan
Parent - By Chris Whittington (**) [gb] Date 2017-12-27 11:38
Hi Charles,

yes, thanks, g4 seems to nearly but not quite make it. this may be an interesting position to demonstrate the AB paradigm Stockfish ability to find a line to pick its way through the traps, and the MCTS ability to hold the tension and possibilities in place.

the problem for white is that "doing something" reduces or changes his potential to "do other somethings". for example, while g4 opens the third rank for the queen to swing over to c3, discover check threats and so on, it remves the Qh3c8 diagonal plays. Early Qh6 checks are too committing and remove many other possibilities.

ok, g4 commits too soon.
so, back to Rd3 .... Rd6, if Rd8 then back to the main line with pawn now a3 - is this still ok for white? Ra7 freedom now doesnt work.
I'm trying to find ways to quietly maneouvre and in essence find a zugswang ... threats are things now like g4 and Qd3, whilst still preventing black to develop
Parent - By Chris Whittington (**) [fr] Date 2017-12-27 16:02
Also, after a6 try qh6+ kf6 rh5 which possibly continues kf7 rxg5 re6 rgg4 g5 then I am not too sure but we have still prevented black developing. Maybe Qh1 and swing over to the other side, rd1 threat double rooks d file or qh1 d1 idea, probably not enough, but white has many many attack lines in this game. Nearly but not quite. So AB search with time enough perhaps can find the saving lines. Interesting.
Parent - - By CSullivan (**) [us] Date 2017-12-27 17:49
Chris & others,
For what it's worth, besides the drawing line 34...a6 35.Rd3 Ra7 36.Rf3 Qh8 37.Qg4 Qxh1+, etc., earlier in the game it looks like Black could have played a different 32nd move and still drawn: 32...a5 33.Rd6 bxc4 34.Qh6+ Kf7 35.Bxg5 Be5 36.Rhd1 Ra7 37.Qh4 Re8 38.Rd8 c5 39.Qxc4+ Kg7 40.Qh4 Rxd8 41.Qh6+ Kf7 42.Rxd8 Qxd8 43.Bxd8 Nd7 44.Qg5 Bf6 45.Bxf6 Nxf6 46.Qxc5 Rd7 47.Qxa5 Rd5
8/5k2/5np1/Q2r4/8/6P1/5PK1/8 w - - 0 48

and we have reached a likely draw.  (NOTE: Stockfish8 does not like this drawish variation and rates it at +2.59.  Houdini3 rates it as about +3.00).

Now for a few opinions...
(1) AlphaZero has shown us that there are a lot more interesting non-losing moves than we thought.  Trading material for a (sometimes small) long-term positional advantage seems worthwhile more often than we imagined.
(2) Looking at the performance curves in the AlphaZero paper, it appears that both AlphaZero and Stockfish have very little room for improvement (AlphaZero achieved Stockfish strength after 300,000 steps but did not gain much after another 400,000 steps).
(3) I think that it would be a very interesting match between AlphaZero & Houdini (or Stockfish or Komodo) if the alpha-beta program could (a) use a custom opening book, (b) use its time management so that in difficult positions it could use more than a fixed amount of time, (c) use bigger hardware for which it has been tuned [Tord Romstad pointed out that Stockfish has not been tuned for large configurations such as the 64 threads when it played AlphaZero].
(4) I suspect that AlphaZero might still win because the alpha-beta programs evaluate materially unbalanced endgames HORRIBLY.  As we have seen in this game, Stockfish avoids drawish endgames because it rates them lost.  This is a huge disadvantage which AlphaZero can exploit.
Charles Sullivan
Parent - - By Chris Whittington (**) [fr] Date 2017-12-27 20:50
hand coding fortress detection is not a task I would like to do. nor expect to be successful, just way too many tricky little exceptions to handle even when most of the cases in the main code worked. Its a lot of cases. I remember only too well the bugs and special cases in mate-at-a-glance coding. Fortress is order of magnitude a tougher task, and without much ELO incentive. Anyway, the one below is only a 8man EGTB, expect that to arrive first ;-)
Parent - - By Venator (Silver) [nl] Date 2017-12-28 06:13 Edited 2017-12-28 06:26
Chris,

Since the A0 paper came out, I have been wondering about the following: when I look closely at the games played in TCEC (or in other tournaments/matches posted on CCC), I notice 3 areas where the current top engines can still be improved quite a lot:

1. Pawn up endings
2. Fortresses
3. Pawn structures / levers, i.e. keep the pawn structure fluid and flexible instead of blocking all possible pawn moves

If current hardware is simply not powerful enough yet to mimic the A0 learning method, couldn't you just build a NN to tackle the above 3 partial patterns in chess? I.e. don't start with the complete game of chess, but only a very small subset of chess? Perhaps 2. Fortresses might be too big and too complicated, but the pawn down endings should be an interesting subject to explore. You can even make smaller subsets of this issue, by f.e. starting with "rook vs rook plus extra pawn, with all pawns on the same side", like rook+3 pawns vs rook+2 pawns.

Currently the top engines still overestimate their chances in pawn up endings by a large margin, saying f.e. "+0.70" while in fact any human sees instantly that the ending is drawn. BTW, I also wonder what negative impact on middlegame search the misevaluation of these endings has: engines will dismiss saving possibilities into such endings, favoring middlegames which 'only' shows -0.50, because they think this is better than such a -0.70 ending, missing lots of drawing possibilities for itself and the opponent. I am of the opinion that such easy pawn up endings should be either won or drawn, thus an eval like "+0.70" isn't very helpful: it should be 0.00 or some high number.

In the TCEC chat someone connected to the Stockfish project told me that they tried to incorporate several pawn down rook endings in the eval, simply evaluating them as drawn. But this cost elo. Of course I think they didn't do it correctly, as the correct evaluation of such endings obviously should gain elo. It would be interesting to know if you can give a NN the task of learning which patterns in such endings are drawn and which ones are not.
Parent - By Chris Whittington (**) [fr] Date 2017-12-28 12:08
I imagine that "we", now, could probably make an 8 man egtb Ann, which could cover quite a lot of unbalanced ending and their associated fortresses. Well, imagine anyway. Trouble would be that a call would require the s l o w ANN calculation and slow up the general search of say , the Stockfish. There won't be an cache benefiting either for subsequent calls for other positions. One idea might be to build a special hash for this, so only the first call on a position was slow. Feasible maybe.
Parent - By Banned for Life (Gold) Date 2017-12-27 22:14 Edited 2017-12-27 22:18
(2) Looking at the performance curves in the AlphaZero paper, it appears that both AlphaZero and Stockfish have very little room for improvement (AlphaZero achieved Stockfish strength after 300,000 steps but did not gain much after another 400,000 steps).

You're probably ascribing too much significance to the learning curve, which always has the same characteristic shape asymptotically approaching its maximum Elo. The implication that AlphaZero won't improve much more in its current state might be accurate, but it's highly unlikely that the AlphaZero learning algorithms are anywhere close to optimal. Using DeepMind's Go engine example, their new program beat the original (the one that convincingly beat the world champion) 100-0 in a match. There is no reason to believe that a much better chess engine isn't possible as well...
Up Topic The Rybka Lounge / Computer Chess / AZ vs SF - game 99
1 2 3 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill