Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Marin's K+P Analysis Error & 5-Pawn Swings by Engines
- - By KWRegan (*) Date 2008-02-04 02:45
A week ago I found a major hole in GM Mihail Marin's big analysis of the Adams-Polgar Corus Rd. 12 King and Pawn ending on ChessBase at http://www.chessbase.com/newsdetail.asp?newsid=4416.  In this position:

8/7p/6p1/p1k2p2/1pp2P2/P1P2P1P/2P5/2K5 w - - 0 38


Marin gave only 38.axb3, but 38.Kb2! holds.  He may have missed this for 2 reasons: usually allowing an outside passer is suicide, and Fritz (my 9 & DF10 with 1024MB hash; he used 11) instantly sees this as a 2-pawn blunder and gives Black 3.20+ to depth 20 or so.  When my DF10 is run a long time, it eventually flatlines to 0.00.  HIARCS 11.2 and Shredder 9.1 give similar high evals, but Rybka2.2n2, 2.3.1, and 2.3.1LK give only about -1.00 all the time, while Rybka 2.3.2a gives -2.09 or thereabouts---with most of my other engines in the mid-to-high -1.nn's.

My own analysis posted on Dennis Monokroussos' blog "The Chess Mind" at http://chessmind.powerblogs.com/posts/1201988305.shtml (game link http://chessmind.powerblogs.com/files/waz2008_adams_polgar_by_kwr.htm, my full PGN downloadable at http://www.cse.buffalo.edu/~regan/chess/analyses/AdamsPolgar108full.pgn) includes a position in which the same defensive idea causes far bigger swings by engines:

8/1k6/7p/ppp2p2/2pP1p1P/2P2P2/PKP5/8 w - - 0 51


I've observed DF10 give Black 4.70--5.22 past depth=30 many times, but White to move can hold by 51. dxc5 Kc6 52. a3!  Kd5 53. Kb1! Kxc5 54. Kc1 Kd5 55. Kd2 Kc6 56. Kd1! Kd6 57. Kd2 Kd5 58. Kd1! Ke6 59. Ke2 Kf6 60. Kf2 Kg6 61. Kg2 Kh5 62. Kh3 a4 63.Kh2!  As after 38.Kb2, White gets counterplay along the Rook's file and Black cannot win any queening races.  I have "proved" this by running DF10 at 63.Kh2 until it says 0.00, then backtracking this eval to the beginning.  However, HIARCS 11.2MP persists with 9.03 to Black even when I play some more moves after 63.Kh2!

I wonder if this is a kind of record for a drawn position that is "normal", i.e. not featuring extra material that is blockaded or subject to stalemate defenses, and not with winnable Queening races or one side a Queen down but holding.  What other examples are out there?

Thanks for replies, and enjoy!  ---Ken Regan (IM)
Parent - - By BB (****) Date 2008-02-04 14:46
One of the engine problems here is considered in a relatively recent paper: Blockage Detection in Pawn Endings [SpringerLink] by Tabibi, Felner, Netanyahu. This, and a companion paper (with titles jiggled a bit), are available from Tabibi's weblinks.
Parent - - By KWRegan (*) Date 2008-02-05 21:35
Thanks!  My situation is trickier than what they consider because () it's not a perfect blockage---there are some Queening races to evaluate, and () the eval swings I get are much higher.  They give a +5.54 from Shredder 7.04, but all the rest are below +4.00, which after much experience remains my minimum threshold for signing off as "winning". 

I wonder if it's now just as efficient for engines, after sensing blockage or triggered by getting lots of transpositions early on, to enter a "Freezer"-style tablebasing mode, but using (say) an 8-ply eval at points where a capture or pawn move is made.  With just K vs. K the 'base is really small, and one could re-enter that mode after pawn moves and/or captures like ...Kxc5 here.  (Does "Freezer" have an option to use N-ply search results at end nodes?---a requisite for me, IMHO.)  This strikes me as properly more general than what they do in those papers.
Parent - - By Mark (****) Date 2008-02-06 01:48

> +4.00, which after much experience remains my minimum threshold for signing off as "winning".


I think you have to be very careful about declaring a victory with a +/- 4.00.  For example, in king and pawn endgames, Rybka could have a +10 and still not win.
Parent - - By KWRegan (*) Date 2008-02-06 02:58
Oh indeed---what I meant is that I never sign off winning if the engine is below +4.00, and then only if my brain agrees.  I do tend to trust DF10 on evals over +5.00.  The above is the first violation of that I've encountered (in a "normal" position) in a lot of endgame analysis.  

Rybka 2.3.2a has many cockamamie endgame evals, many more than Rybka 2.2n2, say.  Is this generally known to this group?  I've seen posts to that effect but not a consensus.
Parent - - By tano-urayoan (****) Date 2008-02-06 17:38
"Rybka 2.3.2a has many cockamamie endgame evals, many more than Rybka 2.2n2, say.  Is this generally known to this group?  I've seen posts to that effect but not a consensus."

I think this is due to the endgame module that was added to this version. The author (MR. Rajlich) was not satisfied with it as an improvement tool.
Parent - - By Vempele (Silver) Date 2008-02-06 18:30

> I think this is due to the endgame module that was added to this version.


It was only for pawn endings.
Parent - By BB (****) Date 2008-02-06 21:22

> It was only for pawn endings.


I think that the UB explanation was that Rybka is counting pawns as unstoppable when the opponent might have a pawn formation which has no static passed pawn, but will create one. For instance:
k7/p5p1/8/6PP/8/8/8/7K w - - 0 1

The claim would be that some versions of Rybka might (statically) evaluate this as strongly favourable for Black due to the unstoppable pawn on a7, while ignoring White's chances. His quote: "I evaluate a pawn to be unstoppable in pawn endgame only if it is more advanced than all of the opponents pawns that seem to have chances to promote."
Parent - By Mark (****) Date 2008-02-06 18:16
Thanks.  I should have realized that you were aware of this, given your extensive efforts to analyze some of these endgames.

What your really need is the full set of 6 man tablebases to help with endgames.  (I've put off downloading the 6 man tablebases in the hopes that someone will compute the 6 man bitbases...)
Parent - By BB (****) Date 2008-02-06 12:03
Two things I can think of:

* You might be interested in the forum discussion of the K+8P tournament that Werewolf ran 2 months ago. There were some bad misevals noted (cockamamie as you say).

* 0.00 need not be a safe score, due to repetitions and hashing [this could also presumably cause mis-evals in subtrees]. The Graph History Interaction problem is solved in theory it seems, but I have my doubts (to say the least) as to whether this is widely applied in a rigourous manner. I don't know of any examples of errors offhand, as most engines won't hash rep-scores, which ameliorates the problem to a large extent.

> Does "Freezer" have an option to use N-ply search results at end nodes?---a requisite for me, IMHO.


I don't know anything in particular about Freezer, other than its basic description. In any case, let me try to re-write your scheme.

* Make rules for a position. This determines a set S of positions of presumed wins.
* Enlarge S to T via retro-grade analysis (with distance-to-convert info). The complement of T should be essentially non-wins.
* Using T as an oracle for searching (it gives best-play/ordering presumably), walk down the mainline until an S-position is reached. Get a better estimate of the score for the S-position. If it seems won, no worries. If it seems not-won, then update T accordingly. If it seems "sorta won", they maybe (just guessing here) use a technique similar to Enhanced Transposition Cutoff to see if a different path from this node is definitely won (if everything looks hairy, search deeper, ask the user, etc.). Keep doing this until a win is "proven" (in your metric).

I can see this as being feasible, and is probably not that far beyond what Freezer is doing [the update of T requires some care if it is to be efficient]. This should be better than (say) originally searching (to 8 ply) all S-positions, as our idea is that T is rather small, and so the tree of interest is not too bushy.

>My situation is trickier than what they consider because () it's not a perfect blockage---there are some Queening races to evaluate


FALCON is a general-purpose engine wrapped around the "blockage" code, so it would presumably be able to handle queening races, etc., with the blockage only kicking in when needed. I have no idea how good it would be.
Up Topic The Rybka Lounge / Computer Chess / Marin's K+P Analysis Error & 5-Pawn Swings by Engines

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill