Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka eval scores vs winning % (actually Elo)
- - By Vasik Rajlich (Silver) Date 2008-08-10 13:21
Larry comments about this topic (also added to the FAQ):

> Quick question: what is in your view the best mapping between Rybka scores
> and winning percentages?
>


     It wasn't a quick question, I had to compile a lot of data to answer
it. The simplest reasonable assumption to make is that Rybka's score is
proportional to the Elo rating difference (which you know how to convert to
a percentage). I think it's a good enough assumption for most practical
purposes. Then, based on 8 ply monte carlo data I've got from a variety of
positions from opening, middlegame, and endgame (but with most of the pawns
on the board), I recorded the ratio of elo points to advantage in pawns for
those positions (21 of them) that showed more than 150 Elo difference in the
results. The ratios ranged from 180 to 385 and averaged 280.7. However in
general the ratio rises with depth, a one pawn advantage being more certain
to win with more depth. Probably for real time games the figure would be at
least 300. So roughly we can say that each centipawn of Rybka eval is worth
3 Elo points on average. My earlier work on opening evals of the different
versions used 2.5 Elo per centipawn, but this value was pretty much just
based on the opening position, so I would have much more confidence in the 3
Elo figure.
Parent - - By George Tsavdaris (****) Date 2008-08-10 16:30 Edited 2008-08-10 16:32
Hmm this is interesting.

>I recorded the ratio of elo points to advantage in pawns for those positions (21 of them) that showed more than 150 Elo difference in the results.


I guess he means the ratio of the difference of ELO points.

>My earlier work on opening evals of the different versions used 2.5 Elo per centipawn, but this value was pretty much just based on the opening  position, so I would have much more confidence in the 3 Elo figure.


Oh, i've made a table for that:


% expected score | Rybka's eval | % probability difference of winning from the probability the opponent has.
      55 %           0.12            10 %
      60 %           0.23            20 %
      65 %           0.36            30 %
      70 %           0.49            40 %
      75 %           0.64            50 %
      80 %           0.80            60 %
      85 %           1.00            70 %
      90 %           1.27            80 %
      95 %           1.71            90 %
      98 %           2.25            96 %


Also:
  
  Rybka's eval | % expected score | % probability difference of winning from the probability the opponent has.
       0.01         50.43 %            0.86  %
       0.10         54.3 %             8.6  %
       0.20         58.5 %             17.1  %
       0.30         62.7 %             25.3  %
       0.40         66.6 %             33.2  %
       0.50         70.3 %             40.7  %
       0.80         79.9 %             59.8  %
       0.90         82.6 %             65.1  %
       1.00         84.9 %             69.8  %
       1.10         87.0 %             74.0  %
       1.20         88.8 %             77.6  %
       1.30         90.4 %             80.8  %
       1.40         91.8 %             83.6  %
       1.50         93.0 %             86.0  %
       1.60         94.1 %             88.1  %


Tables are based on the assumptions:
•Rybka's evaluation score is proportional to the ELO rating difference.

•It is true that 3 ELO difference corresponds to 0.01 evaluation score of Rybka.

•ELO difference X is connected to the score expectancy(P(X)) with the formula:
P(X) = 100/(1+10^(-X/400))   where the result is given in %.

•So score expectancy P(X) relative to Rybka's evaluation score, let's say it R, is:
P(R) = 100/(1+10^(-3·R/4))   where the result is given in %.
Parent - By lkaufman (*****) Date 2008-08-10 17:54
Yes, your assumptions are all correct and I believe you did everything perfectly. Thank you for producing these tables.
Parent - - By Vasik Rajlich (Silver) Date 2008-08-11 20:16
Thanks, this is linked from the FAQ.

FWIW - I suspect that the mapping from +2.25 to 98% is too optimistic. Different things are happening in that part of the curve.

Vas
Parent - - By lkaufman (*****) Date 2008-08-12 15:01
I don't agree. My results for knight odds for the starting position show an eval of -2.37 (after 18 ply), with MC results of 98.6% at 8 ply and 99.7% at 10 ply. If anything it seems that 98% may be too low for 2.25 eval. Maybe over all positions you might be right if we go by average rather than median results, because there are a few positions (mostly endgame) where the eval may be way off (i.e. drawn endgames with this eval).
Parent - - By KWRegan (*) Date 2008-08-12 16:57

>...for 2.25 eval...because there are a few positions (mostly endgame) where the eval may be way off (i.e. drawn endgames with this eval).


Indeed, I think that the winning % needs to depend also on the amount of "play" left in a position.  The value of +0.50 decreases sharply as one nears an endgame, especially with all Pawns on one wing (which the eval I guess already factors in).  Have efforts been made to regress the winning % against the material and/or total piece mobility on the board, or more crudely against the move number?
Parent - By Vasik Rajlich (Silver) Date 2008-08-12 17:57
In theory, the mapping from centipawns to winning percentage should remain constant throughout the game. Otherwise, Rybka will make irrational choices when comparing endgames to middlegames.

In practice, this effect is (probably) fairly mild.

Re. +225 cp, I suspect that in more than 2% of cases where this type of score is given, the position is a draw (or will end up a draw).

Vas
Parent - - By lkaufman (*****) Date 2008-08-12 18:49
     The general rule is to exchange pieces when ahead in material (as of course you know), which implies that a pawn advantage grows in value thruout the game. This is reflected in Rybka 3 which shows an increase in the advantage as pieces come off when one side is up a clear pawn in an otherwise equal position. Roughly speaking, we define +1.00 as being a pawn ahead with half the material still on the board. On a full board, we show about 0.8 score for this situation, and on a nearly empty board (say one minor piece each plus pawns) about 1.2. However what you are saying is the exact opposite: that a lead of half a pawn becomes less valuable towards the endgame (forget the case of all pawns on one wing, we do handle this separately in the eval). Vas and I did discuss this issue some time ago, but it is not currently addressed in Rybka. I'm not sure if it is even true. Let's say you are up by the bishop pair (vs. bishop and knight), which is roughly speaking a half pawn advantage in all phases of the game, because both a pawn and the bishop pair gain in value with exchanges. If you are correct then the average winning percentage for the bishop pair should decline with each piece exchange past some unknown point. Do you believe this is so? It could be, it's certainly not obvious. Maybe I'll do Monte Carlo tests to see. It is more likely to be true of much smaller advantages like 0.1 pawn, because the draw becomes clear with simplification when the edge is tiny.
     I think that this is worth investigating now that most really obvious things are already in the Rybka eval.  
Parent - By KWRegan (*) Date 2008-08-13 04:32
Thanks---indeed I am generalizing off my narrow and particular (but deep and intense) experience with the Kramnik-Grischuk endgame, K+B+4P vs K+N+4P, e-h pawns for both.  This has been done mainly with Deep Fritz 10, with my Nalimov 3-4-5 off when exploring (for speed) and on while "proving".  When the material was B+3 vs N+2 or greater, I found that an eval of 2.00-2.20 or higher was pretty reliable, whether EGTs were on or off---indeed there are hundreds of such places which I've marked "Trouble" and haven't gotten to "proving" yet.  But less than that, especially B+2 vs N+1, needed 4.00 to be certain.  One position in particular, where I've proved a win to my satisfaction and believe it's largely unique (though one major dual 6 or so moves in dampens it as a study), shows many themes I've talked about.

8/8/4Bp1p/8/8/6P1/3nkPKP/8 w - - 0 60


Rybka 3 w32 with 1 CPU and no EGTs likes 60.g4, showing a green fail-high at reported depth=29 (analysis below), but I'm pretty sure this leads only to a draw---though I've gotten DF10 as high as 3.76.   The lines with 60.g4 have many transpositions that act as a huge "well", which I believe would suck in any "evolutionary" local-heuristic algorithm (simulated annealing, quantum adiabatic, like-that).  When I first ran it with Rybka 2.3.2a 4-CPUs, Rybka zoomed over 3.00 into a wrong-color-Bishop rabbit hole, but this unsoundness included an idea which I amended to find my claimed win ("proved" by propagating 4.16 from DF10 + 3-4-5 EGTs back to this position under my IM-watchful eyes).  Let's see if any of you can solve this---unlike other studies, computers and the full panoply of EGTs welcome!

1: Kramnik,V - Grischuk,A, WCh Mexico City MEX 2007
...
60.g4 Kd3 61.Bf5+ Ke2 62.Bc2 Nc4 63.Ba4 Nb2 64.Bb5+ Kd2 65.Kg3 Nd1 66.Bc6 Nb2 67.Be4 Nc4 68.h3 Ke1 69.h4 Kd2
  +/-  (1.05)   Depth: 24   00:00:59  3016kN
60.g4 Kd3 61.Kg3 Ke2 62.Bd5 Nf1+ 63.Kg2 Nd2 64.h4
  +/-  (1.21)   Depth: 25   00:02:52  8753kN
60.g4 Kd3 61.Kg3 Ke2 62.Bd5 Nf1+ 63.Kg2 Nd2 64.h4
  +/-  (1.21)   Depth: 26   00:03:46  11436kN
60.g4 Kd3 61.Kg3 Ne4+ 62.Kf4 Nd6 63.Bd5 Kd4 64.Bg2 Kd3 65.Bf1+ Kd4 66.Ba6 Ne4 67.f3 Nd6 68.Be2 Nb7 69.Bb5 Nd6 70.Ba6 Ne8 71.Bb7 Nd6 72.Ba6 Ne8 73.Bb7 Nd6 74.Ba6 Ne8 75.Bb7
  +/-  (1.21)   Depth: 27   00:05:46  16903kN
60.g4 Kd3 61.Kg3 Ne4+ 62.Kf4 Nd6 63.Bd5 Kd2 64.Kg3 Ke2 65.h3 f5 66.Bc6 fxg4 67.hxg4 Nf7 68.f4 Ke3 69.Bg2 Nd6 70.f5 Ne8 71.Bf3 Nf6 72.Bh1 Kd4 73.Kf4 Kc5 74.Bf3 Kd6
  +/-  (1.36)   Depth: 28   00:10:45  30937kN
60.g4
  +-  (1.56)   Depth: 29   00:33:32  92079kN
[and still going on depth 29 past 53 minutes]
Parent - - By Vasik Rajlich (Silver) Date 2008-08-12 17:59

> I don't agree. My results for knight odds for the starting position show an eval of -2.37 (after 18 ply), with MC results of 98.6% at 8 ply and 99.7% at 10 ply. If anything it seems that 98% may be too low for 2.25 eval. Maybe over all positions you might be right if we go by average rather than median results, because there are a few positions (mostly endgame) where the eval may be way off (i.e. drawn endgames with this eval).


Sure, quiescence middlegames are the best-case scenario. What about endgames, or sharp middlegames (where one side is up a piece but getting mated)? 98% is a high number!

Vas
Parent - - By lkaufman (*****) Date 2008-08-12 18:36
Yes, as I said I'm sure that the average score will be less than 98%, but if relatively calm positions are dominant and they average 98.1% when +2.25, then even if there are many positions where we only score 90% or so the median result may be 98% or more. I guess the answer depends on what set of positions we are talking about.
Parent - - By Vasik Rajlich (Silver) Date 2008-08-13 18:30
I understood your mappings to be from centipawn scores to winning percentages over the full set of possibly encountered positions, weighted by likelihood of occurrence. If you want to elaborate on this (ie. maybe it's for middlegames only ?!), I can add this to the FAQ.

Vas
Parent - - By lkaufman (*****) Date 2008-08-13 20:30
I did include positions from opening, middlegame, and endgame, but they are not a representative set of all positions, just of the three phases. It seems that the rule I gave is reasonably valid for all three phases. I did not even try to include tactical positions, partly because MC is not so good at that. Since I have to run each position manually, and MC takes some time to generate a few hundred positions, this was not a massive, scientific study but just the best estimate I could give with the data I had. I did not include any positions where there was a good reason to suspect that the eval was wrong, I was trying to get the figures for the usual positions where the eval is (roughly) right. So I would describe my numbers as estimates of the median, not estimates of the average, which would indeed be influenced by the frequency of bad evaluations.
Parent - By Vasik Rajlich (Silver) Date 2008-08-15 18:29
Aha, I understand. Yes, this is certainly a reasonable way to do it. In this case I agree that the 225 cp to 98% mapping should be completely solid.

I'll try to clear this up in the FAQ without using really complicated language :)

Vas
Parent - - By George Tsavdaris (****) Date 2008-08-11 20:23

>Larry comments about this topic (also added to the FAQ):


Where is this FAQ? I can't find in this FAQ the words of Larry that you gave above describing the analogy between ELO and Rybka's evaluations....
Parent - - By Felix Kling (Gold) Date 2008-08-11 21:14
pffff, I'll update the website tomorrow :)
Parent - By Vasik Rajlich (Silver) Date 2008-08-11 21:30
In the meantime the true masterpiece is here:

http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=5576

Vas
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka eval scores vs winning % (actually Elo)

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill