> Quick question: what is in your view the best mapping between Rybka scores
> and winning percentages?
It wasn't a quick question, I had to compile a lot of data to answer
it. The simplest reasonable assumption to make is that Rybka's score is
proportional to the Elo rating difference (which you know how to convert to
a percentage). I think it's a good enough assumption for most practical
purposes. Then, based on 8 ply monte carlo data I've got from a variety of
positions from opening, middlegame, and endgame (but with most of the pawns
on the board), I recorded the ratio of elo points to advantage in pawns for
those positions (21 of them) that showed more than 150 Elo difference in the
results. The ratios ranged from 180 to 385 and averaged 280.7. However in
general the ratio rises with depth, a one pawn advantage being more certain
to win with more depth. Probably for real time games the figure would be at
least 300. So roughly we can say that each centipawn of Rybka eval is worth
3 Elo points on average. My earlier work on opening evals of the different
versions used 2.5 Elo per centipawn, but this value was pretty much just
based on the opening position, so I would have much more confidence in the 3
>I recorded the ratio of elo points to advantage in pawns for those positions (21 of them) that showed more than 150 Elo difference in the results.
I guess he means the ratio of the difference of ELO points.
>My earlier work on opening evals of the different versions used 2.5 Elo per centipawn, but this value was pretty much just based on the opening position, so I would have much more confidence in the 3 Elo figure.
Oh, i've made a table for that:
% expected score | Rybka's eval | % probability difference of winning from the probability the opponent has.
55 % 0.12 10 %
60 % 0.23 20 %
65 % 0.36 30 %
70 % 0.49 40 %
75 % 0.64 50 %
80 % 0.80 60 %
85 % 1.00 70 %
90 % 1.27 80 %
95 % 1.71 90 %
98 % 2.25 96 %
Rybka's eval | % expected score | % probability difference of winning from the probability the opponent has.
0.01 50.43 % 0.86 %
0.10 54.3 % 8.6 %
0.20 58.5 % 17.1 %
0.30 62.7 % 25.3 %
0.40 66.6 % 33.2 %
0.50 70.3 % 40.7 %
0.80 79.9 % 59.8 %
0.90 82.6 % 65.1 %
1.00 84.9 % 69.8 %
1.10 87.0 % 74.0 %
1.20 88.8 % 77.6 %
1.30 90.4 % 80.8 %
1.40 91.8 % 83.6 %
1.50 93.0 % 86.0 %
1.60 94.1 % 88.1 %
Tables are based on the assumptions:
•Rybka's evaluation score is proportional to the ELO rating difference.
•It is true that 3 ELO difference corresponds to 0.01 evaluation score of Rybka.
•ELO difference X is connected to the score expectancy(P(X)) with the formula:
P(X) = 100/(1+10^(-X/400)) where the result is given in %.
•So score expectancy P(X) relative to Rybka's evaluation score, let's say it R, is:
P(R) = 100/(1+10^(-3·R/4)) where the result is given in %.
FWIW - I suspect that the mapping from +2.25 to 98% is too optimistic. Different things are happening in that part of the curve.
>...for 2.25 eval...because there are a few positions (mostly endgame) where the eval may be way off (i.e. drawn endgames with this eval).
Indeed, I think that the winning % needs to depend also on the amount of "play" left in a position. The value of +0.50 decreases sharply as one nears an endgame, especially with all Pawns on one wing (which the eval I guess already factors in). Have efforts been made to regress the winning % against the material and/or total piece mobility on the board, or more crudely against the move number?
In practice, this effect is (probably) fairly mild.
Re. +225 cp, I suspect that in more than 2% of cases where this type of score is given, the position is a draw (or will end up a draw).
I think that this is worth investigating now that most really obvious things are already in the Rybka eval.
Rybka 3 w32 with 1 CPU and no EGTs likes 60.g4, showing a green fail-high at reported depth=29 (analysis below), but I'm pretty sure this leads only to a draw---though I've gotten DF10 as high as 3.76. The lines with 60.g4 have many transpositions that act as a huge "well", which I believe would suck in any "evolutionary" local-heuristic algorithm (simulated annealing, quantum adiabatic, like-that). When I first ran it with Rybka 2.3.2a 4-CPUs, Rybka zoomed over 3.00 into a wrong-color-Bishop rabbit hole, but this unsoundness included an idea which I amended to find my claimed win ("proved" by propagating 4.16 from DF10 + 3-4-5 EGTs back to this position under my IM-watchful eyes). Let's see if any of you can solve this---unlike other studies, computers and the full panoply of EGTs welcome!
1: Kramnik,V - Grischuk,A, WCh Mexico City MEX 2007
60.g4 Kd3 61.Bf5+ Ke2 62.Bc2 Nc4 63.Ba4 Nb2 64.Bb5+ Kd2 65.Kg3 Nd1 66.Bc6 Nb2 67.Be4 Nc4 68.h3 Ke1 69.h4 Kd2
+/- (1.05) Depth: 24 00:00:59 3016kN
60.g4 Kd3 61.Kg3 Ke2 62.Bd5 Nf1+ 63.Kg2 Nd2 64.h4
+/- (1.21) Depth: 25 00:02:52 8753kN
60.g4 Kd3 61.Kg3 Ke2 62.Bd5 Nf1+ 63.Kg2 Nd2 64.h4
+/- (1.21) Depth: 26 00:03:46 11436kN
60.g4 Kd3 61.Kg3 Ne4+ 62.Kf4 Nd6 63.Bd5 Kd4 64.Bg2 Kd3 65.Bf1+ Kd4 66.Ba6 Ne4 67.f3 Nd6 68.Be2 Nb7 69.Bb5 Nd6 70.Ba6 Ne8 71.Bb7 Nd6 72.Ba6 Ne8 73.Bb7 Nd6 74.Ba6 Ne8 75.Bb7
+/- (1.21) Depth: 27 00:05:46 16903kN
60.g4 Kd3 61.Kg3 Ne4+ 62.Kf4 Nd6 63.Bd5 Kd2 64.Kg3 Ke2 65.h3 f5 66.Bc6 fxg4 67.hxg4 Nf7 68.f4 Ke3 69.Bg2 Nd6 70.f5 Ne8 71.Bf3 Nf6 72.Bh1 Kd4 73.Kf4 Kc5 74.Bf3 Kd6
+/- (1.36) Depth: 28 00:10:45 30937kN
+- (1.56) Depth: 29 00:33:32 92079kN
[and still going on depth 29 past 53 minutes]
> I don't agree. My results for knight odds for the starting position show an eval of -2.37 (after 18 ply), with MC results of 98.6% at 8 ply and 99.7% at 10 ply. If anything it seems that 98% may be too low for 2.25 eval. Maybe over all positions you might be right if we go by average rather than median results, because there are a few positions (mostly endgame) where the eval may be way off (i.e. drawn endgames with this eval).
Sure, quiescence middlegames are the best-case scenario. What about endgames, or sharp middlegames (where one side is up a piece but getting mated)? 98% is a high number!
I'll try to clear this up in the FAQ without using really complicated language :)
>Larry comments about this topic (also added to the FAQ):
Where is this FAQ? I can't find in this FAQ the words of Larry that you gave above describing the analogy between ELO and Rybka's evaluations....
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill