Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka elo evaluation
- - By albitex (***) Date 2008-08-10 17:59 Edited 2008-08-10 18:09
Rybka elo evaluation
I have made a tour, with Rybka 3 and others my engines. This to appraise the increase of power in comparison to Rybka2.3.2.a . The result has been excellent,
+ 148 ps. elo:


The start elo of engine it has been gotten, with the elo of the my Rybka 2.3.2.a in internet: 2624 p.elo.
The start Hiarcs elo is exaggerated, because it has been gotten, with slow time games, to analyze a single position.
Rybka 3, play with parameters of default.
"Persistent hash" function, probably is not active.
Parent - - By lkaufman (*****) Date 2008-08-10 18:10
Although your sample size is small, your results agree remarkably well with the CEGT testing. What time limit did you use, and what testset or opening book?
Parent - - By albitex (***) Date 2008-08-10 18:54
Time games: blitz 10m + 2'. Opening book start database: NoomenTestsuite2006.
I don't have load opening book, but I have made start the games, from positions of the database. (I have not been able to use it all, I would have had to do at least 200 games)
Parent - - By lkaufman (*****) Date 2008-08-10 19:50
Okay, thanks. I wonder why you don't just run the engine matches using the book option in any of the Fritz family GUIs; don't you have a GUI that supports doing this?
Parent - - By albitex (***) Date 2008-08-10 23:20 Edited 2008-08-10 23:28
Yes , I have shredder 9 GUI . But my openingbook  have of the variations, with tournament movements besides the 40° move.  A lot of variations are been analyzed with miles of games, with random system . To avoid to have a series of games, start almost all from the endgame, I would have had to change a lot of formulations in my book. And I would have employed a lot of time.
I have tried to have some various games, using start opening database.
My personal  impression, is that the difference among Rybka 3 and 2 is all in the endgames play.
Parent - By lkaufman (*****) Date 2008-08-11 03:05
Because of issues like you mention, most testers just use opening testsets that only go out to move eight or so. Some are freely available, like Harry Schnapp's HS220.
Parent - - By George Tsavdaris (****) Date 2008-08-10 19:02
I never trusted the ELO ratings that Chessbase GUI gave....

Chessbase GUI gives according to the poster:
    Program                          Elo  
  1 Rybka 3                        : 2795 
  2 Rybka 2.3.2a                   : 2647
  3 Naum 3                         : 2522
  4 Hiarcs 12                      : 2481
  5 Glaurung 2.1                   : 2435


148 points difference between Rybka 3 and Rybka 2.3.2a according to Chessbase GUI.

Here is what BayesELO gives(with a base ELO for Rybka 3 to have 2795 and every program to have played 2 games as white and 3 games as black and the opposite for the next opponent, for a total of 10 black and 10 white games. I say this since BayesELO takes into consideration the color of games each player played.):
     Name           Elo    +    -  games score
   1 Rybka 3        2795  108   95    20   80%
   2 Rybka 2.3.2a   2702   95   90    20   63%
   3 Naum 3         2605   92   96    20   40%
   4 Hiarcs 12      2603   90   92    20   40%
   5 Glaurung 2.1   2535   95  105    20   28%


93 points difference between Rybka 3 and Rybka 2.3.2a according to BayesELO.

And here is what ELOstat gives(with a base ELO for Rybka 3 to have 2795):
    Program                          Elo    +   -   Games   Score  
  1 Rybka 3                        : 2795  142 106    20    80.0 % 
  2 Rybka 2.3.2a                   : 2674  108 100    20    62.5 % 
  3 Naum 3                         : 2546  109 113    20    40.0 % 
  4 Hiarcs 12                      : 2546   94 102    20    40.0 % 
  5 Glaurung 2.1                   : 2468  129 138    20    27.5 %


121 points difference between Rybka 3 and Rybka 2.3.2a according to ELOstat.
Parent - By lkaufman (*****) Date 2008-08-10 20:03
From my understanding, BayesElo is not very suitable for this data. It "assumes" that two engines are close in strength until there is enough data to prove otherwise. This may take 1000 or more games. I expect that CCRL will show smaller gains for R3 than CEGT for a while, because CCRL uses BayesElo which understates the gains when a new engine is massively superior to others. BayesElo basically says "I don't believe you are much better until you really prove it".
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka elo evaluation

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill