Rybka elo evaluation
I have made a tour, with Rybka 3 and others my engines. This to appraise the increase of power in comparison to Rybka2.3.2.a . The result has been excellent,
+ 148 ps. elo:

The start elo of engine it has been gotten, with the elo of the my Rybka 2.3.2.a in internet: 2624 p.elo.
The start Hiarcs elo is exaggerated, because it has been gotten, with slow time games, to analyze a single position.
Rybka 3, play with parameters of default.
"Persistent hash" function, probably is not active.
I have made a tour, with Rybka 3 and others my engines. This to appraise the increase of power in comparison to Rybka2.3.2.a . The result has been excellent,
+ 148 ps. elo:

The start elo of engine it has been gotten, with the elo of the my Rybka 2.3.2.a in internet: 2624 p.elo.
The start Hiarcs elo is exaggerated, because it has been gotten, with slow time games, to analyze a single position.
Rybka 3, play with parameters of default.
"Persistent hash" function, probably is not active.
Although your sample size is small, your results agree remarkably well with the CEGT testing. What time limit did you use, and what testset or opening book?
Time games: blitz 10m + 2'. Opening book start database: NoomenTestsuite2006.
I don't have load opening book, but I have made start the games, from positions of the database. (I have not been able to use it all, I would have had to do at least 200 games)
I don't have load opening book, but I have made start the games, from positions of the database. (I have not been able to use it all, I would have had to do at least 200 games)
Okay, thanks. I wonder why you don't just run the engine matches using the book option in any of the Fritz family GUIs; don't you have a GUI that supports doing this?
Yes , I have shredder 9 GUI . But my openingbook have of the variations, with tournament movements besides the 40° move. A lot of variations are been analyzed with miles of games, with random system . To avoid to have a series of games, start almost all from the endgame, I would have had to change a lot of formulations in my book. And I would have employed a lot of time.
I have tried to have some various games, using start opening database.
My personal impression, is that the difference among Rybka 3 and 2 is all in the endgames play.
I have tried to have some various games, using start opening database.
My personal impression, is that the difference among Rybka 3 and 2 is all in the endgames play.
Because of issues like you mention, most testers just use opening testsets that only go out to move eight or so. Some are freely available, like Harry Schnapp's HS220.
I never trusted the ELO ratings that Chessbase GUI gave....
Chessbase GUI gives according to the poster:
148 points difference between Rybka 3 and Rybka 2.3.2a according to Chessbase GUI.
Here is what BayesELO gives(with a base ELO for Rybka 3 to have 2795 and every program to have played 2 games as white and 3 games as black and the opposite for the next opponent, for a total of 10 black and 10 white games. I say this since BayesELO takes into consideration the color of games each player played.):
93 points difference between Rybka 3 and Rybka 2.3.2a according to BayesELO.
And here is what ELOstat gives(with a base ELO for Rybka 3 to have 2795):
121 points difference between Rybka 3 and Rybka 2.3.2a according to ELOstat.
Chessbase GUI gives according to the poster:
Program Elo
1 Rybka 3 : 2795
2 Rybka 2.3.2a : 2647
3 Naum 3 : 2522
4 Hiarcs 12 : 2481
5 Glaurung 2.1 : 2435
148 points difference between Rybka 3 and Rybka 2.3.2a according to Chessbase GUI.
Here is what BayesELO gives(with a base ELO for Rybka 3 to have 2795 and every program to have played 2 games as white and 3 games as black and the opposite for the next opponent, for a total of 10 black and 10 white games. I say this since BayesELO takes into consideration the color of games each player played.):
Name Elo + - games score
1 Rybka 3 2795 108 95 20 80%
2 Rybka 2.3.2a 2702 95 90 20 63%
3 Naum 3 2605 92 96 20 40%
4 Hiarcs 12 2603 90 92 20 40%
5 Glaurung 2.1 2535 95 105 20 28%
93 points difference between Rybka 3 and Rybka 2.3.2a according to BayesELO.
And here is what ELOstat gives(with a base ELO for Rybka 3 to have 2795):
Program Elo + - Games Score
1 Rybka 3 : 2795 142 106 20 80.0 %
2 Rybka 2.3.2a : 2674 108 100 20 62.5 %
3 Naum 3 : 2546 109 113 20 40.0 %
4 Hiarcs 12 : 2546 94 102 20 40.0 %
5 Glaurung 2.1 : 2468 129 138 20 27.5 %
121 points difference between Rybka 3 and Rybka 2.3.2a according to ELOstat.
From my understanding, BayesElo is not very suitable for this data. It "assumes" that two engines are close in strength until there is enough data to prove otherwise. This may take 1000 or more games. I expect that CCRL will show smaller gains for R3 than CEGT for a while, because CCRL uses BayesElo which understates the gains when a new engine is massively superior to others. BayesElo basically says "I don't believe you are much better until you really prove it".
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill