Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / CCRL update (28th December 2007)
- - By Graham Banks (*****) Date 2007-12-30 00:09
The December 28th update of the CCRL Rating Lists and Statistics is now available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/

The list gets updated periodically during the week and these updates can be viewed here:
http://www.computerchess.org.uk/ccrl/4040.live/
Please be aware that no game downloads are available from this live link.

The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.

Our standard testing is at 40 moves in 40 minutes repeating while our current blitz testing is at both 40 moves in 4 minutes repeating and 40 moves in 12 minutes repeating, all adjusted to the AMD64 X2 4600+ (2.4GHz).

Currently active testers in our team are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Andreas Schwartmann, Charles Smith, George Speight, Chris Taylor, Chuck Wilson, Gabor Szots and Martin Thoresen.

A big thanks to all testers as usual for their efforts this week.

40/40 NOTES

There currently 93,864 games in our 40/40 database.

Many engines on our list have few games and in many cases their ratings are likely to fluctuate (markedly for some) until a lot more games are played. Therefore no conclusions should be drawn about their strength yet.
To illustrate this point, when an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.

4CPU 64-bit Engines

Rybka 2.3.2a is over 50 ELO stronger than Zappa Mexico.
The update to Zappa Mexico seems to add little, if any strength.

Deep Shredder 11 lies 40 points further back in third spot.

Naum 2.2 comes in fourth, not too far behind Deep Shredder 11, but ahead of Deep Fritz 10.1 and Hiarcs 11.1.
Deep Fritz 10.1 is a 60 ELO improvement over Deep Fritz 10.
Hiarcs 11.1 is ahead of both Hiarcs 11.2 and Hiarcs 11.

The remaining well tested engines in order of rating are Loop M1-T, Glaurung 2.0.1, Deep Junior 10 and Deep Sjeng 2.7.

We have started testing Bright 0.2c.

2CPU Engines

With the emphasis of multi-cpu testing on 4CPU as opposed to 2CPU, there are still gaps in this category and some of the engines also require further games. However, the order of strength is almost identical to that described in the 4CPU notes.

Single CPU Engines

Rybka 2.3.2a is clearly a class apart at 120+ ELO ahead of Fritz 11, Shredder 11 and Zappa Mexico.

Naum 2.2, Toga II 1.3.1, Hiarcs 11.1, Loop M1-T and Fruit 2.3.1 are within a 25 ELO range of each other.
Fruit 2.3.1 and Fruit 051103 seem to be fairly even in strength despite their different playing styles.
Toga II 1.3.1 is stronger than Toga II 1.3.4 and Hiarcs 11.1 is likewise stronger than Hiarcs 11.2.

A little further back again, Spike 1.2 Turin, Deep Sjeng 2.7, Glaurung 2.0.1 and Junior 10 are very close in strength.
Junior 10 is stronger than Junior 10.1.

Bright 0.2c, Ktulu 8.0, SmarThink 1.00 and Chess Tiger 2007.1 are 40-50 ELO further back.

Chessmaster 11 is 40+ ELO stronger than CM10th Default and is also a little stronger than the best CM10th settings.

Movei 00.8.438 and Alaric 707 are fairly closely matched in strength, but the Movei 00.8.438 (10 10 10) settings add another 20+ ELO to the default settings.

Scorpio 2.0 continues to be disappointing and lies 30+ ELO adrift of the previous version, around the same level as SlowChess Blitz WV2.1, Delfi 5.2, Ruffian 2.1.0 and WildCat 7.
Delfi 5.2 seems to be similar in strength to the earlier Delfi 5.1.

Gandalf 6 has sadly fallen behind many of the top amateur engines.


Free Single CPU Engines

Rybka 1.0 64-bit is still the top free engine ahead of Toga II 1.3.1.

Fruit 2.3.1 comes in third ahead of Spike 1.2 Turin and Glaurung 2.0.1.
It will be interesting to see if Bright 0.2c can challenge these latter two.

Naum 2.0 is 40+ ELO further back, but with a good buffer over Movei 00.8.438 (10 10 10).

Next up is Alaric 707, ahead of SlowChess Blitz WV2.1, Delfi 5.2, Scorpio 2.0, Zappa 1.1 and WildCat 7.

After close to 200 games, Pro Deo 1.6b remains 20 ELO behind Pro Deo 1.2.

Some other recent releases worthy of mention are:
Colossus 2007d - continues to go from strength to strength. This latest version is on a par with Pharaon 3.5.1.
BugChess2 1.5.2 - Francois has made astounding progress and this latest version is roughly the same strength as both Booot 4.13.1 and Hamsters 0.6, other engines that have made impressive gains.
Sloppy 0.1.1 and Cyrano 0.2f are newer engines to keep an eye on!

We test a very extensive range of amateur engines through our Amateur Championship divisions (32-bit 1CPU) plus other tournaments, all of which can be followed in our public forum.

Our aim is of course to ensure that all engines lower on our lists get 200+ games.

BLITZ NOTES

There are currently 219,522 games in our 40/4 database.

The 40/4 update is usually done separately to our 40/40 update.
The latest ratings can be found at one of the following links:
http://computerchess.org.uk/ccrl/404/
http://computerchess.org.uk/ccrl/404.live/

An enormous amount of work goes into the blitz list and it is well worth a visit.

FRC NOTES

There are currently 25,200 games in the FRC 40/4 database.

Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.

Shredder 11 has now taken over second spot behind Rybka 2.3.2 FRC (private), an impressive 80 ELO ahead of Hiarcs 11.1 and Naum 2.2.

Ray has recently finished testing Glaurung 2.0.1 and it has made a 60 ELO gain over the previous version, slotting into seventh place on the pure list.

He has also just finished putting Hamsters 0.6 through its paces and it made a 60 ELO gain over the previous version.

For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/

STATS/PRESENTATION NOTES

The LOS stats to the right hand side of each rating list are "likelihood of superiority" stats. They tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

A list of games played this week per engine can be found in the update thread in the CCRL public forum, accessible through the link given at the top of this post.

All games are available for download through the link given at the top of this post. They can be downloaded by engine or by month.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.

Custom lists of engines can be selected for comparison.

An openings report page (link at bottom of index page) lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
Games can now be downloaded by ECO code.
Parent - - By Roland Rösler (****) Date 2007-12-30 04:10 Edited 2007-12-30 04:18
It´s always amazing to have a deeper look to rating lists. Rybka 2.3.2a mp 64-bit 4-cores (3120) is less than 100 Elopoints better than sp 32-bit version (3023) after more than 900 games (and the 32 bit sp version have a better! score than the 64-bit 4 core version; 74.9% vs. 71.6%). Do you believe that? Rybka 64-bit 4 cores is at least equal to Rybka 32-bit 8 cores. So we have only 20-25 Elopoints for doubling the performance (factor 4.4 for 8 cores).
The same with Shredder 11; here 32- vs. 64-bit isn´t important. S11 sp (2942) is less than 80 Elopoints worse than S11 4-core (3021); okay here we have 30 Elopoints for doubling the performance (factor 2.7 for 4 cores).
So we can vote: Doubling the speed of an engine (~ 3 cores) is good for at most 30 Elopoints or your list is bullshit! But 30 Elopoints make Vas and Larry in a quarter (okay, when they are lazy in six months :-)). So why this crazy ratinglist (with 2 or 4 cores and 32- vs. 64-bit) with this enormous confidence intervals (and only 95% propability; this means with every 20th engine you tested, you are wrong).
You have to do enough. Test the engines on the market and give us a hint about mp-scaling and about 32- vs. 64-bit scaling. That´s enough!
Parent - - By Uri Blass (*****) Date 2007-12-30 05:04
Your conclusion is wrong
You cannot divide the numbers.

97 elo for being 4.4 times faster is not 97/4.4 elo for doubling the performance.
it is 97/log(4.4)*log(2) and you also cannot get conclusions about rating based on results that are only against weak opponents.

I also think that we do not know the effective speed factor from having more cores.
Rybka-Rybka games may be misleading and the effective speed factor also may be different at different time controls.

Uri
Parent - - By Roland Rösler (****) Date 2007-12-30 06:12
Your conclusion is wrong
Forget the numbers (I take them from the Rybka Website). My conclusion was, that rating lists like CCRL 40/40 are not so great or the assumption, that doubling the speed of engines by hardware or OS (32 vs. 64-bit) bring 60-70 Elo (see in the Rybka Website). The best you can say about rating lists like CCRL is, that you have an Elonumber for every engine (with hw and OS). That´s all. That the Elo is correct, is less than 5%. That it is in the confidence interval is 95%; with boundaries of +-19 Elo after 1000 games! With this in mind, what can say rating lists? Not so much, as many people assume!
I predict the win of cluster toga against Rybka (ok, Cluster Toga wasn´t rated).
I predict the no win of Hiarcs against Jonny (have a look on the CCRL rating list about Jonny). If I´m wrong ... , forget it!!
Parent - By Graham Banks (*****) Date 2007-12-30 06:44 Edited 2007-12-30 06:56
Perhaps you missed this excerpt from our report:

........when an engine has 200 games played, the error margin is still approximately +-40 ELO, after 500 games +-25 ELO, after 1000 games +-17 ELO and even after 2000 games there is a +-13 ELO error margin!
This of course highlights the importance of looking at other rating lists that are also available in order to draw comparisons and get a more accurate overall picture.


If you look at a range of rating lists, you'll get a fairly accurate picture overall. No one rating list is perfect.
And if other lists show similar results to ours, then you should be able to draw your own conclusions.

Regards, Graham.
Parent - By Uri Blass (*****) Date 2007-12-30 09:53
The rating list of CCRL clearly support 60-70 elo for doubling the speed.
Your calculation when you got 20-30 elo is simply wrong.

comparing of pure speed improvement(64 bit is less than twice faster than 32 bits)
the possible error is too high to get conclusion from one case but the average seem to be near 40 elo for 64 bits

64 bits is only 1.6 times faster or something like that for rybka and zappa so it is logical to believe that being twice faster
gives 60-70 elo.

Rybka 2.3.2a 64-bit 3075
Rybka 2.3.2a 32-bit 3023

Rybka 2.2 64-bit 2CPU 3056
Rybka 2.2 32-bit 2CPU 3019

Rybka 2.1 64-bit 2CPU 3076
Rybka 2.1 32-bit 2CPU 3003

Rybka 2.2 64-bit 3001
Rybka 2.2 32-bit 2989

Rybka 1.1 64-bit 2986
Rybka 1.1 32-bit 2959

Rybka 1.0 64-bit 2921
Rybka 1.0 32-bit 2884

Zappa Mexico 64-bit 2933
Zappa Mexico 32-bit 2890

Zappa 1.1 64-bit 2734
Zappa 1.1 32-bit 2688 

Uri
Up Topic Rybka Support & Discussion / Rybka Discussion / CCRL update (28th December 2007)

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill