Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Comparison of the rankings...
- - By Walter Eigenmann (**) Date 2007-08-08 10:03
.

Here you can see a comparison of the 4 most important rankings;
There are nearly no differences…:

http://glareanverlag.wordpress.com/tag/schach/

Regards: Walter

.
Parent - - By turbojuice1122 (Gold) Date 2007-08-08 15:15
How in the world does the COMP2007 list have a rating for Hydra with over 500 games?
Parent - - By Walter Eigenmann (**) Date 2007-08-08 20:28
playchess.de
Parent - - By turbojuice1122 (Gold) Date 2007-08-08 22:33
No, that's cheating--first of all, the programs playing on there aren't always the actual program that's playing.  Second of all, the number of CPU's can often vary quite a bit (especially with Hydra--they wouldn't spend all of that money just to have all CPUs running to play blitz on Playchess).  Third of all, opponents' ratings on that are known to fluctuate so wildly as to be virtually invalid.  If they really use those games, then they are not at all a valid rating list.
Parent - - By Walter Eigenmann (**) Date 2007-08-08 22:56
Njet.
This Hydra games in the COMP2007 (--->Internet/playchess.de) was played by Nick "Zor Champ" (Hydra-Sponsor) & time controls >30-120min/Engine
Parent - - By turbojuice1122 (Gold) Date 2007-08-09 00:51
Time control doesn't matter--the opponent ratings on Playchess are completely unreliable--they typically have a standard deviation on the order of 100 points or so, often more, and are highly dependent on opening book used, time of day, previous opponents, etc.  The fact that the games are long simply means that the game quality is high, not that there is any better reliability in the rating.  Even worse, "Zor Champ" doesn't always actually play using Hydra!  He often plays using Shredder or other engines.
Parent - - By Walter Eigenmann (**) Date 2007-08-09 06:25
Shure. In the COMP2007 are many games of "ZorChamp" with Shredder a.a.
Parent - - By turbojuice1122 (Gold) Date 2007-08-09 09:38
Okay, but even with that problem eliminated, you still have the fact that the rating of a program and its opponents on Playchess is about as reliable as predicting the temperature in a city 10 days in advance based on the temperature now--yes, there is a correlation, but it's just not very reliable--you'll be right "on average", but the statistical uncertainties will be profoundly high.  This therefore isn't really a rating list, and certainly not one of the more reliable ones, as was implied in the original post, but instead more of a frequency distribution with some averaging applied--basically, there are virtually no fixed variables and relatively few known variables.  The scientific methodology just isn't there compared with lists like CEGT, CCRL, SSDF, CSS, etc.
Parent - - By Walter Eigenmann (**) Date 2007-08-11 08:19
Shure: The COMP2007 is'nt a engine-tournament, but a data base.
Nevertheless: The same results.
The difference: Much more games/engine...
Parent - By Felix Kling (Gold) Date 2007-08-11 21:50
Hi Walter!

I think you have a good point, the result indicates that your way to calculate the engine strength is also OK.
Your test show, that the circumstances for the test aren't as important as one could think.

Anyway, I also agree that a test like the CCRL or CEGT should produce more precise results.
Up Topic The Rybka Lounge / Computer Chess / Comparison of the rankings...

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill