Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Chess Engine Ranking
- - By Razor (****) Date 2012-06-22 17:18
So, if IPON has the following:
Rank   Name                    ELO    +/- Error
1        Houdini 2.0 STD      3030   +10 / -10
2        Critter 1.6              2987   +10 / -10
3        Komodo 4              2984   +10 / -10
4        Stockfish 2.2.2 JA   2966    +10 / -10

Is the above correctly ranked or should it be . . .
Rank   Name                 ELO    +/- Error
1        Houdini 2.0 STD   3030   +10 / -10
2        Critter 1.6           2987   +10 / -10
2        Komodo 4           2984   +10 / -10
3        Stockfish 2.2.2 JA   2966    +10 / -10

and so on because of the statistical error declared.  BTW, not singling IPON out; all lists appear to have this apparent flaw!  :smile:
Parent - By Uly (Gold) Date 2012-06-22 18:20
Ties don't work like that, it would go 1 - 2 - 2 - 4 (no third place).
Parent - - By Vempele (Silver) Date 2012-06-22 18:25
No, it should be

1        Houdini 2.0 STD      3030   +10 / -10
1        Critter 1.6              2987   +10 / -10
1        Komodo 4              2984   +10 / -10
1        Stockfish 2.2.2 JA   2966    +10 / -10

as the error margins are only 95% confidence (on most lists AFAIK). They won't reach 100% with a finite number of games so there's no point assigning rankings.
Parent - - By Razor (****) Date 2012-06-22 20:54
Not sure I understand what you are saying here; are you saying the +/- 10 ELO error margin specified on the IPON list to be in error and that you believe the error range on the IPON list to be larger than this?
Parent - - By saurus_ (**) Date 2012-06-22 21:55
it means that there is a95% probability that that the engines are with +/- point from the stated number.
100% can not be achieved.
Parent - By Razor (****) Date 2012-06-23 08:20
Yes, I understand confidence and this is quite different to precision or variation if you prefer this term.  The sample size used {number of games, etc.} helps to shape variation {precision} along with of course other consideration, such as for example, the type of distribution of data we have here.

My claim ignores all of this as I am only a 'User' of the ranking lists and my claim therefore is based on the numbers in front of me, not how they have been calculated or whether the sample size used and the standard error inherent in their solution is correctly stated.  Simply based on the numbers shown I draw everyones attention to the fact that you cannot claim a ranking exists between two co-existing engines on the list where the stated ELO rating for both lies within the error range claimed.

I feel so much better now!  :smile:
Parent - By keoki010 (Silver) Date 2012-06-23 00:02

> +10 / -10
>


but isn't this plus/minus 90%? So it would be 10 +/-??? Just asking questions, I'm not a mathematician>
Parent - By Uly (Gold) Date 2012-06-23 03:45
Isn't it then that there's a 95% chance that 1 - 2 - 2 - 4 is correct?
Up Topic The Rybka Lounge / Computer Chess / Chess Engine Ranking

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill