Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / So um, I'm doing a test with 1.2f and 2.2n. 300 games so far
- - By Tef [us] Date 2007-01-13 17:28
And 2.2n is only leading by 30ish ELO. A amusing streaks occured, with 1.2f staying neck and neck for the first 100 games, wins were even for a long time, then a 30 game plus streak hit, and I'm not exactly sure what to think.

Time: 3 minutes with a 1 second increment.
3-4-5 tablebases. (Three directories with them in various formats).
128 mb hash.
Priority is set to low for both engines, they both use the same amount of CPU time when it's their turn, cycling down to 0 when it's not. (Ponder is off).

Is this an appropriate result? I assume the Fritz 10's percentage displays are confidence intervals, right? Last I checked, at a 99.7% confidence interval, Rybka 2.2n could be 37 elo stronger, or 121 elo stronger. Which means I need to let it run a long time yet. :)
Parent - - By Vasik Rajlich (Silver) [hu] Date 2007-01-15 13:09
This confidence interval sounds buggy - 99.7% confidence will give you a variance of far more than 90 Elo.

You can just run EloStat directly:

http://wbec-ridderkerk.nl/html/download/other/elostat_13.zip

Vas
Parent - - By Banned for Life (Gold) Date 2007-01-16 01:02
I assume you meant standard deviation rather than variance. Also, I get an error message when I click on the link.
Parent - - By Vasik Rajlich (Silver) [hu] Date 2007-01-18 11:44
I wanted to say that the Elo span (from lowest to highest) which will cover 99.7% of the results is going to be more than 90 Elo. So, whatever the word for that is :)

It's not quite standard deviation either, 99.7% is going to span many standard deviations.

Re. link, you can get EloStat from here:

http://wbec-ridderkerk.nl/html/downmain.htm

Vas
Parent - - By Banned for Life (Gold) Date 2007-01-18 17:49
It looks like it is supposed to be the confidence interval from a Chi Squared test although it looks like most of the requirements for a Chi Squared test aren't being met. Thanks for the link to EloStat.
Parent - - By capiola [ru] Date 2007-01-18 21:21
It looks like a Chi Squared interval indeed, 3 Chi Squared standard deviations. To compare with CCRL one has to divide the interval by 1.5, they seem to use 2 Chi Cube standard deviations. The result would be something like 80+/-30 elo points advantage for the newest Rybka. The shortcut "variance" may, very rarely, be used for such intervals.
Parent - - By Vasik Rajlich (Silver) [hu] Date 2007-01-20 14:07
Ok, thanks.

Two standard deviations is the normal accepted "this is no longer just luck" interval. When you run an engine match in Fritz, it displays a range of values labeled "99.7%". I can't imagine what this could be, it certainly isn't the interval for 99.7% confidence.

Vas
Parent - By Tef [us] Date 2007-01-20 17:22
I had no idea what the percentages were, I just hoped they were confidence levels. :P
Parent - By PCMorphy72 (**) [it] Date 2007-01-20 18:06

> It's not quite standard deviation either, 99.7% is going to span many standard deviations.


To be precise, 3 standard deviations.

> Two standard deviations is the normal accepted "this is no longer just luck" interval.


To be precise, with a 95,45% probability of trustworthiness (confidence).

Follow this "68-95-99.7 rule" table from Wikipedia (sd=standard deviation)

1 sd -> 68.26894921371%
2 sd -> 95.44997361036%
3 sd -> 99.73002039367%
4 sd -> 99.99366575163%
5 sd -> 99.99994266969%
6 sd -> 99.99999980268%
7 sd -> 99.99999999974%

This is for a normal distribution (and Elo error is so distributed)

For other non normal distributions (as hardware/software guided values) follow this, instead

1.4 sd -> 50%
2 sd -> 75%
3 sd -> 89%
4 sd -> 94%
5 sd -> 96%
6 sd -> 97%
7 sd -> 98%
sd>7 -> ~ (100 - 100/sd^2) %
Up Topic Rybka Support & Discussion / Rybka Discussion / So um, I'm doing a test with 1.2f and 2.2n. 300 games so far

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill