Topic Rybka Support & Discussion / Rybka Discussion / So um, I'm doing a test with 1.2f and 2.2n. 300 games so far
And 2.2n is only leading by 30ish ELO. A amusing streaks occured, with 1.2f staying neck and neck for the first 100 games, wins were even for a long time, then a 30 game plus streak hit, and I'm not exactly sure what to think.
Time: 3 minutes with a 1 second increment.
3-4-5 tablebases. (Three directories with them in various formats).
128 mb hash.
Priority is set to low for both engines, they both use the same amount of CPU time when it's their turn, cycling down to 0 when it's not. (Ponder is off).
Is this an appropriate result? I assume the Fritz 10's percentage displays are confidence intervals, right? Last I checked, at a 99.7% confidence interval, Rybka 2.2n could be 37 elo stronger, or 121 elo stronger. Which means I need to let it run a long time yet. :)
Time: 3 minutes with a 1 second increment.
3-4-5 tablebases. (Three directories with them in various formats).
128 mb hash.
Priority is set to low for both engines, they both use the same amount of CPU time when it's their turn, cycling down to 0 when it's not. (Ponder is off).
Is this an appropriate result? I assume the Fritz 10's percentage displays are confidence intervals, right? Last I checked, at a 99.7% confidence interval, Rybka 2.2n could be 37 elo stronger, or 121 elo stronger. Which means I need to let it run a long time yet. :)
This confidence interval sounds buggy - 99.7% confidence will give you a variance of far more than 90 Elo.
You can just run EloStat directly:
http://wbec-ridderkerk.nl/html/download/other/elostat_13.zip
Vas
You can just run EloStat directly:
http://wbec-ridderkerk.nl/html/download/other/elostat_13.zip
Vas
I assume you meant standard deviation rather than variance. Also, I get an error message when I click on the link.
I wanted to say that the Elo span (from lowest to highest) which will cover 99.7% of the results is going to be more than 90 Elo. So, whatever the word for that is :)
It's not quite standard deviation either, 99.7% is going to span many standard deviations.
Re. link, you can get EloStat from here:
http://wbec-ridderkerk.nl/html/downmain.htm
Vas
It's not quite standard deviation either, 99.7% is going to span many standard deviations.
Re. link, you can get EloStat from here:
http://wbec-ridderkerk.nl/html/downmain.htm
Vas
It looks like it is supposed to be the confidence interval from a Chi Squared test although it looks like most of the requirements for a Chi Squared test aren't being met. Thanks for the link to EloStat.
It looks like a Chi Squared interval indeed, 3 Chi Squared standard deviations. To compare with CCRL one has to divide the interval by 1.5, they seem to use 2 Chi Cube standard deviations. The result would be something like 80+/-30 elo points advantage for the newest Rybka. The shortcut "variance" may, very rarely, be used for such intervals.
Ok, thanks.
Two standard deviations is the normal accepted "this is no longer just luck" interval. When you run an engine match in Fritz, it displays a range of values labeled "99.7%". I can't imagine what this could be, it certainly isn't the interval for 99.7% confidence.
Vas
Two standard deviations is the normal accepted "this is no longer just luck" interval. When you run an engine match in Fritz, it displays a range of values labeled "99.7%". I can't imagine what this could be, it certainly isn't the interval for 99.7% confidence.
Vas
I had no idea what the percentages were, I just hoped they were confidence levels. :P
> It's not quite standard deviation either, 99.7% is going to span many standard deviations.
To be precise, 3 standard deviations.
> Two standard deviations is the normal accepted "this is no longer just luck" interval.
To be precise, with a 95,45% probability of trustworthiness (confidence).
Follow this "68-95-99.7 rule" table from Wikipedia (sd=standard deviation)
1 sd -> 68.26894921371%
2 sd -> 95.44997361036%
3 sd -> 99.73002039367%
4 sd -> 99.99366575163%
5 sd -> 99.99994266969%
6 sd -> 99.99999980268%
7 sd -> 99.99999999974%
This is for a normal distribution (and Elo error is so distributed)
For other non normal distributions (as hardware/software guided values) follow this, instead
1.4 sd -> 50%
2 sd -> 75%
3 sd -> 89%
4 sd -> 94%
5 sd -> 96%
6 sd -> 97%
7 sd -> 98%
sd>7 -> ~ (100 - 100/sd^2) %
Topic Rybka Support & Discussion / Rybka Discussion / So um, I'm doing a test with 1.2f and 2.2n. 300 games so far
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill
