Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Excellent Stockfish contempt test
- By MrKris (***) Date 2020-02-06 02:54
Thanks to Stephan Pohl for his outstanding test!
https://www.sp-cc.de/experiments.htm
http://talkchess.com/forum3/viewtopic.php?f=2&t=72992

I resorted Stephan Pohl's excellent test for the draw rates:
     Program                 Elo    +    -   Games   Score   Av.Op.  Draws

   7 Stockfish 11 C=-40    : 3549    4    4  9000    49.2 %   3555   84.5 %
   3 Stockfish 11 C=-24    : 3555    4    4  9000    50.1 %   3554   82.7 %
   4 Stockfish 11 C=-15    : 3554    4    4  9000    50.0 %   3554   81.7 %
   1 Stockfish 11 C=0      : 3558    4    4  9000    50.7 %   3553   79.3 %
   2 Stockfish 11 C=+15    : 3558    4    4  9000    50.6 %   3554   77.8 %
   5 Stockfish 11 C=+24    : 3554    4    4  9000    50.0 %   3554   76.2 %
   6 Stockfish 11 C=+40    : 3551    4    4  9000    49.5 %   3555   73.8 %

Note that the drawrate order for all 7 is excatly as expected!

Original: https://www.sp-cc.de/experiments.htm
      Program                 Elo    +    -   Games   Score   Av.Op.  Draws

   1 Stockfish 11 C=0      : 3558    4    4  9000    50.7 %   3553   79.3 %
   2 Stockfish 11 C=+15    : 3558    4    4  9000    50.6 %   3554   77.8 %
   3 Stockfish 11 C=-24    : 3555    4    4  9000    50.1 %   3554   82.7 %
   4 Stockfish 11 C=-15    : 3554    4    4  9000    50.0 %   3554   81.7 %
   5 Stockfish 11 C=+24    : 3554    4    4  9000    50.0 %   3554   76.2 %
   6 Stockfish 11 C=+40    : 3551    4    4  9000    49.5 %   3555   73.8 %
   7 Stockfish 11 C=-40    : 3549    4    4  9000    49.2 %   3555   84.5 %

Conclusions: Only the +40 and -40 Contempt results are somewhat weaker. All other Contempts are inside errorbar at the same level of strength.


There is a common misconception: error bars never negate a test.
Statistically C=0, and very slightly less so C=+15, is indicated as strongest for the 9000 games.

Statistically nothing is ever stair step, everything is bell curves.
Error bars do not negate that C=0 and C=+15 are strong top outliers.
Statistically it is "ignore result differences within the error bars at your own risk" because error bars are just a part of statistics, Elo, raw scores, etc. are parts of it also.

Correlation https://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-coefficient-r/ is part of statistics also.
There looks like a strong negative linear correlation between strength and the absolute value of contempt:

3358 3358
                     
            3355      
     3354   3354      
                   3351
                   3349
C 0  +-15   +-24   +-40


He says "Only the +40 and -40 Contempt results are somewhat weaker." From the error bars.
- But +40 is less below 50% (0.5%) than 0 and +15 are above 50% (0.7% and 0.6%).
Note that CorChess has been using +12 as its default for quite some time now.

As he concludes, statistically it would take more games to prove the strongest contempt.
However, in support of the supposition (or premise for further testing) that the 9000 games might be enough for strength (desipe the error bar warning) is the fact that the 9000 were enough to put all 7 drawrates in exactly the expected order.
Up Topic The Rybka Lounge / Computer Chess / Excellent Stockfish contempt test

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill