Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Weekend entertainment with the IPON
1 2 3 Previous Next  
Parent - - By Regularuser (***) Date 2013-05-19 19:26
One thing I don't understand about the results is that the overall elo result is nowhere the mean of the individual match results which so far is about 3052 as opposed to 3035.   Even if you weight the match results by the number of games played in each match so far you still get 3052 as the result.

Is the overall result correct, in other words is this how elo is meant to work?
Parent - - By mindbreaker (****) Date 2013-05-19 21:36
Does look like there is a problem. I calculated an average of 3053. I thought he might have divided by the wrong number of engines but the error would have been much larger if he had. You got me.  You would think if it played the same number of games against each engine the average of the performances would be the rating.

A ratings formula can weigh matches that are closer to 50% a bit more because in theory more error is possible when the engines have a large strength disparity. But in this case that should give it a boost instead of dragging it down.

The only thing that makes sense is that there are two different ratings formulas in play.

But I am no expert on this either.
Parent - By Regularuser (***) Date 2013-05-20 05:46
I will be interested to see waht Ingo has to say on this.

I may have to resort to reading up on ELO formulae!
Parent - By ernest (****) Date 2013-05-20 18:38

> Does look like there is a problem


There is no problem at all !!!

See for instance
http://www.talkchess.com/forum/viewtopic.php?p=422320#422320
Parent - By mindbreaker (****) Date 2013-05-19 22:00
I also tried throwing out the high and the low; still 3053.
Parent - By mindbreaker (****) Date 2013-05-19 22:09
There was a similar discrepancy in the boot performance posted above. The performance given was 2744 but the average was 2736.
Parent - - By Chess_Rambo (***) Date 2013-05-20 07:02
The arithmetic mean of the values of a nonlinear function at some x-values is not necessarily the same as the value of the function at the arithmetic mean of these x-values.

I hope this makes sense. If I look at Google's re-translation into German, I doubt it. :smile:
Parent - By Regularuser (***) Date 2013-05-20 08:24
Thanks for replying and yes it makes perfect sense :smile:
And yes I understand that (I have post-grad maths).   I was assuming that the elo calculation function would be approximately linear over the range of results of the individual matches, and therefore the overall elo result would be close to the mean of the individual match reuslts.    I would expect it to be different but to be so far out looks strange to me.

I have found the elo calculation rules and I will do some calculations later today.
Parent - By oudheusa (*****) Date 2013-05-20 10:27
Ingo, are you aware that there are ad pop-ups connected to your site?
Parent - By RFK (Gold) Date 2013-05-20 17:01 Edited 2013-05-20 17:07
For Entertainment! Well, let's place this next to say,

No Country For Old Men Great build up...right up to the intersection!

If only the Coen's  could have devised something with a real twist, beyond Chigurh getting busted up at the intersection. But NO! They had to leave it nebulous! Like this Engine test. :razz:
Parent - - By Ingo (***) Date 2013-05-20 17:19
Hi

I promissed entertainment and looking to my web counter it was great fun. Those who are disapointed should try to get a life or at least the matter "computer-chess" in some real life proportions ...

The solution was already given by Don Daily. It is a CCT setting with 'time usage aggressivness = 3'. It might perform better with long time controls but I assume this is just looking like due to the increased draw rate ...

Anyhow, locking at the real data there is not much difference between them:

2 Guess (K-CCT-TA3)           : 2700 (+1583,=899,-218),  75.3%

Chiron 1.5                    : 150 (+ 79,= 62,-  9), 73.3 %
Deep Shredder 12              : 150 (+ 92,= 49,-  9), 77.7 %
Critter 1.4a                  : 150 (+ 47,= 85,- 18), 59.7 %
Booot 5.2.0                   : 150 (+121,= 23,-  6), 88.3 %
Protector 1.5.0               : 150 (+ 83,= 58,-  9), 74.7 %
Gull II                       : 150 (+ 62,= 75,- 13), 66.3 %
Quazar 0.4                    : 150 (+118,= 29,-  3), 88.3 %
Deep Junior 13.3              : 150 (+110,= 33,-  7), 84.3 %
Deep Rybka 4.1                : 150 (+ 55,= 73,- 22), 61.0 %
Stockfish 3                   : 150 (+ 46,= 82,- 22), 58.0 %
Deep Sjeng c't 2010 32b       : 150 (+111,= 29,- 10), 83.7 %
spark-1.0                     : 150 (+106,= 41,-  3), 84.3 %
Hannibal 1.3                  : 150 (+103,= 36,- 11), 80.7 %
Naum 4.2                      : 150 (+ 93,= 42,- 15), 76.0 %
Toga II 3.0 32b               : 150 (+117,= 27,-  6), 87.0 %
HIARCS 14 WCSC 32b            : 150 (+102,= 39,-  9), 81.0 %
Houdini 3 STD                 : 150 (+ 39,= 68,- 43), 48.7 %
Spike 1.4 32b                 : 150 (+ 99,= 48,-  3), 82.0 %

2 Komodo CCT                 : 2700 (+1590,=893,-217), 75.4 %

Chiron 1.5                    : 150 (+ 75,= 68,-  7), 72.7 %
Deep Shredder 12              : 150 (+ 97,= 47,-  6), 80.3 %
Critter 1.4a                  : 150 (+ 45,= 76,- 29), 55.3 %
Booot 5.2.0                   : 150 (+128,= 20,-  2), 92.0 %
Protector 1.5.0               : 150 (+ 90,= 48,- 12), 76.0 %
Gull II                       : 150 (+ 55,= 82,- 13), 64.0 %
Quazar 0.4                    : 150 (+116,= 28,-  6), 86.7 %
Deep Junior 13.3              : 150 (+114,= 31,-  5), 86.3 %
Deep Rybka 4.1                : 150 (+ 52,= 78,- 20), 60.7 %
Stockfish 3                   : 150 (+ 49,= 78,- 23), 58.7 %
Deep Sjeng c't 2010 32b       : 150 (+107,= 38,-  5), 84.0 %
spark-1.0                     : 150 (+112,= 34,-  4), 86.0 %
Hannibal 1.3                  : 150 (+ 95,= 48,-  7), 79.3 %
Naum 4.2                      : 150 (+ 99,= 39,- 12), 79.0 %
Toga II 3.0 32b               : 150 (+123,= 24,-  3), 90.0 %
HIARCS 14 WCSC 32b            : 150 (+ 95,= 46,-  9), 78.7 %
Houdini 3 STD                 : 150 (+ 33,= 71,- 46), 45.7 %
Spike 1.4 32b                 : 150 (+105,= 37,-  8), 82.3 %

Guess (K-CCT-TA3) - Komodo CCT                    : 150 (+ 26,=107,- 17), 53.0 %


0.1%, difference - actually they are identical!

Of course the individual performances differ and looking at the self test one might conclude the setting is better then the original but this is just 150 games - which in itself is just a bit better than dicing.

Of course this is not included into the IPON list.

Btw, why was it this forum which really got that excited? I could even see that in my web statistics where the clicks where coming from - interesting thought!

Bye
Ingo

PS: Congratulations to Graham, the only one who guessed right! He really has his ears (or reading skills) everywhere ;-)
Parent - - By RFK (Gold) Date 2013-05-20 17:34 Edited 2013-05-20 17:41

>Btw, why was it this forum which really got that excited? I could even see that in my web statistics where the clicks where coming from - interesting thought!


Only because we thought it was Shredder! :yell:

Guess=Shredder!

Was that an accident?
Parent - - By Ingo (***) Date 2013-05-20 17:57

>Guess=Shredder!


>Was that an accident?


As it was pointed out this is the GUI not the engine. You find that with any tourney I am running.

Bye
Ingo
Parent - By RFK (Gold) Date 2013-05-20 19:34
Now we know!:smile:
Parent - - By Labyrinth (*****) Date 2013-05-20 21:21
What was guess's final rating?
Parent - - By Christian Packi (****) Date 2013-05-21 01:35
Pretty much the same as Komodo CCT.
Parent - By RFK (Gold) Date 2013-05-21 01:52 Edited 2013-05-21 01:56
:yell:

(Has anyone told Don?:twisted: Oh! the left side of his brain probably figured it out by now! )
Parent - - By Carl Bicknell (*****) Date 2013-05-22 08:06 Edited 2013-05-22 08:20
Same program with same rating just a different time setting...of course that's disappointing Ingo! It's like when you stopped testing Rybka 4.1 xyz

The reason this forum was excited is because people hoped it was R5.
Parent - By Banned for Life (Gold) Date 2013-05-22 08:49
A lot of people would have been happy to see a new competitive Shredder, or even an MP Komodo as strong as CCT, but agree that the same engine with a different time management algorithm is BORING!!!
- - By siam (**) Date 2013-05-18 20:51
I think it is Shredder13.

According to the list upto now it plays again Shredder 12 is below 3000. The only one on the list lower than 3000 (halfway the games)
Parent - By oudheusa (*****) Date 2013-05-18 22:15
I think you are right. Still, would be an incredible elo jump (literally)!
Parent - By mindbreaker (****) Date 2013-05-19 03:31
Shredder has bounced back more than once. Definite possibility.  Could be a new Thinker, or Zappa.  Still if it is anything other than Komodo or Houdini 3 with a different setting, I think it would be a surprise. Gull's been making progress? Sjeng or Spike? They haven't been heard from in a while.

I suspect it is German in origin if it is not Komodo or an H3 setting.
Parent - - By M ANSARI (*****) Date 2013-05-19 04:34
The most probable guess would be Komodo MP ... it is the engine that we know is being developed and would be the engine many are waiting for.  I am pretty sure it is not a Rybka or one of the other authors breathing new life on an old engine.
Parent - - By Gaмßito (****) Date 2013-05-19 06:03
I agree. Probably Komodo 5.1 MP or Equinox 2.0.

About Komodo 5.1 MP: some weeks ago Don clearly said that this engine will be around 30 points Elo weaker (using 1 core) than Komodo CCT. When he said that, Komodo 5.1 Mp was about -5 Elo points weaker than Komodo 5.0. And he did not expect to add so many Elo points (so fast) to that version. So, if ''guess'' is really Komodo 5.1 MP, that only shows the great work that Don and Larry have been doing recently.

I really was hoping that Komodo 5.1 MP (with 1 core) may have at least the same strength than Komodo CCT.

Regards,
Gaмßito.
Parent - By M ANSARI (*****) Date 2013-05-19 06:08
I would think that they would be able to quickly improve efficiency on a new engine and thus quickly gain ELO points.  This is probably similar to changing tires on a Formula 1 car race ... you lose some time changing tires, but you are very quickly able to make that up  and more, after a few laps.
Parent - - By Gaмßito (****) Date 2013-05-19 10:49
Guess - Houdini 3 STD (3082)        52.0  -  52.0    50.00%    Perf=3082
Guess - Komodo CCT (3039)            57.0  -  48.0    54.29%    Perf=3068
Guess - Critter 1.4a (2980)             61.0  -  43.0    58.65%    Perf=3040
Guess - Stockfish 3 (2976)              61.0  -  43.0    58.65%    Perf=3036
Guess - Deep Rybka 4.1 (2957)        63.0  -  41.0    60.58%    Perf=3031

http://www.inwoba.de/index.html

Impressive performance against the top five!

Regards,
Gaмßito.
Parent - By Chaotic Chess (****) Date 2013-05-19 15:06
Its cryptic?
Parent - By Bloodbane (**) Date 2013-05-19 06:12
Could the new engine be a new version of Sjeng? Sjeng hasn't been updated in three years so it's definitely a possibility.
- - By Magnus Friedmann (***) Date 2013-05-19 15:19
I think is Chessmaster 12 :twisted:
Parent - By Dr.Wael Deeb (***) Date 2013-05-19 16:19
Hell yaaaa :grin:
- - By oudheusa (*****) Date 2013-05-19 17:28
Don/Ingo revealed on Talkchess that it is Komodo CCT with time usage aggressiveness set to 3.
http://talkchess.com/forum/viewtopic.php?t=48035&postdays=0&postorder=asc&topic_view=&start=30

What a bunch of nonsense.
Parent - By RFK (Gold) Date 2013-05-19 17:34
+1
Parent - - By tomgdrums (****) Date 2013-05-19 18:53

> Don/Ingo revealed on Talkchess that it is Komodo CCT with time usage aggressiveness set to 3.
> http://talkchess.com/forum/viewtopic.php?t=48035&postdays=0&postorder=asc&topic_view=&start=30
>
> What a bunch of nonsense.


+1  indeed.

This was nonsense!
Parent - - By RFK (Gold) Date 2013-05-20 01:58
Hi Tom,

Has Don indicated how much improvement he expects to see in this latest "IPON-Rating Test" -or is it just a publicity stunt to sell Komodo CCT and  raise revenue and finish the redecorating of his bathroom?
Parent - By tomgdrums (****) Date 2013-05-20 03:13

> Hi Tom,
>
> Has Don indicated how much improvement he expects to see in this latest "IPON-Rating Test" -or is it just a publicity stunt to sell Komodo CCT and  raise revenue and finish the redecorating of his bathroom?


I think it was just a publicity stunt.  I am not sure if they expected any improvement or anything.  I am not really sure what the purpose of all this was.  They have managed to sell three versions of Komodo before releasing an MP version.  Pretty shrewd business tactics.   :)
Parent - By Bloodbane (**) Date 2013-05-19 20:36
+1
Parent - By Labyrinth (*****) Date 2013-05-20 00:12
Komodo mp on one core was a good guess then =)
Parent - - By M ANSARI (*****) Date 2013-05-20 05:52 Edited 2013-05-20 05:55
Well this just shows how much Time Management can increase "perceived" ELO at fast time controls!  I had similar results with Rybka during beta testing, where one TM setup would score 70 ELO to 100 ELO stronger at a certain time control setup.  The only problem was that once you change the time control, the time management setup would need to be changed.  A good TM algo that will modify its TM setup according to the time control is probably worth many ELO points!  But I will be the first to admit that this does not mean the engine is stronger, it just means that it uses the time more effectively at a given time control.  This has zero ELO gain for analysis and is more for scoring well in engine vs. engine tourneys at faster time controls.
Parent - By Regularuser (***) Date 2013-05-20 06:11
Exactly :)

And it leaves an interesting question.... to what extent do the rating lists tell us what engine is best for analysis?
- By Magnus Friedmann (***) Date 2013-05-20 07:51
Naum 5 would have been interesting to see it coming back ;)
Up Topic The Rybka Lounge / Computer Chess / Weekend entertainment with the IPON
1 2 3 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill