Imagine Rybka 3 to have the following EXTRA AMAZING results in CCRL.
It should be noted that Rybka 3 obviously will not have so good results, not even close, but i just wanted to see what ELO an almost GOD engine would have.
Rybka 3 64-bit 4CPU - Naum 3.1 64-bit 4CPU -----> +42 =8 -0 ||Rybka 2.3.2a ----> +18 =28 -4
Rybka 3 64-bit 4CPU - Zappa Mexico II 64-bit 4CPU -----> +36 =7 -0 ||Rybka 2.3.2a ----> +17 =20 -6
Rybka 3 64-bit 4CPU - Naum 3 64-bit 4CPU -----> +25 =7 -0 ||Rybka 2.3.2a ----> +7 =23 -2
Rybka 3 64-bit 4CPU - Zappa Mexico 64-bit 4CPU -----> +100 =20 -0 ||Rybka 2.3.2a ----> +48 =86 -25
Rybka 3 64-bit 4CPU - Toga II 1.4.1SE 4CPU -----> +29 =6 -0 ||Rybka 2.3.2a ----> +19 =12 -5
Rybka 3 64-bit 4CPU - Hiarcs 12 4CPU -----> +45 =10 -0 ||Rybka 2.3.2a ----> +26 =28 -1
Rybka 3 64-bit 4CPU - Hiarcs Paderborn 2007 4CPU -----> +30 =8 -0 ||Rybka 2.3.2a ----> +19 =16 -3
Rybka 3 64-bit 4CPU - Deep Fritz 10.1 4CPU -----> +28 =5 -0 ||Rybka 2.3.2a ----> +19 =12 -2
Rybka 3 64-bit 4CPU - Glaurung 2.1 64-bit 4CPU -----> +45 =7 -0 ||Rybka 2.3.2a ----> +29 =20 -3
Rybka 3 64-bit 4CPU - Toga II 1.3.1 -----> +29 =3 -0 ||Rybka 2.3.2a ----> +22 =10 -0
Rybka 3 64-bit 4CPU - Deep Sjeng 2.7 4CPU -----> +36 =4 -0 ||Rybka 2.3.2a ----> +27 =11 -2
Rybka 3 64-bit 4CPU - Fruit 051103 -----> +39 =5 -0 ||Rybka 2.3.2a ----> +2 =1 -0
Rybka 3 64-bit 4CPU - Glaurung 1.2.1 64-bit 4CPU -----> +49 =4 -0 ||Rybka 2.3.2a ----> +40 =13 -0
Rybka 3 64-bit 4CPU - Chess Tiger 2007.1 -----> +34 =3 -0 ||Rybka 2.3.2a ----> +3 =0 -0
WOW! Incredible results someone would say. 3800 ELO? 3900 ELO? Well not quite....:
Here is the calculated ELO with the "Bayeselo" program(Elostat gives only 7 points more for Rybka 3) that CCRL uses (i have calibrated the results to 3130 ELO that Rybka 2.3.2a 64-bit 4 CPU has on CCRL):
Rank Name ELO + - Games Score
1 Rybka 3 64-bit 4CPU 3387 36 33 664 93%
2 Rybka 2.3.2a 64-bit 4CPU 3130 15 15 1500 72%
3 Rybka 2.2 64-bit 4CPU 3112 30 29 403 73%
4 Rybka 2.3.2a 64-bit 2CPU 3091 17 17 1181 73%
5 Rybka 2.1 64-bit 4CPU 3086 39 37 248 73%
6 Rybka 2.1 64-bit 2CPU 3077 26 25 571 73%
7 Naum 3.1 64-bit 4CPU 3074 23 23 607 57%
8 Zappa Mexico II 64-bit 4CPU 3073 20 20 821 61%
9 Rybka 2.3.2a 64-bit 3069 12 12 2376 72%
10 Naum 3 64-bit 4CPU 3068 19 19 929 60%
11 Zappa Mexico 64-bit 4CPU 3065 16 16 1358 58%
One would be disappointed. "Only" 3387?
The point of all these is to show that if your opponents are not so strong then your ELO can't become too high(from a standard top player) due to the drawish nature of Chess as we play it(due to out inability to play it in a perfect way-although this, if Chess is a draw, leads to the opposite effect).
Remember that ELO justs measures the performance of a Chess player based on the results of the games he played against some other players. If the other players are weak his ELO can't become too high(while Rybka's results in the rating lists don't confirm that).
So people that would expect a super engine with 4000 ELO(always based on the 3130 ELO that Rybka 2.3.2a 64-bit 4 CPU) should wait long enough until the opponents of Rybka become much stronger.
And what if we break CCRL rules and match Rybka 3 with Rybka 2.3.2a? Then we would have:
Rybka 3 64-bit 4CPU - Rybka 2.3.2 64-bit 4CPU ---> +16 =20 -4 (+108 ELO, a result that agrees with Larry's results)And then the ratings would be:
Rank Name Elo + - games score
1 Rybka 3 64-bit 4CPU 3363 33 31 704 91%
2 Rybka 2.3.2a 64-bit 4CPU 3130 15 15 1540 71%
3 Rybka 2.2 64-bit 4CPU 3109 30 29 403 73%
4 Rybka 2.3.2a 64-bit 2CPU 3087 17 17 1181 73%
5 Rybka 2.1 64-bit 4CPU 3082 39 37 248 73%
6 Rybka 2.1 64-bit 2CPU 3073 26 25 571 73%
7 Zappa Mexico II 64-bit 4CPU 3069 20 20 821 61%
8 Naum 3.1 64-bit 4CPU 3069 23 23 607 57%
9 Rybka 2.3.2a 64-bit 3065 12 12 2376 72%
10 Naum 3 64-bit 4CPU 3064 19 19 929 60%
11 Zappa Mexico 64-bit 4CPU 3061 16 16 1358 58%
(PS: I want to thank Kirill Kryukov that helped me learning about how CCRL calculates its ratings.)
So here it is(again with base ELO the 3130 of Rybka 2.3.2a 64-bit 4 CPU):
(I remind that i put Rybka 3 to have against all the 15 programs i mentioned earlier, all wins except one draw with black)
Rank Name Elo + - games score
1 Rybka 3 64-bit 4CPU 3706 88 70 704 99%
2 Rybka 2.3.2a 64-bit 4CPU 3130 15 15 1540 70%
3 Rybka 2.2 64-bit 4CPU 3112 30 29 403 73%
4 Rybka 2.3.2a 64-bit 2CPU 3091 17 17 1181 73%
5 Rybka 2.1 64-bit 4CPU 3086 39 37 248 73%
6 Rybka 2.1 64-bit 2CPU 3076 26 25 571 73%
7 Naum 3.1 64-bit 4CPU 3075 24 24 607 57%
8 Zappa Mexico II 64-bit 4CPU 3074 20 20 821 60%
9 Rybka 2.3.2a 64-bit 3069 12 12 2376 72%
10 Naum 3 64-bit 4CPU 3067 19 19 929 60%
11 Zappa Mexico 64-bit 4CPU 3066 16 16 1358 57%
> And what if we break CCRL rules and match Rybka 3 with Rybka 2.3.2a?
And what rule would that break, exactly? Don't play imaginary matches between versions of the same engine? There's plenty of hot Rybka incest on CCRL, you know... :-p
The main problem is that you may need to play against stronger players and it may be harder or even impossible to achieve 90% against 2010's top non rybka engines and you may need 90% in order to get 3700.
Uri
Perhaps they can test both contempt=0 and with contempt= a value you will propose and see who will be better. That would be interesting....
It is possible that in 2012 when maybe there are going to be some programs that are stronger than rybka3 we are going to find that contempt=0 have higher rating than the default version.
Uri
> because we have to choose a default contempt which will be based on our strongest likely opponents.
Note that it changes during months : you could set a high default contempt now but it sould be lower (because of stronger opponents) in 1 year and still decreasing each months ...
> Ideally, we could have a place to enter the opponent's rating, and the program would calculate the optimum contempt factor based on that, but I don't know if that would be allowed by the testing organizations, and there is also the problem that they have different scales (CCRL ratings run about 50 higher than CEGT).
In high level chess , it's very very common to know about the strenght of your opponent, so why not in computer chess ?
The problem of losing ELO points when playing weaker opponents is well known. And it seems to be a fact, Jeff Sonas claims that it is due to a flaw in the ELO formula definition.
Actually, the way FIDE calculates ELO is pretty archaic and from the pre-computer era.
What contempt should you use for playing 2200 players in blitz, 300? :-)
>the best value can be approximated by rating difference/15
Hi Larry,
Would your contempt formula work when playing against stronger opponents?
ie. when playing an opponent that is 200 elo higher; a contempt setting of -13 would give better results.
-v
> This 97.2% score translated to +613 Elo, which when added to the Crafty ratings gives us 3242 CCRL and 3218 CEGT, both remarkably close to my estimates for Rybka 3 quad.
http://www.open-aurec.com/wbforum/viewtopic.php?t=6587
You need more games.
agreed. but unfortunately it's quite the arms race in the opening theory realm. it wont be long until we start seeing pretty much all major lines being strongly booked 30-moves (60-ply) deep. that will really plummet the elo rating of the top program.
Powered by mwForum 2.22.1 © 1999-2010 Markus Wichitill
