Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka 3 CEGT results
1 2 Previous Next  
- - By Venator (Silver) Date 2008-08-07 13:08
First results (see: http://cegt.foren-city.de/topic,65,-testing-rybka-3-uci.html)

x64 - 2CPU
vs. Naum 3.1 x64 2CPU: 42,5:7,5 / +301 / perf. 3256 (!!)

w32 - 1CPU
vs. Zappa MxII x64 1CPU: 36,0:14,0 / +164 / perf. 3054
vs. Naum 3.1 w32 1CPU: 40,5:9,5 / +252 / perf. 3115
Parent - - By Venator (Silver) Date 2008-08-07 13:11
Oops, missed another one:

w32 - 1CPU
vs. Fritz 11.1 [2917]  69.0-31.0  perf=3056
Parent - - By Uri Blass (*****) Date 2008-08-07 13:24 Edited 2008-08-07 13:51
Some comparison of blitz rating:
Rybka 2.3.2a x64 2CPU                 rating 3056  performances of rybka3: 3256 (100 games)
Rybka 2.3.2a w32 1CPU                rating 2954  performances of rybka3: 3056,3054,3115(100 games for every performance)
Rybka 2.3.2a x64 4 CPU                rating 3082  performance of rybka 3: 3214(400 games)
Rybka 2.3.2aDynamic x64 4CPU                                                        3219(200 games)
Rybka 3 Human x64 4CPU                                                                3243(224 games)

2 cpu perormed the best so far
Parent - By Roland Rösler (****) Date 2008-08-07 14:01
Not enough games! R3 w32 sp will end up with 3075 and R3 x64 2CPU with 3185. :-)
Parent - - By lkaufman (*****) Date 2008-08-07 16:45
     The weighted average of the results you quote for the three versions of Rybka 3 on a quad is 3227. My prediction for R3 was 3220, and my results for the three versions (against the opposition) were virtually identical. So far it seems that my prediction was pretty close, if a bit conservative.
Parent - - By Heinz van Kempen (***) Date 2008-08-07 17:23
Hi Larry :-),

I am not so sure by now. Naum 3.1 seems to be somehow the favourite opponent for Rybka 3.

To be very careful and serious testers should not give ELO results with less than 500 games (error bars +-30) or even 1000 games (error bars +-18) and those against different opponents of course. To demonstrate this I give the ratings after adding only 35 to 40 games for each personality against Zappa Mexico II x64 4CPU and 36 games for Rybka 3 human vs. Deep Fritz 10.1 4CPU. You will see that there are already considerable changes, as there is often a tendency that initially very high ratings drop a bit or more:

Rybka 3 x64 4CPU               : 3215   31  31   441    78.9 %   2986   25.9 %
Rybka 3 Human x64 4CPU         : 3206   35  34   295    77.1 %   2995   32.9 %
Rybka 3 Dynamic x64 4CPU       : 3195   42  41   235    75.3 %   3001   27.2 %
Rybka 2.3.2a x64 4CPU          : 3083   13  12  1950    72.4 %   2915   37.8 %

And we will get more considerable changes continuing with the other opponents and finishing this matches in progress.

Personally I think 100 ELO improvement over a version being already at a level like Rybka 2.3.2a is almost unbelievable and by now the weakest personality is still 112 ELO ahead and default now is 132 ELO ahead, while it started worse from all. So I think it does not matter so much if 100, 120 or 140 ELO, this will differ from rating list to rating list and time control to time control, not mentioning other conditions.

Anyway in a few days and adding the results by the other testers our rating will be reliable giving our conditions. We will have more than 1000 games with rating list on August 10th at least for all 4CPU personalities.

Best Regards
Heinz
Parent - By lkaufman (*****) Date 2008-08-07 18:51
My tests indicate that Zappa Mexico is our worst opponent, and Deep Shredder 11 our favorite opponent, though Naum is also a good "customer".
Parent - - By Vasik Rajlich (Silver) Date 2008-08-07 20:55
Thanks Heinz for also running tests with the Dynamic and Human versions. We think that these may do better at longer time controls (especially the Human one), although it's really hard to be sure.

Vas
Parent - - By Heinz van Kempen (***) Date 2008-08-08 04:48 Edited 2008-08-08 04:51
Hi Vas :-),

no problem and interesting to test dynamic and human versions, too. For Blitz this only takes a few days, but for the two CEGT 40/120 lists you will have to be very patient as I will use now repeated time controls and this means it will last twice as long to collect a certain number of games. Just want to avoid that people pick a few 200 moves games from a database of more than 20 000 tournament time control games in order to demonstrate that time management for some engines is faulty in a limited third time control and this might influence the list, what of course will happen, but not significiantly.

I also suppose that Rybka 3 Human might be the best with long and very long time controls. We will know about this in a few months :-). For the moment I do not know if to start with the 40/120 quad list neglected for some time or with the 40/120 dual core list or with 40/400 or even add something using own books :-). Well there are already proposals from other CEGT testers willing to help with this time controls requiring a lot of patience from the testers.

Best Regards
Heinz
Parent - - By lkaufman (*****) Date 2008-08-08 13:46
I highly recommend that you use repeating time controls with half the time after the first control, at least for 40/120 and 40/400. All the programs have tested 40/x, so there is little chance of favoring one over the other, and this roughly simulates the way chess is normally played in real tournaments; there is almost always some accelerated or sudden death time control after either one or two controls. This would roughly reduce testing time by 20% or so, make it more interesting for the testers, and have hardly any downside.
Parent - - By Heinz van Kempen (***) Date 2008-08-08 14:35 Edited 2008-08-08 15:23
Hi Larry :-),

yes I thought about this and currently tend to start on Monday with 40/120 repeated real time control and not adapted to weaker hardware like CEGT and CCRL usually do to offer a high quality at the cost of masses of games (including anyway only top six engines).

Planned are the following matches on four quads as first ones:

Rybka 3 x64 4CPU vs. Zappa Mexico II x64 4CPU using my set of openings that I wll soon offer for download
Rybka 3 x64 4CPU Human vs. Zappa Mexico II x64 4CPU same like above

Rybka 3 x64 4CPU vs. Zappa Mexico II x64 4CPU with Jeroen Noomen book for Rybka and Zappa´s own book
Rybka 3 x64 4CPU Human vs. Zappa Mexico II x64 4CPU with Jeroen Noomen book for Rybka and Zappa´s own book

We tested some years ago that there is not much difference with rating performance using short generic books or own books, but such tests have to be repeated from time to time and maybe Jeroen´s book is really a killer. Moreover I like to help Jeroen to possibly improve his own book even more not relying only on server bullet and Blitz games.

This will give 12 games on 4 machines daily, three for each match. Focus is on offering replay for nice games and maybe discuss some here instead of stats with more than 1000 games for each  version. Those games will go to the quad rating list 40/120. If there are differences with own books there will be extracted a small new list only considering those games. Until then for comparison all four variations will be given separately.

http://www.husvankempen.de/nunn/40_120_ratinglist/Quad/qratinglist.html

Additionally Wolfgang and I (maybe also Charles) will start tests for Rybka 3 2CPU for the dual core 40/120 rating list.

http://www.husvankempen.de/nunn/40_120_ratinglist/ratinglist/rangliste.html
Best Regards
Heinz
Parent - - By Kapaun (****) Date 2008-08-08 15:55
How about testing with and without persistent hash?
Parent - - By Heinz van Kempen (***) Date 2008-08-08 16:47
Hi Kapaun :-),

regarding persistent hash I think either or, because it is not possible to test and compare all with this time controls. What do Vas and Larry recommend here?

Will anyway probably also drop Rybka 3 Dynamic x64 4CPU although she is currently leading in the Blitz match against Hiarcs 12 4CPU with a score of +30 =2 -0 what is quite scary and much better than default and human against Hiarcs. I checked all matches, no losses on time.

Best Regards
Heinz
Parent - - By Vasik Rajlich (Silver) Date 2008-08-08 19:31
Hi Heinz,

that's cool that you will also run games with Jeroen's book. There are a lot of different issues which we are trying to understand, so from our point of view the more different tests you run, the better.

Re. persistent hash, there may be a bug with using this in game play. I'll report about this as soon as I can. In the meantime, it's probably best to play without it.

Vas
Parent - - By Heinz van Kempen (***) Date 2008-08-08 19:59
Hi Vas :-),

thanks for the reply regarding persistent hash. What do you recommend regarding EGTB use and settings? Anything elso I should change in options respective to long time controls?

Best Regards
Heinz
Parent - By Vasik Rajlich (Silver) Date 2008-08-09 16:39
Hi Heinz,

I can confirm now that persistent hash has a bug for game play. It's best to leave it disabled - although for analysis I would definitely suggest using it.

Re. EGTB - default should be fine, there is no big difference.

Vas
Parent - - By Venator (Silver) Date 2008-08-08 17:17
.... not relying only on server bullet and Blitz games.

You will see that the Rybka 3 book is a combination of GM and computer games and is not relying solely on computer blitz games :-).

Furthermore, I think there is a great misconception about these computer blitz games: a lot of them are played with very good books, in which a lot of time and analysis has been invested. Hence the opening play in these games is of very high quality. And.... the Rybka 3 book is thoroughly computer checked and tested.
Parent - - By Heinz van Kempen (***) Date 2008-08-08 17:25
Hi Jeroen :-),

sounds good. I am curious. Any adaptations to other top engine books (killer variantions)? The Petroff novelty refers to those main lines with 5.d4 or to the latest experiments with 5.Nc3? Hopefully you also refuted Kramnik´s Catalan. I am fed up with that one :-).

Best Regards
Heinz
Parent - - By Venator (Silver) Date 2008-08-08 17:34
Hi Heinz,

Of course it has been tested against a lot of other engine books and if those books play a weaker line, the chance is high this will be neatly refuted :-)

The Petroff novelty is in the 5.Nc3 line. I am afraid the Catalan can't be refuted, Rybka3.ctg is playing it with both colours!

Best regards, Jeroen
Parent - By Kappatoo (*****) Date 2008-08-08 19:10
I'm sure you already handed the refutation of the Catalan for good money to Anand.
Parent - - By Uri Blass (*****) Date 2008-08-08 18:58
The main problem with repeating time control with half the time is about comparison relative to other time controls.

If all the time controls are like that then it may save time but in case that it is not the case people may suspect that time managament can be important.

If you use 40/120+40/60+40/60 forever then program that is using often only 100 minutes for the first 40 moves may perform relatively better at this time control relative to 40/20+40/20+40/20 because of better time managament.

I am not against starting a new list with 40/20+40/10+40/10 and 40/4+40/2+40/2 but new list also have disadvanatages.
If you think about one year forward then maybe new list is the best thing to do but new list is going to have less games in the near future.

Uri
Parent - By lkaufman (*****) Date 2008-08-08 19:23
The same objection you make could also be made about combining results with sudden death time controls with the normal ones. Unless I misunderstand, CEGT has already been combining such sudden death games with repeating controls for the 40/120' time control. In my opinion this is far more likely to favor one program over another than just using a faster form of the same type of time control after the first one.
Parent - By Heinz van Kempen (***) Date 2008-08-08 19:49 Edited 2008-08-08 20:03
Hi Larry and Uri :-),

there are always pros and cons and it is impossible to do anything so that everybody will be happy.

My point of view is that hardware is advancing, programs are advancing and so each year or every second year there can be started something new. For the beginning it might be possible to combine old and new data admitting that there are some inaccuracies and later extract those matches with new conditions for a new list. When I see what SSDF is testing, very different hardware and all combined from many years. There is no perfect rating list giving really exact equal conditions. Different hardware, benchmarks, different sets or books and so on. This you can only achieve by not combining with other testers like Klaus Wlotzka did for a long time and by not upgrading the hardware, but then it is hard to collect enough games for many engines.

For me it is necessary to get some new motivation after all this years and I get it currently by testing the Rybka 3 versions. Was a good idea to give three different ones here. Thanks to Larry and Vas. After this I might detect that it is impossible with the currently used methods by the testing groups to give good ratings when there is no tough competition and rating differences of 200 ELO and more over next best engines and just sell my hardware or apply as beta tester somewhere.

Best Regards
Heinz
Parent - - By Hamlet (**) Date 2008-08-07 21:19
Hi,

I have question regarding this:
"ELO results with less than 500 games (error bars +-30) or even 1000 games (error bars +-18)"

How do you calculate these error bars?
Do you do normality assumption? Is it based on simulations? What is significance(5% or 1%)?

regards,
hamlet
Parent - - By Heinz van Kempen (***) Date 2008-08-08 04:30 Edited 2008-08-08 05:02
Hi Hamlet :-),

the statisticians in the testing groups are using programs like EloStat and BayesElo. After processing the pgn file with all the games wished to be calculated for a rating list, there is an output of several lists. In EloStat for example, the main one is rating.dat giving only ratings, probabilities that those will fluctuate, score, opponent´s average strenght, amount of draws and amount of games. There is another one called programs.dat which indicates the individual performances against each opponent giving also percentages for this performances and two more lists. Talking about the main output rating.dat you have for example such entries:

Program                          Elo    +   -   Games   Score   Av.Op.  Draws

Rybka 3 x64 4CPU               : 3215   31  31   441    78.9 %   2986   25.9 %
Rybka 2.3.2a x64 4CPU          : 3083   13  12  1950    72.4 %   2915   37.8 %

Now the error bars you find underneath the plus and minus signs. Here for example +- 31 after 441 games and they shrink with more games like in the example with Rybka 2.3.2a to +-12 or 13 after almost 2000 games and with less than one hundred games you have skyhigh error bars so that you pracitcally can´t predict almost nothing.
The author of the rating calculation program Frank Schubert told that those ratings calculated with a probability of 95% will fluctuate within the given values with many games more and so there is a probability of 5% that ratings given fall out of those calculations. So the forecast is that rating for Rybka 3 x64 4CPU should finally be between between 3184 (-31 to the current rating) and 3246 (adding 31 ELO to the current rating).

But... I used the program for many years and this gives me the impression that probablities are not so exact like given, especially under certain circumstances. With an engine so dominating like Rybka 3 lacking real opponents it might well be that even after many hundred games you will not see the expected differences between all the single, dual core or quad versions or between 32-bit and 64-bit. So Wolfgang Battig yesterday got the insane result of 45.5 to 4.5 for Rybka 3 x64 on two cores against Zappa Mexico II x64 2CPU. Adding those and the other extreme results so far for the 2CPU version would show it leading the Blitz rating list far ahead of any 4 CPU Rybka version. We call such things statistical anomalies which disappear more or less after more than 1000 games for each engine version.

Hope this was your question. English is not my mother tongue as you can easily detect.

Best Regards
Heinz
Parent - - By Hamlet (**) Date 2008-08-08 09:30
OK. Thanks. I have to study these programs, because I am interested in formulas behind these error bars.
Also, I would like to point out that when we are talking about 5% confidence levels, it means that one out of 20 real result will be out out these bars.
So it you test 100 engine then result for 5 engines is outside of errorbars. People tend to forget that 5% happens quite often.

regards,
hamlet
Parent - By Uri Blass (*****) Date 2008-08-08 16:47
Note that rating can be based on the opponents so practically you cannot define the level of a program by a single number and things are more complicated.

If you define the exact opponents that you play then you can talk about rating but even in this case the 5% interval is not correct because the practical opponents may be slightly different than the hypotethical opponents that the rating is based on them.

This mistake may reduce the confidence but other errors may reduce the confidence because it is possible that the error bars are based on bigger variance then the real variance of the results so I am not sure if more than 1 out of 20 or less than 1 out of 20 are out of the error bars.

Uri
Parent - - By Uri Blass (*****) Date 2008-08-08 16:36
http://cegt.foren-city.de/topic,65,-testing-rybka-3-uci.html

summery of the results(did not calculate performance for results when it is not written):

Rybka 3 x64 4CPU          : 3224  627 (+412,=169,- 46), 79.2 % by heinz
Zappa Mexico II x64 4CPU      : 200 (+130,= 60,- 10), 80.0 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 41,- 16), 81.8 %
Hiarcs 12 4CPU                :  27 (+ 17,=  8,-  2), 77.8 %
Naum 3.1 x64 4CPU             : 200 (+122,= 60,- 18), 76.0 %

Gerhard:
Naum 3.1 x64 4CPU        [2998]  77.0-23.0 perf=3208 by Gerhard(total number of games 727 performance higher than 3220(kaufman's prediction)

Rybka 3 Human x64 4CPU    : 3216  616 (+396,=174,- 46), 78.4 %

Zappa Mexico II x64 4CPU      : 200 (+123,= 58,- 19), 76.0 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 38,- 19), 81.0 %
Hiarcs 12 4CPU                :  16 (+  8,=  7,-  1), 71.9 %
Naum 3.1 x64 4CPU             : 200 (+122,= 71,-  7), 78.8 %

4 Rybka 3 Dynamic x64 4CPU  : 3200  475 (+295,=133,- 47), 76.1 %

Zappa Mexico II x64 4CPU      : 200 (+113,= 67,- 20), 73.2 %
Deep Fritz 10.1 4CPU          :  65 (+ 46,= 12,-  7), 80.0 %
Hiarcs 12 4CPU                :  10 (+  9,=  1,-  0), 95.0 %
Naum 3.1 x64 4CPU             : 200 (+127,= 53,- 20), 76.8 %

5 Rybka 2.3.2a x64 4CPU     : 3084  1950 (+1043,=737,-170), 72.4 %

Naum 3.1 x64 4CPU             : 100 (+ 32,= 55,- 13), 59.5 %
Zappa Mexico II x64 4CPU      :  50 (+ 23,= 19,-  8), 65.0 %
Deep Fritz 10.1 4CPU          :  50 (+ 27,= 19,-  4), 73.0 %
Hiarcs 12 4CPU                :  50 (+ 20,= 24,-  6), 64.0 %

Heinz 2 cpu

1 Rybka 3 x64 2CPU          : 3231  336 (+238,= 81,- 17), 82.9 %

Zappa Mexico II x64 2CPU      : 136 (+ 86,= 42,-  8), 78.7 %
Naum 3.1 x64 2CPU             : 200 (+152,= 39,-  9), 85.8 %

Anthony 2 cpu

1 Rybka 3 Human x64 2CPU   42.0/50
2 Zappa Mexico II x64 2CPU   8.0/50

1 Rybka 3                  39.0/50
2 Zappa Mexico II x64  11.0/50

1 Rybka 3 x64 2CPU   41.0/50
2 Naum 3.1 x64 2CPU   9.0/50

Wolfgang results

Rybka 3 x64 2CPU     45.5/50 perf=3352
Zappa MxII x64 2CPU 4.5/50

Rybka 3 x64 2CPU    42.5/50 perf=3256
Naum 3.1 x64 2CPU   7.5/50
Parent - - By Heinz van Kempen (***) Date 2008-08-08 16:42
Hi Uri :-),

thanks for the summary.

Best Regards
Heinz
Parent - - By Uri Blass (*****) Date 2008-08-08 16:55
The summery was for 64 bit results
Here is summery of the 32 bit results single bit results

Fritz 11.1       69.0-31.0                   perf=3056
Deep Sjeng 3.0 w32 1CPU  76.5-23.5  perf=3026

average 3041

Zappa MxII x64 1CPU: 36,0:14,0        perf. 3054
Naum 3.1 w32 1CPU: 40,5:9,5 /         perf. 3115

Fruit 2.4 Beta A w32 1CPU: 35,5:14,5  perf. 3011
Ktulu 8: 45,5:4,5 /                           perf. 3170

average 3087.5

total performance near 3064 when 2.3.2a has 2954
Parent - By lkaufman (*****) Date 2008-08-08 18:24
A gain of 110 over 2.32a as your calculations show sounds reasonable, although I predict a slight improvement with more data. Some of the estimated 140 Elo gain on quads is due to better scaling, but probably not as much as 30 Elo, especially since the results on 2 cores have been so good. By my calculations, the data you cite for 64 bit 2 core Rybka 3 works out to 3237.5 after 536 games, while the four core result is 3222 after 727 games (sorry for my 3220 pre-release estimate, I shouldn't have rounded down!). Obviously the two core result is too high and will surely drop, but it does suggest that the four core and 32 bit ratings might rise with more data. The "Human" result was even better on two cores, and not much behind on four, which suggests that it will be awhile before we know the relative strength of the three versions; just as my testing showed, they are too close to call.
Parent - By Juergen Faas (**) Date 2008-08-08 16:53
Hm, Wolfgang´s results are really neat :)
Parent - By Venator (Silver) Date 2008-08-08 17:22
Hi Uri,

Thanks a lot! I was kind of losing track after today's results, but with your great summary this has been solved instantly :-)

Kind regards, Jeroen
Parent - - By Dragon Mist (****) Date 2008-08-07 16:47
Uri,

what is "Rybka 2.3.2aDynamic x64 4 CPU" (4th row)? You ment Rybka 3 ...?
Parent - By Uri Blass (*****) Date 2008-08-07 16:59
Yes

I meant Rybka3
Parent - - By Heinz van Kempen (***) Date 2008-08-07 13:45
Hi Jeroen and all :-),

just updated. Looks awful :-).

Best Regards
Heinz (CEGT)
Parent - By Venator (Silver) Date 2008-08-07 13:51
Looks awful

I think it looks great! ;-)
Parent - - By lkaufman (*****) Date 2008-08-07 17:01
I have a question about CEGT testing. Were there a substantial number of games played between programs on 4 CPUs against programs on 2 or 1 CPU? If there were not, the relative ratings of quad and non-quad programs could be out of line. I ask because my testing shows that Rybka 3 on 1 CPU does much better (in terms of performance rating) against other programs on 1 CPU than she does against other programs on quad, which suggests that your quad ratings in general are too close to the non-quad ratings. Or perhaps there is some other explanation?
Parent - By Heinz van Kempen (***) Date 2008-08-07 17:30 Edited 2008-08-07 17:34
Hi Larry :-),

yes a lot of matches were played in the Blitz rating list for 4CPU against 2CPU and for 2CPU against 1CPU. You can check this by clicking the engine names in our rating lists. This way you get the individual performances for every engine. Here an example with Rybka 2.3.2a x64 4CPU for 40/20:

http://www.husvankempen.de/nunn/40_40%20Rating%20List/40_40%20All%20Versions/1.html

and another one for 40/4

http://www.husvankempen.de/nunn/40_4_Ratinglist/40_4_AllVersion/1.html

Usually you will have more 4 CPU matches against 2 CPU for the weaker engines against the stronger ones, because something like DJ 10 2CPU  vs. Rybka 2.3.2a x64 4CPU is very lopsided.

For the moment it is that we have still very few games for 2CPU and 1CPU Rybka 3, as some of our testers are on holidays and especially Rybka 3 2CPU is without doubt currently overperforming having many games against Naum 3.1 on 2 CPU´s. I will here also start a match for the dual version against Zappa Mexico II x64 2CPU tonight.

Best Regards
Heinz
Parent - - By Venator (Silver) Date 2008-08-08 01:13
Update (comment by CEGT):

x64 - 2CPU
vs. Zappa MxII x64 2CPU: 45,5:4,5 / +402 / perf. 3352 (!!)

I really did not trust my eyes, no single loss for Rybka in these 50 games. So I decided to add 50 games overnight with another
PGN-Random-database, but games 51-60 ended 9:1 for Rybka, but Zappa really managed to win a game in game 55 (!!).

w32 - 1CPU
vs. Fruit 2.4 Beta A w32 1CPU: 35,5:14,5 / +156 / perf. 3011
vs. Ktulu 8: 45,5:4,5 / +402 / perf. 3170 (!!)
Parent - By lkaufman (*****) Date 2008-08-08 02:25
The Zappa result is really nice, as in my testing Zappa MexII was always our toughest opponent!
Parent - - By Roland Rösler (****) Date 2008-08-08 02:52
I really did not trust my eyes, no single loss for Rybka in these 50 games

He Jeroen, you like it really?

Once upon a time, we see the best two bookmakers of the world in Mexico! They have the smartest baby of the world and kissed her all the time. She was called Rybka (Rybusia?). But when they show the baby to the Mexican crowd, they only cried: We want Zappa. He is much stronger and looks much better than this female baby. And the end of this myth; the baby lost and one of the bookmakers was lost too. Ten months later on neutral territory, the female baby was stronger than ever. Nobody helped her really, but she punished this Frankenstein named Zappa. And she was glad and pleased. But in still moments she thinks, Mexico was my best time. At least two people loved me really; Jeroen and Dagh! Where is Daghhhhhhhhhhhhhhh?
Parent - By Uly (Gold) Date 2008-08-08 02:57
Oh, I really expected Rybka 3 to drag Dagh back to computer chess, but so far no Dagh is sight :(
Parent - By Roland Rösler (****) Date 2008-08-08 03:00
Nobody helped her really? This isn´t the whole truth. There were some strangers, called Iweta, Vas and Larry. But this is an other story! :-)
Parent - - By Venator (Silver) Date 2008-08-08 07:15
He Jeroen, you like it really?

The comment was from CEGT as I wrote between brackets, not be me.

Where is Daghhhhhhhhhhhhhhh?

We all like to know :-).

Once upon a time, we see the best two bookmakers of the world in Mexico!

Okay, Dagh and Erdo. But what about me!?
Parent - By Banned for Life (Gold) Date 2008-08-08 20:21
What about you? Why weren't you in Mexico? I must assume you are not fond of tequila or Latino women...
Parent - - By Venator (Silver) Date 2008-08-08 07:26
Update 2 (all 40/4, like above):

Rybka 3 UCI w32 1CPU
Deep Sjeng 3.0 w32 1CPU  [2821]  76.5-23.5  perf=3026

Rybka 3 UCI x64 4CPU
Naum 3.1 x64 4CPU        [2998]  77.0-23.0  perf=3208

First intermediate result posted by Heinz this morning:

1 Rybka 3 x64 2CPU          : 3247  260 (+188,= 61,- 11), 84.0 %

Zappa Mexico II x64 2CPU      :  66 (+ 40,= 23,-  3), 78.0 %
Naum 3.1 x64 2CPU             : 194 (+148,= 38,-  8), 86.1 %

2 Rybka 3 Human x64 4CPU    : 3220  534 (+348,=147,- 39), 78.9 %

Zappa Mexico II x64 4CPU      : 147 (+ 92,= 40,- 15), 76.2 %
Deep Fritz 10.1 4CPU          : 187 (+134,= 36,- 17), 81.3 %
Naum 3.1 x64 4CPU             : 200 (+122,= 71,-  7), 78.8 %

3 Rybka 3 x64 4CPU          : 3218  550 (+359,=149,- 42), 78.8 %

Zappa Mexico II x64 4CPU      : 150 (+ 94,= 48,-  8), 78.7 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 41,- 16), 81.8 %
Naum 3.1 x64 4CPU             : 200 (+122,= 60,- 18), 76.0 %

4 Rybka 3 Dynamic x64 4CPU : 3180  338 (+199,=100,- 39), 73.7 %

Zappa Mexico II x64 4CPU      : 138 (+ 72,= 47,- 19), 69.2 %
Naum 3.1 x64 4CPU             : 200 (+127,= 53,- 20), 76.8 %

5 Rybka 2.3.2a x64 4CPU     : 3084  1950 (+1043,=737,-170), 72.4 %
Parent - - By Heinz van Kempen (***) Date 2008-08-08 15:25 Edited 2008-08-08 15:32
Hi all :-),

a quick update:

My tests for all Rybka 3 4CPU versions against Naum 3.1 x64 4CPU  and Zappa Mexico II x64 4CPU are finished. Those against Deep Fritz 10.1 4CPU soon. Started is Hiarcs 12 default 4CPU deactivating own book and DS 11 x64 4CPU will follow

[code]Individual statistics:

1 Rybka 3 x64 2CPU          : 3231  336 (+238,= 81,- 17), 82.9 %

Zappa Mexico II x64 2CPU      : 136 (+ 86,= 42,-  8), 78.7 %
Naum 3.1 x64 2CPU             : 200 (+152,= 39,-  9), 85.8 %

2 Rybka 3 x64 4CPU          : 3224  627 (+412,=169,- 46), 79.2 %

Zappa Mexico II x64 4CPU      : 200 (+130,= 60,- 10), 80.0 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 41,- 16), 81.8 %
Hiarcs 12 4CPU                :  27 (+ 17,=  8,-  2), 77.8 %
Naum 3.1 x64 4CPU             : 200 (+122,= 60,- 18), 76.0 %

3 Rybka 3 Human x64 4CPU    : 3216  616 (+396,=174,- 46), 78.4 %

Zappa Mexico II x64 4CPU      : 200 (+123,= 58,- 19), 76.0 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 38,- 19), 81.0 %
Hiarcs 12 4CPU                :  16 (+  8,=  7,-  1), 71.9 %
Naum 3.1 x64 4CPU             : 200 (+122,= 71,-  7), 78.8 %

4 Rybka 3 Dynamic x64 4CPU  : 3200  475 (+295,=133,- 47), 76.1 %

Zappa Mexico II x64 4CPU      : 200 (+113,= 67,- 20), 73.2 %
Deep Fritz 10.1 4CPU          :  65 (+ 46,= 12,-  7), 80.0 %
Hiarcs 12 4CPU                :  10 (+  9,=  1,-  0), 95.0 %
Naum 3.1 x64 4CPU             : 200 (+127,= 53,- 20), 76.8 %

5 Rybka 2.3.2a x64 4CPU     : 3084  1950 (+1043,=737,-170), 72.4 %

Naum 3.1 x64 4CPU             : 100 (+ 32,= 55,- 13), 59.5 %
Zappa Mexico II x64 4CPU      :  50 (+ 23,= 19,-  8), 65.0 %
Deep Fritz 10.1 4CPU          :  50 (+ 27,= 19,-  4), 73.0 %
Hiarcs 12 4CPU                :  50 (+ 20,= 24,-  6), 64.0 %[/code]

Please note that those not official overviews do not contain the matches from Gerhard, Wolfgang, Anthony or others in the CEGT Forum thread and that the rating list might look very differently at this point.

Anyway I fear that we will get a statistical anomaly for Rybka 3 x64 2CPU and will not be able to play enough games here until the rating list on August 10th.

Best Regards
Heinz
Parent - - By Venator (Silver) Date 2008-08-08 16:17
Hi Heinz,

Thanks for the update!

Jeroen
Parent - - By Heinz van Kempen (***) Date 2008-08-09 12:53 Edited 2008-08-09 13:01
Hi all :-),

a quick update copied from CEGT Forum:

Hi Gerhard  :) ,

wait and see  :) . After adding games with Hiarcs 12 4CPU and first ones with DS 11 x64 4CPU ratings are shooting up. But maybe you will have to lower the StartELO (currently 2711) by 7 to 10 points because Rybka 2.3.2a x64 4CPU also won 7 ELO since I started testing, without adding games for that one, what is a normal effect when adding many games for strong engines to the database for calculation. But what will count are the differences between Rybka 3 and Rybka 2.3.2a versions and those are currently shocking. I am curious to see what happens tomorrow when you add your games and those by others.

I guess I will run only  to complete 1000 games per 4CPU engine version until tomorrow morning being able to add that way some more for Rybka 3 x64 2CPU.

[code]Individual statistics:

1 Rybka 3 x64 4CPU          : 3258  839 (+586,=199,- 54), 81.7 %

Deep Shredder 11 x64 4CPU     :  39 (+ 29,=  9,-  1), 85.9 %
Zappa Mexico II x64 4CPU      : 200 (+130,= 60,- 10), 80.0 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 41,- 16), 81.8 %
Hiarcs 12 4CPU                : 200 (+162,= 29,-  9), 88.2 %
Naum 3.1 x64 4CPU             : 200 (+122,= 60,- 18), 76.0 %

2 Rybka 3 Dynamic x64 4CPU  : 3237  830 (+561,=203,- 66), 79.8 %

Deep Shredder 11 x64 4CPU     :  30 (+ 22,=  6,-  2), 83.3 %
Zappa Mexico II x64 4CPU      : 200 (+113,= 67,- 20), 73.2 %
Deep Fritz 10.1 4CPU          : 200 (+134,= 52,- 14), 80.0 %
Hiarcs 12 4CPU                : 200 (+165,= 25,- 10), 88.8 %
Naum 3.1 x64 4CPU             : 200 (+127,= 53,- 20), 76.8 %

3 Rybka 3 Human x64 4CPU    : 3223  803 (+519,=222,- 62), 78.5 %

Deep Shredder 11 x64 4CPU     :  29 (+ 18,=  8,-  3), 75.9 %
Zappa Mexico II x64 4CPU      : 200 (+123,= 58,- 19), 76.0 %
Deep Fritz 10.1 4CPU          : 200 (+143,= 38,- 19), 81.0 %
Hiarcs 12 4CPU                : 174 (+113,= 47,- 14), 78.4 %
Naum 3.1 x64 4CPU             : 200 (+122,= 71,-  7), 78.8 %

4 Rybka 3 x64 2CPU          : 3200  583 (+402,=138,- 43), 80.8 %

Deep Shredder 11 x64 2CPU     :  26 (+ 19,=  4,-  3), 80.8 %
Zappa Mexico II x64 2CPU      : 213 (+137,= 60,- 16), 78.4 %
Hiarcs 12 2CPU                : 108 (+ 71,= 26,- 11), 77.8 %
Naum 3.1 x64 2CPU             : 200 (+152,= 39,-  9), 85.8 %
Deep Fritz 10.1 2CPU          :  36 (+ 23,=  9,-  4), 76.4 %

5 Rybka 2.3.2a x64 4CPU     : 3088  1950 (+1043,=737,-170), 72.4 %

Deep Shredder 11 x64 4CPU     : 100 (+ 39,= 52,-  9), 65.0 %
Zappa Mexico II x64 4CPU      :  50 (+ 23,= 19,-  8), 65.0 %
Deep Fritz 10.1 4CPU          :  50 (+ 27,= 19,-  4), 73.0 %
Hiarcs 12 4CPU                :  50 (+ 20,= 24,-  6), 64.0 %
Naum 3.1 x64 4CPU             : 100 (+ 32,= 55,- 13), 59.5 %

6 Rybka 2.3.2a x64 2CPU     : 3056  2694 (+1426,=1000,-268), 71.5 %

Deep Shredder 11 x64 2CPU     : 190 (+ 94,= 72,- 24), 68.4 %
Zappa Mexico II x64 2CPU      : 190 (+ 75,= 85,- 30), 61.8 %
Hiarcs 12 2CPU                : 150 (+ 75,= 56,- 19), 68.7 %
Naum 3.1 x64 2CPU             : 100 (+ 39,= 50,- 11), 64.0 %[/code]

Best Regards
Heinz
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka 3 CEGT results
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill