Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Parameters Experiment 38: 64 Elo over R4 default
1 2 3 4 5 6 Previous Next  
- - By mindbreaker (****) Date 2010-09-02 05:44
My latest batch of Experiments is a little over 400 games in per engine. One is showing very good results and has won 3 of 4 gauntlets and came in a close second in the forth.

They only play against non-R4s, just as ratings are determined in most ratings charts.

I am getting 3357 Elo based on 406 games where Rybka 4 is 3293 Elo based on 1986 games.

It is of course very preliminary, I expect to run the engines 900-1200 games each. 

So far I have not seen any indication that I am running out of room to improve the parameters.

The current batch of 9 has completed 100 game matches with: StockFish 1.8, Deep Fritz 11, Critter .8, and an unnameable opponent.  The Current opponent is Zappa Mexico II in a gauntlet which has just gotten under way.

I plan to give an update at 800 games in.  And at the end at 1200 games per engine or less if I end it sooner, I plan to post the parameters and final results.
Parent - - By Uly (Gold) Date 2010-09-02 07:28
Looking forward to it, you could be finding new Rybka 4.1 defaults!
Parent - - By mindbreaker (****) Date 2010-09-02 20:43
I don't think it will become the default though it might be good as an included personality.  I suspect R4 was tuned against other variants of itself and is probably optimized to get the best results against other variants of R4.  If that is the case my variant could easily loose to it head to head.

What it should be good at is tournaments were there will only be one Rybka 4 or for ratings charts.

I may try for a match version as well that will be very good at not loosing but at the cost of more draws.  But I suspect 4.1 will arrive before I get to it.
Parent - - By Uly (Gold) Date 2010-09-02 21:14

> But I suspect 4.1 will arrive before I get to it.


Or not, Vas is still nowhere to be found (eh, virtually-virtually, in the real-virtual world I could send him an email and he'd answer, but I don't want to bug him).
Parent - - By oudheusa (*****) Date 2010-09-05 06:17
You don't want to bug him. :razz:
Parent - - By Uly (Gold) Date 2010-09-05 19:40
Er, I don't want to bother him.
Parent - By keoki010 (Silver) Date 2010-09-05 20:15
Bug him would be better. :yell::lol:
Parent - By Akbarfan (***) Date 2010-09-02 22:20

> But I suspect 4.1 will arrive before I get to it.


You believe everything?

I believe it only when I see it.
Parent - By Geomusic (*****) Date 2010-09-21 11:36
Yes, I have noticed this myself in (C) tournaments where you have multiple personalities who all score slightly better or worse than default then having 1 particular personality doing well against those variants but worse against default itself when head/head. very annoying... and back to drawing board :)
Parent - - By mdt (**) Date 2011-01-06 22:24
probably i'm wrong (because i think that simply deep inside myself can't belive you get 60+ elo over default) but i have the feeling you tested too much variants...
let's suppose you want to test all the engines in this world (obviously i'll exaggerate) by doing a ...let's say... 32-games match agaisnt rybka.
after testing thousands of engines i wound't suprise to see anywhere in the huge table something like   Booot 4.15 - rybka 4    19.0 - 13.0  :roll:
how wrong am i? could you test more the best variant?
Parent - By mindbreaker (****) Date 2011-01-07 05:51
You need to go to the last page (Tournament of the Champions).  Indeed, the best of the variants appear closer to 30 Elo stronger rather than 60.  It however is ridiculous to compare my tests to 32-game matches with one opponent.  Most of the variants play over 800 games total against 10 or more opponents.  As for how the tests pointed at 60+ Elo gains...that has not been resolved but probably has something to do with the rating formula functions or Rybka 4 Default just having very bad luck initially.
Parent - - By Bouddha (****) Date 2010-09-02 09:29
Looking forward for results & parameters.

Up to know I have not seen any reel improvement by changing the parameters.

regards
Parent - - By Gaмßito (****) Date 2010-09-02 14:27 Edited 2010-09-02 14:31

> Up to know I have not seen any reel improvement by changing the parameters.


Neither me.

You also can see here:

http://www.computerchess.org.uk/ccrl/4040/rating_list_all.html that the claim that Rybka 4 64-bit TC3100150 4CPU was stronger than default was just an illusion.

Regards,
Gaмßito.
Parent - - By mindbreaker (****) Date 2010-09-02 20:25
Well, I would not characterize it as an illusion exactly.  It is just that it only out performs at short time controls: http://www.computerchess.org.uk/ccrl/404/rating_list_all.html

I found those timings effective as well, but I have not tried to refine them, though they are included.  All my efforts have been directed at piece valuation beyond the initial investigations of parameters given in the forums.  I am hoping by doing so, that the improvements are general.  Still it is conceivable that the parameters I am working on are only or most effective at faster time controls.

Ah just remembered, I did investigate the rook endgame scaling thing.  That got me nowhere. 100 is the optimal, any deviation is destructive according to my findings.
Parent - - By titanium cranium (***) Date 2010-09-04 18:31
"It is just that it only out performs at short time controls:"

LOL, so at my 30 days/move controls, the defaults are better.
Parent - By mindbreaker (****) Date 2010-09-05 04:02
I guess you just need 1000 computers and a couple years to find out .
Parent - By Wayne Lowrance (***) Date 2010-09-02 18:07
Sounds interesting. I will stay tuned
Wayne
Parent - - By mindbreaker (****) Date 2010-09-08 10:19
As promised, here is the update after 800 games each.  I plan at least 300 more per engine.  The current competitors are Exp 30-Exp 38.

I suppose it is not that surprising that Exp 38 could not continue that performance, however it did manage a narrow overall victory at this stage.
For a second I thought it had surrendered its first place; the Fritz software actually mis-sorted! That's a new one.

Each generation/cycle has seen a ratings increase over the previous generation.  It looks like this time it will not be a big jump; 24 was the leader of the last generation, 18 before that, and 8 and Forum before that.  The current gain over the default settings is 44Elo. But as I have already stated these settings are not configured for direct head to head matches with other R4 configurations only other top engines in say the first 10-15 places on most ratings lists.

Here is the new list:

                                                         R      G
1    Rybka 4 x64 Exp 38 v1  3337  806
2    Rybka 4 x64 Exp 31 v1  3336  810
3    Rybka 4 x64 Exp 24 v2  3332  1331
4    Rybka 4 x64 Exp 37 v1  3331  807
5    Rybka 4 x64 Exp 30 v1  3330  810
6    Rybka 4 x64 Exp 26 v2  3329  1329
7    Rybka 4 x64 Exp 34 v1  3329  810
8    Rybka 4 x64 Exp 28 v1  3325  1327
9    Rybka 4 x64 Exp 25 v2  3324  1330
10    Rybka 4 x64 Exp 18 v1  3321  3039
11    Rybka 4 x64 Exp 11 v1  3320  453
12    Rybka 4 x64 Exp 27 v1  3319  1328
13    Rybka 4 x64 Exp 32 v1  3318  810
14    Rybka 4 x64 Exp 36 v1  3318  807
15    Rybka 4 x64 Exp 33 v1  3318  810
16    Rybka 4 x64 Exp 22 v1  3317  2231
17    Rybka 4 x64 Exp 35 v1  3317  809
18    Rybka 4 x64 Exp 29 v1  3312  1327
19    Rybka 4 x64 Exp 23 v1  3311  2178
20    Rybka 4 x64 Exp 8 v1  3309  1310
21    Rybka 4 x64 Exp 9 v1  3306  429
22    Rybka 4 x64 Exp 14 Human  3304  898
23    Rybka 4 x64 Exp 7 v1  3301  815
24    Rybka 4 x64 Exp 4 v1  3298  898
25    Deep Rybka 4 x64 v1  3298  161
26    Rybka 4 x64 Forum v1  3296  1159
27    Rybka 4 x64 Exp 1 v1  3296  916
28    Rybka 4 x64 Exp 20 v1  3297  590
29    Rybka 4 x64 Exp 15 v1  3296  1799
30    Rybka 4 x64 Exp 3 v2  3296  422
31    Rybka 4 x64 Exp 16 v1  3294  555
32    Deep Rybka 4 x64  3293  1986
33    Rybka 4 x64 Exp 21 v1  3293  900
34    Rybka 4 x64 Exp 10 v1  3289  459
35    Rybka 4 x64 Exp 19 v1  3289  189
36    Deep Rybka 4 x64 Lasker  3285  477
37    Rybka 4 x64 Exp 13 vC13510  3283  749
38    Rybka 4 x64 Exp 2 v1  3282  471
39    Rybka 4 x64 Beta 15 v1  3282  426
40    Rybka 4 x64 Exp 12 v1  3282  896
41    Rybka 3  3279  1897
42    Rybka 3 Dynamic  3279  892
43    Rybka 3 Human  3277  1461
44    Rybka 4 x64 Exp 17 v1  3270  354
45    Rybka 4 x64 Exp 6 v1  3257  121
46    Deep Rybka 4 x64 Human  3255  174
47    Stockfish 1.8 JA 64bit  3245  3118
48    Stockfish 1.7.1 JA 64bit 4t  3233  4225
49    Stockfish 1.7 JA 64bit 4t  3222  560
50    Stockfish 1.6.3 JA 64bit  3205  384
51    Critter 0.80 64-bit  3172  1800
52    Deep Fritz 11  3115  4772
53    Naum 4 4t  3112  3414
54    HIARCS 12 MP  3105  3283
55    Deep Shredder 12  3101  1824
56    spark-0.4  3093  1802
57    Critter 0.70 64-bit  3075  2324
58    Komodo64 1.1 JA  3075  2333
59    Zappa Mexico II  3068  2519
60    Protector 1.3.3 x64 4t  3067  385
61    Deep Shredder 11 UCI  3061  2410
62    Komodo64 1.0 JA  3051  407
63    bright-0.5c  3032  2350
64    bright-0.4a  3014  21
65    Protector 1.3.6 x64 4t  2968  900
66    Thinker54AInert-MP64-UCI  2959  900
67    spark-0.3a  2933  23
68    Deep Fritz 12  2651  385

Sorry about the alignment; it just does that when I copy and paste from Excel.
Parent - - By Uly (Gold) Date 2010-09-08 13:54
Thanks for the report, what are the settings of Exp 38?
Parent - By mindbreaker (****) Date 2010-09-08 15:04
"And at the end at 1200 games per engine or less if I end it sooner, I plan to post the parameters and final results." ;)
Parent - - By Banned for Life (Gold) Date 2010-09-08 14:26
It would be interesting to carefully look at the statistics to see if any of these are really better than the default (in a statistically significant sense), or whether as Dagh suggested, this is just a matter of having a large number of similar strength settings, with some unsurprisingly faring better than others due to statistical noise.
Parent - - By Uly (Gold) Date 2010-09-08 14:54
There's still the question of why new experimental parameters are consistently doing better than old ones (i.e. he always finds a new one that performs better than all the old ones).
Parent - By Banned for Life (Gold) Date 2010-09-08 15:47
First, I'm not sure if he is changing time allocation values. As a person who is only interested in analysis strength, I have zero interest in this aspect of tuning. Second, its very easy, and natural, to come up with great results using a mixture of a large number of candidates and survival bias (just get rid of the laggards after 1000 games or so and you will end up with miraculous results, even if if all the engines are identical in every way.

Anyway, as I stated below, I'm very willing to be proven wrong (but also rather skeptical).
Parent - - By mindbreaker (****) Date 2010-09-08 14:57
If it were statistical noise then each cycle would not perform better than the one before as a group. If you will notice, the latest 9 engines finished no lower than 17th place and even the lowest one is 24 Elo higher than the Default settings. If you took the group to be the same engine then that engine would have played 7269 games and earned an Elo of 3326 hardly statistical noise and an improvement of 33 Elo.
Parent - - By Banned for Life (Gold) Date 2010-09-08 15:43
Actually, what I should have said is statistical noise combined with survivor bias. I should further point out that my comments apply only to non-time allocation changes. Everyone agrees that the time allocation for engine games in R4 is God awful, but this has zero relevance to the large majority of people using the engine for analysis purposes rather than engine-engine games.

I'll admit that I'm skeptical that you have found changes in piece values that measurably improve strength, but I would be happy to be proven wrong,  Perhaps after you release the settings, the piece value settings can be tested against a gamut of other engines by a third party tester and the results can be compared to the default piece value settings.
Parent - - By Uly (Gold) Date 2010-09-08 16:57
I predict that mindbreaker's improvements will hold against non-Rybka engines.
Parent - - By Banned for Life (Gold) Date 2010-09-08 22:11
I hope that their are improvements due to non-time-management parameter changes. But we shall see...
Parent - - By mindbreaker (****) Date 2010-09-09 04:45
In the first generation I tried a few timings that were given on the site but pretty much since then (second to fifth generation each with roughly 9 engines per generation) I have not messed with the timings.  The last 38 engines have been 3 100 150 except a handful of engines which were default timings instead.  As timings have been essentially constant, the piece values should be what is stratifying their performances.
Parent - By Uly (Gold) Date 2010-09-09 06:15
Then default settings with only 3 100 150 changed should appear as reference (so timings can be ignored).
Parent - - By mindbreaker (****) Date 2010-09-12 03:25
It is just statistical noise but check out what Exp 30 is doing to Komodo ;)
Parent - - By mindbreaker (****) Date 2010-09-12 03:42 Edited 2010-09-12 04:38
Here is the current ratings table (without the vs Komodo round as it is not complete).  I will probably run one more cannon fodder engine after that.  As you can see, we have a new leader!

1    Rybka 4 x64 Exp 31 v1    3346    1053
2    Rybka 4 x64 Exp 38 v1    3341    1050
3    Rybka 4 x64 Exp 30 v1    3339    1053
4    Rybka 4 x64 Exp 37 v1    3338    1051
5    Rybka 4 x64 Exp 24 v2    3335    1331
6    Rybka 4 x64 Exp 34 v1    3333    1053
7    Rybka 4 x64 Exp 36 v1    3333    1051
8    Rybka 4 x64 Exp 26 v2    3332    1329
9    Rybka 4 x64 Exp 32 v1    3330    1053
10    Rybka 4 x64 Exp 28 v1    3329    1327
11    Rybka 4 x64 Exp 25 v2    3328    1330
12    Rybka 4 x64 Exp 33 v1    3327    1053
13    Rybka 4 x64 Exp 11 v1    3326    454
14    Rybka 4 x64 Exp 35 v1    3324    1052
15    Rybka 4 x64 Exp 18 v1    3323    3039
16    Rybka 4 x64 Exp 27 v1    3322    1328
17    Rybka 4 x64 Exp 22 v1    3319    2231
18    Rybka 4 x64 Exp 29 v1    3316    1327
19    Rybka 4 x64 Exp 23 v1    3314    2178
20    Rybka 4 x64 Exp 8 v1    3311    1315
21    Rybka 4 x64 Exp 9 v1    3312    434
22    Rybka 4 x64 Exp 14 Human    3305    900
23    Rybka 4 x64 Exp 7 v1    3303    815
24    Rybka 4 x64 Forum v1    3299    1160
25    Rybka 4 x64 Exp 4 v1    3299    900
26    Rybka 4 x64 Exp 20 v1    3298    590
27    Deep Rybka 4 x64 v1    3299    161
28    Rybka 4 x64 Exp 1 v1    3297    916
29    Rybka 4 x64 Exp 15 v1    3297    1800
30    Rybka 4 x64 Exp 3 v2    3297    423
31    Rybka 4 x64 Exp 21 v1    3294    900
32    Rybka 4 x64 Exp 16 v1    3294    557
33    Rybka 4 x64 Exp 10 v1    3295    462
34    Deep Rybka 4 x64    3294    1986
35    Rybka 4 x64 Exp 19 v1    3290    189
36    Deep Rybka 4 x64 Lasker    3286    479
37    Rybka 4 x64 Exp 13 vC13510    3284    750
38    Rybka 4 x64 Exp 2 v1    3283    472
39    Rybka 4 x64 Beta 15 v1    3283    426
40    Rybka 4 x64 Exp 12 v1    3282    900
41    Rybka 3    3279    1897
42    Rybka 3 Dynamic    3279    892
43    Rybka 3 Human    3276    1461
44    Rybka 4 x64 Exp 17 v1    3270    355
45    Rybka 4 x64 Exp 6 v1    3258    121
46    Deep Rybka 4 x64 Human    3256    174
47    Stockfish 1.8 JA 64bit    3251    3131
48    Stockfish 1.7.1 JA 64bit 4t    3234    4235
49    ******    3221    24
50    Stockfish 1.7 JA 64bit 4t    3220    560
51    Stockfish 1.6.3 JA 64bit    3202    384
52    Critter 0.80 64-bit    3178    1800
53    Deep Fritz 11    3118    4774
54    HIARCS 12 MP    3115    4183
55    Deep Shredder 12    3107    1824
56    Naum 4 4t    3103    4311
57    spark-0.4    3099    1802
58    Komodo64 1.1 JA    3077    2338
59    Critter 0.70 64-bit    3077    2326
60    Zappa Mexico II    3072    2519
61    Deep Shredder 11 UCI    3062    2410
62    Komodo64 1.0 JA    3049    407
63    bright-0.5c    3033    2350
64    Protector 1.3.3 x64 4t    3027    782
65    bright-0.4a    3016    21
66    Protector 1.3.6 x64 4t    2972    900
67    Thinker54AInert-MP64-UCI    2962    900
68    spark-0.3a    2931    23
69    Deep Fritz 12    2654    385

Oh, ignore the v1/v2 stuff it is meaningless.
Parent - - By mindbreaker (****) Date 2010-09-12 07:38
Sorry, I guess one clone slipped through.
Parent - By TheNightFlier (*) Date 2010-09-12 09:08
That is to say?
Parent - - By jpqy (**) Date 2010-09-12 10:29
So after the Komodo test,you will have again a new leader ,the Exp 30 ..because at the moment Exp 38 & 31 get the lowest result! right?

JP.
Parent - - By mindbreaker (****) Date 2010-09-12 19:39
We can't jump to conclusions.  Here is the Komodo match now:
Parent - - By mindbreaker (****) Date 2010-09-12 19:55
Even with about 85 rounds complete we really don't know which is best against Komodo.  We would need maybe 3000 rounds or more for that.  I am just getting games from several opponents which when collected together tells me the top handful of engines.
Parent - - By Banned for Life (Gold) Date 2010-09-12 20:12
For Elo testing purposes, engines that are nearly equal give the most valuable rating information. Spending a large amount of time on matches against Komodo where Rybka's win percentage is ~90%, is not time well spent.
Parent - - By mindbreaker (****) Date 2010-09-12 22:22 Edited 2010-09-12 22:25
I am running out of opponents. Komodo is the 9th best engine: http://www.computerchess.org.uk/ccrl/404.live/

There is one engine not on the table because it never won or drew a game so rating could not be calculated.  I ended that after 126 games.  It is rated 2950 at CCRL.  If you make them strong, they have this tendency of winning ;)

I am trying to use the strongest opponents I can find.  I have run all the stronger opponents I have except Rybka 3, Rybka 3 Dynamic, and Rybka 3 Human. Maybe I will run those but I am not a fan of running different versions of the same program against one-another.

And who is to say that strength is only determined by close pairings?  Should not ratings be legitimate even with some distance between opponents provided there are some draws and losses by the stronger side?  According to critiques of the current ratings formulas it is actually the stronger side's rating that is underestimated by Elo tables.

A strait line is better than Elo curve: http://www.chessbase.com/newsdetail.asp?newsid=562
Parent - - By Banned for Life (Gold) Date 2010-09-12 23:43
First, how many cores is Rybka 4 using when playing against Komodo? I hope the answer is 1 (potentially allowing you to play multiple simultaneous games).

Jeff Sonas' excellent article does not argue against the fact that closely spaced opponents provide more rating information than widely spaced opponents. This is shown in the first graph where the deviation of two opponents at the same Elo is much smaller than the deviation at +/- 300 Elo. It's always going to be more difficult to make predictions based on the tail of the distribution. One way to achieve this result would be to give non R4 engines a time advantage when they play against Rybka. This would diminish your ability to determine how much better R4 is than the other engines, but would enhance your ability to discriminate between different flavors of R4.

Also note that Jeff's results are based on a list of human-human games with constrained rating differences between players (I think he mentions 100-120 Elo for the top players). I suspect he would have ended up with somewhat different results if he had instead relied on a database of engine-engine games (this would be an interesting experiment). One would expect engines to be more consistent than people for a couple of reasons:
- They don't have bad days and don't make random blunders, and
- Their strength doesn't really change over time as peoples do)

For these reasons and others, I suspect that if Jeff generated a Sonas E rating systems for engine-engine games, it would have significant differences from the optimal human-human predictive rating system he developed.
Parent - - By mindbreaker (****) Date 2010-09-13 01:10
All of my variants are running at 4-threads as I have repeatedly stated. Komodo is rated 9th even though it is one thread.  The rating is the rating.

You get more deviation at the ends because there are less games in the database with high ratings disparity.  I also think that when a player does earn the chance to play a much higher opponent it is because they are playing better than their rating or they are promising juniors...whose ratings may not be able to keep pace with their rate of improvement.

Time handicap is possible but not what I am after.  I am trying to find where the engines would actually end-up on a ratings chart.  Handicapping engines makes any correction guesswork.

If anything, I suspect the result would be more linear.  It would likely reach a point where it was just impossible for the weaker engine to win.  What I would like to see is a graph where only decisive games were included, because I think it is easier to get two draws than a win at high ratings disparity. Something that if true should be figured into the ratings.
Parent - - By Banned for Life (Gold) Date 2010-09-13 03:50
You get more deviation at the ends because there are less games in the database with high ratings disparity.

This is not a reasonable explanation. You have a scatter plot and for near equal ratings, where most of the points are falling, you see very few outliers, whereas when you have significantly different ratings, where there are a lot fewer games, you see many more outliers. This shows that the rating has better predictive results when the two players have similar ratings. In this case, this also works in reverse, i.e. the true strength of one of the entities is easier to ascertain if it is playing against an entity having nearly equal rating.

I am trying to find where the engines would actually end-up on a ratings chart.

Once again, if you primarily want to know the engines are X Elo better than Shredder or Komodo, than your method is appropriate. On the other hand, if you primarily want to know which parameter variations are stronger against the other engines, a time handicap will lead to faster convergence. With this approach, you would first find which parameter variation works best, and then test only that variation against the other engines without the time handicap.

What I would like to see is a graph where only decisive games were included, because I think it is easier to get two draws than a win at high ratings disparity. Something that if true should be figured into the ratings.

If you are playing with reversing colors, you can do this by throwing away all sets of openings where:
White won both games - under the assumption that white left book better,
Black won both games - under the assumption that black left book better, and
Both games were drawn - under the assumption that the book exit position was drawish.

This leaves 2 game trials where one game was drawn and the other was decisive, and where one engine won with both colors. This might be a better method of figuring out if one engine is better than another (it will have less bias), but it won't correlate directly with Elo.
Parent - By mindbreaker (****) Date 2010-09-13 06:17
It is rather inflammatory to claim my argument is unreasonable.  I highly doubt that was a normal scatter plot.  The whole screen would be black if there were 266,000 games.  How could you plot games anyway, they only have three outcomes not percentages unless they are 100%, 50%, and 0% which would make for a dull graph.  My guess is that each dot represents the average % of all games (where a game is worth 1 and draws are split .5-.5) with the same Elo difference and color.  As there were fewer games with the higher disparities they will be more distorted by chance.

There is no reason the engines have to be close in strength to find a rating hence no reason to deprive us of both relative Elo among variants and relative Elo to other engines. 

More games is generally better for a calculation...all sorts of extraneous things average out to nothing.  I was unclear.  I was talking about something else: the arbitrary equality of two draws to a win. 

If, for example, there was a 20 game match between players A and B where player A is 300 Elo stronger than player B and the results were that B got 3 wins, I think that is stronger than if B got 6 draws instead, but current ratings formulas automatically gauge these performances as the same and would in both instances subsequently award the same ratings adjustment.
Sonas is saying the statistical results of many thousands of games should be the guide to the formulas...I agree with that.  And we should look into the rate of drawing and rate of winning separately as the chance of a draw may not be double the chance of a win especially for the extremes.  Making the error that they are awards more points to the lower player than is appropriate.  Of course without the data, this is just an intuition.  But is seems hardly likely that double the draw equals a win. It could even be the other direction but the chance that it just lines up...rather small.
Parent - - By mindbreaker (****) Date 2010-09-13 01:29
Komodo final and effect on interim chart.

1    Rybka 4 x64 Exp 31 v1    3354    1153
2    Rybka 4 x64 Exp 30 v1    3349    1153
3    Rybka 4 x64 Exp 37 v1    3349    1151
4    Rybka 4 x64 Exp 38 v1    3349    1150
5    Rybka 4 x64 Exp 34 v1    3345    1153
6    Rybka 4 x64 Exp 36 v1    3343    1151
7    Rybka 4 x64 Exp 24 v2    3341    1331
8    Rybka 4 x64 Exp 33 v1    3339    1153
9    Rybka 4 x64 Exp 26 v2    3338    1329
10    Rybka 4 x64 Exp 32 v1    3338    1153
11    Rybka 4 x64 Exp 35 v1    3336    1152
12    Rybka 4 x64 Exp 28 v1    3335    1327
13    Rybka 4 x64 Exp 25 v2    3334    1330
14    Rybka 4 x64 Exp 11 v1    3333    454
15    Rybka 4 x64 Exp 27 v1    3328    1328
16    Rybka 4 x64 Exp 18 v1    3326    3039
17    Rybka 4 x64 Exp 22 v1    3323    2231
18    Rybka 4 x64 Exp 29 v1    3322    1327
19    Rybka 4 x64 Exp 9 v1    3319    434
20    Rybka 4 x64 Exp 23 v1    3318    2178
21    Rybka 4 x64 Exp 8 v1    3314    1315
22    Rybka 4 x64 Exp 14 Human    3305    900
23    Rybka 4 x64 Exp 7 v1    3303    815
24    Rybka 4 x64 Forum v1    3302    1160
25    Rybka 4 x64 Exp 10 v1    3302    462
26    Rybka 4 x64 Exp 4 v1    3299    900
27    Deep Rybka 4 x64 v1    3299    161
28    Rybka 4 x64 Exp 20 v1    3299    590
29    Rybka 4 x64 Exp 15 v1    3297    1800
30    Rybka 4 x64 Exp 1 v1    3298    916
31    Rybka 4 x64 Exp 3 v2    3297    423
32    Deep Rybka 4 x64    3294    1986
33    Rybka 4 x64 Exp 21 v1    3295    900
34    Rybka 4 x64 Exp 16 v1    3295    557
35    Rybka 4 x64 Exp 19 v1    3290    189
36    Deep Rybka 4 x64 Lasker    3286    479
37    Rybka 4 x64 Exp 13 vC13510    3284    750
38    Rybka 3 Dynamic    3284    892
39    Rybka 4 x64 Exp 2 v1    3284    472
40    Rybka 4 x64 Beta 15 v1    3284    426
41    Rybka 3    3283    1897
42    Rybka 4 x64 Exp 12 v1    3283    900
43    Rybka 3 Human    3281    1461
44    Rybka 4 x64 Exp 17 v1    3271    355
45    Stockfish 1.8 JA 64bit    3258    3131
46    Rybka 4 x64 Exp 6 v1    3259    121
47    Deep Rybka 4 x64 Human    3257    174
48    Stockfish 1.7.1 JA 64bit 4t    3237    4235
49    Stockfish 1.7 JA 64bit 4t    3224    560
50    Stockfish 1.6.3 JA 64bit    3207    384
51    Critter 0.80 64-bit    3186    1800
52    Deep Fritz 11    3123    4774
53    HIARCS 12 MP    3119    4183
54    Deep Shredder 12    3115    1824
55    Naum 4 4t    3108    4311
56    spark-0.4    3107    1802
57    Zappa Mexico II    3079    2519
58    Critter 0.70 64-bit    3078    2326
59    Deep Shredder 11 UCI    3064    2410
60    Komodo64 1.1 JA    3057    3238
61    Komodo64 1.0 JA    3054    407
62    bright-0.5c    3035    2350
63    Protector 1.3.3 x64 4t    3035    782
64    bright-0.4a    3020    21
65    Protector 1.3.6 x64 4t    2977    900
66    Thinker54AInert-MP64-UCI    2967    900
67    spark-0.3a    2935    23
68    Deep Fritz 12    2659    385
Parent - - By Banned for Life (Gold) Date 2010-09-13 03:57
I suspect that if you ran this gauntlet again (i.e. another 900 games), you would find that the ordering of the R4 engine variants has no statistical significance and that the only thing you can ascertain from the games is that R4 with 4 threads is much better than Komodo on one thread.
Parent - By mindbreaker (****) Date 2010-09-13 08:39
I believe I said something like that.  In itself it is not very meaningful as each mini-match is only 100 games but together with the other ten opponents and their one hundred games verse each engine and you do get something more reliable.  There is information but it only shows itself when it comes together when many games are collected like puzzle pieces.  Each piece by itself makes little sense but together they reveal a picture.
Parent - - By Clemens K (*) Date 2010-09-12 10:09
Hello mindbreaker

after all of your pioneer-job in setting tests, I wonder why you dont post the parameters. So we could try to reproduce ,or even benefit from your work.
I also wonder that nobody seems to be courious what the settings are .
Would you share some of your best settings with us?

Kind regards, Clemens
Parent - By Akbarfan (***) Date 2010-09-12 16:16
http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=277108#pid277108
"And at the end at 1200 games per engine or less if I end it sooner, I plan to post the parameters and final results."
Parent - By mindbreaker (****) Date 2010-09-12 20:04
Exp 24 has been posted; it is pretty decent according to my tests: http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=275286;hl=exp

And I posted some earlier ones too.

The whole collection will be posted soon.
Parent - - By mindbreaker (****) Date 2010-09-13 01:37
As I have reached 1150 games and will likely do more than the 1200, I thought I should go ahead and post all the parameters even though it is not quite complete. So here it is...attached.
Parent - - By Clemens K (*) Date 2010-09-13 18:03
Hello mindbreaker

thank you for your parameter file. I will try some of them and report here about.

Have a nice day

Clemens
Up Topic Rybka Support & Discussion / Rybka Discussion / Parameters Experiment 38: 64 Elo over R4 default
1 2 3 4 5 6 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill