CEGT Blitz 40/4
1 Rybka 2.3.2a x64 4CPU 11½½½½0½101½1½½½0½½1½½1½1½0½½½1½10111½1½1110½½½111 32.0/50
2 HIARCS 12 4CPU 00½½½½1½010½0½½½1½½0½½0½0½1½½½0½01000½0½0001½½½000 18.0/50
sainzlei @ Republic of Taiwan
Rybka 2.3.2a 64-bit 4CPU - Hiarcs 12 4CPU: 32-18
Naum 3.0 64-bit 4CPU - Hiarcs 12 4CPU: 26-24
Zappa Mexico 64-bit 4CPU - Hiarcs 12 4CPU: 27,5-22,5
Hiarcs 12 4CPU - Deep Shredder 11 64-bit 4CPU: 32-18
Rybka 2.3.2a 64-bit 2CPU - Hiarcs 12 2CPU: 32,5-17,5
Naum 3.0 64-bit 2CPU - Hiarcs 12 2CPU: 27-23
Zappa Mexico 64-bit 2CPU - Hiarcs 12 2CPU: 26,5-23,5
Hiarcs 12 2CPU - Deep Shredder 11 64-bit 2CPU: 29-21
P.S. I was a Hiarcs fan long before I was a Rybka fan, and remain it. An engine doesn't need to be stronger than Rybka, to be interesting for me.
It looks like all competitors will be at least one 'version generation' behind after the Rybka 3 release, if they are even behind 232a. Amazing. Although, maybe I underestimate Deep Fritz 11. - From Junior, no signs and no news visible. So I think, DF11 will be the last big hero for a while, to try to challenge Rybka 2.3.2a. :-)
The results clearly show that 2.3.2a is still top, and the NEXT generation of progs will be match it. therefore rybka 3 will be equal to the version after that, i.e shredder 14 / hiarcs 14 etc
Style matters a lot now
i actually won some games only because zappa mexico 2 made some good moves
and they did´nt play worse :)
i) Freestyle is very different to engine play where the humans play a large role.
ii) the sample size of games is far too small to draw any conclusions. If you kept up with CEGT etc you'll see that a 50 game sample only give an accuracy of plus/minus 100elo. For 9 games I bet it's plus/minus 200 elo - making any comparison meaningless.
Even when we give hiarcs 2 cpu against single cpu of rybka1.x then hiarcs12 is clearly behind based on CEGT 40/20
17 Rybka 1.1 x64 2959 17 17 1796 82.7% 2687 20.7%
22 Rybka 1.2f x64 2949 16 16 1464 74.3% 2765 29.6%
37 Hiarcs 12 MP 2CPU 2910 33 33 287 40.8% 2974 35.5%
>So, any big expectations (if there were any?) are disappointed, at least in blitz. - Let's wait and see if it is much different, at medium or long time >controls...
For 1 CPU and 32bit using own books, unlike CEGT and CCRL, Hiarcs 12 performance seems very good and it beat Rybka 2.3.2a:
However, I would bet a large amount of money that Hiarcs 12 would beat Naum 3 in true match conditions of 24 games. I would not bet either way in a match between Hiarcs 12 and Rybka 2.3.2a, as I think that both sides have an equal chance of winning such a match. The same goes between Hiarcs 12 and Zappa Mexico II.
This 32 - 18 result doesn´t convince you? why not?
>This 32 - 18 result doesn´t convince you? why not?
I guess because:
He said: true match conditions of 24 games. So:
•This match was using blitz time controls, while true match conditions would probably mean long time controls of more than 2 minutes per move on top hardware.
•This match had 50 games, while turbojuice1122 speaks about 24 games, where a "surprise" can happen more frequently.
•This match was not using program's own books, while true match conditions would probably mean with engine's own books.
In matches with equal conditions it is simply inferior to Rybka. Hiarcs' book can
make up for that by playing good lines for Hiarcs, so the book brings extra elo
points and not vice versa.
To call the 32-18 result 'not a result of playing chess' is very funny. Why is this
result ridiculous? Because it doesn't match with your expectations!? This is what
people often do when they are disappointed with result: they simply call them
Therefore I call the result of the forthcoming european football championship
ridiculous when Holland doesn't win it. Then it simply is not a true tournament for
this title. Holland can beat any of these teams competing and therefore should win
(Sorry, couldn't resist this cynical joke!)
The reason the 32-18 result is ridiculous is because it uses different conditions than what exist in normal play to give a very different strength estimate than what is estimated based on normal play.
Does that say something about the Hiarcs strength, or about the book strength? Will Hiarcs 12 be close to Rybka 2.3.2a if
Rybka plays with a more recent book? As RybkaII is about 16 months old now, it is not too difficult to make a very strong book
against it. So what are you saying? They killed RybkaII and now suddenly this is caused by 'Hiarcs being almost equal strong to
Can you perhaps post the games of the Hiarcs-Rybka match played by Mark, so we can see how many Hiarcs 12 wins were
because of the book?
CEGT plays with a general book that every program has to use. This means equal conditions, no possibilities for book wins,
just measure engine strength. For all programs the same conditions. The 'extra elo due to book wins' disappears and that gives
a much better view on the engine strength than with books. At least, in my opinion :-).
So the problem is that you mix up engine strength with book strength. Given the proper conditions (i.e. best book for Hiarcs and
a worse book for the opponent) you are inclined to think that Hiarcs 12 is close to Rybka. But it isn't :-). Give Zappa Mexico the
latest Erdo book and Rybka my latest book and maybe the gap between Zappa/Rybka and Hiarcs could even be much bigger than
A) the engine
B) the book
C) the GUI
If playing strength is all that matters, you only search for the best A. Of course in that case you'd like to compare results
without book influence.
If you look for the best combination of engine and book (f.e. when you are a correspondence player), then you search for
the best A and B. It is very well possible that the best engine doesn't come with the best book, so a combination of engine A
with the book of engine B could be a reality here.
If you look for a nice GUI with lots of possibilities and strength and book doesn't matter, you can get yourself the GUI that
you like best and download a free engine with a free book.
>CEGT plays with a general book that every program has to use. This means equal conditions, no possibilities for book wins,
>just measure engine strength. For all programs the same conditions. The 'extra elo due to book wins' disappears and that gives
>a much better view on the engine strength than with books. At least, in my opinion
I don't agree with this. Actually i agree but let me describe what i mean:
•Yes with the way you describe we measure engine's strength on a wide range of positions.
•But what about if a programmer does not want his engine to play some kind of positions? That is the reason(among others) to create a specialized book for its engine. And perhaps with this book the engine may play extremely well not losing a single game! Let's have an example(random):
Hiarcs 12 on a wide range of positions with 1.f4, with 1.e4 e5 f4 etc, may be 100 ELO points worse than Rybka 2.3.2a playing against a big number of other engines, like it happens in CEGT CCRL.
1)But perhaps there is a book that when Hiarcs 12 plays against the same opponents(Rybka 2.3.2a too) and the opponents have their own best suitable books too, as also Rybka 2.3.2a to have its own best suitable book, then Hiarcs to be 20 ELO better than Rybka 2.3.2a.
2)Perhaps there is a book that makes Naum 3.0 stronger than everything else, and Rybka 2.3.2a, even if everyone else uses the best book suitable for them and Rybka using the latest Rybka III.ctg book.
Improbable? Maybe. The latest situation 2), but what about the first 1) ?
So in short we have 2 kinds of tests:
1) CEGT,CCRL type of tests where it tests engine strength in a wide range of positions. So it actually tests the analysis abilities/strength of the engines in a wide range of positions.
2) Own book type of tests(SSDF), where it tests engines at their highest strength they can have, if we assume own book is the most suitable for the engine.
that can play only a limited number of lines, but with the data of both CEGT and SSDF I could figure that out.
BTW, I find the SSDF list rather unsuitable for comparing engines with books, as there are a lot of good engines
missing and a few of them are not playing with their best book. I think it would be rather helpful if they take
a CEGT type of approach, i.e. a single version list and a MP version list and do faster time controls. Now you
see results like Zappa Quad - Fritz 9 A1200 with something like 36-4, all 40/120 games, not very interesting IMO.
I would love it when they make a special Quad list with all the latest versions of the programs available, with their
To find out which engine plays best with which book is interesting, but probably this is impossible to find out
within a reasonable time. You will have an enormous amount of possible combinations engine-book :-).
But to give you a counterexample which might be confusing for you as a customer:
Suppose you find information on several sites that engine A with book A beats Engine B with book B. You confidently
assume that engine A is the better one, so you purchase it. Alas, when you start playing your correspondence games
you are outplayed most of the times..... And your opponents are all using Engine B (they tell you after you resigned).
Now wouldn't you be a bit pissed? What I want to tell you is that relative engine strength DOES matter :-). Another
counterexample is when your engine plays only a few lines well and (too bad) it appears in your correspondence
games it doesn't understand anything of your gambit lines and King's Indian's. Now wouldn't you be better off with
an engine that understands a wider variety of lines!?
Have a nice evening!
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill