Rybka Chess Community Forum
"An opening book is part of an engines strength just like an opening book is part of Chess."
i disagree, altho it may simply be a matter of semantics. if u said "an opening book is part of a *chess program's* strength...", then i agree completely. but then i'd divide "chess program" into 2 components: (default) opening book, and the engine. rybka is the engine. the book is the book.
"Who cares if the Zappa team created an opening book made for use in the Rybka match? This is what you are SUPPOSED to do."
if one's goal is to win the match, u'll get no argument from me. but if u wanna demonstrate engine superiority, then i disagree.
"There was nothing wrong with the Zappa team preparing an opening book to be used in the match with Rybka. This is part of playing Chess."
well i certainly never said there's anything "wrong" w/ going into a h2h match w/ an anti-opp. book. and i'm pretty sure u're not implying i said that. however the issue is: for many people (not necessarily u and me who can see the forest thru the trees), many will go away from the results of the mexico match and say to themselves, "wow, i guess zappa *is* stronger than rybka on monster hardware." and that would be a very naive conclusion to make, as u know. that's the problem.
it would be like someone drawing the same conclusion if jereon were to suddenly defect to chessbase, and someone now thinks fritz 8 can eat rybka for lunch simply because it has jereon's opening book mated to it (he published similar results recently). but that doesnt tell us that fritz is a stronger engine than rybka. far from it.
This argument fails too. The Rybka team had a lot more man hours invested in its tournament book than the Zappa team did. Once again, Jeroen has claimed that if the opening positions for each of the games was reversed, Rybka would have done much worse than it did...
after finding the rybka/zappa match thread, i stand corrected.
however the match wasnt under 120/40, so the results of such a match is still left as an exercise to the reader... :-)
Actually, there was some discussion of using remote Canleland (16-core) machines for the match. The Rybka team was very much against this under the thin pretext that this would allow cheating. In reality, it would have given Zappa a huge advantage since Rybka 2.3.2a is already running out of gas on an octal.
all bets are off on rybka's superiority when we're talking about a 4x core deficit given your opp's superiority scaling efficiency. :-)
>Actually, there was some discussion of using remote Canleland (16-core) machines for the match.
Was this also for Rybka-Zappa, or did this only come up in the negotiations for Rybka versus Junior?
>The Rybka team was very much against this under the thin pretext that this would allow cheating.
I don't think the phrase "Rybka team" has quite the right focus here --- it was more Convekta
(the money agent) who decried remote play.
It was shot down as an idea during the Junior match discussions, but was also discussed during the run up to the Zappa match. Convekta derailed this by setting up the importing of two octals into Mexico :-). Of course the match ended up being played on remote machines and nobody ever suspected cheating (which was always a red herring).
Of course Convekta wanted to hold onto its money, so it was determined to limit the machines to octals. You know the golden rule right? He who has the gold makes the rules.
I guess you will have to wait until Erdo writes his biography for the complete story. Yes remote machines were used Zappa ran on my oct and Rybka on one belonging to Lukas. There was no cheating here and I am 100% sure Lukas did not either.
I never considered cheating to be a serious matter, unless we consider the voodoo fish that you were poking needles into. :-)
No needles but we certainly skewered and cooked a few. :)
A rotisserie can be considered a very large needle. :-)
In the case of zappa it is easy to test things because zappa mexico is available so there is no reason to suspect cheating.
The Junior team did not suggest to make special version of Junior mexico that is going to be available after the match in order to prevent cheating and were against showing their logfile so the fear of cheating was obvious.
Claiming that octal machines is against zappa is not convincing.
By the same logic you can say that the zappa team wanted to make money so they insisted to play with octals and not with quad or dual.
Note that if the convekta team make rules in order to win the match then they can easily make rule that the match is only with single processor machines.
They did not do it so it is illogical to claim that they made rules to favour rybka.
It is simply the opposite.
Almost nobody has an octal so they made rules that favoured zappa.
I believe that most customers are more interested in what the machines they use in 2008 can do so I guess that they are going to prefer a match between dual machine(I even do not have a dual at this moment).
I recall the originally challenge was a bet with Chessbase for $100K with Vas offering something in the way of odds. Clearly Convekta was willing to stake this bet and it seemed like a reasonable gamble. When no additional funding showed up, Convekta clearly saw no reason to throw their money away so they tried to reduce risk by reducing the prize fund by an order of magnitude and trying to make sure that Rybka didn't get outgunned in the processor area. So they insisted on using the same hardware, which is appealing from a fairness perspective, and insisted on bringing the machines on-sight, which reduced the risk of having to use monster hardware. Since Zappa scales a lot better than Rybka, it would be natural for the Zappa team to want to play on 16 core machines, while Rybka would have been better off on single processor machines.
Most people have dual core machines but I don't think this means that most people want to see world championship matches on dual core machines. This is for the same reason that Formula 1 racing is more popular than showroom stock racing.
Going by my oft-repeated maxim that strategy is doing the right things and tactics is doing things right, it is clear to me that Erdo had the right tournament strategy in the first half of the match and Rybka caught on that it had to play to its strengths in the second half--too late.
I'm sure Erdo didn't know he had the winning strategy beforehand, but I sense that he sized up his opponents better than the reverse. Sometimes that's all it takes to eke out enough of an advantage.
FWIW, the first two games (which were, nevertheless, won) was with Rybka 2.3.2a, if I recall correctly.
/* Steinar */
Number of games isn't enough to get a conclusion about strength.
You can get a pretty good idea in match conditions, but not in the far more random rating list testing conditions.
Regardless of books, match condition vs. other, or anything else, you need at least a result that is statistically significant before you can say X is better than Y. If we want high confidence, that means two sigma. If only ten games are played, and we assume six draws and four decisive results, we need all four wins to be by the same side to approach two sigma. With a hundred decisive games, we need 60-40. If the two contestants are quite close in strength, you are likely to need thousands of games to prove which is stronger.
But this is assuming that we're dealing with random variables. I think this is a much better approximation with CEGT and CCRL tests than with match conditions, where the operators of each side are doing their best to accentuate the strengths of their program and the weaknesses of the opposing program. My point here is that what we have is a "pretty good idea" that these programs were about the same strength on octacores, and my statements have been reflecting the fact that if you repeated the Mexico match with the lower strength Rybka 2.3.2a (compared with the last 8 games) and you doubled Zappa's hardware, Zappa would clearly win.
In a ten game match, either player can hope for victory (without needing a miracle) if they are within something like 150 points of each other. I don't want to get into a discussion of which mode of testing is better, that's probably a matter of opinion. My point is that no method will help in a ten game match where the players are closely matched. There is just too much luck; one side can be winning and overlook a deep sequence or a drawn endgame outside its knowledge, or one may just happen to pick an opening that the other side did not expect and did not prepare well for(that's how Kramnik beat Kasparov). The results are random when the sides are closely matched, there is no way around this.
Yes, and if we admit that the sides are closely matched, I think that one would have trouble supporting the argument that Rybka 2.3.2a in match conditions would beat Zappa on twice the hardware in match conditions, which is the point against which I'm arguing here.
If you are talking about a ten game match, my answer would be that it's like arguing over whether the next flip of a coin will be heads or tails.
Perhaps, but I think this is a somewhat heavily weighted coin. :-)
What is strength in chess? Only Elo and statistics? In human chess, I´m not sure, although here you can´t see big differences in skills in openings, middlegames and endgames like in computer chess. But I have my preferences. I like the generalists! Ivanchuk and Anand are generalists, Kramnik and Topalov not!
In computer chess, it should be clear; Rybka is so much above. But Rybka isn´t a generalist; she is a Bronstein (the best player in the 50th but can´t play endgames). So I took Rybka (Bronstein) for the middlegame and a Zappa and Shredder (Botvinnik) for the endgame analysis.
It's really both things at the same time: Zappa gains more than Rybka from an increase in time control, and Zappa also gains more than Rybka from an increase in hardware. There was speculation in the past that the latter might be due to the greater hardware simulating an increase in the time control, but there seems to be a slightly bigger effect than what one would expect from that assumption. It seems, though, that between 1 and 8 cores, the main gap closer between Zappa and Rybka is an increase in the time control. After 8 cores, the main gap closer is an increase in the hardware.
pursuant to my above reply to u (where i asked if u had actual match data/results), i cant seem to find any evidence to support your claim.
for instance, at 120/40 time controls on quads, 89 elo separate rybka and zappa. even assuming zappa gains 75 elo (probably a bit too opt from a doubling of cores, that still puts it marginally weaker.
We are talking about two very different things in the same thread. Rybka and Zappa have been tested against each other on octals in both conditions of general books (where Rybka wins--CEGT did this) and own books (where Zappa won in Mexico on equal hardware).
Dadi has also done this in a situation that should favor Zappa more than in the CEGT octal testing, and Rybka won that, too. I think he called it something like "Turbo style", since I had made the case before that Zappa would win such a thing. Anyway, it's well-known that Rybka 2.3.2a has no scaling after 8 cores and that Zappa continues to scale well far above that.
.... after 8 cores and that Zappa continues to scale well far above that
I still recall the WCCC in Turin 2006, where Zappa played on 512 cores or something similar and didn't come close to first place.....
yup. finished 4th (2 progs tied for 2nd). however that was w/ an old version of zappa, maybe it had an inferior search? maybe inferior scaling too?
> maybe inferior scaling too?
He never got another chance to test on a 512-processor machine.
Not inferior scaling--it was getting something like 200 Mnps. I seem to recall something about a memory bottleneck screwing things up. I have gotten the impression from various posts that Zappa works best on either 32 or 64 cores.
CEGT has matches where they use octals? can u give me the URL to those pages? i've looked thru the CEGT website and cant seem to find it.
right, i'm aware of that match setup. and as u can see, zappa still cant win.
Yes, but this was with generic books, and this is regarding a different conversation, i.e. my reply to Paul (NATIONAL12) that if Zappa cannot beat Rybka 2.3.2a in testing conditions, than it certainly won't be able to beat Rybka 3 in those same conditions. Testing conditions are different from match conditions. That is why I was making the point that we are having two very different conversations in the same thread on the same type of topic.
"Testing conditions are different from match conditions. That is why I was making the point that we are having two very different conversations in the same thread on the same type of topic."
we know in a rybka 2.3.2 quad vs zappa mexico quad match at "marathon" time controls, rybka wins (60% expected score over 50 games) using generic books. we dont know what the score would be in a marathon match using their default books, but i'd be shocked if zappa can win simply because they switch to their default books.
we know rybka 2.3.2 can be beat by zappa mexico if the zappa book is home-cooked to be an anti-rybka book.
what we dont know (empirically speaking) is if zappa mexico can beat rybka 2.3.2 with both on octals, using either generic books or default books, at 120/40 time controls. i say it can not, altho i think it'll be close (maybe an expected score of 55% for rybka?).
I think that you are right about this for a long match in your last point. My point is that Rybka would not stand a chance if Rybka was on a quad and Zappa was on an octal in such a situation. I believe, though it would be difficult to prove this without actually doing this particular type of match, that Zappa's advantage would be more than one would calculate based on the elo increase of doubling the hardware that one sees on the rating lists.
actually that was a typo on my part. i meant to say rybka *quad* vs. zappa octal at 140/20. what is the effective speedup that zappa has going from a quad to an octal? is it gonna be more than 80 elo?
"...that Zappa's advantage would be more than one would calculate based on the elo increase of doubling the hardware that one sees on the rating lists."
what is your hypothesis on why this might be true? IOW, what is it about zappa's strength as u double its cores that isnt reflected in its elo rating?
Going from quad to octal, I think that Zappa's scaling is in the 80-90% range from what I've heard. As for my hypothesis, this is more of a hunch, but I think it would turn out to be correct. Basically, I think that Zappa would start to see a lot of tactics that Rybka on a quad would miss.
"Basically, I think that Zappa would start to see a lot of tactics that Rybka on a quad would miss."
tactics or positional ideas? if it's tactics, that would be really cool to see in action (i.e. rybka going down from very deep tactical shots from zappa).
I say this after testing Rybka 2.3.2a vs. Zappa on 8 cores at 4.8 Ghz in a 200 game match. Zappa and Deep Fritz 10 gain much more strength than Rybka does on that platform ... in other words Rybka's big advantage is less when hardware is increased.
"I say this after testing Rybka 2.3.2a vs. Zappa on 8 cores at 4.8 Ghz in a 200 game match."
what were the time controls? and what was the result?
"...in other words Rybka's big advantage is less when hardware is increased."
this is not in dispute. what i am disputing u on is your claim that "...so much so that it seems there is a linear point where it basically equals Rybka 2.3.2a and might even surpass it." can u tell us at what hardware differential does zappa beat rybka? because the evidence says it's definitely does not occur when u pit zappa octal vs. rybka quad (as the marathon match tells us). maybe zappa on a 16-core will achieve parity against rybka on a quad?
Thanks for that interesting informations!
Vas told us a view weeks ago about the ELO-Difference-Differences, if you play against an Engine of the same Engine-Type, against other Engines and against humans.
If you have another engine E and a human H, who have the same strength as Rybka232a, (they get 50:50-Results)
and if you reach a 100-ELO-Point difference when playing with Rybka3 against Rybka232a,
what would you reach against E, (This could be die Difference, which might be found later in the ELO-Ratings)
what would you reach against H?
(4:1 and 5:1 seems to be very fascinating!
But also a 2:1 against H would be a fine result.)
My previous opinion on this was 80 for E and 60 for H. However I now think that this would be true only if Rybka 3 was improved primarily by search and not eval (Vas is not convinced of this theory); if the improvement was only eval E should be 100. Since it's roughly half and half between eval and search gains, I'll say 90 for E and 70 for H.
What impresses me is that you said it was "very close". In other words you guys are STILL making tweaks! Amazing what some applied science can do!
Tweaks, or as others call it : bugfixes ;-)
It's a fine line. Yesterday I discovered that some otherwise helpful changes had the bad side effect of reducing Rybka's incentive to castle a bit more than I feel comfortable with, so I'm trying to remedy that now.
Yes. I'll make the bold forecast that the final Rybka 3 will be at least 1 Elo point stronger than the one I've been testing!
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill