Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka 2.3.2a readme
- - By Vasik Rajlich (Silver) Date 2007-06-18 06:54
Rybka 2.3.2a is cleared for release. Below is the readme:

Vas

-----------------------

Rybka 2.3.2a
June 18, 2007

This is a minor update for Rybka 2.3.2. A few small bugs are fixed, the most serious being a problem with zugzwang detection in Rybka 2.3.2.

The style is identical to the style of 2.3.2, and the playing strength will be very close to identical (within 2 Elo or so). It's perfectly fine for test groups to merge results for these two versions.

Vasik Rajlich
Parent - By Vasik Rajlich (Silver) Date 2007-06-18 08:15
Bump.
Parent - - By MightyMouse (**) Date 2007-06-18 13:11 Edited 2007-06-18 13:16
Got it, and noticed some differences immediately... It is giving some extra information in the engine window (Shredder Classic GUI) which I never saw before. Also it shows a CPU Usage option in engine options which wasn't present in 2.3.2 if I recall correctly.

For example when I define and load a modified engine, it tells me in the GUI engine window which options were modified. That never happened with any prior version of Rybka.

Also when I reduce the number of PVs being analyzed, it makes a statement to that effect in the GUI engine window. It never did that before in any prior version of Rybka.

So I'm wondering if this is an improvement, or simply new information (of no effect), or a bug such as a compiler error? I'm running the 32-bit single processor version, of course.
Parent - - By Fulcrum2000 (****) Date 2007-06-18 13:19
Might be caused by the "info string" fix?.
Parent - - By MightyMouse (**) Date 2007-06-18 13:30
Might be, I guess, but I'm not privey to such insider info. I'm not a beta tester. I'm not complaining about the new information, which may be simply some more feedback about what's happening. If so, that's OK. I only worry because it's completely new outputs, a new kind of behavior in the GUI, without any mention in the 'readme' or elsewhere.
Parent - - By revengeska (**) Date 2007-06-18 22:15
It's most likely not a bug.  It also displays new information when starting in Arena.  The programmer(Vas) would specifically have to put in extra code to make Rybka display those values.  So I'm 99% sure he meant to include it on purpose.  The only way I can think of how it could be an accident is if Vas commented out such code earlier and accidently deleted the comment characters, making the code visible to the compiler.  It's far more likely he did it on purpose.  Either way, it won't affect the playing strength of the program.

I suppose it could be an accident as well if the GUI displayed them because something changed in the program, but since it shows up on multiple GUIs, I also doubt this is the case.
Parent - - By Fulcrum2000 (****) Date 2007-06-18 22:26
I'm quite sure it's related to the "info string" bug. Rybka didn't use the correct UCI syntax when sending info. It did sent "info blablabla" the fixed version (2.3.2a) sends correctly "info string blabla". I think the GUI's where confused by the wrong syntax and didn't show anything. Using the correct syntax made the additional info to appear in the GUI's
Parent - By Vasik Rajlich (Silver) Date 2007-06-19 12:48
Yes, this is the "info string" bugfix. The information was always intended to be provided to the GUI, now finally it is done as specified in UCI.

Vas
Parent - - By Uri Blass (*****) Date 2007-06-18 13:16
I wonder how do you know that the difference in playing strength is not more than 2 elo.
I suspect that zugzwang detection can give more than 2 elo points and in order to detect 5 elo improvement you need to play many thousands of games.

I think that it is not ok for testers to merge 2.3.2 and 2.3.2a

Uri
Parent - - By Jim Walker (***) Date 2007-06-18 14:16
Uri please do the math!  How many games will it take to tell if it's 2 Elo or 5 Elo?  It's simply an educated guess and it's likely that a new version will be out before anybody proves what the difference is.  Bottom line:  The difference will be negligible for most people.
Jim
Parent - - By Uri Blass (*****) Date 2007-06-18 20:44
I am sure that we need many thousands of games to be convinced of 5 elo difference but it is not ok to merge versions when we do not know what is the difference.

The difference may be also 20 elo and not 5 elo.
The difference at 40/40 CCRL may be even 50 elo
We simply do not know and when I do not believe that the difference is 50 elo the only way to be practically sure that the difference is smaller than 50 elo is by testing.

I do not plan to do the testing but if other do testing it is interesting for me to know the difference that they find between 2.3.2 and 2.3.2a even if the difference is not significant so I do not like merging versions.

Uri
Parent - By turbojuice1122 (Gold) Date 2007-06-18 21:10
I'm on it since yesterday--I started a gauntlet at 3'+3" time control, 50 games each against Hiarcs 11.1, Fritz 10, Loop 10.32f, Shredder 10, and Zap Zanzibar.  First doing Rybka 2.3.2a, then will do Rybka 2.3.2.  All programs use their own books (Loop uses Sheebar.ctg, Zap uses TourBookII.ctg), so the conditions in the two sets of gauntlets will be identical.  I should know a lot more in some days, but after only 95 games, the performance rating for Rybka 2.3.2a is looking very, very close to Rybka 2.3.2 in terms of percentage score (about 70%) against rated competition (about 2816 CEGT average).  The results with both Rybka 2.3.2a and Rybka 2.3.2 should be finished by the end of the week.  I'm using a "longer" time control than before so as to increase the accuracy of the results and to allow programs to be able to handle the endgame better.
Parent - By Jim Walker (***) Date 2007-06-19 11:17
Uri have you ever seen even one program tested to the point where the uncertainty is say less than +/- 0.2 Elo?  This is the kind of uncertainty you will need to say with certainty that a program has changed by 2 Elo for sure. 
In the end, what difference will that make to anybody?  Sometimes your "Spock" type logic has no place in the real world.  Common sense should tell you to forget about it.
Jim
Parent - - By grolich (***) Date 2007-06-18 15:03
I suspect the 2 was only a way of saying it's almost unmeasurable and not enough to show any significant differences.

I'm sure Larry and Vas keep running their tests all the time, and they probably have not noticed any difference in ELO between the versions.
So it may be 1-2 ELO, it may be 5... Does not make much of a difference.

Statistically speaking, even as many as a 5 ELO difference should only mean a difference of about 0.625% in overall score in performance.
You'd be within a statistical margin of error in almost any series of tests you're going to make. (Unless you're REALLY REALLY patient and you have tons of computers to run your tests on).

So all in all, it doesn't make that much difference.

But I tend to believe Vas's estimate of about 2 ELO and not your opinion, simply because I've seen how much experience can affect these approximations, and he's the one who has been working on Rybka, noticing the effect on performance each change had while he programmed the different Rybka versions. Changes none of the users had a chance to experience... The experiential process of developing these things must have made his a very 'educated' guess.

Of course, it's still only guess, and so is your opinion.
It's too small of a difference. And (un)fortunately for us, only the aggregated value of many small changes can be tested, if we want to reach a high degree of certainty as to the difference in performance. (of course, one huge change is also enough, but those are never a problem...)
Parent - - By jaeger1975 (**) Date 2007-06-18 18:42
Hi friends

I have to comment you my experience yesterday en my tiny PC. I have a pentium 4. 3.2 Ghz, HT and 1Gb RAM. I performed this test and I think it has a very interesting result

Match Rybka 2.3.2 vs 2.3.2a
Openings Book: None (It was the Nunn2 Match, 50 opening positions)
Blitz : 2min per side
NUmber of games: 50
results: +10 = 30 - 10

So if the differences is about 2 Point ELO, it means that technically they have the same strength!

Regards

DAVID AMIEL
Parent - By grolich (***) Date 2007-06-18 19:39
I'm afraid that doesn't contradict what Uri said either...

even a 500 game match would not be reliable for determining such a small elo difference.

another thing is, EVEN a 2 point ELO difference is guaranteed to give a certain difference in results, given a sufficiently large number of games.
but it will take a huge number of games for that difference to manifest itself in a close to constant way...

Of course you were also contradicting yourself... If there is indeed a 2 point ELO difference that means they are NOT equal in strength (because of the definition of ELO).

Your test shows nothing about same strength. Only about approximate equality of strength (not absolute equality. actually there's a very high chance that even a 5 or 8 point difference in elo would give equal results in a test as small as 50 games. And that is even if we ignore statistical margin of error).

Now, if you were to perform dozens of thousands of games (making sure the programs do not repeat the same critical games over and over again with so many games and the same programs), you'll get a result which can verify such a difference somewhat better.

I haven't done the math (yet), but I think about a couple of thousands of games should give a semi-accurate result , meaning there will be a more than reasonable chance that an elo difference of about 2 would start manifesting itself.
If it's about 5, maybe even a thousand games are enough for a reasonable chance.
(even ignoring the possibility of statistical error, and assuming the difference is 5 ELO, which should make it a lot more pronounced, it still means only about +0.6 score to the better engine in your 50 game match, which means even statistically they are extremely likely to remain tied during a 50 game match.
If it's only a 2 point difference, then...well, you get the picture. Your test would not have picked up even on a 5 point difference.)

Still... An ELO difference is an ELO difference... even of 2 points. it's a definite difference in strength

Of course I don't think such tests are even important.
It's an almost immeasurable difference.
I simply believe Vas's estimate better because of his experience as Rybka's developer.

Just wanted to clarify that there is something fundamentally wrong with your statement.
Even a two point difference is NOT the same strength. It's just difficult to test for, but it's a slightly stronger engine.
Parent - - By Uri Blass (*****) Date 2007-06-18 20:37
My opinion is that zugzwang detection is more important than 2 elo.
Maybe the detection has price of being slower in positions when there is no zugzwang so the total difference is not more than 2 elo but we need tests to know and I think that it is not responsible to decide that 2.3.2 and 2.3.2a have the same rating without testing.

It is possible that 2.3.2 is stronger and it is possible that 2.3.2a is stronger.
result in blitz also proves nothing about long time control.

It is possible that there is 20 elo difference at long time control and no difference in blitz and the only way to know is by testing.

The fact that vasik is the programmer does not mean that he knows and he did not explain how he got the conclusion of difference of no more than 2 elo.

Uri
Parent - - By Banned for Life (Gold) Date 2007-06-19 02:13
It looks to me like zugzwang "detection" is more likely not using null moves in some endgame scenarios (based on the large slowdowns I see in certain positions in 2.3.2a but not in 2.3.2). Vas can tell me if I'm full of shit. :-) Anyway, this could definitely cut either way so I agree with you that the testers should test both versions individually. If the zugzwang solution cost playing strength on average, I would be very tempted to use the unfixed version...

Regards,
Alan
Parent - By Uri Blass (*****) Date 2007-06-19 03:16
I think that things may be dependent also on time control

In theory if the time control is slow enough then the version with zugzwang detection will play perfect chess when version without zugzwang detection is not going to play perfect chess.

This is the reason that I believe that zugzwang detection should be productive in slower time control.

Uri
Parent - By grolich (***) Date 2007-06-19 07:29
That IS possible.
but I'll take a wild guess here:

Removing the null move optimization once you reach certain endgame positions will just result in a performance degradation (null move causes an ENORMOUS increase in calculation power. It creates more search cutoffs than you can imagine if you haven't tried to implement it in a program yourself).

Also, you're left with the secondary problem of testing for the right "positions" which will "kill" the null move optimization when reached.
Not a simple problem also.

Then there's the problem that even just doing the null move optimization and ALWAYS running a verification after a null move cutoff in those positions or endgames results in a stronger engine than one which simply won't use it in those positions
(I've done some tests with my own code in the past, even though that engine's strength was nowhere near Rybka's)

So I'd have to guess your guess is probably wrong.

As to the slowdowns you have noticed, that means that in those positions the verification is probably used, which would indeed slow things down as compared with a simple null move usage.

Either Vas found some way to detect with a high degree of accuracy the transition into positions in which zugzwang is probable,
and turned the verification on in these positions,

or he gave up and decided to always use the verification (based both on his previous comment of this approach (ugly), and on search speed in middlegame positions, I'd have to go with my first option.

A note to Uri:
The slowdown which results by adding the verification is not as small as you think... And can definitely be the reason for Vas's estimate of about 2 ELO.
Parent - By Vasik Rajlich (Silver) Date 2007-06-19 12:52
It's true that the bugfix does have some (tiny cost).

FWIW - I am not pulling the 2 Elo figure completely from thin air. It comes from my private testing procedure, which I admit is not 100% watertight and could be wrong. However, this is extremely unlikely.

Note also that earlier Rybka versions handled zugzwang properly in most cases. The main problem was that as a human analyst, there was always some doubt in your mind when analyzing an endgame with zugzwang themes. This is an analysis feature, not a strength feature.

Vas
Parent - - By Roland Rösler (****) Date 2007-06-18 22:57
For me, the Elo improvement by zugzwang detection is an academic talk. In practice, a program with this feature is a better program! Now you can analyse endgames much better and can be more sure, that your analysis is correct.
You also have to regard, that most of the programs can´t detect zugzwang. Therefore nobody knows exactly, what this feature brings in Elo, because nobody have a deeper look to the endgames played in CEGT or CCRL.
Last statement: Now, Rybka is grown-up and no more this little princess!
Parent - - By Michael Waesch Date 2007-06-18 23:10
Most of what´s written here is plain theoretical. I am still baffled on how people get exited by rating lists and what other people tell them about how strong an engine is and what it can do. I don´t care at all. I am only interested in how strong an engine is on my very own specific configuration and I won´t get such informations by regarding what others want to tell me.

While on one system 2.2n2 rocks, it is 2.1d3 for another and 2.3.2 for another. Plain simple. One can only find out more information by running an awful lot of games on the system which is to be tested. So if one does not want to spend such a huge amount of time to find out for sure and just begin from scratch when a new computer is bought, well, then one really has to rely on paraphrasing what others have found out for their system, cross fingers and hope it´s also true for one´s own configuration.

If only hard facts would count, I´d bet this board becomes wasteland over night.

Mike
Parent - By Roland Rösler (****) Date 2007-06-19 00:37
I can´t agree to all. Personal opinion is important, statistic is only aid. I have a look to the rating lists like a staff manager to the references of the candidates. Only the five best candidates will be invited and then we have a deeper look to them; only one can have the job. And then I see the play from Rybka in a rook endgame with f,g,h-pawn bothsides and a-pwn oneside. It´s like the exemplary candidate stinks. No company can tolerate this. Therfore I make a wise decision: Rybka becomes the job for department of the interior, but a second candidate becomes the better sold job for looking over the work of Rybka and answer to the questions of the audience.
Last statement: I believe, in the future I only need the hard worker Rybka. Nowadays, he is interested in deodorants!
Parent - By plicocf (***) Date 2007-06-18 19:01
Great news for me. I like to do test suites.

Paulo Soares
Parent - - By Pia (****) Date 2007-06-19 11:39
The most visible bug is that still no difference in Ultra Optimistic and Pessimistic modes (they behave as Ultra Optimistic).
Can anyone else confirm it?
Parent - - By Vasik Rajlich (Silver) Date 2007-06-19 12:56
You're right. I don't have access to my code, so I don't know which one is not working properly (probably Ultra Optimistic).

Vas
Parent - By Pia (****) Date 2007-06-19 15:44
Old tests shows that Ultra Optimistic is close to Very Optimistic (which is close to Slightly Optimistic) and is far from Neutral.
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka 2.3.2a readme

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill