Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Deep Rybka 4 SSE42 X64 vs Deep Rybka 3 X64 /2
- - By Bouddha (****) [ch] Date 2010-05-27 09:22
Ok, I started the test again with the following settings

Opening book => Fritz 12.ctg
i7 980x at 4 GHz
Ponder => off
Large pages => off
Time control => 3min +2sec per side

After 74 games played the score is

Deep Rybka 4 SSE42 X64 => 41.5/74

Which means that Deep Rybka 4 SSE42 X64 is currently only about 42 ELO stronger than Deep Rybka 3 x64

I know it may have better results against other engines compared to Rybka 3 but this is still to be proven.

With 42 ELO over Rybka 3, this is below my minimum expection and thus, I am not satisfied.

Also I wouldn't be surpprised if the next Stockfish (free engine) which may be release beginning of June would already be stronger than Rybka 4.

What is maybe more fustrating to me, is to imagine that what was released for purchase and that I bought was not the strongest of what Vas had in his pocket !

regards
Parent - By Werewolf (*****) [gb] Date 2010-05-27 09:27
My results echo yours and by the sounds of it we're not alone. I'm now testing at longer TC to see if that makes a difference.
I believe the next Stockfish will surpass Rybka 3, not sure about R4...but we'll see.
Parent - - By Eckeneckepen (*) [de] Date 2010-05-27 10:03
The same for me: I am not satisfied.

Additionally you must take into account that Time-Control was optimized. If you subtract TC-improvements from R4 there are no more mentionable elo improvements left . Sigh
Parent - - By Fulcrum2000 (****) [nl] Date 2010-05-27 10:13
The default TC settings are not (fully) optimized for all time controls. There is still a lot to be gained there.
Parent - By Bouddha (****) [ch] Date 2010-05-27 10:16
I do not agree with that.

Other engines are not supposed to be changed the TC dependent of te openend and the time control selected.

So default has to be used.
Parent - - By Silvian (***) [ro] Date 2010-05-27 10:24

> I am not satisfied.
>


Mammy,mammy, only +42 ELO points ????????????

Parent - By Bouddha (****) [ch] Date 2010-05-27 10:28
Yes exactly !
Parent - By Eckeneckepen (*) [de] Date 2010-05-27 11:09
Silvian, please dont dinky weep.  *wipe your tears away* ... its only 120 dollar.

The point is, that R4  seems not the result of two years of work. I have the feeling that playing strength was  intentional reduced because of future plans with rental rybka.
Parent - By stvs (***) [gr] Date 2010-05-27 12:18
if next stockfish is stronger than rybka 4 maybe expect more and more claims that stockfish is a clone too :)
Parent - - By turbojuice1122 (Gold) [us] Date 2010-05-27 13:52
74 games has an elo uncertainty of around +/- 65, I think.
Parent - - By Bouddha (****) [ch] Date 2010-05-27 14:49
That sounds huge to me and difficult to trust.

Can you back-up this statement with data ?

Also it is still running and when back home I will post latest results. (I assume its over 100 games by now)
Parent - By turbojuice1122 (Gold) [us] Date 2010-05-27 15:00
Check the CCRL or CEGT rating lists and see the uncertainty for numbers of games between about 50 and 90, and you'll get a pretty good idea.
Parent - - By Werewolf (*****) [gb] Date 2010-05-27 14:57
Kindof weird that we decide human world championship matches in 20 games or less...
Parent - By turbojuice1122 (Gold) [us] Date 2010-05-27 15:01
Yes, though many of the statistical uncertainties present in computer play are negated by extreme preparation and concentration during the games.  Computers don't have this "sense", so I think that the true uncertainty of human matches is much less than one would obtain from the statistical analysis.
Parent - - By Leto (***) [us] Date 2010-05-28 04:58
In my view human world championships are for mere entertainment only, they're pretty much worthless for determining who the stronger player is.  Even if these championships ran for 1000 games you cannot know who the stronger player is from that match alone.  Just because B beats A doesn't necessarilly mean B is stronger than A.  B could perform worse against C, D, E, F, and G, making A the superior player.
Parent - By Bouddha (****) [ch] Date 2010-05-28 05:05
Maybe we should ask CEGT to test humans also ? :-D
Parent - By Bouddha (****) [ch] Date 2010-05-27 18:01
OK,

final result after 126 games

Deep Rybka 4 SSE42 X64 => 72/126 => +50 ELO over Rybka 3
Parent - - By Uly (Gold) [mx] Date 2010-05-27 19:25

> Time control => 3min +2sec per side


The TC settings were definitively not optimized for this time control.

Why waiting for the next release of a different engine when you can try different TC settings yourself?

Imagine that a R4+ is released, but the only difference is that the TC settings are optimized for 3+2 time controls, would you be very happy if you get a +80 elo result? Nonsense, you could have found those TC settings yourself.
Parent - - By Bouddha (****) [ch] Date 2010-05-27 19:36
What is nonsense is to fine tune the settings for each time control and openend.

Is Deep Rybka 4 supposed to play the clone game ?

Same as all other engines, Deep Rybka 4 was released with some settings, why not test like that, what we bought ?
Parent - - By Dadi Jonsson (Silver) [is] Date 2010-05-27 19:45

> Rybka 4 was released with some settings, why not test like that, what we bought ?


Didn't you buy the option to change the settings? Do you always leave the hash size as it is set by default :wink:

I think it's important to know as much as possible how the default settings perform, but results for different settings are also interesting (as long as they are based on enough games). However, different settings are hard to evaluate unless you know how the standard version performs under the same conditions.
Parent - - By Bouddha (****) [ch] Date 2010-05-27 20:17
Yes, but if you have to change the settings for each openend, this is not very convienient.

Beta testers have worked on default settings which should do fine in time controls like 3min + 2 sec.

This is not such a special border limit time control.

And by the way, there is currently no guidance/publication what settings should be used for what time control & openend.

What is customer who want the best settings supposed to do ? test in 1000 games ?

Yes fun game & time, but maybe some customers dont want to spend that time to set correctly Deep Rybka 4.

Maybe if StockFish also will have that function in its next release we can play a new game :
I have settings that destroy the settings you set to destroy my settings, I set to........

Dont get me wrong, I am not saying that its not good that we can play around with that feature.

But testing with default is fine and there is no reason to critic that.
Parent - By Dadi Jonsson (Silver) [is] Date 2010-05-27 20:23

> testing with default is fine and there is no reason to critic that.


Agreed.
Parent - By Vempele (Silver) [fi] Date 2010-05-27 20:09

> Imagine that a R4+ is released, but the only difference is that the TC settings are optimized for 3+2 time controls, would you be very happy if you get a +80 elo result?


I would be happy. I mean, that's a bigger improvement than thinking twice as long but mysteriously not losing on time.
Parent - - By hal9000 (**) [no] Date 2010-05-28 16:24 Edited 2010-05-28 16:31

> With 42 ELO over Rybka 3, this is below my minimum expection and thus, I am not satisfied.


So why did you purchase Rybka 4 so early then? You could've have waited until more test results had been available. So please do us a favor and stop whining :)

Besides, you draw a general conclusion about Rybka 4's rating based on 3+2 games with ponder off (I think it's preferrable to enable ponder and use two different computers). You also need a lot more games before you can start to draw conclusions. Furthermore, Rybka 4's rating isn't only determined by how she performs against Rybka 3, but against other engines as well. Your blitz results do not necessarily reflect Rybka 4's performance at longer time controls either.
Parent - - By Bouddha (****) [ch] Date 2010-05-28 17:56

> So why did you purchase Rybka 4 so early then? You could've have waited until more test results had been available. So please do us a favor and stop whining :-)


You dont have to read my post if they are so anoing to you.

I bought it because I trusted in Rybka blindly since version 1.
I also read the posts where it was said that it should be at a minimum about 60 ELO better.

for the rest of your post on analysis, I am ok with it.
Parent - - By hal9000 (**) [no] Date 2010-05-28 21:58

> You dont have to read my post if they are so anoing to you.


It's nothing personal, but generally speaking I just think some people are complaining too much. People have the right to complain and offer constructive criticism, of course, but I think some of these complaints are unreasonable. Sometimes I'm left with the feeling that certain people here believe they can make demands from Vas, as if he was owing them something...

> I bought it because I trusted in Rybka blindly since version 1.


I can understand why you feel disappointed if you expected a much bigger improvement, but perhaps your expectations were too big? The kind of improvement you saw going from R2 to R3 isn't commonplace, and shouldn't be expected. Besides, purchasing a product blindly isn't always a good idea...

> I also read the posts where it was said that it should be at a minimum about 60 ELO better.


I think it's too early to draw conclusions. So far we have (AFAIK) only blitz games to base our estimates on. And these results vary from around 60 (IPON) to over 100 points (CCRL). More results at longer time controls will become available over the next few days, and should give us an indication of how strong R4 is at tournament levels.

If it turns out that R4 is actually 60 points stronger than R4 at tournament level, it would still represent a nice improvement over R3 in my opinion.
Parent - - By Bouddha (****) [ch] Date 2010-05-28 22:58
I agree with what you say.

But I still have a certain level of fustration for the following reasons :
- My expectionstions was set higher than what currently seems to be supplied
- I have the feeling that Vas had better in his pocket but he kept it for something else.... (cloud...?)

I would mostlikely feel much less fustrated if I had the feeling that what we received was the best he could offer. But its not the case.
Parent - - By hal9000 (**) [no] Date 2010-05-28 23:57
Bouddha wrote:

> I have the feeling that Vas had better in his pocket but he kept it for something else.... (cloud...?)


Do you have anything to support this 'notion' or is it just pure speculation or wishful thinking? I must admit that what you're saying doesn't make a lot of sense to me.

There was a typo in my previous post, BTW: "If it turns out that R4 is actually 60 points stronger than R4 ..." should read "If it turns out that R4 is actually 60 points stronger than R3".
Parent - By Milton (***) [us] Date 2010-05-29 02:25

> I have the feeling that Vas had better in his pocket but he kept it for something else.... (cloud...?)


>Do you have anything to support this 'notion' or is it just pure speculation or wishful thinking? I must admit that what you're saying doesn't make a lot of sense to me.


Of course, this is not proof; but it does show that Bouddha is not the only one in this forum who holds this belief. 

http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=247960;hl=cloud

"I think Vas is doing his best to improve R4, but withholding one or two really original ideas for Cloud Rybka, So R4 is still progressing normally, just it's handicapped by some unknown amount. "  Larry Kaufman
- - By Ron Bateman (**) [us] Date 2010-05-28 04:06
How should one use the SSE4.2 Rybka 4 engine?  Just what is it anyway?  Does anyone know anything about the Rybka 4 Observer engine?  I notice that you can only use 1 core.  Thanks.
Parent - By Vempele (Silver) [fi] Date 2010-05-28 07:20
It's about 5% faster on processors that support it (Nehalem and Phenom).

> Does anyone know anything about the Rybka 4 Observer engine?


A renamed Rybka 4. Aquarium uses it to check for user blunders in Play mode, I think.
Up Topic Rybka Support & Discussion / Rybka Discussion / Deep Rybka 4 SSE42 X64 vs Deep Rybka 3 X64 /2

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill