Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Will Rybka 3 scale good on a 12 CPU & 16 CPU ?
- - By Bouddha (****) [ch] Date 2008-04-22 09:32
Hi Vas,

Its “common” today to see 4 cores or 8 cores machines.

You explained that you are working for Rybka 3 on the scalability improvements on multi cores.

I may be wrong with my statement but usually when you double the number of cores you should be able if you program scales perfectly to improve speed by a factor of 1.7

Currently  with Rybka 2.3.2a going from a 4CPU to a 8CPU brings around 1.5x speed and not 1.7

Will Rybka 3 bring 1.7x speed ?

And now even more important, I am considering buying a machine with the Intel 2x Penryn: 6 cores => 12 cores when it will be available (2nd have of 2008). Will Rybka 3 scale good on a 12 core or lets say a 16 core machine ?

Can you give maybe numbers of speed improvement groing from a 4 cores to a 16 cores ?

Perfect program should gain around 2.9x speed in that case.

Thanks for your input
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-04-23 00:02
I don't have precise figures yet, but yes, I believe that going from 8 cores to 16 will give at least a 1.7 effective speedup.

Vas
Parent - - By M ANSARI (*****) [kw] Date 2008-04-23 09:20
Wow ... so things must have changed.  That is almost as dramatic as the speedup from single to dual core.  I must re-think my hardware strategy now :)
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-04-24 09:02
Your ideas about multiple "Rybka-type entities" still make a lot of sense for clusters. We'll take a look at this after Rybka 3.

Vas
Parent - - By ernest (****) [fr] Date 2008-04-23 19:10
> I believe that going from 8 cores to 16 will give at least a 1.7 effective speedup

Conceptually, how is it possible that there is not a decrease of speedup, going 1 -> 2, 2 -> 4, 4 -> 8, 8 -> 16, ...? :-P
Parent - - By Roland Rösler (****) [de] Date 2008-04-23 20:26 Edited 2008-04-23 20:33
There is a decrease of speedup, but I can´t follow your coloum of numbers.
Rybka 2.3.2 has a speadup of 1.7 (2 cores), 2.8 (1.7*1.65; 4 cores), 4.4 (1.7*1.65*1.57; 8 cores) and 6.2 (estimated) (1.7*1.65*1.57*1.41; 16 cores).
If Vas is right with his estimation from 8 to 16 cores (>=1.7), we will see maybe the following speedups for Rybka 3 mp:
1.8 (2 cores), 3,2 (1.8*1.78; 4 cores), 5.6 (1.8*1.78*1.74; 8 cores) and 9.5 (1.8*1.78*1.74*1.7; 16 cores).
Not bad, I think :-)! But less us see ...
Parent - - By Bouddha (****) [ch] Date 2008-04-23 20:42
I have no idea, but my experience is that if programm scales perfectly it should always be around a 1.7 speedup when you double the number of cores.

1 core
2 cores = 1.7 speedup
4 cores = 1.7x1.7
8 cores = 1.7x1.7x1.7 or 4cores x 1.7
16 cores = etc
Parent - - By Vempele (Silver) [fi] Date 2008-04-23 21:01

> I have no idea, but my experience is that if programm scales perfectly it should always be around a 1.7 speedup when you double the number of cores.


Perfect scaling is defined to be 2x.

>2 cores = 1.7 speedup


No. Cray Blitz got 2.0, for example.

>4 cores = 1.7x1.7


Cray Blitz, 3.7.

>8 cores = 1.7x1.7x1.7 or 4cores x 1.7


6.6
Parent - - By boo! (**) [no] Date 2008-04-23 21:48
A speedup of 2.0 can only be achieved if no extra overhead what so ever is needed to parallelize the problem. This is clearly not the case with chess, so a speedup of 2.0 can't be true.
Parent - - By Vempele (Silver) [fi] Date 2008-04-24 07:55
Or if the overhead is less than 0.05. Please learn to round.
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-04-24 09:04
For effective speedup, I will have trouble believing anything higher than 1.9x on 2 cores.

For time-to-depth speedup, I can believe even 2.5x on 2 cores.

Vas
Parent - - By Sesse (****) [no] Date 2008-04-24 23:29
Why superlinear? Because there's more cache on two cores than on one?

/* Steinar */
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-04-28 13:56
Because multi-processing changes the shape of the search tree.

Try playing single-processor Rybka vs multi-processor Rybka in fixed-depth games. You won't get an even score.

Vas
Parent - - By Bouddha (****) [ch] Date 2008-04-28 14:35
What will be the result ?
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-04-28 15:23
Multi-processor Rybka will win.

Vas
Parent - - By Carl Bicknell (*****) [gb] Date 2008-04-28 17:10
Does the logic continue to hold true for more cores? So would fixed depth quad beat fixed depth duo?
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-05-01 08:07
For 2 vs 4 and 4 vs 8 cores, yes. At some point, it will tail away.

Vas
Parent - - By Carl Bicknell (*****) [gb] Date 2008-05-02 09:59
Sorry to labour the point still further, but is this the reason why a doubling of speed by adding more cores seems to give more than the traditional 40elo gain from doubling the clock speed on a single processor?

I realise there is some efficentcy loss when doubling cores but if the search shape is improved this seems to at least make up for it. Correct?
Parent - - By Vempele (Silver) [fi] Date 2008-05-02 10:22

> the traditional 40elo gain from doubling the clock speed on a single processor?


40 points? Double that.
Parent - - By Carl Bicknell (*****) [gb] Date 2008-05-02 12:10
at 40 in 2 going from 500mhz to 1000mhz was always thought to yield around 40elo (cf Selective Search Magazine, ratings section etc)
Parent - By turbojuice1122 (Gold) [us] Date 2008-05-02 12:52
I would guess that against human opponents, it's less, possibly a lot less: humans win against computers by finding areas where the computers are weak, and these are usually involved in positional understanding and strategical planning, something that is not really improved by increased processing power.
Parent - By ernest (****) [fr] Date 2008-05-02 18:31
> was always thought to yield around 40elo

It seems you are definitely on the low side...

Standard "thinking" is 60-70 Elo.

But then, there is the "diminishing returns" debate (when you go deep)!
Parent - - By turbojuice1122 (Gold) [us] Date 2008-05-02 18:48
I just remembered--the 40 elo is usually what is considered "standard" if you double the number of cores, while 60-70 elo is considered "standard" in computer vs. computer matches if you double the processing speed.
Parent - - By Carl Bicknell (*****) [gb] Date 2008-05-02 19:03
please read the post below Vas' - my question was that although I TOTALLY see where you're coming from, his information about the search tree changing shape seems to indicate that more cores = better performance NOT ONLY because of the speed increase.

So, if you're right and it's 60-70elo gain for processor speed increase (i.e 1Ghz going to 2 Ghz) but only 40elo per doubling of cores (traditional thinking) then it might ACTUALLY be 60-70elo doubling of cores because of Vas' point.

I guess we'll only know when he replies!

:)
Parent - By turbojuice1122 (Gold) [us] Date 2008-05-02 19:15
Yes, I remember his statement on that--that's an interesting point.  Of course, I'd be pretty surprised if it's more than about 30 elo against humans in either case.
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-05-03 16:38
All of this is taken account when I report effective speedup. Any other speedup measurement would be meaningless.

Vas
Parent - - By Roland Rösler (****) [de] Date 2008-05-03 01:34
That´s the ruling doctrine. But nobody believe in it :-)! With Quad (factor 2.8 of time in Rybka) you don´t achieve this +80 Elo. Have a deeper look to CEGT and CCRL.
Parent - By ernest (****) [fr] Date 2008-05-03 15:57
> With Quad (factor 2.8 of time in Rybka)

Well, maybe that 2.8 is a bit on the high side too... :-p
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-05-03 16:36
Older versions of Rybka had an "effective speedup" of around 1.7x. This means that if you played two-processor Rybka against one-processor Rybka, you'd need to give a 1.7x time handicap to have an equal match.

No engine will have an effective speedup of more than 1.9 or so, unless it's doing something pathological in single-process mode.

Vas
Parent - - By Sesse (****) [no] Date 2008-04-28 14:58
It feels a bit misleading to talk of a "speedup" if you potentially get worse moves (which would be what could happen with a more tightly pruned tree, no?)...

/* Steinar */
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-04-28 15:24
Yes, exactly. When I use the word "speedup", I mean "effective speedup". By itself, a time-to-depth speedup is pointless.

Vas
Parent - - By boo! (**) [no] Date 2008-04-24 16:30
I'm a programmer, I don't round, I truncate :P
Parent - By Vempele (Silver) [fi] Date 2008-04-24 20:20
LOL, me too. But I learned math before programming, so that must be why I got it right! :-p
Parent - - By Roland Rösler (****) [de] Date 2008-04-23 21:53
Many thanks for the data. Very interesting!

>2 cores = 1.7 speedup
No. Cray Blitz got 2.0, for example.


I don´t believe in 2.0. I can believe in 1.9999 :-).
Parent - By BB (****) [au] Date 2008-04-23 22:03

> I don´t believe in 2.0. I can believe in 1.9999 :-).


From the data given, I get 37772 versus 74024 seconds for the 24 test positions, for a ratio of 1.96. For some individual positions, the ratio is actually above 2, presumably due to rounding or noise effects.
Parent - - By BB (****) [au] Date 2008-04-23 21:58
The infos about CB are from: http://www.cis.uab.edu/hyatt/search.html
Parent - - By boo! (**) [no] Date 2008-04-23 22:12
This could very well be what Vas has been reading lately. :)
Interestingly, Hyatt's speedup from 8 to 16 cores corresponds pretty well with Vas' estimates: 1.68 ~ 1.7
Parent - - By BB (****) [au] Date 2008-04-23 22:23

>This could very well be what Vas has been reading lately. :-)


Just as long as he doesn't ask Hyatt for help :) --- see here:
I have no beef against Vas. I have a beef with all that do this. There are others far more egregious in this behavior. They know who they are. Literally hundreds of emails asking specific questions about DTS this or DTS that. And then they are commercial. DTS is non-trivial to implement. The devil is in the details.
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-04-24 09:06
Hmm .. there is only one commercial engine with DTS. Sounds interesting :)

BTW - my new algorithm is better than DTS, I am quite sure of it.

Vas
Parent - By Roland Rösler (****) [de] Date 2008-04-23 21:41
Maybe you are right. But I don´t believe in perfect scaling (constant speedup factor) in chess enginges. A speedup of ~ 200 for 1024 cores? Never! On the other side, I see no reason, why you can´t get a speedup of 1.8 from 1 to 2 cores (or from 2 to 4).
Okay, speedup factor is <2. This is a fact. Speedup factor should be >1. This isn´t an trivial implementation. I remember the Kasparov vs Junior match, where they made marketing with their 8 processor machine, but played with an 4 processor machine, because speedup from 4 to 8 cores was <1. In last Kramnik vs Fritz match, real speedup was maybe =1 from one to four cores.
Parent - - By grolich (***) Date 2008-04-24 09:33 Edited 2008-04-24 09:36
That's never true. For any program. Scaling ALWAYS becomes worse as the number of processors incease

4->8 is always worse than 2->4, 2->4 is always worse than 1->2 etc.

The constant factor you seem to believe in, which remains constant regardless of the number of cores, is more in the department of science fiction... It's never true, for any Chess program (actually, for ANY programs, but given certain presuppositions on memory accesses and usage of other system resources, there can be, theoretically, programs that scale with an almost constant scaling factor. For chess programs, even the "almost" part is never true. Actually... For almost any program)

The perfect scaling=1.7  part is strange too:
As others have pointed out, there are programs that scale MUCH better than 1.7 from 1->2

A scale factor of >=1.8, as Vas suggested may be the case for Rybka 3, sounds really good though.
I wasn't expecting such results that quickly.
Parent - By Sesse (****) [no] Date 2008-05-02 10:57
I don't buy your "always" -- there are HPC programs that scale superlinearly (usually there's a bump when your working set starts to fit into the L2 cache, which usually grows larger with more cores), and for those, the jump from 2->4 can very well be better than the jump from 1->2.

For chess, of course, probably not.

/* Steinar */
Up Topic Rybka Support & Discussion / Rybka Discussion / Will Rybka 3 scale good on a 12 CPU & 16 CPU ?

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill