Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Dual Sandy Bridge E5 Xeon Review
- - By Werewolf (*****) [gb] Date 2012-03-30 16:54
Sorry this is brief, but last week I tested the 2.6 Ghz version with turbo 2 taking all cores to 3 Ghz.

Results are from the start position.
here are the results. Everything is nps on the start position, typically running for 2 minutes before I took a measurement:

Houdini 2 Pro
16 cores @ 3.0 Ghz. Hash = 2 GB
24.2 Mn/s
(In vain I tried lowering the hash to 512 MB hoping that would undo the ECC memory. However, it only showed a slight speedup: 25.99 Mn/s. Then I tried only 16 MB RAM = 26.09 MN/s)

From now on Hash = 512 MB

Houdini 2 Pro
1 core @ 3.3 Ghz = 2104.49 KN/s

(That’s slower than 1 core on my Nehalem 3.6Ghz = 2247 KN/s)

Now for Rybka 4.1
Deep Rybka 4.1 SSE 16 Cores @ 3.0 GHz = 790.55 KN/s (corrected to overcome Rybka's mismeasurement)
Deep Rybka 4.1 SSE 8 Cores @ 3.2 GHz = 474.6 KN/s
Deep Rybka 4.1 SSE 4 Cores @ 3.3 GHz = 331.9 KN/s
Deep Rybka 4.1 SSE 1 Core @ 3.3 GHz = 116.6 KN/s
Parent - - By Banned for Life (Gold) Date 2012-03-30 17:10
Your numbers for Rybka aren't right.

You claim you are getting the following speedups for doubling cores:

1 to 4    1.69 per doubling
4 to 8    1.43
8 to 16  1.67

And this is without correcting for clock speed. I don't know what was done to "correct to overcome Rybka's mismeasurement" but I call bullshit.
Parent - - By Werewolf (*****) [gb] Date 2012-03-30 17:37 Edited 2012-03-30 17:40

> And this is without correcting for clock speed. I don't know what was done to "correct to overcome Rybka's mismeasurement" but I call bullshit.


This is why I often don't post benchmarks here - people like you comment without thinking and don't say thank you.

The numbers ARE right, try thinking first. The reason 4 cores are low is because it spread them over 2 cpus, even though it would be much faster on one. It did that to spread the thermal load and to raise the clockspeed. That's why it's not showing the 1.7x speedup you usually get.

Secondly, the only Rybka number which was corrected was the 16 core one which had +5% added to it because Lukas informed me that is the most accurate way of doing it. In other words it's EXACTLY what appeared on the screen + 5%.
All other Rybka results (including the 4 core result) ARE what appeared on the screen.

The numbers are right.
Parent - - By Banned for Life (Gold) Date 2012-03-30 18:27
I do thank you for running the tests, but your numbers are clearly wrong.

The reason 4 cores are low is because it spread them over 2 cpus, even though it would be much faster on one.

4 cores are NOT low. 4 cores show 1.69x speedup per doubling, which is just what Vas programmed in.

8 cores drops to an increase of 1.43x over 4 cores. Then 16 cores, running at a significantly lower clock speed increases back to 1.67X over 8 cores. This isn't reasonable behavior for anyone who claims to be thinking.
Parent - - By Werewolf (*****) [gb] Date 2012-03-30 18:35 Edited 2012-03-30 18:38

> 4 cores are NOT low. 4 cores show 1.69x speedup per doubling, which is just what Vas programmed in.
>
> 8 cores drops to an increase of 1.43x over 4 cores. Then 16 cores, running at a significantly lower clock speed increases back to 1.67X over 8 cores. This isn't reasonable behavior for anyone who claims to be thinking.


Sorry, that was a typo. It's obviously 8 cores which are spreading over to 2 CPUs.

Basically it's like this:
For every doubling of cores Rybka gives 1.7x speed up, right?
But when it goes from 1 physical processor to 2 there must be some sort of penalty. This is applied when 8 cores gets used.

The reason for spreading over onto the second CPU is that that allows turbo 2 to keep the clock speed high because each physical CPU has a low temperature (only 4 cores running on each CPU).

So: the numbers are right.

Anyway the number CAN'T be wrong: I just divided nodes by time as always.....
Parent - - By Banned for Life (Gold) Date 2012-03-30 19:11
It's obviously 8 cores which are spreading over to 2 CPUs.

OK, so your thesis is that spreading onto two cores is causing Rybka to report a significantly lower result. The problem is that IIRC, Rybka doesn't really look at anything other than the master process, and the speedup over a single process is calculated as 1.7 ^ log2(N). So when the number doesn't increase as it is supposed to, it means that the master process has not increased as expected, which may or may not be indicative of what is happening with the slave processes.

Most likely the slowdown when going from one to two processors is due to a significant increase in memory latency. This doesn't show up in the jump from 8 to 16, because it's already reflected in the values from 4 to 8. In fact, the extra 5% that you added wiped out the loss that you would have seen due to a clock speed reduction from 3.2 to 3 GHz.

So what you've really seen are the following:

1) Vas programmed in an increase per doubling of ~1.69.
2) Your machine, when normalizing out Vas' scale factor and the changing clock rate, slows down Rybka about 13% when you're using two cores rather than one.
Parent - - By Werewolf (*****) [gb] Date 2012-03-30 22:26 Edited 2012-03-30 22:32

> OK, so your thesis is that spreading onto two cores is causing Rybka to report a significantly lower result.


yes.

> The problem is that IIRC, Rybka doesn't really look at anything other than the master process, and the speedup over a single process is calculated as 1.7 ^ log2(N). So when the number doesn't increase as it is supposed to, it means that the master process has not increased as expected, which may or may not be indicative of what is happening with the slave processes.
>
> Most likely the slowdown when going from one to two processors is due to a significant increase in memory latency. This doesn't show up in the jump from 8 to 16, because it's already reflected in the values from 4 to 8.


This sounds very plausible.

> In fact, the extra 5% that you added wiped out the loss that you would have seen due to a clock speed reduction from 3.2 to 3 GHz.


I added 5% to the 16 core value because Lukas said Rybka counts nodes accurately to 15 cores but then underestimates from then on. He provided the 5% increase...and of all people he should know I suppose!

> 1) Vas programmed in an increase per doubling of ~1.69.
> 2) Your machine, when normalizing out Vas' scale factor and the changing clock rate, slows down Rybka about 13% when you're using two cores rather than one.


2 CPUs rather than one, but yes.

Sorry about earlier....it's just been a long slog this week and driving around a lot to get a couple of benchmarks from 30 miles away. It's not my machine, I just had 30 minutes to test it.
Parent - By Banned for Life (Gold) Date 2012-03-31 00:18
Yes, 2 CPUs. Anyway, my fault for setting the wrong tone.

I hope Intel is able to fix this (assuming its being caused by long delays when accessing non-local DRAM). I don't care too much about chess applications, but it's important for us stockholders for Intel to be really good when it comes to scaling, since their performance on the low end of the scale hasn't been awe inspiring.
Parent - - By NATIONAL12 (Gold) [gb] Date 2012-03-30 22:02
i would like to ask suj and Lukas what they think of your testing,after all intel guys were there and accepted it,it was their build after all.

thanks for your hard work and time.
Parent - - By suj (***) Date 2012-04-02 13:44 Edited 2012-04-02 13:50
these numbers all ok-not totally wrong but I really wouldnt like to measure nps through the gui.command line is best.
Second was numa or large pages enabled? That gives quite a lot of speed up. Was turbo used? Usually with servers turbo can be disabled in the bios.

As I said earlier think about what you are comparing it too....2687w to a heavily overclocked x5690 is not same.Do it at same clock speed not against what you have if you are just doing a review of the cpu/architecture. Also you are looking at 2 engines both houdini/rybka and how much they can benefit from 4 extra cores in comparison to having lesser cores(12 but at a much higher clock speed).
I know Sjeng and Rondo benefit from the extra cores more than any other engine but am a touch suprised about Houdini as even a stock x5680 yield 24mnps average in a 3/0 playchess game and x5690 around 25mnps at stock speeds which is comparable to a 2687w.

If you are thinking about whether you want to upgrade then play some games as memory handling and position and ply depth comes into play.

Of course the e5 are very good for cluster use-much lesser power usage with better performance/watt@same clock speed and surely I get more nps on even 2690 on a few games last weekend.
Parent - By NATIONAL12 (Gold) [gb] Date 2012-04-02 14:47
You should have addressed this to Werewolf and not me,he said turbo was on and memory 1333.
Parent - - By Regularuser (***) [gb] Date 2012-03-30 19:02
Thanks for posting this.  Very interesting.

It seems to confirm what we exepcted - the new Xeons cannot match up to overclocked 5650s.
Parent - - By NATIONAL12 (Gold) [gb] Date 2012-03-30 20:15
agree,i use 2x5680's o/c to 4.2 and in start position after 5 mins get over 32,000 with Houdini and over 1000 kN/s with Rybka.

Werewolf has seen this for himself when visting me.
Parent - - By Werewolf (*****) [gb] Date 2012-03-30 22:39

> agree,i use 2x5680's o/c to 4.2 and in start position after 5 mins get over 32,000 with Houdini and over 1000 kN/s with Rybka.
>
> Werewolf has seen this for himself when visting me.


Given that you lose 20-30% efficiency when doubling cores I think it is VERY safe to assume that the old Westmere 12 core systems are faster than the new SB E5 xeons, because these new chips cannot be overclocked.

HOWEVER, having said that here are a few reasons to go for the new E5 variety:
i) Turbo 2 is very cool. You get this neat little graph in the corner of the screen which shows how fast the cores are going. It powers down when idle and shoots up to beyond the maximum clock speed when there's a need.
ii) Therefore power consumption is good, especially when idle.
iii) Memory speed supported is now 1600mhz.
iv) The future Ivy Bridge E5 Xeons ought to plug straight into the same Mother Board.
Parent - By NATIONAL12 (Gold) [gb] Date 2012-03-30 23:35
i) Turbo 2 is very cool(Werewolf)

Like Fritz 13.:smile:

Sorry Carl,i could not resist.
Parent - By Regularuser (***) [gb] Date 2012-03-31 07:02 Edited 2012-03-31 07:07
Actaully I was basing my comment largely (but not entirely) on your previously posted figures :)

It's really good that you guys post this stuff.
Parent - By Mark Eldridge (****) [gb] Date 2012-03-31 19:33
Agreed. I get just over 30,000 with Houdini after 5 mins with 2x5650's o/c to 3.8
Parent - - By Lukas Cimiotti (Bronze) [de] Date 2012-03-31 12:12
Thank you for this test. All of us have to keep this in mind: In such a short test performance numbers are not very precise (I guess +/- 5%). In fact you have to repeat these tests several times - but I know you didn't have enough time for this.
The relatively bad numbers for 8 and 16 cores might also have been caused by a bad BIOS setup. Rybka is NUMA aware, but if you switch off NUMA in the BIOS it won't work. I don't know if NUMA support was on or off and I don't know what the penalty is for switching it off.
Do you know if it was on?
And can you please tell me if hyperthreading was on or off?
But to sum thing up - these results are quite in line with what I expected after testing my Core i7-3930K. Sandy Bridge has no advantage over Westmere for chess. And a reasonably overclocked 12 core Westmere still is faster than a 2x E5-2687W.
Parent - - By jpqy (**) [be] Date 2012-03-31 17:29
Chess test Dual Xeon E5-2687w

FEN:
rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1


Cyclone xTreme II:
1/1 00:00 2 0 +0.08 Nb1a3
1/1 00:00 3 0 +0.51 Nb1c3
2/2 00:00 43 0 +0.20 Nb1c3 Nb8c6
3/3 00:00 140 0 +0.51 Nb1c3 Nb8c6 Ng1f3
4/4 00:00 250 0 +0.20 Nb1c3 Nb8c6 Ng1f3 Ng8f6
5/6 00:00 940 0 +0.09 Nb1c3 Nb8c6 Ng1f3 Ng8f6 d2d4
6/11 00:00 1.782 0 +0.20 Nb1c3 Nb8c6 Ng1f3 Ng8f6 d2d4 d7d5
7/11 00:00 3.363 0 +0.12 Nb1c3 Nb8c6 Ng1f3 Ng8f6 d2d4 d7d5 Bc1f4
8/11 00:00 5.336 0 +0.20 Nb1c3 Nb8c6 Ng1f3 Ng8f6 d2d4 d7d5 Bc1f4 Bc8f5
9/22 00:00 16.411 0 +0.13 Nb1c3 d7d5 d2d4 Ng8f6 Bc1f4 Bc8f5 Nc3b5 Nb8a6 Ng1f3
10/22 00:00 22.142 0 +0.16 Nb1c3 Ng8f6 d2d4 e7e6 Ng1f3 Nb8c6 d4d5 e6xd5 Nc3xd5 Nf6xd5 Qd1xd5 Bf8b4+ Bc1d2 Bb4xd2+ Nf3xd2
11/22 00:00 27.456 0 +0.18 Nb1c3 Ng8f6 Ng1f3 d7d5 d2d4 Bc8f5 e2e3 Nb8c6 Bf1d3 Bf5xd3 Qd1xd3 Qd8d6
12/24 00:00 93.891 0 +0.22 Nb1c3 Ng8f6 Ng1f3 d7d5 d2d4 Bc8f5 e2e3 Nb8c6 Bf1d3 Nf6e4 OO Ne4xc3 b2xc3
13/26 00:00 194.498 0 +0.17 Nb1c3 d7d5 Ng1f3 Ng8f6 e2e3 e7e6 Bf1b5+ Bc8d7 OO Bd7xb5 Nc3xb5 Nb8c6 Nb5d4 Bf8d6 Nd4xc6 b7xc6
14/30 00:01 357.052 9.631.068 +0.20 Nb1c3 d7d5 Ng1f3 Ng8f6 e2e3 e7e6 d2d4 Bf8d6 Bc1d2 OO Bf1d3 Nb8c6 OO Bc8d7
14/30 00:01 518.500 9.631.068 +0.26 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 Ng8f6 OO Bf8c5 Nb1c3 Qd8e7 Nc3d5 Nf6xd5 e4xd5 a7a6 Bb5xc6 d7xc6
15/30 00:02 753.392 9.791.161 +0.14 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 Ng8f6 OO Bf8c5 Nb1c3 OO d2d3 Rf8e8 Qd1e2 Nf6g4 Bc1g5
16/34 00:12 3.866.126 10.142.946 +0.22 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 Ng8f6 OO Bf8c5 Nb1c3 OO d2d3 d7d6 Bb5xc6 b7xc6 Bc1e3 Bc5xe3 f2xe3 Bc8e6
17/34 00:15 4.854.430 10.244.821 +0.25 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 Ng8f6 OO Bf8c5 c2c3 OO d2d4 e5xd4 c3xd4 Bc5e7 e4e5 Nf6e4 Nb1c3 d7d5 e5xd6/ep Ne4xd6 Bb5xc6 b7xc6
18/35 00:20 6.652.213 10.404.363 +0.32 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 Ng8f6 OO Bf8c5 c2c3 OO d2d4 e5xd4 c3xd4 Bc5e7 d4d5 a7a6 Bb5d3 Nc6b4 Nb1c3 Nb4xd3 Qd1xd3
19/39 00:33 11.040.297 10.466.787 +0.23 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 a7a6 Bb5xc6 d7xc6 OO Bc8g4 h2h3 Bg4xf3 Qd1xf3 Ng8f6 d2d3 Bf8c5 Nb1c3 Qd8e7 Bc1g5 OOO Bg5xf6 Qe7xf6 Qf3xf6 g7xf6
20/39 00:52 17.435.761 10.563.000 +0.28 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 a7a6 Bb5xc6 d7xc6 OO Bc8g4 h2h3 Bg4xf3 Qd1xf3 Ng8f6 Qf3b3 b7b6 Nb1c3 Bf8c5 d2d3 OO Bc1e3 Qd8d6
21/46 01:31 30.147.825 10.583.336 +0.27 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 a7a6 Bb5xc6 d7xc6 OO Bc8g4 h2h3 Bg4xf3 Qd1xf3 Ng8f6 a2a3 Bf8d6 Nb1c3 OO d2d3 h7h6 Bc1e3 Qd8e7 Qf3f5
22/46 02:30 49.889.418 10.614.174 +0.23 e2e4 e7e5 Ng1f3 Nb8c6 Bf1b5 a7a6 Bb5xc6 d7xc6 OO Bc8g4 h2h3 Bg4xf3 Qd1xf3 Ng8f6 Nb1c3 Bf8c5 d2d3 Qd8e7 Bc1g5 OOO a2a4 Qe7e6 Bg5xf6 g7xf6 Ra1c1

JP.
Parent - By Werewolf (*****) [gb] Date 2012-03-31 18:10
can you test with Rybka 4.1 or Houdini 2 Pro by any chance?
Parent - By Werewolf (*****) [gb] Date 2012-03-31 18:15

> In fact you have to repeat these tests several times - but I know you didn't have enough time for this.


yep...I wish I had more time.

> Rybka is NUMA aware


I didn't know that. In January I asked Vas if he was going to add NUMA to Rybka 5 and he said "probably not" which made me think 4.1 doesn't have it. But Vas may have thought I meant something else.

> I don't know if NUMA support was on or off and I don't know what the penalty is for switching it off.
> Do you know if it was on?


I don't know - it didn't occur to me to ask. The engineer is away until Wednesday but I will find out then.

> And can you please tell me if hyperthreading was on or off?


This I did check! Hyperthreading was OFF.
Parent - - By Werewolf (*****) [gb] Date 2012-04-05 15:14

> Rybka is NUMA aware, but if you switch off NUMA in the BIOS it won't work. I don't know if NUMA support was on or off and I don't know what the penalty is for switching it off.
> Do you know if it was on?


I can now confirm that NUMA was ON.
From data on Robert H website, he claims Houdini gains +10% from NUMA.

The message to chess players is clear: The E5 Sandy Bridge Xeons are not a good buy.
Parent - - By NATIONAL12 (Gold) [gb] Date 2012-04-05 20:37 Edited 2012-04-05 20:40
thanks for doing these tests,if David Evans set up is still for sale it could be a good cheap buy for you(12 core at 3.8).

http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=24489
Parent - - By Werewolf (*****) [gb] Date 2012-04-05 22:11

> David Evans set up is still for sale it could be a good cheap buy for you(12 core at 3.8).
>


Too late I'm afraid Paul - today I took the plunge and bought a new system (also an overclocked 12 core).

Thanks for those benchmarks btw. Here's hoping for Ivy Bridge...
Parent - - By NATIONAL12 (Gold) [gb] Date 2012-04-08 00:12
Carl,when these guys do your build,i know its costing a lot of money.I believe you told me 4.5 on water.

i give you the following suggestion,make them run H2 on IA for 48 hours from start position.

They may not realise how tough this test is.

I dont know anyone on forum that has yet achieved this over this time.

If they cant do this i believe you have no contract with them.

Just trying to save you money.

I take it for granted they are using xeon 5690's or at least 5680's like i have.
Parent - By NATIONAL12 (Gold) [gb] Date 2012-04-08 00:18
Now my friend Tony is back in UK who built my computer,i feel he will give you advice on this and the right questions to ask these guys.

http://rybkaforum.net/cgi-bin/rybkaforum/message_add.pl?uid=6
Up Topic The Rybka Lounge / Computer Chess / Dual Sandy Bridge E5 Xeon Review

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill