I am planning to get a new PC, and I would like to know from you if this "configuration" may be powerful enough to fastly run games and analysis with Rybka 2.3.2a 64 bit engine on Shredder 10 Gui, with Vista 64-bit naturally.
CPU Intel Quad Q6600 (or Q9300 ??)
RAM 4Gb (2x2Gb)
MOTHERBOARd Asus P5K-E
Video Card ZOTAC (or XFX) GeForce 8800GT 512 Mb
HDD - 2x Western Digital 250Gb 16 Mb cache - Raid 0
Frankly speaking I felt astonished reading that is far better to put Tablebases ( I am thinking of the Nalimov ones....so many huge files ;-)... I have just those till 5-men...7.05Gb) on a USB Pendrive in order to reach fastest access times!! I was guessing that Raid HD system was faster than USB 2.0 connection. Very very interesting!!
So, in your opinion, I could get just a 500GB 16mb cache hard disk just for normal use and that's it....
As far as the CPu is concerned, since I would like to get a PC with good longevity....I would be inclined to consider Q9550 for my system, 45ns technolgy of this processor should perfectly fit and be exploited by Asus P5K-E motherboard, shouldn't it? DO you believe that 4Gb (2x2Gb) will be enough on a Vista 64 bit version under huge use? (I am mainly considering calculation involved with chess anlysis but also huge latest games.....)
As regards the cooling part of the "project", what about a case from Lian Li, maybe one from the Silent range? (Maybe the a12.....ATX midtower). Would it be enough to keep down temperature under moderate overclocking with respect to silence? I know they are some expensive but it seems they are just "the state of the art" about silent and "refrigerated" cases.
Many issues, my kind friend....:-)
May I ask you one thing about Rybka2.3.2a?
In your opinion, which one is the strongest opening book for this engine? I downloaded Perfect13.ctg and imported inside Shredder 10 GUI but I have not seen such an improvement if compared to RybkaII.ctg. Am I mistaking something with configuration or whatever else?
Sorry for such a long post.
I am not sure I like the choice of case too much. Lian Li make wonderful stuff, but avoid anything that has low noise as a design feature rather than very good air circulation. I do like their V2000 series very much. I would also have a look at the Antec 900 for a very good value well ventilated case.
I think the Q9550 should be the way to go, though I would first check that the P5K-E will support the 8.5 multiplier. If you were restricted to 8.0x's you would need higher FSB speeds than might be comfortable. I have had very good experience with the X38 boards from Gigabyte, and they do support 8.5 Multipliers.
Books are something of a black art, and I am not the best person to ask. I found HS for Rybka to be a very solid performer. In general books composed to compliment the playing style of a specific engine will always be best, which probably explains why RybkaII is still a good performer despite it's age.
I created my own book from computer games downloaded from various sites, then added in my own game history, it plays OK, but probably no better than stuff you can download.
I don't have one of these beauties but for playing tournaments or serious Blitz with a program that uses tablebases or bitbases I would like to have one!
I have a question, maybe I could find an answer somewhere on the forum , but can anyone tell me if you have a 64-bits operating system XP or Vista 64-bits for instance, are there problems using more than 3 gigabyte of main memory? I keep reading that normally Vista only recognizes the first 3 gigabyte, most desktop systems seem to have that as a maximum? Thanks for any advice!
Doing your RAM in 2x2gb is also a good plan. Not sure I would bother with a raid setup for the HD's, the speed is not really a great benefit for gaming or chess. If you are thinking of table bases, the best way is to put them on a USB stick, which speeds them up nicely, and is getting cheaper almost daily.
As far as the CPU is concerned, if the budget allows, I would consider a Q9550, as it will run cooler and over clock higher for a given temperature than a Q6600. It also has larger L2 cache, which should provide a slight performance benefit.
I am assuming you will want to over clock this at least a little. Money spent on good quality cooling will help enormously. A Q6600 G0 should be easily capable of running at 3.2gHz with good air cooling, with the Q9550 about 3.4gHz should be simple to achieve and perfectly stable on air. This is provided it's in a case with good air circulation of course.
As far as I have seen so far, it over clocks well, the only issue being a maximum of 8.5 multiplier. That also raises a possible issue with P35 motherboards. I know the P5K-E does support 45nm CPU's, but I am not sure it will allow you to access the 8.5 setting.
I have two QX9650s running on my X38 and P35 Gigabyte boards and while the .5 option appeared after a BIOS update on the X38, it's not yet available on the P35. So it maybe a P35 limitation, or just Gigabyte being sluggish with updates, I shall try to find out.
I have seen a few at 4+gHz, though obviously pretty high FSBs for that, but Vcore requirements as you would expect are inline with the QX9650. I like the idea of running one at 8.5x440 for a nice stable 3.74gHz at hopefully 1.38 to 1.40 Vcore, which should be a very good 24/7 over clock which is not too hard to cool.
To the extent that a 45nm quad setup is better than a 65nm quad setup at the same clock speed and FSB, the improvement in nodes/clock is most likely attributable to other minor changes and corrections that Intel made when they did the shrink (and for dual quad core setups, the major improvement is probably in the chipset rather than the CPUs).
Has anyone measured Rybka 2.3.2a kn/s for 45nm and 65nm quad setups with the same clock speed, FSB, and chipset to quantify this difference?
Zappa doesn't care about that extra cache - speedup is less than 2%.
There are no really important improovements besides bigger cache.
A 12MB L2 cache is just as useless as an 8MB L2 cache if you're silly enough to try to use it for fast access to a 2GB transposition table. Any hash cluster that went into L2 would be gone before it was needed again. I would be very surprised if Vas was using it for this, and the small difference in hit rates for instructions and other tables will be hardly measurable.
Rybka would be expected to get more of an improvement than Zappa because it is more sensitive to latencies than Zappa. Period.
At the same speed, Rybka performed 5% better on San Diego.
Your statement that the only difference between the 65nm and 45nm processors running at the same clock and FSB rate is the cache size is demonstrably wrong. The most likely reason for better Rybka performance is probably shown in the table below.
I doubt there would be even a 1% difference due to higher instruction and non hash table memory accesses going from 8 MB to 12 MB.
That's something different, I think.
I don't think Rybka makes havy use of SSE instructions - but I don't know exactly. There is one fact that makes me assume this: Rybka performs badly on P4 although P4's SSE performance isn't all that bad.
So one way to find out if Rybka likes bigger caches would be to get a Core 2 Quad 9300 and compare it to my (then) underclocked QX9650.
Or maybe we should ask Vas if he uses SSE at all.
> There is one fact that makes me assume this: Rybka performs badly on P4 although P4's SSE performance isn't all that bad.
Chess engines perform badly on P4 period.
My last P4 ran at 700 MHz before I switched to AMD. The P4 was designed with very long pipelines to allow high clock rates. These long pipelines lead to very high latency and major penalties for things like branch misprediction. Rybka runs poorly on systems with high latency, so its not surprising that its performance on the P4 is really atrocious.
And why does Zappa not get faster on the new 45 nm chips? Or does Zappa not use data alignment? Like all chess engines?
Zappa doesn't care about that extra cache - speedup is less than 2%.
But I guess what you're really saying is that either Intel is bullshitting about the new shuffle engine or that data alignment should be exactly the same proportion of processing in every engine? I can't follow your argument.
The obvious explanation is that Vas is either doing more shuffling than Anthony or Rybka is more sensitive to shuffle latency than Zappa.
Please read my last post again - I repeated that Zappa isn't faster on the new chips - but that doesn't match with your theories.
If my last post was mistakable, I apologize.
Note that one could also ask why a larger cache helps Rybka but not Zappa (I don't think it helps either one, but this is your theory).
My favorite theory is: Rybka loves bigger caches.
We all know that Rybka takes good advantage of fast memory. Zappa doesn't care for memory speed.
Vas once told me when we were discussing mp speedup, hash fetches are not the problem. So maybe Rybka's inner loop doesn't fit into the processor cache. This could be an explanation - but I doubt that.
Bigger cache improoves average memory speed - maybe it's that simple - maybe some data from RAM is used twice?
It would be super fantastic if anyone could figure out a way to have hash table accesses be available in the L2 cache rather than having to go out to DRAM. But if you allow the hash accesses to be stored in L2, you end up with cache pollution, where important stuff in L2 cache (like instructions and other tables) gets overwritten by hash table entries. Since the hash table is designed to randomize the location of any hash entry in memory, your chance of finding the hash entry in the L2 cache would be approximately cache size / hash size. If you use the maximum possible hash size, this comes out to 12 MB / 2 GB or 0.56% hit rate. This is too low to cause a meaningful improvement (and the loss of L2 access to instructions and non-hash tables would be a much bigger performance hit).
Vas is well aware of all these issues and most likely does not allow hash entries to be stored in L2. Its possible that Rybka uses huge tables that need close to 12 MB of storage. Even then, I would be very surprised if the performance difference with an 8MB hash was 6%. This would only be possible if the accesses to this data was cyclical, negating the caches LRU replacement strategy.
Some questions arise:
Are hash fetches really only used once (in case Vas allows them to be stored in L2)?
I made tests with very small hash tables (with only 1 core too), down to 1 MB - kn/s didn't go up very much. This could be because they aren't stored in L2, or - as Vas claims - because hash fetches aren't the problem.
Is it possible that Rybka uses huge tables that don't fit into L2? I don't think so - the size of the executable is only 2.69 MB for Rybka 2.3.2i22 (current freestyle version). OK, tables could be compressed in the executable file - but that is unlikely.
I really don't know why Rybka (and only Rybka) is so memory bound.
Btw. our little discussion really helped me.
To confirm my statements I clocked down my X5460 to 3 GHz and made some brief tests comparing it to an X5365. When I fist saw the results I was shocked:
Rybka being ~22% faster on the X5460, Zappa ~16%. Was all that I told you bogus? That X5365 was formerly my surf comp using 2 5120 Xeons. I hadn't optimized BIOS settings for chess performance. I put in 2 5365s when I disassembled another computer and they were left over. My tests reminded me of optimizing BIOS settings. Worst for chess are prefetches (OK, they improove streaming memory performance - but random accesses get slower), the snoop filter of 5000X is bad for chess too. After correcting these settings performance was normal again - next to 0 performance increase for Zappa and 8% gain for Rybka (these numbers are never super precise). I hope, this will help Vas too, as he currently is using this computer (and some more) in Freestyle.
> Are hash fetches really only used once (in case Vas allows them to be stored in L2)?
Twice (probing and storing).
I get the feeling that Vas has implemented a more efficient memory management system than other engines are using and this explains the increased importance of memory (and other) latencies. Get rid of one bottleneck and another one instantly appears...
Prefetching and cache snooping both cause additional FSB traffic and cache snooping may require more work in the TLB as well. Prefetching is useless for hash accesses that are cache line alligned so turning this off is a no-brainer. Turning cache snooping could result in occasionally using out of date data, but this has been shown to have a negligeable effect on chess engine performance.
I have also gotten something out of this discussion. When I am running a combination of 45nm and 65nm quads, I will know to run Rybka on the 45nm processors and Zappa on the 65nm processors to optimize results!
My question is, as soon as i get my funding i am going to purchase a MacPro 8x, 3 gighz in part based on our conversations. As I understand the mac pro has no Bios that can be optimized for chess. Is my understanding right. Should that affect my decision in buying the MacPro over the Jncs equivalent.
Maybe you should ask Dummkoller - I know he's got a new Mac Pro 2.8 GHz and he had an "old" Mac Pro 8x 3 GHz. He told me the new one was faster for Rybka despite the lower clockspeed. But I don't know if he really made thorough tests.
>bypassing the L2 cache when accessing the hash
There could be a slight ambiguity with "accessing" here. The use of
PREFETCHNTA(NTA is non-temporal access) on some processors elides later eviction to the
L2cache. However, this instruction will typically look in
L2at the onset (and if it hits, will, in this case, evict back to
L2). As indicated in a different thread, there is good reason to expect that later versions in the Rybka series are not insouciant toward the use of this.
As far as PREFETCHNTA is concerned, its true that if the hash cluster was found in L2, it would be returned there, but one would hope this wouldn't happen very often. Having hash values throw instructions and non-hash tables out of L2 would be highly undesirable.
I am also thinking about buying new PC, mainly computer chess and analysis.
I prefer workstations rather than desktop configurations, due to the expansion options.
What do you think?
- motherboard: Tyan Tempest
- 1 Xeon 5460 3000 Mhz (wish list: another Xeon...)
- 8 Gb Ram
- 500 Gb HD
- Antec P-190
What are your opinions?
Thanks very much in advance,
I'd take 2x E5430 (2.66 GHz) instead of 1x X5460 if you haven't got enough money for 2x X5460
Tyan Tempest must have 5400 Seaburg chipset (there are also Tyan Tempest boards with 5000 chipset that won't work)
8 GB is OK - must be FB-DIMMs
Thx for answering.
Do you think this is a good aproach, or it would be better going for QX9650 series...?
My 2 QX9775 run on Intel Skulltrail board D5400XS.
Do you think is better a Skulltrail + 2 QX9775 rather than Tyan Tempest 5400 + 2 Xeon 5460?
Thanks for your opinions!
One more thing, I am seeing your signature right now, just a question: in which motherboard have you mounted your 2 Quads?
Thx a lot!
(Not that hardly anyone has even the Q9300 in stock here yet...)
/* Steinar */
The Q9550 can go to 8.5x's which doesn't sound like a big difference, but that extra 0.5 can make a noticeable impact.
Most P35,X38 or X48 motherboards will accept an FSB at around 440 before over clocking starts to become challenging. With a 9450 this means 3.520gHz, which can be achieved quite easily with a Q6600 G0. The 9550 will be at 3740gHz, which would be difficult (but not impossible) to achieve with a Q6600.
So much depends on what you are hoping to achieve, for analysis of games a Q6600 running at 3.2gHz is really probably enough for anyone. If you are looking to be competitive in either an engine room or freestyle chess, then any extra speed you can achieve can be a big help.
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill