The following reference is obviously dated but I still think it has merit.
Allocation of memory in Transposition tables
In general, If you are doing engine versus engine matches on one machine, the total memory allocated to both programs should be half that of your system's total memory.
I must add that this advice is limited only to people with low [say below 256 MB] amounts of RAM. Given that the amount of memory resources needed by Windows is somewhat fixed ,if you have large amounts of RAM, you do not need to follow the "50% rule" above. For example, Andreas Schwartmann's 512 Mb Ram machine can support a total of 420Mb for the tournament without any problems. [Thanks to Andreas Schwartmann and Mogens Larsen for pointing this out.]
How much RAM you should allocate also depends on the time controls used. At lightning and Blitz time controls, large amounts of RAM [64mb and up?] allocated to transposition tables do not help improve the playing strength and may hurt the engine instead. Large Hash tables are only useful, at long time controls, when the transposition tables are filled up.
In the interest of fair competition you should allocate the same amount of memory to both engines. However depending on the program, you might not have full control for the allocation process. Some programs [e.g. Francesa] use only one fixed hash upon compiling and cannot be adjusted at all. While others might allow almost any finite allocation of memory. Lastly, some programs like Crafty lie in between, by allowing you to adjust the size of Hash Tables but in discrete increments.
Therefore it may not be possible to be totally "fair" in the allocation of hash for engine versus engine matches on the same machine.
Another question to consider is when allocating memory for engines that use endgame Tablebases and those that don't. Should the total memory allocated to each engine include those allocated to the endgame caches?
OpenChess - Topic: Ideal hash size for short time controls
" ...Bigger hash is definitely bad, because the larger your program, the more physical pages of RAM you use, and the more TLB entries you need to avoid "walking the tables" to convert virtual to physical addresses. If you blow the TLB, you can (on 64 bit hardware) add 4 extra memory accesses to each primary memory address because of the virtual-to-physical mapping that must be done. So more RAM = more TLB entries needed, and when you run out, memory access time goes _way_ up."
For infinite analysis or long time control matches you should use the largest hash that fits in the physical memory of your system. For example, on a system with 4 GB of memory you can use up to 2048 MB hash size.
For shorter games, for example 3 to 5 minute games, it’s better to use 256 MB or 512 MB as this will provide the best performance. For 16 min games 1024 or 2048 MB hash size should be fine.
If someone has a difference of opinion on this -I'd like to hear it. Personally, I don't put too much stock in the limited amount of blitz games that I run. And if I present a blitz game, one-on-one , I would not suggest it had any real merit beyond its entertainment value.
With that said, I do think running a plethora of both short and/or long time controlled games can give insights into both the strength and weaknesses of a chess engines.
Using Houdini 3's autotune facility it indicated 2Gb of hash would be full within 80 seconds on my quads, similar to what Houdini 3 shows in the Shredder GUI. 2Gb takes just 25 seconds to fill on the 16-core so the general rule of lower hash for faster time controls may have been appropriate for hardware from 10 years or more ago but seems out of line to what happens in practice with today's modern hardware and engine combinations.
Rybka has always been a bit unpredictable with hash changes. You'd think giving more hash would help for deeper analysis but overall, solutions were being found faster with smaller hash and hence my findings that 512Mb gave best results up to a maximum of 2Gb but as I am always quick to point out, this may vary between hardware set ups so people should check for themselves. There are always likely to be exceptions. Perhaps my machines have been the exception but results are based on earlier AMD 4800 dual core CPU's and my current O/C'd Intel Q9550 CPU's.
Subsequent to serious resource conflicts affecting certain engines' performances when playing matches with two engines on one machine, even with ponder and LP switched off, I only run matches using two same specification machines so until I can get another dual E5-2687w machine to match my current unit, matches are still restricted to my pair of quads. If anyone checks Zappa Mexico II's games in the recent Graham Banks "Duck for Cover" tournament the first game against Exchess in Round 4 is a good example. Zappa slows very quickly once out of book. 5m:10s for Move 14.Rf2 giving just 16 ply. Something already seriously going wrong by this move when I ran through the moves on one of my machines. The speed ratio compared to my machine is way out of line compared with the first few moves out of book. So basically I have little faith in results from one machine and if these results go into the rating system then some skewing of Elo is almost certain.
Some earlier engines gave best performance settings with what initially seemed odd hash values e.g. Shredder 7 and Fritz 8 at 160Mb but given back then a top machine would likely have just 256Mb running Windows 98 then it makes sense. Until the MP engines, HIARCS seemed best with 64Mb hash.
So don't take what the "experts" say for granted, conduct your own tests and stick with your own findings.
I don't know if someone has more latitude in applying hash in short time controlled games -when using high-end machines (16 cores or more) against someone who is an average user with a 4 cpu or at best a 6 core machine. In long time controlled games you want as much as you can afford.
So what you are saying is that this is a recent phenomenon due in part to the advancement in chess engine strength?
What I am saying is that there has been a long standing preconception of how hash is used in each engine as though the influence of hash on all engines was the same but as I have found in actual tests, practice and theory can be very different. It came as quite a shock when testing Rybka 2.3.2a on one core when it was finding solutions to problems in say 210s with 512 Mb hash and the anticipated speed up did not happen using 1Gb or 2Gb hash even to the point where the higher hash value did not solve at all after 30 minutes. There were positions to the contrary where higher hash values solved quickly and 512 Mb did not solve. It became clear the way Rybka was using hash gave very unpredictable results. With one core it was certainly deterministic but change the hash value and it would seem like a different engine. Overall I found 512 Mb hash gave best performance for Rybka 2.3.2a and 3 on 1 through to 4 cores. Initial tests suggest it may be the same for Deep Rybka 4.1.
In my last serious tests back in the Rybka 3 period I found there three basic groups:
1. Engines where in mid game positions different hash values made insignificant differences.
2. Engines that seemed to perform significantly better with a small hash value and
3. Rybka that exibited completely different output when hash was changed.
As you can appreciate to thoroughly test this it is very time consuming but having done so a while back, I do not assume the the preconception for hash allocation is right for every engine. Because of MP engine variability it takes much longer to obtain clear results than with SP engines. Today, even the HIARCS author recommends there are likely benefits from allocating large hash for short time controls and I have found nothing to contradict that!
Houdini 3 Pro gave me hope when Robert Houdart wrote,
"For infinite analysis or long time control matches you should use the largest hash that fits in the physical memory of your system."
But my few tests (using Windows 7 Pro with "large memory pages" on an i7-3930K@4.3GHz machine with 16GB) gave discouraging results: using 4GB of hash gave a better, faster solution than did using 8GB of hash. [By the way, using "large memory pages" with Houdini 3 Pro performs just as well as Houdart claims, if not better.]
I think that Bob Hyatt's comment about overflowing the TLB may apply to the Rybka and Zappa (and perhaps Houdini) results. However, it is also possible that as you increase the size of the transposition table, you may be relying more and more on table entries that have some level of "corruption." I don't exactly know the source of this "corruption" (pruning?, cutoffs?), but when I worked with Crafty about 8 years ago, I found this "corruption" very frustrating. In most cases, I found it better to ignore the score in the transposition table, make the stored move, and continue analyzing at ply+1. It would be interesting if Houdart would publish more data regarding hash sizes.
I think I saw nTCEC using 8GB per engine in it's matches. I'd have liked to see less.
> I think I saw nTCEC using 8GB per engine in it's matches.
Why WOULD you even think of using 8GB per engine?
It's like going to the grocery store with a $5000 note just to buy $256 worth of groceries or at best $512 worth.
What cashier is going to have that kind of cash flow in their register drawer to break the note? She'll need (or, he'll need) to get the manager-the manager is going to look at you like you're some kind of an idiot or- counterfeiter. He won't even fool with it and tell you to go to the bank and have them break it. Meanwhile -some poor clerk has to put all your crap back on the shelf thinking that you're not coming back believing for real that the ink just dried on that note.
The only reason I can imagine is so that you can have bragging rights, " For REAL dude, I used 8gigs of hash per engine-word, man!"
" I've never heard of uh gigs of hash, man! Is that more than a kilo?"
From nTCEC info page.
Each engine is allowed to use up to 8192 MB of hash. Not all engines supports this much hash, so the maximum for that engine will be used in this case, typically 1024 MB or 2048 MB
So, again, it isn't because Martin Thoresen has 16 cores that he decides arbitrarily to use 8gigs. Or because it is a long time controlled match.
>So, again, it isn't because Martin Thoresen has 16 cores that he decides arbitrarily to use 8gigs
What are you talking about? Go back to your original statement.
>Well, nTCEC did use 16 cores and a very long time control. [that is the gist of your entire statement and that...] The 8 GiB could have been beneficial, considering that.[may be, may be not!]
Just because he has 16 cores and uses long time controls - doesn't mean he has to necessarily use 8 gigs of hash or, that is it is any more beneficial than using, say 4096.
I am not sure much science had gone into Martin's decision other than what the author has allowed in usage. For analysis purposes I think large hash can be very useful. In engine vs engine games I wonder.
What I suppose it comes down to is that there IS no real hard evidence that large hash in SHORT TIME CONTROLLED games is a benefit. I don't want to get into a pissing match over LONG TIME CONTROLLED HASH there I think it is less relevant.
> For analysis purposes I think large hash can be very useful. In engine vs engine games I wonder.
> Today, even the HIARCS author recommends there are likely benefits from allocating large hash for short time controls and I have found nothing to contradict that!
Okay -but, HIARCS hash is defaulted at 128-
Stockfish 3JA 64bit SSE4.2 is defaulted at 32-
Critter 1.6a 64-bit is also defaulted at 32 -
I think it is a bit arbitrary to assume that when an author/developer recommends that there are "likely" benefits (assuming these are an authors words and not your own) from allocating large hash for short time controls that what they might more accurately be implying is that the term "large" could very well mean anything between 256 and 512 -given the default in most cases is at best 128. I have yet to see an engine set at a higher default than 128-but, I would love for someone to point me to one that is defaulted at 256 or higher!
I think the key here is not just installing some arbitrary large cache of hash - because you bought and installed 32 gigs and now want to impress everyone with the magnitude of your hardware! ( I'm not referencing you here Peter! Apparently, you're one of the few who actually will do the work of experimenting with databases of accumulated games to see the resulting effects.).
> (assuming these are an authors words and not your own)
I quote this extract from the Deep HIARCS 14 engine install author's read me file ...
"You are recommended to use 50% of your machines physical RAM for Hash tables. Please check that your hash table
size setting does not cause disk activity. If so HIARCS will run SIGNIFICANTLY SLOWER - in this case please decrease the hash table size and try again.
HIARCS can benefit from large hash tables even in blitz time controls." My highlights HIARCS is currently limited to 2Gb hash as I understand it.
> you're one of the few who actually will do the work of experimenting with databases of accumulated games to see the resulting effects
For me that has always been the key to obtaining best hash settings; experiment with different values for each engine and measure the differences. If other people test their machines and get different results then that is fine because that is what I advocate and it always seemed odd to me that people would spend big bucks on fast machines and possibly slow the engines by using sub-optimal settings. Some prefer to follow other's advice and so I post my findings from time to time.
I think testing the engines on the 16-core is going to be delayed a little while as the weather forecast shows no let up in the high temperatures.
I have only found Arena 3 GUI that allows hash settings greater than 2 Gb. Fritz and Shredder GUI's showing less than 4Gb available that means maximum hash for many engines is 2 Gb. As I am able to complete different tests I will update if I find any significant implications of differing hash values. I believe the FRITZ GUI is the only one that allows a database to replace book in Autoplayer matches using two machines. Shredder does not and I cannot see it in Arena so it looks like a 2Gb hash limit for tests despite the 32 Gb available on the machine.
If anyone has the time to test out six core machines it would be interesting to see some postings of hash impact findings to know whether the behaviour changes or is consistent with 2^n threads/cores.
Below is an example where not only the hash but the GUI impact on the Rybka 3 single core performance for the original Rebel test 10 position test set is plain to see. Chessbase GUI 11 implemented sampled search whereas GUI 9 did not so that seemed to be the influencing factor because when repeated, the same timing results were obtained. Repeated for many test sets and matches highlights the extent of the work that goes into testing these effects so my postings are not based on perceptions they are actual results.
I don't remember if Rybka 3 supported large pages? If so, were you using them? Without use of large pages, there would certainly be a smaller optimal hash value due to TLB thrashing with larger hashes of 4KB pages. This should not be an issue with a good large page implementation (but we don't know if this is the case either).
It is a pity you have not undertaken any research before making your post because if you had you would have known there is no parameter switch or any indication in Windows that Rybka 2.3.2a or Rybka 3 uses Large Paqes whereas there is a switch in Deep Rybka 4 and 4.1 to turn LP on although it is not set to be on as default. For that matter, which engines before Deep Rybka 4 supported Large Pages? Well exactly! Therefore I discard your criticisms as flippant and throwaway.
Admonishment over. Disappointingly below your usual standard Alan!
For that matter, which engines before Deep Rybka 4 supported Large Pages?
Third party memory managers can be used to support large pages (true for quite some time now). These are very useful with engines that don't support Large Pages (like Houdini 3 standard), but not for R3 and R4 (because Vas didn't use malloc calls, but instead relied on Windows functions which aren't intercepted by the memory manager.
I use R4.1 and am aware of the switch. I didn't recall R3 using large pages.
I still consider the results to be statistical noise though...
> I still consider the results to be statistical noise though...
I would likely agree with you if those spreadsheet results were the only basis for my findings. I was surprised I still had that following a big clear out last year. BT2630, BS2830, Nolot, Richter, WM100 and my own 100 position test set were used to establish any trends. With some changes over time if positions became too easy for engines to solve such as the majority of the Richter test set for example.
Having established a trend from these position tests the question then was could it be reproduced in engine versus engine matches using database openings not books? It was and furthermore as far as Rybka engines were concerned the MP engine followed the same pattern as the SP engine, contrary to many peoples' belief of the rule; double the cores, double the hash. Pleasing then when I explained my MP test method to average out the MP variability and Vas posted his endorsement of the method when I was getting the usual statistical barrage from others.
Statistics give the error range for the probability and all that can be said is that the results will likely fall within that range. Unfortunately people ignore the fact that the results may be spot on, instead assuming the results are always at the extremities of the error range. The reason I posted that table having found them was because at the time as far as I recall those specific test positions were very good as a starting indicator as to what was likely to be the final outcome, perhaps the reason why I retained them.
Statistics give the error range for the probability and all that can be said is that the results will likely fall within that range. Unfortunately people ignore the fact that the results may be spot on, instead assuming the results are always at the extremities of the error range.
This is definitely not the case here. Given the very small number of positions, and the large variance of the metric that was used, even if Vas' hypothesis was correct, the hash positions would be properly ordered only a very tiny percentage of the time. In fact, if one uses rank order techniques (which are almost certainly a better fit for your data set), one comes up with a different ranking of the different hash sizes.
In any event, R3 is the one engine that I wouldn't have any interest in for this metric, because it is the only engine that I care about that doesn't allow use of large pages...
Having been in Industrial Measurement and Control Engineering for over 40 years my view on statistics is likely different from yours for I am definitely of the "there are lies, damned lies and statitistics" ilk. Statistical predictions are based on what is past and can be pretty lousy for predicting future outcomes. I just don't view statistics with your faith in them. It seemed to be the case that statisticians could show why what just happened should not have happened but failed to point out that the unpredicted quality issue happened because some important criteria was missed out of the evaluation. Reminds me of the guy who tried to show he could bring a process back into control using six sigma and was dumbfounded when I explained to him it was impossible without a major upgrade because the furnace in the process was being continuously run flat out loaded at 40% above its design capability so it was out of control for most of its operating time. If essential information is missing the statistics are useless.
Statistics have value for statisticians the rest of us rely on practical results. If you use statistics you can come up with whatever you want to show but practical results are what you get and if they differ from the statitistical evaluation prediction then revisit the criteria used for the statistical evaluation because one thing is for sure, if all else stays the same, the practical results are not going to change.
You say a small number of positions. When conducting the tests for selected engines when MP mode was tested in each position and run 100 times at 240s cut-off per position to average out the variability then I can tell you they were very time consuming and did not seem a small number. WM100 and my 100 test positions equated to 20,000 positions for each hash possibility. I recall a week to a fortnight to complete a single hash value test even with the benefit of the FRITZ GUI early cut-off when solution found. If I followed your requirement it is likely I'd be pushing up the daisies before being able to present acceptable results for you. It took long enough back then using my 4 machines together with help from other enthusiasts. What is the point of testing thousands of positions through a lengthy process when a quickly discernable trend is repeated? It just wates time. If you want accuracy to the nth degree that is fine but a quick pointer to best set up is what most people look for.
I went a bit further than that but found what I was looking for.
> If an engine has large pages support, I'd be more likely to use bigger hash
I agree Ray because that is what I tend to do these days not having had the time nor the inclintaion to run the necessary tests for each new engine release. I have started to run some tests with Deep Rybka 4.1 if only out of interest to see if the engine /hash relationship changed much from Rybka 3 and if the MP engine followed the same trend using 1 thread.
Generally though, the amount of hash used may not show so much benefit in analysis but results did seem to indicate getting it right made some difference in 3 to 5 minute blitz and 40/5 repeating time controls. It should be remembered that the position tests where there is a single best move were used to try and identify under what conditions the engine was more likely to find the best move. Where a position is quiet hash may not make much difference.
From what I remember of the Rybka 2.3.2a and Rybka 3 MP tests, given the degree of variability in the primary search depth to find the same result, the key seemed to be the secondary or extended search depth the engine ran to at the particular primary search depth. Unfortunately, Rybka engines do not show secondary search so that was a guess based on what was seen with other MP engines.
This is absolutely correct.
I too have found that larger the Hash, better and faster the Depth reached in a given amount of time, even if the kN/s drops a bit... better results with more hash on my 6 core machine.
By far the easiest way to determine optimal size is to test. Only problem is computational requirements. One can also just pick a few positions, opening-middlegame-endgame, run them to fixed depth where the search takes about as long as the games will take per move, and then adjust the hash size to get optimal time to depth. That's a lot easier, computationally, but a little less accurate...
Some one running a 1 minute 3 second match can jack up whatever the limits are the author has established -by way of hash (4 cpu machine and 6 at best!). I find this really hard to believe- but, I'll take your word for it if you say so. I guess things have really changed.
(1) larger TT means fewer overwrites, which avoids losing useful information. Search is more efficient.
(2) larger TT means more physical pages of RAM. More pages mean that either a virtual-to-real translation is done via a TLB hit, which is cheap, or via walking through 4 separate page tables (very expensive).
BIG pages has two good effects. First, you have fewer pages, reducing stress on the TLB. Second, the number of page tables is reduced, which helps even when the TLB doesn't have very many entries for big page (TLBs generally have a bunch of small page entries, a much smaller number of big page entries, and they can't be interchanged).
So a table size that is significantly too large won't offer any benefit since overwriting will not be a problem, but it might slow down memory access times due to the increased overhead caused by the excessive TLB misses.
Hence my suggestion to test, because these effects are difficult to predict, particularly when every new Intel/AMD processor has a different number of TLB entries
> I wouldn't say "doesn't hurt"; More like "shouldn't hurt".
Shouldn't is a Big word!
Wait! No "Big pages" for -
( That is "large pages" support for the following = none NADA! )
Critter 1.6a 64-bit
Deep HIARCS 14
Deep Junior 13
Komodo's -in plural
Stockfish 3 JA 64bit SSE4.2
(Whoops almost forgot)
Naum - Aleksandar Naumov hasn't development the engine any further than 4.2 .
Zappa Mexico II
Those that do -you have your head up your butt about. So, why are you even discussing Big pages (large pages for the uninitiated)!
Ippolit chess family engines et al.
plus the devil engines-
As far as the name goes, they are most commonly called "huge pages". As to who uses 'em, I'd assume that will vary over time.
The discussion was about the effect, not who benefits or not due to lack of programming code...
Huge pages is a win-win idea, excepting that it increases the difficulty for the operating system. Today one reserves N huge pages early before the big pages get broken as small pages inside them are allocated. One day, this won't be the case, the O/S can, with some programming effort, recover broken huge pages by moving the internal fragments to other available small pages and restoring the huge page to a contiguous block of memory with no holes...
Then the O/S can discover, by running the program, whether huge pages would be better, or if it should stick with the easier-to-deal-with regular 4K pages...
>As far as the name goes, they are most commonly called "huge pages".
In Linux they are called "huge pages". In Windows they are called "large pages".
> Wait! No "Big pages" for -
> ( That is "large pages" support for the following = none NADA! )
> Critter 1.6a 64-bit
Critter 1.6a uses Large pages by default. There is no switch in the engine parameters, it will use them if set in the operating system same as Houdini 3.
But where did you come by that information? Per Houdini -you are referencing Houdini 3 Pro are you not?
- Automatic detection of SEE4.1 capable CPU. Both SSE4/non-SSE4 codepaths are now compiled in a single executable and chosen dynamically at runtime.
- Large pages suppport. This requires sufficient user privileges and right OS settings.
- Own book is automatically turned off when analysing.
- Engine now honors multi-pv even when the root position is a tablebase hit
- Session file now supports IDEA. SF writes are disabled at root when any moves were excluded.
- Session file writes are now protected by OS IPC mechanisms (semaphores/mutexes). Same session file can be accessed concurrenlty by multiple engine isntances without risking corruption (useful when running IDEA)
- Removed UCI option "SF strategy", now its value is hardwired to "depth"
- New UCI option "SF move limit" which disables learning after game has reached given number of moves. When set to 0 this option is ignored.
- New UCI option "SF material limit" (0..32). Disables learning when total amount of material is less than the given amount [Q=6, R=3, B&N=1]. When set to 0 this option is ignored.
- Gameplay related changes:
* Tweaked blocked-pawn recognizer
* Skewer detection in eval
* King safety tweaks
* Pseudo-contempt: slightly increased preference of moves increasing pressure on the opponents king
* slightly increased preference of pawn pushes in semi-blocked positions
* Recognizing more types of drawish endgames
- new console mode commands "sf probe", "sf delete", "sf store" for mainpulating session file entries
> and then adjust the hash size to get optimal time to depth
Not easy at all, because of SMP randomness!... and besides, you will get different results for the different positions.
But playing matches with different hash sizes will still be valid...
BTW changing the hash size will change the score about as often as a parallel search produces a different score. Producing a different move is pretty rare for changing either TT size or number of processors however...
Playing enough games is a difficult, but tractable problem. It's difficult because the games should be at a long enough time control to ensure significant exercise of the hash replacement algorithm. It's strength is that the results are unimpeachable, i.e. if two engines, identical in every way, but with one having twice the hash of the other, are matched up over many, many games, a valid result will be obtained. In short, it is an unbiased estimator. The fact that doubling the hash size will result in only a small improvement, means lots and lots of games, but also improves the accuracy of the result.
You can pick enough positions and run them multiple times and still be able to complete the test in a reasonable time.
True, you can get valid results for a number of positions and complete a test in reasonable time, but each position will have a different result than the underlying population of positions, so you will obtain a biased estimate, and the bias may be much larger than what you are trying to measure. So you end up with an accurate estimate of a biased estimator. Not good! Take for example, the table of 10 positions selected by Peter Grayson above. Looking at this chart, it is clear that the results are driven by a few positions that take much longer at some hash sizes than others (this dependence on a small number of positions would be even larger if he hadn't arbitrarily stopped processing at 240 seconds). If we accept Peter's claim that the results are reasonably repeatable, we have verified that a very large number of positions will be required to come up with valid results. We could estimate this number of positions by repeatedly doubling it until the results are stable, but it isn't clear whether the resulting processing requirements for valid results will be less than for the games approach...
Given clusters such as what I use, the games approach becomes manageable problem, but most don't have those kinds of computing resources available, and are left with doing what is actually practical given limited computing power.
Based on the argument above, I don't believe there is any shortcut available.
- Playing games provides an estimate with significant variance, but zero bias. A large number of games must be played to reduce the variance.
- Looking at a set of positions provides an estimate with much less variance, but significant bias. A large number of positions must be played to reduce the bias, and the set of positions must be representative of what would occur in games.
Pick your poison. I would go for the games every time. At least you can do a reasonably good job estimating the variance, and even if you can't pinpoint the exact advantage of a larger hash size, you can bound it...
I personally ALWAYS pick games. Unless time constraints dictate otherwise. For about 99.9% of the people on the planet, positions is all that is tractable computationally. If you can play 4 games a day on one box, you need 30K games or so to get the error bar to +/-4, which is the kind of change you might see going from 4 to 8 gigs, for example.
30K games =20 years or so. That I call intractable.
I personally believe that if 300 positions says 8gb is no better than 4gb in terms of speedup, that is likely to match game measurements VERY closely. And choosing one over the other is not going to be a serious performance killer, the difference will REALLY be tiny. I personally would never refuse any valid speedup, however, but I would not accept any slowdown at all, since we are not talking something that makes a qualitative difference, just affects the search depth/time.
OK, but you think 300 positions would be OK? They certainly wouldn't be. In the example above from Peter Grayson, on Position One, Rybka 3 without selective search (i.e. GUI 9) takes 42 times longer to solve a problem with a 256 MB hash size than it does with a 4 MB hash size. This difference would be even larger if Peter hadn't arbitrarily assigned a maximum value of 240 seconds. This error is much, much too large to swamp out with 299 other positions. This might lead you to believe, as Peter's table indicates, that R9 without selective search performs better with 4MB of hash than with 256MB. This is almost certainly a fallacy. And with this ratio of hash sizes, the fallacy would likely be exposed in a reasonable number of match games.
This is the universal problem of trying to estimate underlying statistics from badly biased estimates. You end up with precision nonsense.
There is another inherent fallacy in using mean estimates from solution of positions relating to the importance of the variance of the estimate in game play. If one modeled two engines, and each generated the same moves with the same mean time per move, but one engine had twice the variance in coming up with the moves, that engine would lose (badly) to the engine with less variance in solution time. This affect is taken into account when games are played, but not when using mean solution times from a set of positions.
> This difference would be even larger if Peter hadn't arbitrarily assigned a maximum value of 240 seconds
Can't disagree but when the SMP tests were still to be done the 240s looked sensible when each position was to be tested 100 times for each hash setting. I had considered doubling the failure penalty to 480s but as the primary criteria was correct solved positions there seemed no point. Even with assistance this was a massive test to take on in terms of time and resources and I actually regretted not choosing a shorter time limit some way into the testing. It is easy to criticise but decisions have to be taken at the ouset and even then the tests time went way beyond what I had anticipated. One drawback now testing Deep Rybka 4.1 is that much of the benefit of the early cut off in the Fritz GUI is being lost because of Deep Rybka's reiterative search at the same ply depth. So even having got the right move in maybe 20 seconds say, it can still take the full 4 minutes before moving on to the next position. Already beginning to regret starting this no matter what the outcome!
Unfortunately Rybka engines only show one search depth value but still followed the SMP trend of other engines where out of 100 test runs for the same position, the indicated ply depth to solve varied as much as 3 or 4 ply per position at the 240s cut off. From what was seen with other engines, the key was likely the secondary search depth achieved at the particular primary search depth. From early SMP tests with Shredder 7 ( yes the standard Chessbase release was SMP) and Deep Fritz 8, changing the hash sometimes caused a position search to take longer because it extended the secondary search by some considerable depth. This may have given a more precise evaluation and I guess that is one further drawback of the correct move cut-off that is it does not record evaluation but I took the view that finding the correct move was the principle objective of the exercise.
From my own experience and from feedback from those who helped with the tests, getting the hash optimised had some impact at faster time controls, as much as 50 Elo according to one but around 30 Elo from my own experience. I never tested at 40/120. From my perspective, once I had a pointer to the specific question there was no further need to test for the long time controls.
Postulation on statistics apart, it is not clear to me what model could be used when different engines seemed to have different reactions to hash changes. Particularly with the SMP engine variability if the averaging of the 100 same positions tests had not resulted in a similar trend to the SP performance early on then I would likely not have completed the tests to confirm the same trend could be seen with other test sets. Without averaging the SMP results the results looked chaotic, perhaps random so it was fascinating to see that averaging showed a pattern in the chaos.
Another observation on my quad machines was that Rybka 3 in GUI11 that supported sampled search was overall faster than in GUI 9 that did not support sampled search but only up to approximately 2 minutes. After that GUI 9 was faster. This conflicted with what I had anticipated because updating of the sampled search information window must have taken resources away from the engine even if only marginally.
I have started testing Deep Rybka 4.1 with large Pages in a similar fashion on one of my quads. The first thing to establish is if the SMP results give a similar averaged trend to the SP results. While the current heat wave persists then opportunity to test is limited so unless someone else living in colder climes offers to take this on then it will be a little while before the first test set results are completed.
Before the topic gets too remote from the original discussion the question was around "is more hash better or worse?" and where Rybka engines are concerned I believe I can accurately predict ... "it is when it is and it isn't when it isn't".
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill