Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / [Persistent Hash Merge File] question
1 2 Previous Next  
- - By Ozymandias (****) [es] Date 2008-07-30 22:02
When you merge two Persistent Hash files, it's said that the "merged information will be contained in the main persistent hash file, whose size will be unchanged". When you merge two files, you actually add information, unless they're identical, and if you don't delete anything from the original "main persistent hash file", the size should always be bigger after adding new information from the secondary file. Am I missing something here?
Parent - - By Banned for Life (Gold) Date 2008-07-30 22:13
If the persistent hash format is actually a hash table image, it will be some power of 2 bytes up to 2 GB in length. If this is the case, hash merging would look at each location in the table and put in the more important result. Of course its easy to come up with this more important result if one or both hash table entries is empty. Otherwise it might be chosen by giving one input table priority, or choosing the one with greater depth, or whatever other rule Vas fancies.

Regards,
Alan
Parent - - By Ozymandias (****) [es] Date 2008-07-30 22:21
That would be the case for downward propagation, but I'm talking about the other one. Besides, "this functionality does not support resizing or merging". From what I've read, upward propagation is the one that should be called "Persistent Hash". The other one it's nothing more (and nothing less) than a save/load hash.
Parent - - By Banned for Life (Gold) Date 2008-07-30 23:23
If you have two hash images (from running save hash twice) that you want to merge so that you can load the result, they would be combined in the manner I suggested and would stay the same size. But I probably don't  understand your original question.

Alan
Parent - - By Ozymandias (****) [es] Date 2008-07-30 23:32
I'm asking about the function detailed under the title "Persistent Hash", here: http://www.rybkachess.com/docs/Rybka_3_persistent_hash.htm
Parent - - By Banned for Life (Gold) Date 2008-07-30 23:51
It appears the persistent hash size can only be changed with the persistent hash size command or the persistent hash resize command. This would prevent the file from growing in an unbounded manner. This would mean that as new positions are added either through analysis or via merge operations, lower priority entries might have to be discarded if there are not enough empty slots available in the persistent hash file.

Alan
Parent - - By Ozymandias (****) [es] Date 2008-07-31 00:02
That's a thought, but it contradicts the spirit of phrases like: "for upstream propagation, we want to save everything, automatically and forever, and we want to be able to merge it with everything that other users have done", or "the persistent hash file is always kept on disk, and accessed there directly. It can be as big as the hard drive". The whole matter seems a little bit unclear.
Parent - - By Banned for Life (Gold) Date 2008-07-31 00:17
You get to set the size initially and reset it if you want, but it certainly shouldn't be left unbounded as this would result in many angry customers when the persistent hash file took over their entire disk. The described implementation seems pretty reasonable to me. As it says, you can set it for a whole disk drive if you want.

Alan
Parent - - By Ozymandias (****) [es] Date 2008-07-31 00:32
Well, that's what the windows "low on disk space" warning is for. It's also stated that "the conclusions which must be stored have a low volume", so there's no danger of growing so quickly that the program will run out of disk space if left unattended for, say, the night. This size restrictions would mean that, those of us who don't care about how much space that information requires would have to allocate large amounts of the hard drive in the event that it would be needed. Not to mention that we never would know how much of the allocated space is actually being used or just empty. We therefore wouldn't know when new information is replacing or being replaced because some other information with a higher priority is taking it's place (we wouldn't know that we actually want to reset the size because there's no way to monitor the file's "capacity").
Parent - - By Banned for Life (Gold) Date 2008-07-31 01:31
How fast the file grows will depend on the length of time the engine runs and the persistent hash write depth. It bad practice to have files that can grow to an unbounded size so its not surprising that a maximum size is established for the file. Its not clear if there will some way to tell if the file is nearly full. Maybe that will be handled by the GUI rather than the engine.

Alan
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-08-05 21:47
The persistent hash file is itself a hash table (although it's unrelated to the hash table used by the engine during a normal search). This means that you choose the size manually and data is always "hashed" to the appropriate place. When you choose a bigger size, you get fewer collisions and more data is kept.

Note that the phrase "for upstream propagation, we want to save everything, automatically and forever, and we want to be able to merge it with everything that other users have done" is a slight exaggeration. Of course, there are collisions and data is lost. It's just that the collisions (and lost data) are really tiny. A 30-second search might cause 10 entries to be stored, while a 512-mb persistent hash has space for 67,108,864 entries.

Vas
Parent - - By Vempele (Silver) [fi] Date 2008-08-05 21:55 Edited 2008-08-05 22:04

>A 30-second search might cause 10 entries to be stored, while a 512-mb persistent hash has space for 67,108,864 entries.


You're using 8-byte entries?
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-08-05 22:03
Yes.

In theory, I could open the exact format. For example, perhaps some other type of merging operation might make sense, ie. a merge which merges more than 2 files, or which prefers one file over another, etc.

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-05 22:09
A 30-second search might cause 10 entries to be stored, while a 512-mb persistent hash has space for 67,108,864 entries.

OK, hashing this data seems to be an improvement over the flat file formats I've seen previously. But I must be missing something on the write frequency aspect. Assuming you use an intelligent process for deciding whether to update the file or not (maybe greater depth), I would think you would want to have an intermediate write rate. Not so high as to cause blocking on disk accesses but high enough to store results at reasonable depths. I would think you would want to be saving maybe 1000 positions per second (preferably on a large solid state memory) rather than 10.

Will the persistent hash file store upper and lower bounds or only exact values?

Regards,
Alan
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-08-06 07:56
The persistent hash only stores exact values, because only exact values are useful from the point of view of upstream propagation. The write frequency is extremely limited, first because exact values are extremely rare (they only exist along the PV), and second because the entire table is prioritized by age rather than depth.

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-19 16:42
and second because the entire table is prioritized by age rather than depth.

That's interesting.

Alan
Parent - - By Vasik Rajlich (Silver) [pl] Date 2008-08-21 11:41
I explained this in a thread which is (IIRC) linked to from the FAQ.

The entire table is one-way associative, always-replace. This is the same as saying that it is prioritized by age.

Vas
Parent - By Vasik Rajlich (Silver) [pl] Date 2008-08-21 11:42
Actually, I realize now - this is that thread :)

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-21 12:43 Edited 2008-08-21 12:48
This is not important, but replacement policy and set associativity are completely independent. You could have a hash table that was N-way set associative purely based on age (each set would be a FIFO queue), or you could have a 1-way set associative hash table that had a very complicated replacement policy. In the FAQ, you mentioned the one-way set associativity, but not the always replace part. In any event, the savings from not having to read before replacement is obvious, so now it is both clear and understandable.

Regards,
Alan
Parent - By Vasik Rajlich (Silver) [pl] Date 2008-08-23 13:27
Yes, you're right. A two-way FIFO queue would of course be more efficient at prioritizing by age - the average table age would be lower. The cost would be more complex write operations.

Vas
Parent - - By Ozymandias (****) [es] Date 2008-08-06 00:28
When you said that the information stored was of "low volume", I never imagined that it would be so low. In fact, the only "problem" I see now, is that Rybka 4 should also make good use of these files, because it's going to take a very long time to feed them up.
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-08-06 07:58
I'm certainly planning to have Rybka 4 be backward-compatible with this format. At the very least, we should have a way to convert Rybka 3 persistent hash files to Rybka 4 persistent hash files. Anyway, this is some time away - we'll see.

Vas
Parent - - By AndrewWalker (**) [au] Date 2008-08-06 09:22 Edited 2008-08-06 09:25
The main problem would be if the evaluation changes a lot,
then it may be better to wipe the learning and start again.

Andrew
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-08-06 09:49
This can probably handled with a "depth bias", ie. a Rybka 3 entry with a depth of 15 is as valuable as a Rybka 4 entry with a depth of 13 (or something like that).

Starting again of course solves all of these issues, but at quite a cost.

Vas
Parent - - By BB (****) [au] Date 2008-07-31 04:33

>If this is the case, hash merging would look at each location in the table and put in the more important result.


You are simplifying a bit. Probably there are clusters (N-way associative, if you prefer the term, with N=4 most likely), and the merging would choose the N most useful entries that go in each cluster.
Parent - By Ozymandias (****) [es] Date 2008-07-31 09:13
If somebody from the team with some insight about this had something to say...
Parent - - By Banned for Life (Gold) Date 2008-07-31 22:39
Yes.
Parent - - By Ozymandias (****) [es] Date 2008-07-31 23:08
"Yes" to the cluster thing or "yes" to wait for somebody with answers? ;-)
Parent - - By Dragon Mist (****) [hr] Date 2008-08-01 06:17
I have another question: I would like to know how new Persistent Hash improves total search time. Is it the same as having Rybka do infinite analysis, then playing one move for both sides, with Rybka keeping most of the analysis in the hash (as you can see when playing engine-engine with ponder off), but with the difference that with Persistent Hash you can switch off PC, then after a while swith it on again, and start Rybka and you keep everything?
Parent - - By Banned for Life (Gold) Date 2008-08-01 07:36
It will really help if you want to start a search with infinite analysis at the root for a long period of time, then explore a number of options without losing the initial analysis. This is a major problem in R2.3.2a and using preserve analysis introduces a whose different set of problems.

Alan
Parent - - By Dragon Mist (****) [hr] Date 2008-08-01 08:24
Sorry, not sure that I quite understood this. I thought that the existing Preserve Analysis serves exactly for this purpose, keeping most of the analysis, while wondering aroung left and right.
Parent - - By Banned for Life (Gold) Date 2008-08-01 08:27
Preserve analysis keeps the deeper depth stuff without regard to how old it is. You can only do this for so long before it your available hash, and performance, goes down the tubes.

Alan
Parent - - By Dragon Mist (****) [hr] Date 2008-08-01 09:54
Hm. Does this mean the following: (with Preserve Analysis) during infinite, Rybka considers as best 22. Bb3 Rc8 23. a4 etc. This is analysed throu depth 22. Black however plays 22... Kh7. Now, Rybka pulls out of hash what it planned after 22...Kh7, but part of the hash is still occupied with entries concerning 22...Rc8?
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-08-05 22:00
Right, high-depth entries for 22. .. Rc8 will be preserved and make life more difficult for newer entries.

When you work with preserve analysis, it's best to clear the hash every time you move on to a new position.

Vas
Parent - - By Dragon Mist (****) [hr] Date 2008-08-06 09:12
And Persistent hash in this particular situation does nothing better, except it allows you to switch your PC between the sessions, and start again?
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-08-06 09:51
Preserve Analysis actually locks up entries in your hash table and decreases performance on new positions.

Persistent Hash is a separate mechanism, without this particular side-effect.

Vas
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-08-05 21:59

> It will really help if you want to start a search with infinite analysis at the root for a long period of time, then explore a number of options without losing the initial analysis. This is a major problem in R2.3.2a and using preserve analysis introduces a whose different set of problems.


This is actually the scenario for invoking Save Hash.

The persistent hash is intended for remembering the analysis higher up in the tree.

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-01 07:33
Yes to BB's observation that the best N elements will be kept in the merged persistent hash file. No doubt Vas will have more to say about this after the release, but nothing discussed to this point has really been unclear.

Alan
Parent - - By Ozymandias (****) [es] Date 2008-08-01 09:53
It's unclear to me, at this point, if the file size will really remain unchanged even when adding information from another one. Can you confirm this 100%?
If that's so, what remains to be seen is the availability of some tool that monitors the file's "free space" for manual resizing, so more data can fit in.
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-08-05 21:49
The persistent hash size is always chosen by the user and is 100% independent of the actual contents of the file.

Vas
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-08-05 21:57
Actually, the persistent hash is implemented with a one-way-associative (or maybe non-associative is a better word) always-replace scheme.

This might sound a little bit bizarre, but I put a lot of thought into it and convinced myself that it is the best way.

Keep in mind three things:

1) Only a very small handful of PV entries are actually written to this file.
2) During persistent hash merging, entries are preferred by depth.
3) It is only via persistent hash merging that really high-quality high-volume information will be put together. No single computer is that productive.

It actually makes some sense for a user to back up his own persistent hash every few weeks and then merge with the previously-backed-up hash. This may give a better balance between depth and age.

In practice, it's probably not necessary for most users to dig into this too much. Simply enabling the persistent hash and merging it with everything they can get their hands on will give them 95% of the maximum performance.

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-05 23:47
Yes, it does sound bizarre! I can think of no other field where associative memory, without significant additional cost in extra storage and/or latency, is considered a detraction.

Alan
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-08-06 08:03
Several points:

1) Age is not stored in the hash table (and in fact there is no good way to store it there), so when you increase the associativity, you necessarily shift the focus from age to depth.
2) Users can always manually shift focus from age to depth by doing their own merges.
3) Increased associativity typically has the same effect as an increased hash size, which in this case doesn't seem too important.
4) No associativity is quite a bit cleaner - there are no disk reads before disk writes.

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-19 16:48
4) No associativity is quite a bit cleaner - there are no disk reads before disk writes.

OK, so you are always placing the most recent values above the write threshold into PH regardless of what is already there? I was assuming it was prioritized by depth. This makes things easier to understand from an implementation standpoint, but makes the default values appear really bizarre. For this type of replacement scheme, I would go with a fairly high write threshold, probably 14 or 15, rather than 10...

Regards,
Alan
Parent - - By Vasik Rajlich (Silver) [pl] Date 2008-08-21 11:44

> OK, so you are always placing the most recent values above the write threshold into PH regardless of what is already there?


Yes.

> For this type of replacement scheme, I would go with a fairly high write threshold, probably 14 or 15, rather than 10...


The key point to understand here is that because only PV nodes are written into the table, there is a linear (rather than geometric) relationship between the write depth threshold and the number of entries which are written.

Vas
Parent - - By Banned for Life (Gold) Date 2008-08-21 13:20
The key point to understand here is that because only PV nodes are written into the table, there is a linear (rather than geometric) relationship between the write depth threshold and the number of entries which are written.

I'm not sure I understand this. As an example, lets say we have a branching factor of 2, we reach depth 13 in one second and have two cases, one where the write depth is set to 13, and another where it is set to 15.

For the case where write depth equals 13, new entries are made at 1 second, 2 seconds, 4 seconds, etc.
For the case where write depth equals 15, new entries are made at 4 seconds, 8 seconds, 16 seconds, etc.

Example PH writes as a function of write depth and time (assuming no change in PV):

Time  Write     Write
(sec)  Depth    Depth
          13         15
0.5     0           0
1        1           0
2        2           0
4        3           1
8        4           2
16      5           3
32      6           4
64      7           5

This function shows a geometric progression, but with a factor less than one.

Separate question: Are the persistent hash entries used in the search or only at the root? I've been assuming the latter.

Regards.
Alan
Parent - - By Vempele (Silver) [fi] Date 2008-08-21 13:23

>Separate question: Are the persistent hash entries used in the search or only at the root? I've been assuming the latter.


PV nodes is my guess.
Parent - - By Banned for Life (Gold) Date 2008-08-21 13:32
OK, that would make sense!

Alan
Parent - - By Vasik Rajlich (Silver) [pl] Date 2008-08-23 13:36
Actually, the entries are probed everywhere (given a high-enough depth). Otherwise, you'd miss important upward-propagation effects.

Take a tree like this:

A
|\
| \
B  C

Let's say that without any stored data, B would be the main line for a long time, but after a long search C proves crushing.

We'd like to be able to analyze C manually, prove that it's crushing, close Rybka, reload, and instantly get correct analysis of A.

Vas
Parent - - By Uly (Gold) [mx] Date 2008-08-23 13:49

> close Rybka, reload


I wasn't aware that we needed to close and reload Rybka.
Up Topic Rybka Support & Discussion / Rybka Discussion / [Persistent Hash Merge File] question
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill