Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / R.V. about Rybka 1.0 beta vs Fruit 2.1
1 2 Previous Next  
- - By Richard Vida (**) Date 2013-08-11 00:31 Edited 2013-08-11 00:34 Upvotes 2
First, I apologize for opening a new thread, but the one where this post would apply is already cluttered beyond repair ;)

I got several PMs and a couple of emails asking if I could revise the "correctness" of the RE job done on Rybka1.0beta. While I think the RE itself (as done by Zach W.) is fine, double-checking never hurts. But I very much doubt I could find some serious flaws in it. I sincerely think that the RE is correct, and all the problems (and all flamewars on this forum) are caused by _interpretation_ of the results.

Please give me 4-5 days to make annotations and I will be able to answer any questions regarding R10b. And please... only _technical_ questions - I am not here to judge.
If someone will try to push me to WII or WIG camp - I will go away.

For reference - I downloaded the first google match on "Rybka 1.0 beta": http://www.chess.com/download/view/rybka-beta-10
Any further reference by me will be to this particular binary.

What can I tell for sure right now (after 15 mins):
- The rotated bitboard code from Crafty (of that era anyway) is _not_ there. R10b uses more compact occupancy lookup tables - 64*64 arrays as opposed to Crafty's 256*64 arrays. Trick is in discarding the edge squares where it does not matter whether they are occupied or not. The cost is an extra ALU operation.
- I read that there is some controversy about the 0.0 constant in time allocation code. I missed some discussions about it, but I can assure anyone with doubts that there is indeed an unusual comparison between an integer value and a floating-point constant (0.0) in the binary (I can cite the relevant disassembly lines if wanted). I wonder.. did I miss something?
- Overall, Rybka1.0beta is - by todays standards - a very simple engine. I really doubt there is anything else to be found, but we shall see...

Recently an interesting discussion aroused about a specific position where both F21 & R10b failed in a very similar way. I see that Robert Hyatt did some debugging with Fruit and found the culprit - I am now really curious if I can found the same bug in Rybka...

I will update this thread as I progress with RE... Meanwhile feel free to ask any _technical_ questions. But again, I do this voluntarily and in my free time, so please don't ask me whether I think Vas is guilty of code copying or not. It is not my duty to judge.

Richard
Parent - By Rebel (****) Date 2013-08-11 00:43
Good to see you here Richard.
Parent - - By marcelk (***) Date 2013-08-11 01:27 Edited 2013-08-11 12:48
The crafty-19.7beta sources that Hyatt sent me, dated January 18, 2004, have these edge-discarding 64x64 tables. They also appear 'officially' in crafty 19.9 for the first time a few days later on January 23, 2004. It is documented in main.c as follows:

                         changes to non-COMPACT_ATTACKS code to shrink the    *
*           array sizes.  this was done primarily for the Opteron where the   *
*           COMPACT_ATTACKS stuff was a bit slower than the direct look-up    *
*           approach.  this change reduced the table sizes by 75%.


EDIT: fixed typo (16.7beta --> 19.7beta). Thanks Ed.
Parent - By Rebel (****) Date 2013-08-11 05:32
The crafty-16.7beta

Typo or real?

Not 19.7 ?

Second time you state 16.7
Parent - - By Rebel (****) Date 2013-08-11 06:24
Good morning Richard,

- I read that there is some controversy about the 0.0 constant in time allocation code. I missed some discussions about it, but I can assure anyone with doubts that there is indeed an unusual comparison between an integer value and a floating-point constant (0.0) in the binary (I can cite the relevant disassembly lines if wanted). I wonder.. did I miss something?

Guess so, 0.0 was a hard nut to crack, took some months.

Info :
http://www.top-5000.nl/fadden.htm
http://www.top-5000.nl/evidence.htm#C6

Recently an interesting discussion aroused about a specific position where both F21 & R10b failed in a very similar way. I see that Robert Hyatt did some debugging with Fruit and found the culprit - I am now really curious if I can found the same bug in Rybka...

Have a look here first, it might save some of your time.

A last piece of technical information here.

Looking forward to your findings.

I am also curious about some parts that weren't investigated and if they are the same, on top of my head:

1. Move format
2. Repetition handling
3. Handling of special moves (castling, en passant, promotions)
4. Search differences.
5. Related to (3) move generation.
6. Hash table (structure, code and Zobrist handling)

As you certainly remember there is Hex-rays source code available, not sure if that helps you in your new RE adventure, the previous was quite impressive :wink:
Parent - - By Richard Vida (**) Date 2013-08-11 13:38
From your page at http://www.top-5000.nl/fadden.htm:

> Lastly, Rybka uses a lot of variables in its time control Fruit does not use at all. Although given fancy names by Fadden (ignoreClockFlag, depthLimit, stopSearch etc.) no reference nor description is given to its meaning.


RYBKA                         FRUIT
ignoreClockFlag               no such code
depthLimit                    no such code
stopSearch                    no such code
nodeTickLow = 1024;           no such code
bestMove   = 0;               no such code
moveScore  = 0;               no such code
depthScore = 0;               no such code
bad_1      = 0;               no such code
bad_2      = 0;               no such code
change     = 0;               no such code
easy       = 0;               no such code
clackFlag  = 0;               no such code
nodeTickHigh = 1;             no such code

You seem to have missed something important. These are indeed present in Fruit, and are initialized by a call to search_clear() from line 352 of protocol.cpp. In Fadden's decompile these look like "orphan" variables, where in fact they might be members of search_*_t structures (and their memory layout strongly suggest that).
Parent - - By Richard Vida (**) Date 2013-08-11 13:42 Edited 2013-08-11 13:45
There is my version of decompiled parse_go(). The first line certainly looks very suspicious...

void __cdecl parse_go(char *gostr)
{
  search_info.last_time = 0.0;    // this member does not seem to be used anywhere else (!)

  search_input.time_limit1 = -1;
  search_input.time_limit2 = -1;

  int mtg = 25;
  int movetime = 0;

  search_input.infinite = false;
  search_input.depth_limit = 0x7FFFFFFFu;

  search_info.stop = false;
  search_info.check_nb = 1024;

  search_best.move = 0;
  search_best.value = 0;

  search_root.last_value = 0;
  search_root.bad_1 = false;
  search_root.bad_2 = false;
  search_root.root_changed = false;
  search_root.easy = false;
  search_root.stop_flag = false;

  search_info.nodes_div_1024 = 1;

  bool go_infinite = false;
  bool go_ponder = false;
  int go_mtg = 25;
  int go_winc = 0;
  int go_wtime = 0;
  int go_binc = 0;
  int go_btime = 0;
  int go_movetime = 0;

  strtok(gostr, " ");
  char *s = strtok(0, " ");
  if (s) {
    do {
      if      (string_equal(s, "winc")) go_winc = atol(strtok(0, " "));
      else if (string_equal(s, "wtime")) go_wtime = atol(strtok(0, " ")) - 5000;
      else if (string_equal(s, "binc")) go_binc = atol(strtok(0, " "));
      else if (string_equal(s, "btime")) go_btime = atol(strtok(0, " ")) - 5000;
      else if (string_equal(s, "depth")) search_input.depth_limit = max(atol(strtok(0, " ")) + 2, -1);
      else if (string_equal(s, "is_infinite")) go_infinite = true;
      else if (string_equal(s, "movestogo")) go_mtg = min(30, max(1, atol(strtok(0, " "))));
      else if (string_equal(s, "movetime")) go_movetime = atol(strtok(0, " "));
      else if (string_equal(s, "ponder")) go_ponder = true;
    } while (s = strtok(0, " "));
    mtg = go_mtg;
    movetime = go_movetime;
  }

  // search_clear(); Fruit initializes various search_*_t members here

  int time, inc;

  if (pos.stm) {
    time = go_btime;
    inc = go_binc;
  }
  else {
    time = go_wtime;
    inc = go_winc;
  }
  if ((double)go_movetime > 0.0) {
    search_input.time_limit1 = 5 * movetime;
    search_input.time_limit2 = 1000 * movetime;
  }
  else {
    if (time > 0) {
      int time_max = time - 5000;
      int target = inc * (mtg - 1) + time - 5000;
      int alloc1 = target / mtg;
      if (target / mtg >= time_max) alloc1 = time_max;
      int alloc2 = target >> 1;
      search_input.time_limit1 = alloc1;
      if (alloc2 <= alloc1) alloc2 = alloc1;
      if (alloc2 >= time_max) alloc2 = time_max;
      search_input.time_limit2 = alloc2;
    }
  }
  if (go_infinite || go_ponder) search_input.infinite = true;
  searching = true;
  infinite = (go_infinite || go_ponder);
  delay = false;
  search();
  searching = false;
  delay = infinite;
  if (!delay) send_best_move();
}
Parent - By Richard Vida (**) Date 2013-08-11 13:55

Memory layout of serach_*_t equivalent structures:

.data:00667A18 search_input search_input_t <?>
.data:00667A28 search_info  search_info_t <?>
.data:00667A80 search_root  search_root_t <?>

00000000 search_input_t  struc ; (sizeof=0x10)
00000000 time_limit1     dd ?
00000004 time_limit2     dd ?
00000008 depth_limit     dd ?
0000000C infinite        db ?
0000000D _padding        db 3 dup(?)
00000010 search_input_t  ends

00000000 search_info_t   struc ; (sizeof=0x58)
00000000 jmp_buf         dd 16 dup(?)  ; buffer for setjmp()
00000040 stop            db ?
00000041 _alignment      db 3 dup(?)
00000044 check_nb        dd ?
00000048 start_time      dd ?
0000004C nodes_div_1024  dd ?
00000050 last_time       dq ?
00000058 search_info_t   ends

00000000 search_root_t   struc ; (sizeof=0xC)
00000000 last_value      dd ?
00000004 bad_1           db ?
00000005 bad_2           db ?
00000006 root_changed    db ?
00000007 easy            db ?
00000008 stop_flag       db ?
00000009 _padding        db 3 dup(?)
0000000C search_root_t   ends
Parent - - By Rebel (****) Date 2013-08-11 16:11
search_info.last_time = 0.0;    // this member does not seem to be used anywhere else (!)

That's odd indeed. My hex-rays notes state:

//                                RYBKA                 FRUIT
//                                =====                 =====     
  dbl_667A78 = 0.0;            // gTimeDouble = 0.0     Fadden name calling
  dword_667A18 = -1;           // time_limit_1 = -1;    no initialization in Fruit
  dword_667A1C = -1;           // time_limit_2 = -1;    no initialization in Fruit
  v2 = 25;                     // movestogo = 25;       Hex-rays oddity ?? (v2 is also v44)
  v3 = 0;                      // movetime = 0;         Hex-rays oddity ?? (v3 is also v45)


You are right, the dbl_667A78 is death code. It's not in Fruit either. It's probably why I missed the relevance.
Parent - - By Richard Vida (**) Date 2013-08-11 17:02 Edited 2013-08-11 17:04

> You are right, the dbl_667A78 is death code. It's not in Fruit either. It's probably why I missed the relevance.


It _IS_ in Fruit. In search_clear() which is called from parse_go(). Fruit uses this variable in search_send_stat() to periodically send info every second. Rybka does not send periodic info, so this assignment is useless - but still there.
Parent - By Rebel (****) Date 2013-08-11 17:24
But not in the UCI part of Fruit, same misunderstanding. My focus is on the code differences between R10b and Fruit in the UCI  parse_go part.
Parent - - By Rebel (****) Date 2013-08-11 15:56
You seem to have missed something important. These are indeed present in Fruit,

But contrary to Fruit not in routine itself which is what I wanted to accentuate. After all the accusation was that Vasik copied Fruit's time control hence my focus is (entirely) on the differences.

Alright, I should rephrase that sentence,

Lastly, Rybka uses a lot of variables in its time control Fruit does not use at all.

to

Lastly, Rybka uses a lot of variables in its time control Fruit does not use here.
Parent - - By Richard Vida (**) Date 2013-08-11 16:49

> But contrary to Fruit not in routine itself which is what I wanted to accentuate.


Since it is called from this routine, technically it _is_ in this routine. Just to verify I compiled Fruit with MSVC2010 and indeed the compiler inlined the search_clear() function. Same with GCC -O3 -flto.
Parent - By Ray (****) Date 2013-08-16 09:56

> Since it is called from this routine, technically it _is_ in this routine. J


I don't agree, it demonstrates a difference between the two.
Parent - - By Richard Vida (**) Date 2013-08-14 11:56 Edited 2013-08-14 12:16

> I am also curious about some parts that weren't investigated and if they are the same, on top of my head:
>
> 1. Move format


Rybka uses 2 move formats. Due to this Rybka has to use 2 separate move_do() functions.

In move generators and in the search where it does not consider underpromotions it uses a 15 bit move format:

bits 0.. 5 (mask 0x003f) = to square (0-63)
bits 6..11 (mask 0x0fc0) = from square (0-63)
bit     12 (mask 0x1000) = promotion flag
bit     13 (mask 0x2000) = castle flag
bit     14 (mask 0x4000) = en passant flag


UCI parsing accepts all (under)promotions and the move format is 18 bits wide:

bits 0.. 5 (mask 0x003f ) = to square (0-63)
bits 6..11 (mask 0x0fc0 ) = from square (0-63)
bit     12 (mask 0x1000 ) = Q promotion flag
bit     13 (mask 0x2000 ) = R promotion flag
bit     14 (mask 0x4000 ) = B promotion flag
bit     15 (mask 0x8000 ) = N promotion flag
bit     16 (mask 0x10000) = castle flag
bit     17 (mask 0x20000) = en passangt flag
Parent - By bob (Gold) Date 2013-08-14 16:32
While we are here, Fruit is somewhat different.

Fruit uses this (bit 0 = rightmost or LSB)

0-5 = to
6-11 = from
12-13 promotion piece (0-knight, 1-bishop, 2-rook, 3-queen)
14-15 move type (0-normal, 1=castle, 2-promotion, 3-en passant)

Pretty inefficient for Rybka to use 4 flags to indicate a promotion piece, but it shows a difference anyway.

Instead of a 1 bit promotion flag, and 4 bits to indicate which piece, I wonder why he would not use the obvious 3 bits for promotion, 0=none, otherwise use the normal piece number.  Makes everything fit in 16 bits, one could then avoid the duplicated do_move() code.  In any case...
Parent - - By Rebel (****) Date 2013-08-15 06:35
In move generators and in the search where it does not consider underpromotions it uses a 15 bit move format:

bits 0.. 5 (mask 0x003f) = to square (0-63)
bits 6..11 (mask 0x0fc0) = from square (0-63)
bit     12 (mask 0x1000) = promotion flag
bit     13 (mask 0x2000) = castle flag
bit     14 (mask 0x4000) = en passant flag


Is this decalred as short or integer in R10b ?
Parent - By Richard Vida (**) Date 2013-08-15 15:02
It is handled as unsigned int everywhere except hashtable entry where it is truncated to short
Parent - - By Richard Vida (**) Date 2013-08-14 12:13

> I am also curious about some parts that weren't investigated and if they are the same, on top of my head:
>
> 1. Move format
> 2. Repetition handling


Repetion detection is done by comparing the current hashkey to previous hashkeys stored in a flat array (like in most engines).

bool board_is_repetition(bool check_rule50)
{
  if (check_rule50 && board.ply_nb > 100)
    return true;
 
  for (int i = 4; i <= board.ply_nb; i += 2) {
    if (board.stack[board.sp - i] == board.hashkey)
      return true;
  }
  return false;
}

Almost same as Fruit except that Rybka does not care about the case where ply_nb is equal to 100 (where fruit returns true if not checkmated). Rybka checks the 50 move rule only in non-QS PV nodes.
Note to Ed: This function is inlined too (in all search functions)
Parent - By Rebel (****) Date 2013-08-15 06:42
Almost same as Fruit except that Rybka does not care about the case where ply_nb is equal to 100 (where fruit returns true if not checkmated). Rybka checks the 50 move rule only in non-QS PV nodes.

Crazy Vas :grin:

Note to Ed: This function is inlined too (in all search functions)

Got it.
Parent - - By Richard Vida (**) Date 2013-08-14 15:22

> I am also curious about some parts that weren't investigated and if they are the same, on top of my head:
>
> 1. Move format
> 2. Repetition handling
> 3. Handling of special moves (castling, en passant, promotions)


Here the code itself cannot be directly compared between Fruit and Rybka (due to different board representation). I would instead briefly describe move_do() and move_undo() routines of both.

Steps where they are identical:
- updating 2 pst values (opening, endgame)
- updating hash key, pawn hash key, material key
- calculating new castle rights
- pushing the hash key into an array for repetition detection

Steps where they differ:
- Fruit maintains piece-lists, not present in Rybka
- Fruit maintains a pseudo "bitboard" for pawns, Rybka has the real thing
- Rybka needs to update 4 rotations of occupancy bitboard
- Rybka updates a rough estimate of material balance with weights of 1:3:3:5:10 (in the eval this value is then corrected by a delta obtained from the material table)
- Fruit has 16*16 square mailbox, Rybka 8*8
- Fruit supports all promotions, Rybka only queening

A point of interest might be that both Rybka and Fruit treat null move as irreversible (resetting the 50 move rule counter and ignoring repetitions through null move). AFAIK there is no general consensus whether it should be treated that way.
Parent - - By Richard Vida (**) Date 2013-08-14 16:20 Edited 2013-08-14 16:23
decompiled board representation:

struct undo_t {   // sizeof(undo_t) = 0x38
  int cr;         // castle rights (named flags in fruit)
  int ep_square;
  int ply_nb;     // 50 move rule counter
  int sp;         // hash history pointer
  int mg_pst;
  int eg_pst;
  uint64_t hash_key;
  uint64_t pawn_key;
  int mtrl_key;
  int mtrl_value; // 1:3:3:5:10
  int capture;    // last captured piece
  int _padding;   // make sizeof(undo_t) divisible by 8
};

struct board_t {  // sizeof(board_t) = 0x21d0

  // 8x8 mailbox
  int squares[64];

  // occupancy bb by color
  uint64_t w_occ, b_occ;

  // piece bb by color;
  uint64_t w_pawns, b_pawns;
  uint64_t w_knights, b_knights;
  uint64_t w_bishops, b_bishops;
  uint64_t w_rooks, b_rooks;
  uint64_t w_queens, b_queens;
  uint64_t w_king, b_king;

  // 4 rotated occupancy bitboards
  uint64_t occ;
  uint64_t occ_90;
  uint64_t occ_lt;
  uint64_t occ_rt;

  // side to move (WHITE=0; BLACK=1)
  int stm;

  // unused (alignment to nearest QWORD ?)
  int _alignment;

  // board state (epsq, castle rights, etc)
  undo_t state;

  // hash history for rept detection
  uint64_t stack[1024];
};

extern board_t board; // .data:00667A90
Parent - - By Rebel (****) Date 2013-08-15 06:07
Thanks for the info, much appreciated.

Question, I see that you posting your analysis in structures (which is convenient for the eye) but my question is how reliable that is. Isn't that kind of information (structures) not lost during compilation time?
Parent - - By Richard Vida (**) Date 2013-08-15 15:26 Edited 2013-08-15 15:31
It tried to reproduce it as reliable as possible. Ordering of the member fields are given by how they are laid out in memory. Exact structure sizes can be deduced by looking at how large chunks are memset-ed during initialization, or memcpy-ed elsewhere. I even left in some alignment and padding fields (probably added by compiler) deliberately to match exactly the binary.

In the above example, it is somewhat ambiguous whether undo_t is contained in board_t or its members are duplicated (like in fruit). I choose the former just because the whole thing is copied with a single rep/movsd in move_do().
Edit: In real source code it would be more convenient to use the latter (writing eg. board.state.ep_square requires more typing than board.ep_square)
Parent - By Rebel (****) Date 2013-08-17 07:04
That's what I assumed and given that I am okay with your (pleasing the eye) presentation. For a moment I thought the decompilers had become so smart they could detect structures flawlessly, which IMO is impossible. It practically means no exact comparison can be made with Fruit which is a pity. Or do you see that differently?
Parent - By Rebel (****) Date 2013-08-15 08:01
Steps where they are identical:
- updating 2 pst values (opening, endgame)
- updating hash key, pawn hash key, material key
- calculating new castle rights
- pushing the hash key into an array for repetition detection


Don't we all :wink:

Steps where they differ:
- Fruit maintains piece-lists, not present in Rybka
- Fruit maintains a pseudo "bitboard" for pawns, Rybka has the real thing
- Rybka needs to update 4 rotations of occupancy bitboard
- Rybka updates a rough estimate of material balance with weights of 1:3:3:5:10 (in the eval this value is then corrected by a delta obtained from the material table)
- Fruit has 16*16 square mailbox, Rybka 8*8
- Fruit supports all promotions, Rybka only queening


Cool.

A point of interest might be that both Rybka and Fruit treat null move as irreversible (resetting the 50 move rule counter and ignoring repetitions through null move). AFAIK there is no general consensus whether it should be treated that way.

I do the same (both 50 move and repetion should be resetted) except that I do check for repetitions during null move. Why not? Not that it matters elo-wise I guess.
Parent - - By Richard Vida (**) Date 2013-08-15 22:02
Today my post is about

> 6. Hash table (structure, code and Zobrist handling)


Zobrist hash keys are calculated and handled identically in both programs, only the random numbers are different.

The transposition table:

struct trans_t {
  entry_t *table;
  int _unused_04;
  uint32 mask;
  int _unused_0c;
  int date;
  uint32 size;
  int age[DateSize];
} trans; // .data:0066C480

Fruit has some extra fields here for collecting hash usage statistics which Rybka does not. The age[] array is precalculated identically. DateSize is 256 for Fruit, but interestingly, only 4 for Rybka.
In hash allocation code I found something unusual (a bug? or am I missing something? any help welcomed). See the commented line in the code below:

void trans_alloc(uint32 target)  // .text:0040E230
{
  if (target < 2) target = 64;
  target = target * 1024 * 1024;
  for (uint32 size = 1; size != 0 && size <= target; size *= 2)
    ;

  size /= 2;
  size /= sizeof(entry_t);

  trans.size = size + (ClusterSize - 1);

  // Why subtracting 2 ? Normally, I would expect -1 here.
  // Now this scheme can address only even entries
  trans.mask = size - 2;

  trans.field_0c = 0;
  trans.table = (entry_t *)malloc(trans->size * sizeof(entry_t));
  trans.clear();
}

I wonder, why would someone want to increase the addressing granularity?
Parent - - By Richard Vida (**) Date 2013-08-15 22:07
Hash table entry format is virtually identical, though the fields appear in slightly different order.

Rybka                            Fruit

struct entry_t {                 struct entry_t {   
  uint32 lock;                     uint32 lock;    
  uint16 move;                     uint16 move;    
  sint8 depth;                     sint8 depth;    
  uint8 date;                      uint8 date;     
  sint16 min_value;                sint8 move_depth;
  sint16 max_value;                uint8 flags;    
  sint8 move_depth;                sint8 min_depth;
  uint8 _unused;    // flags?      sint8 max_depth;
  sint8 min_depth;                 sint16 min_value;
  sint8 max_depth;                 sint16 max_value;
};                               };
                                
Meaning and usage of the fields is indentical. It is worth noting that the "flags" field is unused in Fruit. Strangely enough, rybka has an unused byte too. It would be natural if it appearead at the end, but a structure containing a hole (which is not for alignment purposes) is somewhat suspicious... Or let's just say it was for some experimental feature that was removed before releasing the beta... :wink:
Parent - By Richard Vida (**) Date 2013-08-15 23:06 Edited 2013-08-15 23:15
Now let's look at trans_store() function. Semantically identical, but speed optimized in Rybka. First, Rybka omits all hash usage stats stuff. While Fruit has only 1 generic function, Rybka has 4(!) specialized instances. All 4 are derived from the generic one. At the call site it is already known which blocks of the generic code would be never executed thus calling a specialized version will save some CPU cycles (and branch mispredictions).

+---+--------------------+-----------------+--------------+--------------+---------------------+----------------+
| # | call site          | move available? | lower bound? | upper bound? | specialization      | address        |
+---+--------------------+-----------------+--------------+--------------+---------------------+----------------+
| 1 | pv move found      |      yes        |    yes       |     yes      | trans_store_exact() | .text:0040E330 |
| 2 | best_score > beta  |      yes        |    yes       |     no       | trans_store_min()   | .text:0040E420 |
| 3 | best_score < alpha |      no         |    no        |     yes      | trans_store_max()   | .text:0040E5F0 |
| 4 | null-move cutoff   |      no         |    yes       |     no       | trans_store_null()  | .text:0040E510 |
+---+--------------------+-----------------+--------------+--------------+---------------------+----------------+

For example in a fail low situation 2 comarisons can be omitted (depth >= trans->move_depth and depth >= trans->min_depth) for there is no best move and no lower bound to store.
Only trans_store_exact() has all three code blocks:


void trans_store_exact(short move, int depth, int value)
{
  int best_score = 0;
  entry_t *best_entry;
  entry_t *entry = &trans.table[trans.mask & board.state.hashkey];
  for (int i = 0; i < ClusterSize; i++, entry++) {
    if (entry->lock == board.state.hashkey) {
      entry->date = trans.date;
      if (depth > entry->depth)
        entry->depth = depth;
 
      /* move available [block present in #1 and #2] */
      if (depth >= entry->move_depth) {
        entry->move_depth = depth;
        entry->move = move;
      }
     
      /* lower bound [block present in #1, #2 and #4] */
      if (depth >= entry->min_depth) {
        entry->min_depth = depth;
        entry->min_value = value;
      }
     
      /* upper bound [block present in #1 and #3] */
      if (depth >= entry->max_depth) {
        entry->max_depth = depth;
        entry->max_value = value;
      }
      return;
    }
    int score = trans.age[entry->date] - entry->depth;
    if (score > best_score) {
      best_entry = entry;
      best_score = score;
    }
  }
  best_entry->lock = HIDWORD(board.state.hashkey);
  best_entry->date = trans.date;
  best_entry->depth = depth;
  best_entry->move_depth = depth;  // cases [#1,#2] else zero
  best_entry->move = move;         // cases [#1,#2] else zero
  best_entry->min_depth = depth;   // cases [#1,#2,#4] else zero
  best_entry->min_value = value;   // cases [#1,#2,#4] else zero
  best_entry->max_depth = depth;   // cases [#1,#3] else zero
  best_entry->max_value = value;   // cases [#1,#3] else zero
}

I will leave the comparison to Fruit code for the reader...
Parent - - By Rebel (****) Date 2013-08-16 06:55
Strangely enough, rybka has an unused byte too. It would be natural if it appearead at the end, but a structure containing a hole (which is not for alignment purposes) is somewhat suspicious... 

Not necessarily suspicious, I actually keep 2 free bytes for experiments. On top of my head, idea's such as :

1. A second best move.

2. Maintain the score difference with the previous result.

3. Marking positions to extend in the next iteration.

4. Another 8 or 16 byte hash-key for collision checking.

-----------------

uint8 _unused;    // flags?

What's odd is that in the Rybka format there is no "flags". Perhaps Vas uses 2 bits of another structure entry? A hash table that returns real scores only would be a novelty, likely a very bad novelty :lol:
Parent - - By Richard Vida (**) Date 2013-08-16 10:15 Edited 2013-08-16 10:20

> What's odd is that in the Rybka format there is no "flags". Perhaps Vas uses 2 bits of another structure entry? A hash table that returns real scores only would be a novelty, likely a very bad novelty :lol:


You mean flags like FLAG_LOWER_BOUND and FLAG_UPPER_BOUND? With this hashing scheme these are not necessary. If min_depth != 0 then min_value is a lower bound, if max_depth != 0 then max_value is an upper bound, if min_value = max_value it is an exact score.
Parent - - By Richard Vida (**) Date 2013-08-16 10:32 Edited 2013-08-16 10:35
This is how probing looks like:

  entry_t entry = &trans.table[board.state.hash_key & trans.hash_mask];
  for (int i = 0; i < ClusterSize; i++, entry++) {
    if (entry->lock == HIDWORD(board.state.hash_key)) {
      hash_move = entry->move;
      entry->date = trans->date;
      if (entry->min_depth >= depth && entry->min_value >= scout) {
        iid_move = hash_move;
        return entry->min_value;
      }
      if (entry->max_depth >= depth && entry->max_value < scout)
        return entry->max_value;
      break;
    }
  }
Parent - - By bob (Gold) Date 2013-08-19 00:47
the flagless approach is unusual.  As is the double bound storing.  You'd have to ask Fabien why, I suppose...

I once did a hash something similar, storing both bounds.  I did it in at least two versions, one was in a non-released version of 23.0 where I modified the search to use the mtd(f) idea.  The first was back in version 12.0 (not the released 12.0 version).  I believe Don was exhorting its effectiveness and I tried it.  Since you expect to "cross over" the actual score, by searching null-window searches for efficiency, sometimes multiple times, saving the upper AND lower bounds is almost a requirement.

But if not using mtd(f), I've found no benefit at all, and it stretches the entry a bit because you have to store two scores, and two drafts.  But not FOUR as in fruit...

So two things that stand out to me (not counting how closely the code matches Fruit).

(1) two bounds...

(2) 4 different draft values...

The code match makes it beyond a coincidence...
Parent - - By Rebel (****) Date 2013-08-19 09:44
the flagless approach is unusual.  As is the double bound storing. 

The code match makes it beyond a coincidence...

Yikes...

Shall we have a "neutral" look?

Hash table implementation follows strict rules a programmer can't sin against else you will end-up with a buggy hash table. As such the symantical equivalence between engines is already extremely high, the code must do the same and there is little room for variation. The usual exceptions are the bucket size (about everyone uses 4, right?), replacement scheme (about everyone uses the draft, right?) and the aging. It's the "aging" system where the flexibity is.

Bucket-size:
Had already in the mid 90's a bucket size of 4x16=64 bytes because already back then caching became important. I have thouroghly tested it back then and it was the best. We had multiple discussion about it back then in RGCC and Vincent was right when he pushed the 4x16 bucket size overthere. It's still in use by most (modern) programs if I have understood the authors well. Right so far?

Replacement scheme:
Various experiments have been done, seems that draft based replacement is still the standard nowadays? Right?

Aging:
It's where engines start to differ in solutions.

flags:
And now the point Fruit starts to differ from the classic (2-bit) "flags" approach (lowerbound, upperbound, realscore). I might assume Fabien was aware of the classic 2-bit flag approach but he choose something else (the 2 bounds + related depths) because 1) it does exactly the same but 2) performed better. And that's the ONLY topic we are talking about right now, a different "flag" system. Agree?

Arrived here we start to differ, where you immediately conclude "copied" because it fits your (unproven) theory Vas copied Fruit as a whole a neutral person would be more careful and consider:

1. That Vas (by his own voluntary words 2 weeks after the Rybka 1.0 beta release) went forwards and backwards to Fruit's source code and took many things, saw the Fruit "flag" idea, tested it in his own code and used it. Like you used the Fruit interpolation idea in Crafty.

2. In one of the ICGA attack documents Zach (incorrectly) accused Vas using Fruit interpolation code (BTW, see how ugly that sounds since you have done it yourself?). He was proven wrong because the idea of interpolation was already known for 5 years. Following Zach's logic Fabien stole the interpolation idea from the author of Phalanx, what a nonsense.

Point is, how new was Fruit's hash table "flag" replacement idea? If it was already known, how do you possibly conclude Vas took the idea from Fruit?

Meaning, do your homework first before you accuse. Then we will reason.
Parent - - By Richard Vida (**) Date 2013-08-19 11:56

> Bucket-size:


They don't really use buckets. Although they indeed do probe 4 consecutive entries, they can start with an arbitrary one.

> Aging:
> It's where engines start to differ in solutions.


Rybka ages entries exactly same way as Fruit:

age = (age_delta << 8) - depth

where

age_delta = ((trans->date - entry->date) & (DateSize-1)) + 1


> flags:
> And now the point Fruit starts to differ from the classic (2-bit) "flags" approach (lowerbound, upperbound, realscore). I might assume Fabien was aware of the classic 2-bit flag approach but he choose something else (the 2 bounds + related depths) because
>1) it does exactly the same but
>2) performed better.


1) false
2) Performs about the same. Besides that, performance was not a priority for Fabien
Parent - By Rebel (****) Date 2013-08-19 21:53
They don't really use buckets. Although they indeed do probe 4 consecutive entries, they can start with an arbitrary one.

That's what I meant.

Rybka ages entries exactly same way as Fruit:

Okay.

1) false

Not "false". I was merely referring to your previous post which explains how Fruit deals with transpositions that can be used to cut-off the tree. I mean, there has to be information in the TT to make that decision whether it is by the classic 2-bit flags or something else, as long as it is present.
Parent - By bob (Gold) Date 2013-08-19 17:37 Edited 2013-08-19 18:30
Zach didn't "accuse" regarding the interpolation.  He called it ANOTHER similarity to Fruit.  Until Fruit came along, it was a rarely mentioned approach...  The point of Zach's report is that there are too many similarities to be accidental. 

Back to the hashing.  I used the words "code similarity".  Did you compare Fruit's code to the code Richard posted?  Didn't think so.  There are a LOT of different options in terms of hash implementations.  I can probably go back to old Crafty versions and pull out at LEAST 4 or 5 different approaches.  Yet Fruit and Rybka end up basically identical?  AGAIN? 

For replacement, NO, draft is not the primary issue.  This has been known for many years.  When you complete a node, you MUST store that hash entry or suffer a significant performance hit.  If you just use draft, you end up with a table full of deep draft entries that don't help the search much at all where the help is needed.  Even in Belle, Ken used a two-tier hash approach, one table depth-preferred, one "always replace no matter what".  There are lots of options built around that theme.

For "buckets" I suspect the answer is "no".  Just probing 4 consecutive 16 byte entires is not the "bucket concept".  In crafty, All probes to a bucket go to the first entry, so that the entire bucket is guaranteed to lie within a single cache block, every time.  Fruit does not appear to do this in 2.1..  I did not look at the older versions, however.  So it gets no benefit from the "bucket concept" since 3 of every 4 probes will span 2 cache blocks...

The "backward and forward" excuse wears thin.  If you STUDY a program, for any amount of time, then go write your own, the code, etc are NOT going to match so closely.  Unless you took the original as a guideline and wrote your code while directly looking at the original.  That's not "taking an idea" by any stretch of the definition.

I won't claim to have looked at every source program available.  But I will claim to have looked at most, over the years.  Not having EXACT, LOWER and UPPER is something I had not seen until Fruit.  Doesn't mean it wasn't used, but I had not seen it.  I had only seen the two-bound approach in programs that used mtd(f), for obvious reasons.  Whether Fabien started with mtd(f) and then gave up, or whether he started with normal PVS, tried mtd(f) and then gave up on it and switched back to pvs, I don't know.  I certainly followed that approach.  But when I switched back, I dumped the dual-bound hashing to keep the hash entry at 16 bytes, as I had stretched it to 20 when adding a second bound and depth for it.

I believe I have done my homework.  You should also.  Start by comparing Fruit and Rybka hashing.  Then look at non-fruit programs.  Arasan is easy to find.  Uses same basic idea used by everyone, as mentioned in my previous post.  One bound, flags, etc.  The only examples I found for this dual-bound approach was the various fruit/Fabien derivatives such as toga, gnu chess, etc.  Doesn't mean there are none, of course.  But in any case, this is yet another similarity to Fruit.

As to the "only topic" question, there are two topics at hand.

1.  The odd table structure common to fruit/rybka.

2.  the code match between fruit and rybka hashing.

As I have said repeatedly, if this were the ONLY perfect match between the two, it would still not be very significant.  But there are so MANY matches between the two that it is impossible to believe it is just the result of pure luck.

BTW as far as aging goes, I believe the "Cray Blitz algorithm" is used universally today.  That is, an "age" variable that is incremented by one each time a new search (not new iteration) is started.  Store the rightmost N bits of this age value in the table.  Then the replacement strategy will choose to overwrite "old" entries regardless of their draft.  This replaced the previous approach of going thru and marking every hash entry as "invalid" (so that the scores/best moves could still be used) which resulted in their getting replaced first.  Only problem was that large loop through the entire hash array was expensive.  About the only variability I see in that approach is "how many bits to store".
Parent - - By Rebel (****) Date 2013-08-18 23:12
I hardly look at the work of others so I am unaware of this technique. Therefore allow me this question, how common is this technique? Prior to Fruit or unique by Fabien?
Parent - - By Richard Vida (**) Date 2013-08-19 12:06 Edited 2013-08-19 12:09
For normal PVS engines it was quite unusual at the time. For MTD(f) search it is common to maintain both bounds.
Parent - - By Rebel (****) Date 2013-08-19 14:53
From the old CCC archives this might interest you, 7 days after the R10b release. It would explain the origin of the TT classification. Furthermore I do remember a MTD(f) discussion between Fabien and Vas, perhaps you can look for it yourself.

http://www.stmintz.com/ccc/index.php?id=469380

On December 12, 2005 at 04:58:52, Chrilly Donninger wrote:

I did not follow the CCC-discussion the last months/years. So please do not "kill" me if I am telling old stuff. I got questions about Rybka from the Hydra-sponsor so I read yesterday evening some of IM Vasik Rajlichs very interesting postings in the archive.
Vasik mentions several time MTD(f) and his experience with it. But when playing with Rybka, it looks like a normal PVS-search. I looked therefore also in the Rybka-Code. What I have seen so far is typicall PVS-Code.

From this first look I can say that Rybka is not at all a Fruit clone.

Chrilly

---

Yes, I dropped MTD (f) a long time ago - it just doesn't work. In the best case you save a tiny handful of nodes, and for this you have to live without true bounds in the search. Even very simple things like how to do a null move become really complicated.

Vas
Parent - - By bob (Gold) Date 2013-08-19 18:12
One has to wonder when he used it.  Not in 1.4, 1.5 or 1.6.1...  not in 1.0 beta.
Parent - By Banned for Life (Gold) Date 2013-08-19 19:40
There is no reason to believe this experiment left his basement.
Parent - - By Banned for Life (Gold) Date 2013-08-15 22:53
I wonder, why would someone want to increase the addressing granularity?

I'm guessing that Vas was probably doing this to try to align TT accesses with cache lines, but he didn't get it right. Fruit does no alignment at all, so either 75% of TT accesses will hit two cache lines, or 100% if the 16-byte hash entries are not 16-byte aligned.

There was a discussion about this many years ago and Vempele suggested a simple working method for using a four entry set locked to 64 byte cache lines, with the ability to walk up to three hash entries without leaving a cache line.
Parent - - By Richard Vida (**) Date 2013-08-15 23:44 Edited 2013-08-15 23:49

> There was a discussion about this many years ago and Vempele suggested a simple working method for using a four entry set locked to 64 byte cache lines, with the ability to walk up to three hash entries without leaving a cache line.


This is indeed how most engines do it.

As a side note, it would be worth looking up the hash allocation code in the pre-beta Rybka. Crafty had cache-aligned allocation since ages. It would be pretty damning if the older Rybka had a nicely aligned hash and suddenly Vas "forgot" how to do it properly.
Parent - By bob (Gold) Date 2013-08-16 02:23 Edited 2013-08-16 02:29
One quick note.  I just checked the 19.x code (19.7beta specifically) and it does the usual 64 byte alignment trick.  I need more mental energy to look at the early versions.  But a couple of things that suggest the old versions copied crafty exactly...  Mark found some places where I initialized things, and I had an error or two where one piece was initialized twice and another piece was not initialized at all.  This was in Rybka 1.6.1 exactly.  I'd be totally surprised to find that Rybka 1.6.1's hashing was different from Crafty after all the copied code we found...

I am sure that one or two here will offer perfectly plausible explanations why this is completely unimportant, however... 

I'll try to poke around once I recover from my annual trip to the dermatologist...

edit:

there are two alignment issues.  The first is to force the basic hash table to lie on a 64 byte address alignment.  The second is to make sure that if you are going to do multiple probes to choose the best entry to replace, that you make sure that the probes all go to the same cache block.

Crafty correctly forced the ttable to lie on a 64 byte boundary, but it used a 3-entry table much like Belle, where one entry was depth-preferred, the other two entries were a set of two, one was selected by a bit of the hash signature not used in addressing.  It tried to overwrite the depth-preferred entry if the new depth was greater, otherwise it wrote to the always entry selected by the bit mentioned above.  So this COULD split across two cache blocks.  Later versions went to the more usual 4 entry bucket, with all 4 entries (one bucket) guaranteed to be in a single cache line...
Parent - - By Banned for Life (Gold) Date 2013-08-16 05:56
As Bob mentioned, there's aligning the TT to 64-byte boundaries (and it's unclear from the above how Rybka's TT is aligned) and then there is the issue of only using TT entries that fall in a single cache line.

With the 'Fruit' TT methodology, there is no benefit to using a 64-byte alignment, but there is a small benefit to a 16-byte alignment (in this case 25% of accesses will remain in one cache line as opposed to 0% if there is 8-byte alignment only).

The simplest way to fix this while maintaining the Fruit methodology is to 64-byte align the TT and use trans.mask to ditch the two least significant address bits so that only every fourth hash record can be accessed, forcing the four record sets to be cache aligned. There is no penalty for doing this after the TT fills up, since each of the four hash records will be checked each time anyway, and this reduces memory accesses (but not as significantly as might be expected because this second cache line access is one of the few instances in accessing the TT where the TLB will actually be useful).
Parent - - By Richard Vida (**) Date 2013-08-16 10:07

> (and it's unclear from the above how Rybka's TT is aligned)


It is allocated with plain malloc(). Typically, it is guaranteed to be 8-byte aligned on 32bit systems and 16-byte aligned on 64bit systems.

e.g. the glibc manual says:
"The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems)."
Parent - By Banned for Life (Gold) Date 2013-08-16 23:26
I was using msvc++ 7.1 in 2005 with the 64-bit SDK, which I think is what Vas used to compile Rybka 1.0 Beta. I'm pretty sure this always aligned to 8-byte boundaries.
Parent - - By Rebel (****) Date 2013-08-16 07:17
  // Why subtracting 2 ? Normally, I would expect -1 here.
  // Now this scheme can address only even entries
  trans.mask = size - 2;


Looks like a bug (typo) indeed.

 
I wonder, why would someone want to increase the addressing granularity?

What do you mean by that? Related to the odd "-2" ?
Parent - By bob (Gold) Date 2013-08-16 17:09
You can only address every other entry, because the rightmost bit of the AND mask is zero.

Odd unless you use a "bucket" approach.
Up Topic Rybka Support & Discussion / Rybka Discussion / R.V. about Rybka 1.0 beta vs Fruit 2.1
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill