> 64-bit has 60% increase
I think a 64-bit bitboard build on an 2.4Ghz Opteron 842 was 100% faster than a 32-bit build on a 3.2Ghz P4, but I'm not sure about the comparison here due to the different processor types. My impression is that a 2x speedup is more typical than 60%, but I will check this directly in a day or two.
A 2.4 GHz Opteron on one processor would have the expected speed of a 4.8 GHz Pentium 4. Noting that 4.8x1.6 = 7.7, I'm surprised that it was only 100% faster, as I'd expect the ratio to be more like 7.7/3.2 = 2.4, or 140% faster.
1) What is counted.
2) The shape of the search tree.
3) Search-like operations to help with search (which is related to #1).
Forgetting about Rybka for a second, a high-NPS program like Junior is around 10x faster than a low-NPS program like Hiarcs. This doesn't mean that Junior is 10x better optimized :)
Vas
Firstly, I was remembering wrong - Fruit computes a
SEE
value, though all that is used is whether SEE
>=0. [I noticed this in Fruit, so in my own code I had just returned a boolean].For the sorting idea in noncaptures, I'm not sure it is a big gain in any case, as
gen_captures
is executed more often by a factor 20 or so - also, I might expect mergesort to be used if speed were that important. The claim of about 15 changes being necessary to modify the code need not be so bad if a macro is used somehow [macros could be a big difference with looking at the Strelka code versus the actual Rybka1.0 source, especially if VR used an extended macro language on top of the C
preprocessor].There are definitely some things I personally would think should be nonintuitive if writing for clarity rather than speed [though the speed/clarity choice need not be binary]. For instance, the incremental changes to what Strelka calls Board->mat_summ (an index for material lookup) --- this is computed in a pleasantly clever way in part of make/unmake rather than recomputing it from scratch each time (through brutal multiplication) in eval. The SEE early-exits have already been mentioned - there is also the fact that the
do
/while
loop here is unrolled so that the identities of me
/opp
do not switch when the side to move changes. Thirdly, the delta-pruning eliminates pieces of sufficiently small value from consideration before calling gen_captures
[this could be seen as another idea of incrementalism] - my impression is that the gain here should be noticeable, in that gen_captures
is called quite often. Finally, many bitboard engines (Crafty and Glaurung) have bitboards for each piece, say WhiteKnights
, and then have a switch
/case
to get the right one in make/unmake - Strelka avoids the switch
branching by having 12 bitboards enumerated by the number of the piece on the square on the board [this should not yuck up the code, as one can easily #define
WhiteKnights
to be what Strelka calls Board->mp[WhiteKnight]
).This being said, I could agree if VR were to claim that all these optimisations are "high-level", or at an algorithmic level, but I would also agree with AC that I would probably not do things in this way if speed were not a consideration [real or imagined] near the outset.
> if a macro is used somehow [macros could be a big difference with looking at the Strelka code versus the actual Rybka1.0 source, especially if VR used an extended macro language on top of the C preprocessor].
Seriously, not bad, I tip my hat :)
Although I prefer inlines. Macros are about 25% as evil as people think, which is still pretty bad.
> For instance, the incremental changes to what Strelka calls Board->mat_summ
I'd definitely say that this is more elegant.
By the way, I would be quite curious to see if you can find one example of inelegence. Take it as a challenge :)
Vas
> Take it as a challenge :-)
Part of your plot to get me to waste my time rather than work my own engine? :) Maybe if I can't find any other reading material for the next boring seminar, I'll look at the Strelka code again.
>> For instance, the incremental changes to what Strelka calls Board->mat_summ
>I'd definitely say that this is more elegant.
There have been various words bandied about - simplicity, elegance, clarity, intuitiveness - and these should mean slightly different things. I personally find arithmetic in split-base 9-9-3-3-3-3-3-3-2-2 to be elegant, but none of the other three. Clarity is usually achieved with comments in concert with the code, simplicity could describe the typical code produced when trying to meet a deadline, and intuitiveness will differ from person-to-person, with particular dependence concerning from which "school of coding" they have received their learning. [One correspondent of mine works for a "independent corporation whose sole client is the federal government" and his job, to what I can tell, involves getting aircraft to land without human intervention - bugs are not an option, one might say (I think Torvalds said something similar about the Linux kernel and/or kernel programming in general at one point, when he was tired of getting dubious submissions)].
> I personally find arithmetic in split-base 9-9-3-3-3-3-3-3-2-2 to be elegant, but none of the other three.
I'd definitely go with all four for that one.
> Clarity is usually achieved with comments in concert with the code
If you need comments to have clarity, something is wrong. Actually, I love using no comments. If that's too ridiculous, I'll go with a three-word summary. It's hard to beat terseness.
> simplicity could describe the typical code produced when trying to meet a deadline
For me it's quite the opposite. When I'm in a hurry, I'll produce a huge mess. When I have time, every line is perfect.
Vas
>> For instance, the incremental changes to what Strelka calls Board->mat_summ
> I'd definitely say that this is more elegant.
As the 'Board->mat_summ' and 'Board->mat_diff' are needed only for positions which are going to be evaluated a recalculation of summ and diff in each position to be evaluated might be good enough.
almost every move_do in the search of strelka is followed by something like
in_check = evaluate(pos_info_entry) because without this strelka cannot know if the king is under threat(the same function that find if
the king is under threat also calculate the evaluation of the position).
The only exception is in the qsearch when strelka generates checks so it does not need the evaluation function to know if the king is under threat and in these cases some cheaper function is called to help to find details like if illegal moves was made and also to help to generate legal moves later.
For example
If white makes a checking move then
squares that black controls are needed to know if the white king is under threat and squares that white controls are needed
in order not to make illegal moves with the black king out of check.
Uri
>By the way, I would be quite curious to see if you can find one example of inelegence. Take it as a challenge :-)
Having re-read the REBEL description of ES, I can now note that he gives a trick that can be applied here to skip (part of) the capture generation code: simply
AND
the (masked) opponent's pieces with the squares you attack (the latter is at-hand, having been computed in eval), and if this gives 0
, you can jump straight to pawn promotions (and the fickle en passant) w/o needing to loop over your pieces. Admittely, this could be termed a lack of elegance rather than an example of inelegance --- in any event, it gives (in Strelka) almost a 1% speedup in multiple runs (about an hour's worth) of a test suite, though I managed to get myself confused with the REBEL description, and only applied this from qsearch
(for some reason, I wanted to skip the function call, which on many modern processors is not much overhead in any event --- oh yeah, another pithy inelegance might be the checking of the pawn hash at the top of pawn_eval
rather than before the function call, though a really good compiler might catch this). I've made other minor (again not more than 1%) gains via (say) separating the disjoint parts of the move-generator functions, so as to avoid the branch for white-to-move (on a 2Ghz 64-bit machine, movegen gets called maybe 2 million times a second, and at 10-15 cycles per mis-predict [which should occur nearly 50% of the time], this starts to add up... well, at least to a 0.5% speedup or so on the back of my envelope, and about this in practise). Of course, you can likely gain much more via a implementing a different bitboard scheme, or perhaps by memory-mapping the executable (at the linker stage) to ameliorate cache pollution, etc. --- but the speedups here shouldn't take more than about 15-30 minutes to code and test, and are likely orthogonal to the other (more tedious) types of cycle-grinding.
> the function call, which on many modern processors is not much overhead in any event
Now that I think some more (and before the Nordic forum experts beat me to it), it seems that the possible lack of pipelining with re-directed program flow could also be a worry with function calls, especially ones that are indirect. [This doesn't change the analysis of the first idea in any case, but it is reason for skepticism regarding (say) the typical latency of
CALL
as opposed to what is listed in processor manuals].> [idea 2 gave] a 0.5% speedup or so on the back of my envelope, and about this in practise
I posted this in the first place partially because I was initially puzzled by the largeness of the real-world gain, and my above re-consideration has not de-confused me totally. Indeed, all I've done is replace a direct function call and branch with an indirect function call... In a perfect world, the direct function call need not stall the pipelines, but (I guess that) in a chess programme, the call to movegen itself comes right after a branch, and so the flow structure is non-obvious --- thus the first method probably leads to 2 pipeline stalls (or about 1.5 on average with the mis-prediction), and my method only one.
Anyway, this isn't elegance, this is optimization. IIRC I guessed that a 10-20% optimization to Strelka would be possible without hard-core methods like pure assembly language. It seems like you're well on your way.
Of course, you can argue that optimization is elegance - but then we're getting a bit too far afield. :)
Vas
> Indeed, Osipov missed a trick/simplification with the pawn shield/storm (these are the king-sheet stuff - there are three, for abc/def/fgh files) that would generate it from about 4 lines rather than 125 lookup entries.
I'm curious, what do you have in mind?
Vas
Letting P0 and P1 be the arrays, I can replicate the contents via:
const int SH[5]={1121,0,214,749,915}; const int ST[5]={0,0,2334,653,310};
for (j=0;j<4096;j++)
{ ap=j&01111; A=first_one(ap)/3; if (ap) A++;
bp=j&02222; B=first_one(bp)/3; if (bp) B++;
cp=j&04444; C=first_one(cp)/3; if (cp) C++;
P0[j]=SH[A]+2*SH[B]+SH[C]; if (!P0[j]) P0[j]=794; P1[j]=ST[A]+ST[B]+ST[C];}}
And if you really want 4 lines rather than 6, one could save another line or two via a macro for the ap,bp,cp computations. :)
I also consider your code as more than 4 lines because you have variables that you did not define and need to define.
Note that I understood the fact that the P0 and P1 arrays are based on few numbers but my code to generate them is longer
Here is my code
void init_pawns()
{
int i,j;
for (i=0;i<4096;i++)
{
Pawn_king_defend=0;
Pawn_king_attack=0;
if ((i&7)==7)
Pawn_king_defend=794;
else
for (j=0;j<=2;j++)
{
if (i&(1<<j))
{
}
else
if (i&(8<<j))
{
Pawn_king_attack+=2334;
Pawn_king_defend+=214*(1+(j==1));
}
else
if (i&(64<<j))
{
Pawn_king_attack+=653;
Pawn_king_defend+=749*(1+(j==1));
}
else
if (i&(512<<j))
{
Pawn_king_attack+=310;
Pawn_king_defend+=915*(1+(j==1));
}
else
Pawn_king_defend+=1121*(1+(j==1));
}
}
}
> Your code does not work and the problem is that first_one(0)=32
Sorry, I changed the Strelka code because the GCC assembler I needed is not of the same form. I can imagine 0 or 64 being a reasonable output for
first_one(0)
, but 32 is quite odd. The specs say that bit scans should leave the target unchanged if the input is 0, so that doesn't help much in deciding a standard.> I also consider your code as more than 4 lines
Rather than a 6-line code snippet, here is an (obfuscated) 4-line 80-column code block (with plenty of white space still :():
#define Z(o,m) x=j&o; v=x?1+first_one(x)/3:0; P0[j]+=m*S[v]; P1[j]+=T[v];
void URI() {int S[5]={1121,0,214,749,915},T[5]={0,0,2334,653,310},j,v,x;
for (j=0;j<=07777;j++)
{Z(01111,1); Z(02222,2); Z(04444,1); if (!P0[j]) P0[j]=794;}}
P0
and P1
arrays. I leave this as an exercise to the reader. :)
Vas
If I might interject. I do not think Vas has claimed the move generator code was taken from Rybka 1.0 beta, and if I recall correctly (and some Google guys can check the russian), the Strelka author said he started with his own move generator, which might have been optimized more than Rybka.
I do agree that the Strelka code is nicely written, very fast and does things in an efficient way. I am just not so sure it does them in the way Rybka does. It is possible it was stolen from Rybka, but if it does, it might just indicate Vas write efficient code from the get go, and does not need to go through many cycles of optimization.
Mark
> P.S. I have never understood why it's important that Rybka is a high
> knowledge program anyway. If it works, who cares?
Isn't this more just a matter of definition? The key sentence being that you have to have code that works first? If by optimizing you mean making essentially the same code execute faster, then I think Vas would argue that no matter what clever speedtricks it is almost impossible to get a two-fold speed up, even improving a very inefficient parallelisation I think it is difficult to go two times faster. At most you would get about 50 elopoints out of that doubling and, while nice, that is not enough to stay at the top because in a way you are doing what the dinosaurs did, you risk overspecializing and if an opponent finds something new for which it is difficult to find an answer because you were only working towards speed, it is hard to adapt, even if you are a programming dinosaur that can alter his or her own code which of course the dinos could not do in a million years.
I imagine you have to have code that works first and if it works it does not really matter how fast it works. Also the code has to remain flexible enough to implement almost any new knowledge that you think is necessary, which goes against speed optimization, and then you have to tune changes again with the existing knowledge. I agree knowledge can be something simple like a tunable constant for pruning, but this is more knowledge than speed optimization in my opinion. Depends on the definition used. Anthony would probably say that such a constant certainly does not qualify as chessknowledge, it is just speed enhancement..
How much of the general data structures that you could use in programming can be found back in a final compilation-decompilation-rewrite 'product' like Strelka I do not know.
Well just my two cents.
Eelco
I) Separate PV search - this is to help me think. PV nodes and scout nodes are just different conceptually. My way shouldn't be any faster, as the multiple routines will pollute the cache.
II) Moves & scores together - my way is definitely more elegant. The point you're missing here is that in the Rybka code the history stuff is encapsulated. What you're looking at is post-inlining - the hackers just didn't clean it up.
III) SEE - again, my way is just better, simpler, more compact, more to the point. Why output some value which you don't plan to use?
I also don't see why you're getting so worked up about this. You're not taking Knuth literally, are you? :)
Vas
and I talk only about strelka.
The fact that strelka has seperate code for white and for black not only about pawns is not elegant.
My guess is that for pawns it is faster but not for the other pieces but it is only a guess and I simply do not know.
Uri
One offsetting benefit is that the eval can be debugged by checking for black/white eval symmetry. This doesn't help with all of the search stuff of course.
Vas
> probably speed up 10-15% without crossing over
I just want to clarify: I read you as guessing a 10-15% gain overall, not just a 10-15% gain in bitboard ops [make/unmake, movegen] - 10-15% overall would be at least 30% in bitboard ops.
I suspect that the bitboard ops are more of a bottleneck than you think. Look at for example the 64-bit speedup. I'm not sure, though.
Vas
I didn't yet read the Strelka code, but what I noticed at Rybka from the beginning is it's huge exe size (the current version is 20 times as big as Toga, an earlier version was even twice this size). I assumed the code itself might not be much larger then from other engines, but Vas found a way to attach (his) chess knowledge and look it up during search.
Unfortunately now you tell us that the engine just gets blown up by loop unrolling, inline functions and other optimization stuff, if I got that right. That would not only be boring for studiying, but almost unreadable (at least after decompilation). Besides, Vas one time confessed his style of programming is not actually the industrial way. In this case I will maybe not waste the time examining the Strelka code.
Maybe I should have known it. In the 90s I attended a chess programming course with Chrilly Donninger and one point he taught was that a Chessmaster will never write a top class program, because his chess knowledge detains him from the correct ideas how a chess program should work. That's not actually true, since Vas is a Master, but considering his chess knowledge it seems he just managed not to get confused by it.
It's a pity that computer chess today is all about optimization and processor speed. I wonder what had happend if we hadn't experienced this extensive speed gain since the early days. Let's assume processors today would still run at 1 Mhz, allowing 300 n/s, but we had todays memory size. Do you think the strongest chess programs would work differently then, or would they just be the same?
Best Regards,
Josef
The importance of speed is going to be even higher with slow hardware.
Chrilly Donninger talked nonsense because having chess knowledge does not mean that you try to implement everything that you know.
I see no reason to assume that people are stupid and are going to be confused by their knowledge.
It is the opposite.
Knowledge show that the person has intelligence and intelligent people know to develop good things.
I expect very strong chess player or very strong go player or every intelligent person to develop better chess program relative to person who has a lower IQ(IQ may not be the right term to use and maybe I should write instead of it inferior thinking skills).
Uri
even if they look at the position for one second they do some search
I will give you a simple example
Show one of the following positions to chess players that are not weak chess players(2000 fide rating is clearly enough) and tell them to evaluate the positions in 1 second
I guess most of them are going to be correct.
Humans can see immediatly that white win in the first position and lose in the second position because they do a search
In the first position they will see that black cannot defend against mate and in the second position they will see immediatly the line Qh6 Ne6 when black wins.
If you ask them what was their evaluation before doing a search the answer can be that they detect the pattern Bf6 Qh6 and understand that mate threat need to be analyzed.
They do not think in the term white is better black is better but only in the terms can black prevent Qg7 mate after white plays Qh6.
The first thought that humans are going to think is Qh6 and not evaluation of the position.
A human will detect the mating pattern and conclude those positions aren't beta-quiescent.
Progress cannot be other than incremental. Like other difficult and hence interesting games chess is badly structured and most of its “phase space” (the immense all-position set) could be deemed as irregular and chaotic. That means not much reliable or definitive “knowledge” could ever be extracted from chess positions. In an unstable and chaotic realm the only cure is searching deeply and comprehensively, however hopeless (exponential) it could appear to be. Human strategic patterns cannot be built in advance in the software because most of these patterns emerge unpredictably and spontaneously in our brains during the very course of live games! Of course, all strategic rules and guidelines suffer from numerous random (unpredictable) exceptions and loopholes, which are amenable to systematic and exhaustive search only to find out about.
One could imagine a neural network-like technology able to learn and train itself toward layouts of chess positions, which led to success in past games - something akin to automatic recognition of visual scenes. Memory demand for positional layouts would be huge though; and how you assess dynamical factors like exchange for initiative, active figure play in the spirit of Tal, Kasparov, Topalov? Unfortunately, the brute force search, geared toward execution efficiency and boosted by sheer technological speed seems the only viable way to catch up and exceed the curious capacity of human mind to recognize and generate global (collective) strategic patterns in disordered and chaotic domains like the game of chess.
> Human strategic patterns cannot be built in advance in the software because most of these patterns emerge unpredictably and spontaneously in our brains during the very course of live games!
It's a complex issue, so let me just say this: I would have agreed with this much more 18 months ago than I do today.
Vas
I believe that many strategic or orderly patterns are quite unique and emerge randomly during the course of a game depending on particular number, type, locations and hence dynamical interactions of chess pieces on the board. Humans adapt, recognize and exploit such patterns, machines cannot unless preprogrammed; obviously emergent positional patterns, if and when those appear, are not to be found in the already running software. Fortunately, this is hardly a significant constrain. Software embodies and keeps enriching with the most common, reliable and widely employed (in most positional types) principles and guidelines comprising chess knowledge. Coupled with fast, deep and comprehensive search, this makes up for powerful chess engines. Adjusting the evaluation function and parameters with latest strategic and positional insights will always be incremental and unreliable (unstable), at least for a while. This seems to follow from the poor (rather chaotic) overall structure of the game , the context dependence of guiding rules and numerous random exceptions in actual positions. Barring complete and exact combinatorial solution of the game, one will never know why some chess engines are better than others and what precisely, if significant at all, is the contribution of the so called chess knowledge. In poorly ordered or random domains the only cure appears to be the universal and comprehensive search.
ok, I see what you mean. Yes, humans are constantly learning new things, even within a single chess game. For example, a human may spend two minutes looking at variations and conclude something very specific to that position, for example "the knight has to stay on a6 until white plays a3", etc.
It's true that every evaluation heuristic has all sorts of exceptions. Search heuristics have exceptions too. In theory your statement that "the only cure appears to be the universal and comprehensive search" is true, but it's not a good practical guideline. There is no way to magically search deeper - well, except for optimization. :) In practice, you have to rely on imperfect search and evaluation heuristics.
Vas
The final outcome and "perfect" play deepends solely on knowledge of each legal position that might occur in the game.
Even the stupidest positions and moves must be examined (considered) to scientifically prove what wins (loose) or draws.
The most primitive "Brute force" search and counting is an answer to such problem .No,one can't improve that part much.
Yes,the truth hurts sometimes:Game of chess with ~10^44 positions will be hardly solved in next 100 years.
With technology of today ,the computer with RAM mass of Earth and CPU of size Moon will be neccessary to crunch numbers
for years to solve chess played on 8×8 board.But,the situation is much better with chess on 5×5 board:
http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=3233
Might be a *little* scientific project for you or Anthony or whoever want to engage computers to do something in field where they are at their best:counting.
Everything else you mentioned about strategies,recognition,dynamical position assessment, approaches how to "teach" machine to play chess are just poor aproximations ,not to say guessing,from the standpoint of theory of games.
This is simply wrong.
You can prove that a class of positions win or lose without proving it for everyone of them.
For example it is possible to prove without tablebases that KR vs K wins and it is easy to prove that without the 50 move rule the rook win even in 1000000*1000000 board(and no tablebases of O(10^18) positions is needed for it.
Uri
You can prove that a class of positions win or lose without proving it for everyone of them."
You are correct. This is how checkers was solved.
Yes,but in chess the number of these positions is relatively small if compared to total number of positions.
Unfortunatelly.
With checkers ,it's another story.
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill