Strelka source code, I assume that there will be a hue and cry from
the unwashed masses about how I am pirating the intellectual
property of Vasik Rajlich. Let me just immediately say that there
are simply no new ideas in Strelka anyway; my shock at reading the
Strelka source code was not generated by the brilliancies contained
therein but by the lack thereof. Looking at Strelka finally
convinced me to do some things that everyone else has done for a
long time (like futility pruning), and to clean up my code a bit
since I had always been developing in "crunch" mode. I did not add
any eval terms from Strelka (my view being that Zappa's eval is way
better than Strelka's anyway) and I also did not do the crazy
futility pruning where it drops into the qsearch at depth 3. In
summary, Strelka was more a source of motivation than chess ideas.
Of course, you will just have to take my word for it until some
enterprising Russian hacker disassembles Zappa Smile."
( from the read-me file of the new Zappa Mexico II )
(Note that identifying parts of another engine's code does not necessarily mean identifying a new idea.)
see also http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=40986;hl=#pid40986
Strelka1.8 has a similiar behaviour to strelka that obviously cannot happen not on purpose and people who saw the source can tell you that strelka2 is based on strelka1.8.
I guess that you simply did not read the similiarity in analysis between strelka1.8 and rybka beta otherwise you had no reason to suggest this nonsense.
Uri
I accept your word that strelka is based upon Rybka. But something tells me that it was more of a success at emulating Rybka than it was to reveal the secrets of the Rybka code. I assume that many chess engines have similar code.
Anyhow, I was just trying to put myself in the shoes of Vasik.
practically it is not logical to expect it and Juri Osipov also admitted that he used reverse engineering of rybka.
I also read that some parts of the exe were clearly the same.
Uri
Although, if someone really isn't familiar with the topic, it must be a nightmare trying to find and tell the relevant observations and statements from the rest, which included a giant amount of message board trash and even wrong expertise.
And it just won't stop...
> So I wonder why that has to be explained again and again.
Osipov said it was a decompiled Rybka. Vas said it was a decompiled Rybka. Is billyraybar arguing that both Vas and Osipov are lying?
Anyhow, no biggie. I'm not trying to stir the pot. I was actually trying to be a bit 'tongue in cheek'. Just thought it would be comical if the code only vaguely reminded Vas of Rybka but he said the opposite to throw people. Highly unlikely. No doubt.
> I thought it were impossible to actually extract the original code of any software program.
This is true for many programming languages, but not for others. For instance, you can't get "the original" code from something written in C, but with Java you can get something close.
However, in all cases, there do exist reverse-engineering tools (decompilers) that take an executable and turn into functionally equivalent C code (or another language). The output is usually not the most pleasant, but it can be hacked [a long and tedious process] by someone until it is more palpable to the eye. This is essentially the claim about Strelka/Rybka - I don't think anyone familiar with the facts doubts that this was done to some degree (one obvious instance is that the same positional weights for various things like mobility, king safety, etc., appear in both Rybka1.0 and some Strelka versions). I personally don't consider the concordance of positional evaluations to be at the same level of evidence.
> The real proof is that if you turn it into a .wav file and play it backwards, you can clearly hear the phrase "Vas is dead, Vas is dead". [link]
This only worked for me in October when I was in a grassy knoll with black helicopters hovering overhead. :-P It might also work in Atlanta, at least for chess players. [bizarre conspiracy theory link]
(funny, italians inthis case touch iron!)
Why do we even need an "opinion" of the quality of the code when we have an objective measure: engine strength? What anthony says is simply unrelated to reality. If the ideas in rybka where just "old" and trivial, then you would have seen 100 engines at rybka's strength.
Maybe the secret is in the implimentation of the ideas (e.g., bug free code). But even this is nontrivial, because again, objectively rybka does better then the 1000 or so other uci engines (and better than the programs that have been professionally developed for over a decade).
Let me translate what anthony is saying to golf: Tiger woods is nothing special, because he uses the same golf clubs and same strokes as everybody else. there is absolutely nothing new in tiger woods game.
best
Joseph
different viewpoint than my own, and yours is probably much more
common. One of the interesting things about computer chess is that it
is both a scientific and competitive endeavour. As a graduate
student, I am emphasizing the former. As a user and computer chess
enthusiast, you are emphasizing the latter.
In academia, novelty rules all. It is much better to be novel than to
have great experimental results. The goal is to add to the knowledge
of the human race, and the key question is "What can we as a community
learn from this?" You will have to take my word for it that the
source code of Strelka simply does not contribute to the state of the
art in building computer chess engines. Consider Fruit: you can take
the new ideas from Fruit, add them to an existing chess program, and
make it better. Just about everyone uses late move reductions of some
sort nowadays. There is no equivalent in Strelka, no magical
techniques that could be added to a new chess program and give it 100
elo. There is no equivalent to null move, history pruning, or even
singular extensions. For two years, whenever things were becoming
boring computer chess enthusiasts could always debate about what
Rybka's secret was. In hindsight, Vasik must have been quite amused
by those discussions, because Rybka's "secret" was simply to be super
optimized to 3 or 4 times faster than everything else.
There is only one thing that is really interesting about Strelka: that
a search is still important. A lot of people (myself included, under
the dark school of Diepeveen) thought that 16-18 or at most 20 ply was
sufficient for any position. Obviously, we were wrong. This is why I
said earlier that I thought Vasik went in a different direction (at
least from me). While I was trying to improve the evaluation, he was
trying to optimize his program, and since Rybka gets about 4-5 more
plies than Zappa, his 5 ply search, especially in quicker games,
proved better than my heuristical evaluation.
Anyway, from an engineering standpoint Strelka (rybka) is fantastic.
It packs most of the search and evaluation of Fruit/Crafty into
something almost 3 times faster. For those of you that have never
written code, 3 times is quite a bit, considering that Fruit and
Crafty are already reasonably well coded. Rybka is even faster than
the ultimate beancounter, Fritz 5, while having a lot more under the
hood. And this is why I said that Strelka was more a source of
motivation than insight, because I had never really taken the time to
optimize Zappa, persuaded by the standard CS doctrine of "don't waste
your time trying to improve the constants on an exp-time algorithm".
What this all boils down to is the difference between interesting and
useful, two qualities that are often unrelated.
I think that is about the best I can do to try to explain my
viewpoint. I personally have had a life long love affair with science
and engineering. I find it amazing that they work at all. Imagine
the zillions of quantum interactions involved when we say that the
American dollar has lost value against the Euro! That we can build
models for such complex phenomena is preposterous - and yet those
models work down to 20 decimal places. This is why I am trying to
earn my PhD: I would like to make some small contribution to science.
Of course, I don't expect everyone to have this view. I don't think
this particularly bothers Rajlich; he seems a pure competitor who wants
to win games and keep himself in twinkies. Just because it isn't
scientifically interesting
(or wasn't in 2005. Perhaps he has done something interesting
since then) doesn't make Rybka any less of a great chess program.
cheers,
anthony
P.S. I don't consider Zappa to be particularly interesting from a
scientific standpoint either. This is why I don't work on it that
much.
P.P.S. I think this scientific attitude is what gets Hyatt in trouble in this forum as well.
P.P.P.S. Yes, I am sort of arrogant. We all have these little flaws, but it is considered courteous not to point them out.
thanks for your considered response. That makes sense to me (i am an academic too, though not in computer science)
best
J
ps. I never said you were arrogant
It's nice to know that you're back working with Zappa again. It keeps computer chess competition alive. I'm sure you got tons of ideas by now. Please surprise this chess community by releasing a 150+ ELO stronger Zappa soon!
regards :)
For me at least !
Regards,
Silvian
However, it can't JUST be that - many programs are weaker on a quad than Rybka is on a single cpu.
Here he expresses that it is fast AND has a good eval/more chess knowledege, or whatever makes an engine great in addition to its search.
Things would get really interesting if he spent lots of time speeding Zappa up.
anthony
I understand if you are not interested in making the same thing faster but I think that other programmers can learn from strelka's code how to do it.
Uri
> Even if strelka is better only because it is designed to do things faster then it still has things that programmers can learn from her.
> I understand if you are not interested in making the same thing faster but I think that other programmers can learn from strelka's code how to do it.
I agree with this in principle, but only to a point - if you are really interested in speed, assembler (for critical code) can be noticeably faster than
C
. There is also the issue of whether decompiling and recompiling is truly lossless in speed (if the original used ASM code, the chance of this being a worry is increased). Finally, there are some C
-level micro-optimisations that might be useful in some programmes, but not in others. I'll give a Strelka code snippet as an example:
for (mask = Board->mp[WhiteBishop]; mask != 0; mask &= (mask-1))
{ sq = first_one(mask); WH_BISHOP_CODE(sq);}
This is (diagrammatically) Strelka looping over White bishops, with some code executed for each one. We can first note that this loop can be partially unrolled, as (typically) there is a maximum of 2 White bishops. Thus I claim that
if (mask) {WH_BISHOP_CODE(sq); CLR_BIT(sq,mask);
if (mask) {WH_BISHOP_CODE(sq); CLR_BIT(sq,mask);
while (mask) {WH_BISHOP_CODE(sq); CLR_BIT(sq,mask);}}}
should give (slightly) better branch prediction, at least when doing a search from a position where White has 2 bishops. Whether or not the final code is faster could depend on matters such as pipelining, the size of the WH_BISHOP_CODE vis-a-vis the instruction cache (especially as POPCNT is currently 20-odd instructions), branch-prediction-counter interaction, etc. In general, you have to test it and see. [At the ASM level things can be controlled a bit better, but it's still trial-and-error in the end].
Secondly, for an ASM level attempt at a nano-improvement, the
CLEAR_BIT(square,mask)
as implemented by Strelka via mask&=(mask-1)
can instead use Bit Test and Reset (BTR) [this could be tricky with 32 bits, but mask&=(mask-1)
also requires care in that case]. As above, this is just an idea, and a generic real-world plug-and-play need not lead to faster code.
Today when compilers are fast it is clearly less important.
I believe that the main reason for the speed of strelka is not assembler tricks but good data strucuture.
Uri
> I do not know assembler but based on my knowledge assembler was significantly faster than C mainly in the past when compilers were slow.
> Today when compilers are fast it is clearly less important.
Much low-level "mission-critical" code is still written in assembler, and can still be 2x faster. An obvious example is the
first_one
and popcnt
in many engines (and other programmes). Beyond such trivialities, the gains rapidly decline, but an additional gain of 20-25% over C
code is typically a reasonable hope from a suitable amount of effort. Indeed, the difference between compilers (or even from fiddling with flags) can often be nearly this much. [For instance, compiling Toga with gcc4.2.2 gives me a result that is 10-15% slower than the distributed executable].One of my favourite recent quotes (from Zimmermann and Dodson in 20 years of ECM) is: Efficient Assembly Code. While using clever high-level algorithms may give a speedup of 10% or 20%, at the expense of several months to invent and implement those algorithms, a twofold speedup may be obtained in a few days, just rewriting one of the assembly routines for integer arithmetic. [There is a footnote, but I would be forced to digress in order to relate its whole history]. Of course, GMP (multi-precision arithmetic) is a very specific processor-intensive case, and most code should not see this type of speedup.
As for possible gains of rewriting code into assembler: It depends a lot on the code, it depends a lot on the compiler, and it depends a lot on how spread-out your "hot" code is. If 95% of your time is spent inside a 10-line loop (this is not uncommon for multimedia tasks), there's a good chance you can spend some time, write careful assembler and get it faster. (Modern compilers are really, really good, though -- what they usually can't handle well is vectorization, and that's where you can still beat them without _too_ much effort. Still, speaking as someone who have just spent two days trying to optimize such an "obvious" loop for not more than 2% practical overall gain, I do tend to be skeptical of claims that just porting it to assembler is easier.) However, counting cycles is such a little part of the entire picture; an incredibly often overlooked effect is that of cache, for instance, and fixing that on the C form is, erm, slightly easier. :-)
Actually, I'm not ready to agree that much speed-critical code is written in assembler at all, unless you have different ideas of either what "much" is, or what set of code we're talking about. :-)
/* Steinar */
> Actually, I'm not ready to agree that much speed-critical code is written in assembler at all, unless you have different ideas of either what "much" is, or what set of code we're talking about. :-)
Perhaps I spend too much time with multi-precision arithmetic, where getting the 0th level to be blazingly fast percolates all the way to the top, due to the recursitivity.
> an incredibly often overlooked effect is that of cache, for instance, and fixing that on the C form is, erm, slightly easier. :-)
Indeed, when Dan Bernstein visited a group I was working with, we spent most of the time getting a Gröbner basis algorithm to be cache-friendly. Big speedup in the end. Bill Hart and David Harvey have done similar things with polynomial arithmetic in their FLINT package.
> The single reason
first_one
is written in assembler,
Yes, yes, I agree this is merely a historical
C
thing.> that just porting it to assembler is easier
I would never say something this strong. :) Indeed, inline ASM should be sufficient. Also, I'd guess that even DSP programmers are likely to use a
C
-like ASM-extension these days.
FWIW, I've programmed DSPs (my degree is in signal processing), and the only thing written in assembler was the vendor's FFT library. The rest was C, all C.
/* Steinar */
> spent two days ... for not more than 2% practical overall gain
I'd suspect that there are over-clockers who have spent longer than this for an even smaller gain - and largely only for their own box. :)
>> One of my favourite recent quotes
Now that I think about it, an even better one in this genre (though not assembly versus high level) would be from Dan Bernstein regarding the recent Van Buskirk tangent FFT - it's not in the paper, but when I saw him give a 20-minute blurb about it he said something like: "Recently, the computation of the DFT has seen an algorithmic improvement over the split-radix methods that were described almost 40 years ago. The (asymptotic) gain is about 5%. [Obviously no one should be much impressed, so he turns to a salesman pose]. This might not seem like much, especially when compared to the 10000-times speed improvements in hardware over the same time period, but(!) [extreme irony, but not mockingly so], the algorithmic improvement is largely orthogonal to the hardware advances [this is not exactly true - for instance, Bernstein himself describes how to make it cache-friendly], and so, when compounded with that gain, we can hope to achieve an overall speedup by a factor of around 10500!"
The GMP 4.2.2 comparison ended up 3.7x faster with P4-ASM versus generic, but as I say, this is a really special case.
http://www.fftw.org/speed/CoreDuo-3.0GHz-icc64/
and scroll down to the most commonly used FFTs (single-precision real and complex, 1D transforms), you will see that the difference between the best algorithms and the median algorithm approaches an order of magnitude. For many detection-theory type applications, these differences are really, really important!
Regards,
Alan
> I'm a little surprised by your example. If you look at this FFTW page:
Ah, great. FFTW benchmarketing! :) Bernstein had a long dispute with them (back when he maintained it, djbfft was faster than FFTW), as they wouldn't compile his programme in the manner he desired. [My impression is that Bernstein cares more about theory than practise nowadays].
In any case, the quote was meant to exemplify the difference between a pure algorithmic advance and hardware advances - there would seem to be a large middle-ground in implementation.
These exercises tend to be very non-theoretical. The added complexity of having to run critical code on a larger set of hardware costs lots of money and adds significant schedule risk on jobs that are frequently bid fixed price. The FFTW suite normally does pretty well but is avoided if possible due to commercial licensing issues.
Regards,
Alan
> No doubt the FFTW team is biased toward (you guessed it!) FFTW
I am reminded of the apocryphal Soviet reporting of a hypothetical head-to-head footrace: "Gorby takes 2nd, Reagan next to last" - it's all in how you spin it. :)
/* Steinar */
> quite good writeup on the FFTW/djbfft benchmarking controversy
I can ask Dan about it next time I see him, but that could be fraught with danger. :) His operational methodology has frequently been "Do ONE thing - do it VERY well" [the
qmail
design, particularly its security, shows a similar concept], which is less broad than the FFTW
focus. As such, this seems to have led to an inadequate comparison between the two programmes. His comments about their benchmark methodology do not seem incorrect - however, if typical consumers/users find the FFTW
metric to be ill-devised, they could simply ignore it. If you were particularly concerned about the special case(s) [which are not uncommon] covered by djbfft
circa 1998, you'd probably have used it, or at least considered it. I'd agree with Bernstein that FFTW
's Format of Results does leave a lot to be desired - this is why I do not hestiate much when using the pejorative "benchmarketing" moniker to describe much of the content of Alan's link. Johnson's WikiTalk indisputables seem just so - in any case, the feud has long passed, and has not been rejoined.> Wiki page: I'd never heard of Dan Bernstein until I looked at this page.
:). I first met him playing bridge at a math conf back in 2000. Knew nothing about his (in)famous lawsuit,
qmail
, djbfft
, etc. Has ALWAYS worn black - nothing else. [He actually ran out of black shirts at one conf, and chose to buy a (black) T-shirt at an "Everything a Euro" store].
> Note that in the two benchmarks I referenced (which not coincidently are the most commonly used), Intel's ipps significantly outperforms FFTW (in some cases by 20%). Its not obvious how this could be construed to be be marketing for FFTW.
Aha! So
FFTW
puts the most-common near the bottom, but the ones for which they excel are put near the top! Sounds like marketing to me... :-P I guess I'd prefer the raw data from their site as opposed to their graphical representation - "mflops" is simply wrongly named, and I share some misgivings about inverse time and scaling in any case. Sadly, Intel's MKL page does the same. [Maybe I just don't like graphs - they're USA Today material :)]. Somewhat like gcc
, one important (and impressive) feature of FFTW
is its portability, so it should be expected that it can be beat in a limited venue.
I'm proud of having written code that actually beat FFTW in terms of speed (2x on my own machine, more on others), but I was using the GPU, so it wasn't a fair comparison. :-) (It was part of a larger algorithm, but fast DSTs and IDSTs were an essential and time-critical part of it.)
/* Steinar */
> but I was using the GPU, so it wasn't a fair comparison
A few years back, I was implementing an algorithm to search for points on an intersection of two quadrics in projective 3-space. More down to earth, you have two 4x4 symmetric matrices M and N, and want a (nonzero) integral 4-vector v with both <v,Mv> and <v,Nv> equal to zero. One guy with whom I was working had the brainstorm to offload some/most of the computation to the GPU - after all, we were doing a lot of (presumably streamable) operations with 4x4 matrices on integers of smallish size, and for this, the GPU should be quite fast, even if it is doing everything in floating point. We gave it a bit of thought, but didn't use it in the end - perhaps if we had 53 bits of mantissa with the GPU, it would have been realistic. With the recent FireStream9170, this sort of GPGPU-esque methodology might need a rethink, but it's not exactly my area of expertise.
Overall the GPU was very much worth it for this kind of task. With a fast GPU, the GPU implementation was a _thousand_ times faster than the reference MATLAB implementation on my Athlon 64 3000+... (And that was 64-bit, JITed MATLAB, and almost all the time was spent inside native MATLAB routines, so it wasn't like you can blame all of that on interpretation overhead either.) It depends a _lot_ on the task, though... and it's a very different model from what programmers are used to. I don't know how computationally intensive your stuff was, from the description it sounds a bit like the cost of getting the data to and from the GPU would outweigh the advantage of more processing power.
/* Steinar */
>>>One of my favourite recent quotes
Not exactly related, but a bizarre quote... at the end of a seminar I gave today, an emeritus professor from Oxford asks in the question period: "I guess I don't understand why you didn't mention dessins d'enfants." Literally this means children's drawings, so I guess I have something to tell the grandkids, as it were ("I once got asked why I didn't talk about children's drawings") [dessins d'enfants is also a technical mathematical term].
/* Steinar */
- only has one course in abstract algebra, would have loved more =)
> However, counting cycles is such a little part of the entire picture; an incredibly often overlooked effect is that of cache, for instance, and fixing that on the C form is, erm, slightly easier.
I'm not really an expert on this, but did download some of the AMD and Intel manuals a long time ago. It's a huge area, you can spend your life studying it. I can definitely imagine that it's very interesting.
One interesting story about Fritz: apparently, Franz is very good at this stuff, including things like balancing the load _inside_ the CPU. Fritz actually uses more power than other engines (since all aspects of the CPU are constantly in use), and one user had an instability problem which only arose when Fritz was running.
Vas
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill