Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Anthony Cozzie speak !
1 2 3 4 5 6 Previous Next  
- - By Silvian (***) Date 2008-02-03 12:27
"Since this release comes immediately after the release of the
Strelka source code, I assume that there will be a hue and cry from
the unwashed masses about how I am pirating the intellectual
property of Vasik Rajlich. Let me just immediately say that there
are simply no new ideas in Strelka anyway; my shock at reading the
Strelka source code was not generated by the brilliancies contained
therein but by the lack thereof. Looking at Strelka finally
convinced me to do some things that everyone else has done for a
long time (like futility pruning), and to clean up my code a bit
since I had always been developing in "crunch" mode. I did not add
any eval terms from Strelka (my view being that Zappa's eval is way
better than Strelka's anyway) and I also did not do the crazy
futility pruning where it drops into the qsearch at depth 3. In
summary, Strelka was more a source of motivation than chess ideas.
Of course, you will just have to take my word for it until some
enterprising Russian hacker disassembles Zappa Smile."

                                                ( from the read-me file of the new Zappa Mexico II )
Parent - - By Banned for Life (Gold) Date 2008-02-03 16:44
It's good to see Anthony putting some additional effort into Zappa.
Parent - - By billyraybar (***) Date 2008-02-03 16:55 Edited 2008-02-03 17:50
Seems like Anthony is indirectly taking a jab at Vas by saying how unimpressed he was with the strelka code. 
Parent - - By Carl Bicknell (*****) Date 2008-02-03 17:45
well yes, but it's good to see he's not a copy cat.
Parent - - By turbojuice1122 (Gold) Date 2008-02-03 19:58
Yes, though I would expect nothing else of him since he's a real chess programmer who is out in the open, can be verifiably located, etc.  That's why it's so ridiculous when some Neanderthals try to defend Osipov and his actions and try to put him on the same level as Vas.
Parent - - By Permanent Brain (*****) Date 2008-02-03 20:29
There seems to be disagreement even among chess programmers, about new ideas in Strelka's code. Some say it contains none, others say it does.

(Note that identifying parts of another engine's code does not necessarily mean identifying a new idea.)

see also http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?pid=40986;hl=#pid40986
Parent - - By billyraybar (***) Date 2008-02-04 19:28
Just had a thought - What motivation would Vas have to say that much of the strelka code is very close to Rybka's code?  Maybe Vas said that to throw everyone off.  Maybe the strelka code contains hardly any similarity to the Rybka code.  If I was the Rybka author and the strelka code was very similar to my code, I would probably say 'There is hardly any similiarity to Rybka at all.' 
Parent - - By Uri Blass (*****) Date 2008-02-04 20:00
Every intelligent person who saw the facts knows that Strelka is based on rybka regardless of what Vas is going to say and Vas had no reason to lie and claim that rybka is not based on strelka.

Strelka1.8 has a similiar behaviour to strelka that obviously cannot happen not on purpose and people who saw the source can tell you that strelka2 is based on strelka1.8.

I guess that you simply did not read the similiarity in analysis between strelka1.8 and rybka beta otherwise you had no reason to suggest this nonsense.

Uri
Parent - - By billyraybar (***) Date 2008-02-04 21:01
Yeah, I read about all the similar analysis.  Before the Strelka source was made available I remember seeing a lot of similar analysis on CCC and there was a debate as to whether strelka was a reverse engineered Rybka or not.  Is it possible to have similar analysis, yet have very different code to produce that similar analysis?  I am not a computer programmer/ software developer (or whatever you want to call it), but if the answer to my question is 'yes', then I think my suggestion is not nonsense.  Do strelka and Rybka have the same exact analysis on every single position?  My guess is, no.
 
I accept your word that strelka is based upon Rybka.  But something tells me that it  was more of a success at emulating Rybka than it was to reveal the secrets of the Rybka code.  I assume that many chess engines have similar code. 

Anyhow, I was just trying to put myself in the shoes of Vasik. 
Parent - By Uri Blass (*****) Date 2008-02-04 21:35
In theory it is possible to have similiar analysis and different code but
practically it is not logical to expect it and Juri Osipov also admitted that he used reverse engineering of rybka.

I also read that some parts of the exe were clearly the same.

Uri
Parent - - By Permanent Brain (*****) Date 2008-02-04 22:51
For the first Strelka, there were many examples found, where the engine output wasn't just "similar" it was identical to Rybka 1.0 beta (except for reported depths d+2 and other node counts). Same moves at certain depths and same evals. Even identical bugs were found, at least one but IIRC more than one. This was just some month ago and the message boards were FULL of it, after May 2007. I am sure that 97% of these postings can be found still, in CCC and elsewhere. - So I wonder why that has to be explained again and again. The disassembling of Rybka 1.0 was never denied.

Although, if someone really isn't familiar with the topic, it must be a nightmare trying to find and tell the relevant observations and statements from the rest, which included a giant amount of message board trash and even wrong expertise.

And it just won't stop...
Parent - - By Uly (Gold) Date 2008-02-05 03:31

> So I wonder why that has to be explained again and again.


Osipov said it was a decompiled Rybka. Vas said it was a decompiled Rybka. Is billyraybar arguing that both Vas and Osipov are lying?
Parent - - By billyraybar (***) Date 2008-02-05 16:41
No, I'm not arguing that they are both lying.  I must be confused about what 'decompiling' means.  I thought that decompiling was a process of trial and error - seeing how Rybka evaluates position A and then making Strelka evaluate the position the same way.  Then take another position, position B,  and try to get Strelka to evaluate that position the same.  Then make sure the changes made to evaluate position B correctly didn't change the way position A was evaluated, and if so make the changes so the evaluation of both positions remain the same as Rybka's evaluations of those positions.  Then go position C, and D, and so on.  I thought it were impossible to actually extract the original code of any software program.   My logic is that if the two programs disagree about the evaluations of a few positions then it is possible that the code of the two programs are very different although they usually produce similar output.   There must be a more sophisticated method of reverse engineering.

Anyhow, no biggie.  I'm not trying to stir the pot.  I was actually trying to be a bit 'tongue in cheek'.  Just thought it would be comical if the code only vaguely reminded Vas of Rybka but he said the opposite to throw people.   Highly unlikely.  No doubt.   
Parent - - By BB (****) Date 2008-02-05 17:02

> I thought it were impossible to actually extract the original code of any software program.


This is true for many programming languages, but not for others. For instance, you can't get "the original" code from something written in C, but with Java you can get something close.

However, in all cases, there do exist reverse-engineering tools (decompilers) that take an executable and turn into functionally equivalent C code (or another language). The output is usually not the most pleasant, but it can be hacked [a long and tedious process] by someone until it is more palpable to the eye. This is essentially the claim about Strelka/Rybka - I don't think anyone familiar with the facts doubts that this was done to some degree (one obvious instance is that the same positional weights for various things like mobility, king safety, etc., appear in both Rybka1.0 and some Strelka versions). I personally don't consider the concordance of positional evaluations to be at the same level of evidence.
Parent - - By Banned for Life (Gold) Date 2008-02-05 19:12
The real proof is that if you turn it into a .wav file and play it backwards, you can clearly hear the phrase "Vas is dead, Vas  is dead".
Parent - - By BB (****) Date 2008-02-05 20:17

> The real proof is that if you turn it into a .wav file and play it backwards, you can clearly hear the phrase "Vas is dead, Vas  is dead". [link]


This only worked for me in October when I was in a grassy knoll with black helicopters hovering overhead. :-P It might also work in Atlanta, at least for chess players. [bizarre conspiracy theory link]
Parent - By Nelson Hernandez (Gold) Date 2008-02-05 20:25
Hey, don't knock the black helicopters.  I've seen them.  When I lived in McLean, VA (4 miles down the road from CIA) all kinds of choppers used to fly by, some with the presidential seal, some black, some military, some with NWO or ZOG stenciled on the side.
Parent - By noctiferus (***) Date 2008-02-05 20:29
After that, I can well imagine Vas touching wood  :)
(funny, italians inthis case touch iron!)
Parent - By billyraybar (***) Date 2008-02-05 21:39
LOL
Parent - - By ozziejoe (**) Date 2008-02-03 20:42
If the ideas in strelka/rybka are trivial, why does it perform so much better than other engines? . Could it be that anthony has missed something, or underestimates the importance of what he is looking at.?

Why do we even need an "opinion" of the quality of the code when we have an objective measure: engine strength?  What anthony says is simply unrelated to reality.  If the ideas in rybka where just "old"  and trivial, then you would have seen 100 engines at rybka's strength.

Maybe the secret is in the implimentation of the ideas (e.g., bug free code). But even this is nontrivial, because again, objectively rybka does better then the 1000 or so other uci engines (and better than the programs that have been professionally developed for over a decade).

Let me translate what anthony is saying to golf: Tiger woods is nothing special, because he uses the same golf clubs and same strokes as everybody else. there is absolutely nothing new in tiger woods game.

best
Joseph
Parent - - By Anthony C (*) Date 2008-02-04 02:58
I have to thank you for your post, because it exposes a completely
different viewpoint than my own, and yours is probably much more
common.  One of the interesting things about computer chess is that it
is both a scientific and competitive endeavour.  As a graduate
student, I am emphasizing the former.  As a user and computer chess
enthusiast, you are emphasizing the latter.

In academia, novelty rules all.  It is much better to be novel than to
have great experimental results.  The goal is to add to the knowledge
of the human race, and the key question is "What can we as a community
learn from this?"  You will have to take my word for it that the
source code of Strelka simply does not contribute to the state of the
art in building computer chess engines.  Consider Fruit: you can take
the new ideas from Fruit, add them to an existing chess program, and
make it better.  Just about everyone uses late move reductions of some
sort nowadays.  There is no equivalent in Strelka, no magical
techniques that could be added to a new chess program and give it 100
elo.  There is no equivalent to null move, history pruning, or even
singular extensions.  For two years, whenever things were becoming
boring computer chess enthusiasts could always debate about what
Rybka's secret was.  In hindsight, Vasik must have been quite amused
by those discussions, because Rybka's "secret" was simply to be super
optimized to 3 or 4 times faster than everything else.

There is only one thing that is really interesting about Strelka: that
a search is still important.  A lot of people (myself included, under
the dark school of Diepeveen) thought that 16-18 or at most 20 ply was
sufficient for any position.  Obviously, we were wrong.  This is why I
said earlier that I thought Vasik went in a different direction (at
least from me).  While I was trying to improve the evaluation, he was
trying to optimize his program, and since Rybka gets about 4-5 more
plies than Zappa, his 5 ply search, especially in quicker games,
proved better than my heuristical evaluation.

Anyway, from an engineering standpoint Strelka (rybka) is fantastic.
It packs most of the search and evaluation of Fruit/Crafty into
something almost 3 times faster.  For those of you that have never
written code, 3 times is quite a bit, considering that Fruit and
Crafty are already reasonably well coded.  Rybka is even faster than
the ultimate beancounter, Fritz 5, while having a lot more under the
hood.  And this is why I said that Strelka was more a source of
motivation than insight, because I had never really taken the time to
optimize Zappa, persuaded by the standard CS doctrine of "don't waste
your time trying to improve the constants on an exp-time algorithm".

What this all boils down to is the difference between interesting and
useful, two qualities that are often unrelated.

I think that is about the best I can do to try to explain my
viewpoint.  I personally have had a life long love affair with science
and engineering.  I find it amazing that they work at all.  Imagine
the zillions of quantum interactions involved when we say that the
American dollar has lost value against the Euro!  That we can build
models for such complex phenomena is preposterous - and yet those
models work down to 20 decimal places.  This is why I am trying to
earn my PhD: I would like to make some small contribution to science.
Of course, I don't expect everyone to have this view.  I don't think
this particularly bothers Rajlich; he seems a pure competitor who wants
to win games and keep himself in twinkies. Just because it isn't
scientifically interesting
(or wasn't in 2005.  Perhaps he has done something interesting
since then) doesn't make Rybka any less of a great chess program.

cheers,

anthony

P.S. I don't consider Zappa to be particularly interesting from a
scientific standpoint either.  This is why I don't work on it that
much.

P.P.S. I think this scientific attitude is what gets Hyatt in trouble in this forum as well.

P.P.P.S. Yes, I am sort of arrogant.  We all have these little flaws, but it is considered courteous not to point them out.
Parent - By ozziejoe (**) Date 2008-02-04 04:06
Hi Anthony,

thanks for your considered response. That makes sense to me (i am an academic too, though not in computer science)

best

J

ps. I never said you were arrogant
Parent - By JhorAVi (***) Date 2008-02-04 04:40
Hi Anthony,

   It's nice to know that you're back working with Zappa again. It keeps computer chess competition alive.  I'm sure you got tons of ideas by now.  Please surprise this chess community by releasing a 150+ ELO stronger Zappa soon!

regards :)
Parent - - By Silvian (***) Date 2008-02-04 08:03
Simple you are THE GREAT !
For me at least !
Regards,
Silvian
Parent - - By Carl Bicknell (*****) Date 2008-02-04 09:07
His viewpoint on rybka is very revealing - I knew several people who thought early Rybka prototypes were VERY fast.

However, it can't JUST be that - many programs are weaker on a quad than Rybka is on a single cpu.
Parent - - By Luckybaer (**) Date 2008-02-04 11:48
Rybka is even faster than the ultimate beancounter, Fritz 5, while having a lot more under the hood.
Here he expresses that it is fast AND has a good eval/more chess knowledege, or whatever makes an engine great in addition to its search.
Parent - By Carl Bicknell (*****) Date 2008-02-04 13:41
yep, thank you, I saw that but making a comparison with Fritz 5 doesn't reveal too much: all the top progs have 'more under the hood' than it does.

Things would get really interesting if he spent lots of time speeding Zappa up.
Parent - By Anthony C (*) Date 2008-02-04 14:43
Of course not.  Rybka, like everyone else, has its special tricks.  However, if we assume it gets 80 elo from being 3x faster than everything else, than on CCRL 40/40 Rybka 2.3.2a would be rated 3040, which is basically where all the rest of the quad engines are, plus or minus some noise.

anthony
Parent - - By Uri Blass (*****) Date 2008-02-04 10:00
Even if strelka is better only because it is designed to do things faster then it still has things that programmers can learn from her.

I understand if you are not interested in making the same thing faster but I think that other programmers can learn from strelka's code how to do it.

Uri
Parent - - By BB (****) Date 2008-02-05 19:53

> Even if strelka is better only because it is designed to do things faster then it still has things that programmers can learn from her.
> I understand if you are not interested in making the same thing faster but I think that other programmers can learn from strelka's code how to do it.


I agree with this in principle, but only to a point - if you are really interested in speed, assembler (for critical code) can be noticeably faster than C. There is also the issue of whether decompiling and recompiling is truly lossless in speed (if the original used ASM code, the chance of this being a worry is increased). Finally, there are some C-level micro-optimisations that might be useful in some programmes, but not in others.

I'll give a Strelka code snippet as an example:
  for (mask = Board->mp[WhiteBishop]; mask != 0; mask &= (mask-1))
  { sq = first_one(mask); WH_BISHOP_CODE(sq);}


This is (diagrammatically) Strelka looping over White bishops, with some code executed for each one. We can first note that this loop can be partially unrolled, as (typically) there is a maximum of 2 White bishops. Thus I claim that
  if (mask) {WH_BISHOP_CODE(sq); CLR_BIT(sq,mask);
             if (mask) {WH_BISHOP_CODE(sq); CLR_BIT(sq,mask);
                        while (mask) {WH_BISHOP_CODE(sq); CLR_BIT(sq,mask);}}}

should give (slightly) better branch prediction, at least when doing a search from a position where White has 2 bishops. Whether or not the final code is faster could depend on matters such as pipelining, the size of the WH_BISHOP_CODE vis-a-vis the instruction cache (especially as POPCNT is currently 20-odd instructions), branch-prediction-counter interaction, etc. In general, you have to test it and see. [At the ASM level things can be controlled a bit better, but it's still trial-and-error in the end].

Secondly, for an ASM level attempt at a nano-improvement, the CLEAR_BIT(square,mask) as implemented by Strelka via   mask&=(mask-1)  can instead use Bit Test and Reset (BTR) [this could be tricky with 32 bits, but   mask&=(mask-1)  also requires care in that case]. As above, this is just an idea, and a generic real-world plug-and-play need not lead to faster code.
Parent - - By Uri Blass (*****) Date 2008-02-05 20:17
I do not know assembler but based on my knowledge assembler was significantly faster than C mainly in the past when compilers were slow.
Today when compilers are fast it is clearly less important.

I believe that the main reason for the speed of strelka is not assembler tricks but good data strucuture.

Uri
Parent - - By BB (****) Date 2008-02-05 20:51

> I do not know assembler but based on my knowledge assembler was significantly faster than C mainly in the past when compilers were slow.
> Today when compilers are fast it is clearly less important.


Much low-level "mission-critical" code is still written in assembler, and can still be 2x faster. An obvious example is the first_one and popcnt in many engines (and other programmes). Beyond such trivialities, the gains rapidly decline, but an additional gain of 20-25% over C code is typically a reasonable hope from a suitable amount of effort. Indeed, the difference between compilers (or even from fiddling with flags) can often be nearly this much. [For instance, compiling Toga with gcc4.2.2 gives me a result that is 10-15% slower than the distributed executable].

One of my favourite recent quotes (from Zimmermann and Dodson in 20 years of ECM) is: Efficient Assembly Code. While using clever high-level algorithms may give a speedup of 10% or 20%, at the expense of several months to invent and implement those algorithms, a twofold speedup may be obtained in a few days, just rewriting one of the assembly routines for integer arithmetic. [There is a footnote, but I would be forced to digress in order to relate its whole history]. Of course, GMP (multi-precision arithmetic) is a very specific processor-intensive case, and most code should not see this type of speedup.
Parent - - By Sesse (****) Date 2008-02-06 04:34
The single reason first_one is written in assembler, is that the functionality for historical reasons is not exposed to C -- it's a bit of a special case, really. (If you're using gcc, you can use ffs() and __builtin_clz() to access the bsr and bsl instructions.) Of course, you could claim that, say, log() is written in assembler too -- it's the same thing, it's just that the standard C libraries don't happen to expose bit counting.

As for possible gains of rewriting code into assembler: It depends a lot on the code, it depends a lot on the compiler, and it depends a lot on how spread-out your "hot" code is. If 95% of your time is spent inside a 10-line loop (this is not uncommon for multimedia tasks), there's a good chance you can spend some time, write careful assembler and get it faster. (Modern compilers are really, really good, though -- what they usually can't handle well is vectorization, and that's where you can still beat them without _too_ much effort. Still, speaking as someone who have just spent two days trying to optimize such an "obvious" loop for not more than 2% practical overall gain, I do tend to be skeptical of claims that just porting it to assembler is easier.) However, counting cycles is such a little part of the entire picture; an incredibly often overlooked effect is that of cache, for instance, and fixing that on the C form is, erm, slightly easier. :-)

Actually, I'm not ready to agree that much speed-critical code is written in assembler at all, unless you have different ideas of either what "much" is, or what set of code we're talking about. :-)

/* Steinar */
Parent - - By BB (****) Date 2008-02-06 11:09

> Actually, I'm not ready to agree that much speed-critical code is written in assembler at all, unless you have different ideas of either what "much" is, or what set of code we're talking about. :-)


Perhaps I spend too much time with multi-precision arithmetic, where getting the 0th level to be blazingly fast percolates all the way to the top, due to the recursitivity.

> an incredibly often overlooked effect is that of cache, for instance, and fixing that on the C form is, erm, slightly easier. :-)


Indeed, when Dan Bernstein visited a group I was working with, we spent most of the time getting a Gröbner basis algorithm to be cache-friendly. Big speedup in the end. Bill Hart and David Harvey have done similar things with polynomial arithmetic in their FLINT package.

> The single reason first_one is written in assembler,


Yes, yes, I agree this is merely a historical C thing.

> that just porting it to assembler is easier


I would never say something this strong. :)  Indeed, inline ASM should be sufficient. Also, I'd guess that even DSP programmers are likely to use a C-like ASM-extension these days.
Parent - By Sesse (****) Date 2008-02-06 17:30
Good to see that we converge to agreement. :-)

FWIW, I've programmed DSPs (my degree is in signal processing), and the only thing written in assembler was the vendor's FFT library. The rest was C, all C.

/* Steinar */
Parent - - By BB (****) Date 2008-02-06 14:12

> spent two days ... for not more than 2% practical overall gain


I'd suspect that there are over-clockers who have spent longer than this for an even smaller gain - and largely only for their own box. :)

>> One of my favourite recent quotes


Now that I think about it, an even better one in this genre (though not assembly versus high level) would be from Dan Bernstein regarding the recent Van Buskirk tangent FFT - it's not in the paper, but when I saw him give a 20-minute blurb about it he said something like: "Recently, the computation of the DFT has seen an algorithmic improvement over the split-radix methods that were described almost 40 years ago. The (asymptotic) gain is about 5%. [Obviously no one should be much impressed, so he turns to a salesman pose]. This might not seem like much, especially when compared to the 10000-times speed improvements in hardware over the same time period, but(!) [extreme irony, but not mockingly so], the algorithmic improvement is largely orthogonal to the hardware advances [this is not exactly true - for instance, Bernstein himself describes how to make it cache-friendly], and so, when compounded with that gain, we can hope to achieve an overall speedup by a factor of around 10500!"

The GMP 4.2.2 comparison ended up 3.7x faster with P4-ASM versus generic, but as I say, this is a really special case.
Parent - - By Banned for Life (Gold) Date 2008-02-06 14:51
I'm a little surprised by your example. If you look at this FFTW page:

http://www.fftw.org/speed/CoreDuo-3.0GHz-icc64/

and scroll down to the most commonly used FFTs (single-precision real and complex, 1D transforms), you will see that the difference between the best algorithms and the median algorithm approaches an order of magnitude. For many detection-theory type applications, these differences are really, really important!

Regards,
Alan
Parent - - By BB (****) Date 2008-02-06 15:04 Edited 2008-02-06 15:07

> I'm a little surprised by your example. If you look at this FFTW page:


Ah, great. FFTW benchmarketing! :) Bernstein had a long dispute with them (back when he maintained it, djbfft was faster than FFTW), as they wouldn't compile his programme in the manner he desired. [My impression is that Bernstein cares more about theory than practise nowadays].

In any case, the quote was meant to exemplify the difference between a pure algorithmic advance and hardware advances - there would seem to be a large middle-ground in implementation.
Parent - - By Banned for Life (Gold) Date 2008-02-06 15:19
No doubt the FFTW team is biased toward (you guessed it!) FFTW, but there are fields where every job starts with benchmarking of a number of DSP primitives on a small set of target hardware and differences of 10X between the best and some of the also rans are not unusual. In cases where more efficient DSP primitives allow a critical portion of an application to stay on a single die, or a single socket, or single blade rather than spilling over, the difference can be even larger.

These exercises tend to be very non-theoretical. The added complexity of having to run critical code on a larger set of hardware costs lots of money and adds significant schedule risk on jobs that are frequently bid fixed price. The FFTW suite normally does pretty well but is avoided if possible due to commercial licensing issues.

Regards,
Alan
Parent - By BB (****) Date 2008-02-07 01:15

> No doubt the FFTW team is biased toward (you guessed it!) FFTW


I am reminded of the apocryphal Soviet reporting of a hypothetical head-to-head footrace: "Gorby takes 2nd, Reagan next to last" - it's all in how you spin it. :)
Parent - - By Sesse (****) Date 2008-02-06 17:27
Again straying offtopic, one of the FFTW authors have a quite good writeup on the FFTW/djbfft benchmarking controversy on the talk page for FFTW on Wikipedia -- basically they acknowledge the screwup with regards to compiler flags, but there's a bit more to it than that. It's an interesting read, and not too long either.

/* Steinar */
Parent - - By BB (****) Date 2008-02-06 22:39

> quite good writeup on the FFTW/djbfft benchmarking controversy


I can ask Dan about it next time I see him, but that could be fraught with danger. :) His operational methodology has frequently been "Do ONE thing - do it VERY well" [the qmail design, particularly its security, shows a similar concept], which is less broad than the FFTW focus. As such, this seems to have led to an inadequate comparison between the two programmes. His comments about their benchmark methodology do not seem incorrect - however, if typical consumers/users find the FFTW metric to be ill-devised, they could simply ignore it. If you were particularly concerned about the special case(s) [which are not uncommon] covered by djbfft circa 1998, you'd probably have used it, or at least considered it. I'd agree with Bernstein that FFTW's Format of Results does leave a lot to be desired - this is why I do not hestiate much when using the pejorative "benchmarketing" moniker to describe much of the content of Alan's link. Johnson's WikiTalk indisputables seem just so - in any case, the feud has long passed, and has not been rejoined.

> Wiki page: I'd never heard of Dan Bernstein until I looked at this page.


:). I first met him playing bridge at a math conf back in 2000. Knew nothing about his (in)famous lawsuit, qmail, djbfft, etc. Has ALWAYS worn black - nothing else. [He actually ran out of black shirts at one conf, and chose to buy a (black) T-shirt at an "Everything a Euro" store].
Parent - - By Banned for Life (Gold) Date 2008-02-06 23:12
Note that in the two benchmarks I referenced (which not coincidently are the most commonly used), Intel's ipps significantly outperforms FFTW (in some cases by 20%). Its not obvious how this could be construed to be be marketing for FFTW.
Parent - - By BB (****) Date 2008-02-07 00:02

> Note that in the two benchmarks I referenced (which not coincidently are the most commonly used), Intel's ipps significantly outperforms FFTW (in some cases by 20%). Its not obvious how this could be construed to be be marketing for FFTW.


Aha! So FFTW puts the most-common near the bottom, but the ones for which they excel are put near the top! Sounds like marketing to me... :-P I guess I'd prefer the raw data from their site as opposed to their graphical representation - "mflops" is simply wrongly named, and I share some misgivings about inverse time and scaling in any case. Sadly, Intel's MKL page does the same. [Maybe I just don't like graphs - they're USA Today material :)]. Somewhat like gcc, one important (and impressive) feature of FFTW is its portability, so it should be expected that it can be beat in a limited venue.
Parent - - By Sesse (****) Date 2008-02-07 00:50
Also, FFTW is great for prototyping. It's there, it's known to be pretty good (quite often, although not always, the fastest option available), and it's incredibly flexible. You can always throw it out later (or buy a license) if you're not writing free software.

I'm proud of having written code that actually beat FFTW in terms of speed (2x on my own machine, more on others), but I was using the GPU, so it wasn't a fair comparison. :-) (It was part of a larger algorithm, but fast DSTs and IDSTs were an essential and time-critical part of it.)

/* Steinar */
Parent - - By BB (****) Date 2008-02-07 01:10

> but I was using the GPU, so it wasn't a fair comparison


A few years back, I was implementing an algorithm to search for points on an intersection of two quadrics in projective 3-space. More down to earth, you have two 4x4 symmetric matrices M and N, and want a (nonzero) integral 4-vector v with both <v,Mv> and <v,Nv> equal to zero. One guy with whom I was working had the brainstorm to offload some/most of the computation to the GPU - after all, we were doing a lot of (presumably streamable) operations with 4x4 matrices on integers of smallish size, and for this, the GPU should be quite fast, even if it is doing everything in floating point. We gave it a bit of thought, but didn't use it in the end - perhaps if we had 53 bits of mantissa with the GPU, it would have been realistic. With the recent FireStream9170, this sort of GPGPU-esque methodology might need a rethink, but it's not exactly my area of expertise.
Parent - By Sesse (****) Date 2008-02-07 02:24
You can fake doubles using two floats, but it's computationally expensive. (You get about the precision of doubles, but only the range of floats.) But yes, the GPU is a very different beast :-)

Overall the GPU was very much worth it for this kind of task. With a fast GPU, the GPU implementation was a _thousand_ times faster than the reference MATLAB implementation on my Athlon 64 3000+... (And that was 64-bit, JITed MATLAB, and almost all the time was spent inside native MATLAB routines, so it wasn't like you can blame all of that on interpretation overhead either.) It depends a _lot_ on the task, though... and it's a very different model from what programmers are used to. I don't know how computationally intensive your stuff was, from the description it sounds a bit like the cost of getting the data to and from the GPU would outweigh the advantage of more processing power.

/* Steinar */
Parent - - By BB (****) Date 2008-02-06 22:50 Edited 2008-02-06 22:55

>>>One of my favourite recent quotes


Not exactly related, but a bizarre quote... at the end of a seminar I gave today, an emeritus professor from Oxford asks in the question period: "I guess I don't understand why you didn't mention dessins d'enfants." Literally this means children's drawings, so I guess I have something to tell the grandkids, as it were ("I once got asked why I didn't talk about children's drawings") [dessins d'enfants is also a technical mathematical term].
Parent - By Sesse (****) Date 2008-02-07 00:44
If you're actually speaking on stuff like group theory, I think your grandchildren will see you as too weird anyway ;-)

/* Steinar */
- only has one course in abstract algebra, would have loved more =)
Parent - - By Vasik Rajlich (Silver) Date 2008-02-08 09:31

> However, counting cycles is such a little part of the entire picture; an incredibly often overlooked effect is that of cache, for instance, and fixing that on the C form is, erm, slightly easier.


I'm not really an expert on this, but did download some of the AMD and Intel manuals a long time ago. It's a huge area, you can spend your life studying it. I can definitely imagine that it's very interesting.

One interesting story about Fritz: apparently, Franz is very good at this stuff, including things like balancing the load _inside_ the CPU. Fritz actually uses more power than other engines (since all aspects of the CPU are constantly in use), and one user had an instability problem which only arose when Fritz was running.

Vas
Up Topic Rybka Support & Discussion / Rybka Discussion / Anthony Cozzie speak !
1 2 3 4 5 6 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill