Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / High Reflection Tourneys
- - By Kamsky [us] Date 2007-01-12 15:12
I'm just curious if anyone has done any tests with Rybka against other top engines when the programs have been allowed more reflection time, like say an hour or so per move. Most of the tourneys I've seen people post are shorter ones, but I'm curious in this area since I've not recalled anyone report them results. Of course the number of games would be alot less and would take along time to finish and get any good stats to go by. Higher reflection times would of course allow each engine to report stronger moves, in theory anyway.
Thanks
Parent - - By Nelson Hernandez (Gold) [us] Date 2007-01-12 15:56
Mark Lacrosse is the expert in this topic, but he concluded that in terms of measuring engine strength the time control is not really that important a variable.  Obviously engines play more strongly the longer they have to calculate, but there is little tangible evidence that some engines make better or worse use of the extra time in relative terms.  This observation, which he supported with statistical tables, exploded years of mythology regarding the need to let machines play at long time controls in order to get a more "fair" appraisal.  What this means inferentially is that an engine that makes x% sub-optimal moves in 3 seconds will make a steadily lower, but relatively comparable, number of sub-optimal moves in 3 or 30 minutes.  I nominate his study for "important new analytic insight of 2006".
Parent - - By Vasik Rajlich (Silver) [hu] Date 2007-01-13 10:14
FWIW - you can definitely do things on the engine side which will make an engine better at blitz.

Vas
Parent - - By Nelson Hernandez (Gold) [us] Date 2007-01-13 14:35
Again, every engine plays better at longer time controls than at blitz, but the question is...how does it perform relative to other engines on a sliding time-scale.  Lacrosse showed that if you play a set of engines at 1+1 and at some longer control, the rankings don't change that much.  I.e. the level of play rises more or less the same across all engines.

Now, obviously engines are not all the same in their internal dynamics; some are more efficient than others.  But in broad terms he showed the differences weren't really that measurable.  As the world's leading expert in these things, what do you think?
Parent - - By PCMorphy72 (**) [it] Date 2007-01-13 19:03 Edited 2007-01-13 19:07
This is exactly the subject of a post for CEGT forum that I had prepared a week ago (unfortunately I'm still waiting they activate my account there...).
Before to report the full post I quote something from it:

[...]
I think that for an ideal future engine the rating gap doubling CPU numbers i.e. from 256 to 512 CPU will be reduced as the rating gap between 256 to 512MB in the hash table, therefore no more than 7 Elo points (in the past it was estimated ~10 Elo points), while the gap in doubling time control could remain in supposition about 15 Elo points (also until time controls involving depths in the order of 30 ply).
This argument was already discussed in the past since 1997, and before in several inconclusive experiments by Andreas Junghanns and others (I quote: [All these comparisons inevitably led to speculative extrapolations of the graphs which Levy characterized to the point as the "meta-science of prediction in computer chess'' in his latest article about the subject in 1997]), but a serious test by Ernst A. Heinz in 2000 (New Self-Play Results in Computer Chess), demonstrated that "frodo" thought badly (A most recent confirms in 2003 and 2004 are in this papers:
http://csn.umit.at/staff/eheinz/Self-FUp.pdf
http://www.st.ewi.tudelft.nl/~renze/doc/ICGA_2005_4_DeepSearch.pdf)
The supposed 15 points gap can be calculated in a very difficult way for a particular engine (naturally it can be different than 15 for each engine). In a simulated calculation of the 15 points gap for Zap!Chess!, considering [...]
The rest (where I describe my methods and mathematical reasonings...) is too much complicate and rather off-topic here...
However don't forget that collecting blitz games are much less useful than longer time control games:
[I quote again from my post]
I don't think that lower time-controls ratinglists are more useful than largest, simply because the accuracy in the rating calculation is  proportional to the square root of the number of games: 10000 games (for each engine) are sufficient, after which the "real" accuracy (in terms of quality of the game) is improved by more time control [...]
Parent - By Vasik Rajlich (Silver) [hu] Date 2007-01-15 13:19
This is a slightly different issue. But yes - you could argue for example that if you made a rating list of all engines at some absurdly long time control, the ratings would all tend to be more compressed than in the existing rating lists.

Vas
Parent - - By Vasik Rajlich (Silver) [hu] Date 2007-01-15 13:12
Actually, I haven't tried to quantify these things. It's really hard to do that without running actual experiements.

Probably, a quick study of long-TC rating lists and short-TC rating lists would give a pretty good indication.

Vas
Parent - - By PCMorphy72 (**) [it] Date 2007-01-15 21:08
Well, for technical reasons I can't become a member of the CEGT Forum, so I've decided to shift here that pack of arguments (I'm sorry if perhaps it betray a little bit of criticism towards you, Vas, but I always begin in forums carried away by enthusiasm).
That subject (40/4 or 40/2 rating list?) was about the advisability of using faster time controls in rating lists.
I would have liked to reply with such post:

My answer is: 600/40

Hi all,
I want introduce me to the forum and especially to you Heinz, with a question:
Are you interested in extrapolation?
Yes... extrapolation... in order to preview future observations...
I'm sure that all of you are interested in engines tournaments and I know that you aren't novices in statistical analysis if you publish such rating lists, but lately here I've seen increased time control (the new 40/120) and its rating comparisons with 40/20 (in Replay Zone), very interesting from my point of view, but also for Heinz, I think, by reading his first enthusiatic post on it . So I've decided to put this question.
I haven't never seen serious subjects on statistics in correspondence chess, whereas the very long time control represent for me the last frontier for the chess engines, also in the view of human-machine challenge.
Many chess programmers don't think so. Vasik Rajlich on his forum says:

>What prevents Rybka's eval from being bigger isn't CPU time, it's that I like to have some semblance of order in there


But they are programmers, not always good statisticians.
I'm also interested in computer chess programming, but at present I don't working on the differences between increasing time control and i.e. the improvements of chess knowledge/algorithms performance/multiprocessing optimizations. Currently I'm working hard merging several rating lists (similarly to Walter Eigenmann's work) to find the precise way to calculate the Elo increment by time control increasing, practically speaking a more precise way to interpret the gains in your comparisons, and Heinz seems interested when he says:

>Zap!Chess not surprisingly is the engine gaining the most profit from 2 CPU´s and tournament time control.
>[...]
>You will see that Zap gains 36 ELO with long time control while all others do not vary by more than +- 22 ELO, what is not so much when you bear in mind that we have some engines included like SpikeMP 1.2 Turin, Rybka 2.1c mp or Deep Junior 10 with relatively few games.


Let me explain a difference instead between increasing time control and increasing CPU numbers in SMP, because perhaps many people can think that after all they are the same thing and therefore the performance of an ideal engine in supercomputer with 512 CPU would be similar to the performance by waiting 500 more times hours in a normal PC.
Even though it's partially true for 2 CPU versus a single CPU (where the virtual-time gain factor is not nearly 2 but about 1,6, mainly owing to a 15% duplication of nodes in the hash table, see here), it's not the same thing for a sufficient number CPU working in parallel (SMP).
If a process require that A result from B, and B result from C, and C result from D and so on for 100 times, this process would require the same time in 1 or 500 CPU.
I think that for an ideal future engine the rating gap doubling CPU numbers i.e. from 256 to 512 CPU will be reduced as the rating gap between 256 to 512MB in the hash table, therefore no more than 7 Elo points (in the past it was estimated ~10 Elo points), while the gap in doubling time control could remain in supposition about 15 Elo points (also until time controls involving depths in the order of 30 ply).
This argument was already discussed in the past since 1997, and before in several inconclusive experiments by Andreas Junghanns and others:

>All these comparisons inevitably led to speculative extrapolations of the graphs which Levy characterized to the point as the ''meta-science of prediction in computer chess'' in his latest article about the subject in 1997


but a serious test by Ernst A. Heinz in 2000 (New Self-Play Results in Computer Chess), demonstrated that "frodo" thought badly (A most recent confirms in 2003 and 2004 are in this papers:
http://csn.umit.at/staff/eheinz/Self-FUp.pdf
http://www.st.ewi.tudelft.nl/~renze/doc/ICGA_2005_4_DeepSearch.pdf)
The supposed 15 points gap can be calculated in a very difficult way for a particular engine (naturally it can be different from 15 for each engine). In a simulated calculation of the 15 points gap for Zap!Chess!, considering i.e. a 14 Elo points gap from your comparison between 40/120 ratings and 40/20 (the 36 ELO gained by Zap!Chess, cited above, actually became more accurate in 14 especially with the indications of Deep Junior and Spike, another two conceptually hardware-independent engines for long time controls) and considering also the gap from the "previous-order" comparison beetween 40/20 and 40/4, that would be ~8 for Zap!Chess. However, with blitz games inside, such last comparison and the calculated gap of 8 isn't so statistically significant as the 40/120 to 40/20 (what do you think about a 40/600 ratinglist?): however, the difference between the two gaps is 14-8 = 6. This precious six points indicate the closeness of the engine to an ideal engine (theoretically these gaps differences would be always negative if we take only close-to-ideal engines in comparisons, due to Heinz's proved diminishing returns), and adding half of these points (half because from 40/120 to 40/20 we double two times the time control) to a precalculated Heinz theoretical value of 12 (it was estimated for Fritz 6 until a 11vs12 ply of search depth comparison, but could be slightly different for each engine) we get 12+6/2 = 15, that is the 15 Elo points gap supposed above.
To find delta(Elo)/delta(ln(t)) for an engine and its convergence of delta(Elo)/delta(ln(ln(t))) is my goal actually. I don't pretend to find the technological-curve for each engine and the mentioned difference CPU/RAM doubling vs time/clock freq. doubling (7 Elo points gain vs 15) is not a considerable difference nowadays, but it should be larger with CPU working in serial (pipeline), as nowadays FPGA card tryes to do for CPU in Hydra (a real chess-dedicated pipeline in CPU is a different thing, see this to understand the current limits of Hydra).
Chrilly Donninger isn't a fine and deep connoisseur of chess as Vasik Rajlich, but he is a good statistician, therefore perhaps he caught the importance of get the strenght of the engine turned to parallelized-independent: this is what I think, although his Hydra actually use 64 CPU and perhaps furthermore in future. I also think that he would be happy if Rybka knowledge been in his FPGA cards, but actually Vas want only implement his chess knowledge by himself, he don't care on the "only" 5 or 10 times speedup of the FPGA. I think it's to early for a programmer to follow this citation from Junghanns's work:

>There will come a time when the performance gains for the additional search effort are small. Programmers will then have to resort to other means for improving performance, such as additional evaluation function knowledge.[...] Chess programmers will have to invest more time on their knowledge if they want their program to improve in a significant way


I want to add at this citation "And when the main hardware-dependent variable will be CPU clock frequency" (that probably will be expressed in THz instead of actual MHz/GHz).
What do you think about this prediction? (or extrapolation, if you want :-) )
However... Why this question here?
The inspiration has been gave me by the inversely proportional problem involved in Wolfgang Battig reply to Uri Blass subject on decreasing continuously time-controls (I think in order to have more games), and also if I'm italian and I don't speak english very well, I've posted all this ideas with a translator assistance.
I don't think that lower time-controls ratinglists are more useful than largest, simply because the accuracy in the rating calculation is  proportional to the square root of the number of games: 10000 games (for each engine) are sufficient, after which the "real" accuracy (in terms of quality of the game) is improved by more time control (moreover other factors deteriorate the accuracy in nowadays blitz games: some long time is needed by computers in endgame management and in tablebase search, and other reasonable time for other "sequential" (pipelinable?) procedures as loading large amounts of nodes and/or filling the hash table, starts/autoconfigurations of other sofware modules, ...). No matter to know exactly how much imprecise is an engine, we want to know how much probabilty has the engine to bring the truth (of Chess) to light.
So, as I have explained, a solution for more accuracy could be a 40/600 rating list, but with actual number of testers, and the disinterestedness of the correspondence chess world in such tests, I think I must only to wait that chess will become a mass phenomenon in 2067, like Beatles in 1967 (attention: I haven't predicted this :-) )
Bye Heinz
Bye everybody and good 2007!
Parent - By Vasik Rajlich (Silver) [hu] Date 2007-01-18 11:34
This looks like an interesting post. I'm in a bit of hurry and can't give the response it deserves. Just a couple of points:

1) There are two reasons why one engine may do (relatively) better with more CPUs: the multi-processor implementation is better, or the algorith scales better.

2) There is a lot of nonsense written about computer chess, so take everything with a grain of salt. (Except from me of course :))

3) The main tradeoff in computer chess is programmer time vs (X), where (X) is everything. :)

Vas
Parent - - By Banned for Life (Gold) Date 2007-01-13 19:24
Vas,

You imply that if you were building an engine solely for "fast" time controls, you could make choices that would improve the results. In another set of posts, you mentioned that you already make some changes when there is very little time left on the clock. Any idea how much (in Elo) these changes could improve results? Do you foresee a point where the estimated available time per move could influence the way you do search and evaluation aside from just affecting the search depth? This might imply different settings for "fast" time controls, "normal" time controls and infinite analysis where "fast" and "normal" would obviously change over time as hardware progresses.

Regards,
Alan
Parent - By Vasik Rajlich (Silver) [hu] Date 2007-01-15 13:17
First, what I'm saying is really more of an opinion than some sort of an established fact. It just makes sense to me that certain thing will work better at blitz than at longer time controls.

For example - making an engine faster (via optimization) should work better in blitz, while improving an engine's selectivity should work better in long time controls. You could say that an optimization is applied once to each branch of the tree, while improved selectivity is applied recursively and will be more important when the tree is bigger.

And yes, in theory, you could have different approaches for different time controls. None of us really have the time for that. In practice, when you make a change, it's just good to think about how it might behave in the long-term at a longer time control (and not just in some blitz games that you might be playing).

Vas
Parent - By Vasik Rajlich (Silver) [hu] Date 2007-01-13 10:15
Here are the highest-search-resources games that I am aware of:

http://www.husvankempen.de/nunn/Replay/replay.htm

Vas
Up Topic Rybka Support & Discussion / Rybka Discussion / High Reflection Tourneys

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill