Not logged inRybka Chess Community Forum
1 2 Previous Next  
- - By Rebel (****) Date 2012-03-18 08:37 Edited 2012-03-18 10:29
Monday, 12 March 2012 20:59

From now on all participating chess engines in CSVN tournaments will be tested, if the organisation think it is necessary, against already known engines with a similarity test.

For this the CSVN will use the similarity test created by Komodo's programmer Don Dailey. The similarity test makes a profile of the chess engine with help of software. These profiles can be compared to determine similarities with other chess engines.

When a percentage of 60% or more is found in the similarity test the tournament organisation will not accept the entry of the chess engine in the tournament. The organisation keeps the right to deviate from this rule.

For a more concise description of the similarity test we forward you to the study of Adam Hair (http://www.top-5000.nl/clone.htm). There you also find a link to the software, which can be freely downloaded.

http://www.computerschaak.nl/index.php?option=com_content&view=article&id=521:gebruik-van-ponder-hit-systeem-voor-toernooideelname&catid=18:vereniging&Itemid=28&lang=en
Parent - - By Uly (Gold) [mx] Date 2012-03-18 09:30

> For a more concise description of the similarity test we forward you to the study of Adam Hair (http://www.top-5000.nl/cone.htm).


Getting a page error, did you mean clone instead of cone?
Parent - - By Felix Kling (Gold) [de] Date 2012-03-18 10:30
corrected :)
Parent - - By Rebel (****) Date 2012-03-18 12:45
Thank you Felix. It's funny, the Dutch page has the right link. I will mail the CSVN folks.
Parent - - By Arrière Pensée (Gold) Date 2012-03-18 13:06
Has anyone tested the accuracy and reliability  of Mr. Dailey's software?
Parent - By Rebel (****) Date 2012-03-18 14:26
I have done some self-similarity to see if I could hide my own origins :wink: and that wasn't so simple.

1. ProDeo 1.74 (default setting of 2012)
2. ProDeo 1.74 (doubled king safety)
3. ProDeo 1.74 (Q3 - tactical engine)
4. ProDeo 1.74 (Rebel 12 settings of 2003)

       1         2         3         4
1.  -----    78.09   48.24  69.74
2.  78.09   -----    51.76  72.32
3.  48.24  51.76    -----   54.56
4.  69.74  72.32   54.56    -----

Even the old Rebel 12 positional settings from 9 years ago shows enough similarities (69%) to link both programs to the same origin and author. Only the extreme settings of the Q3 - tactical engine is able to hide the ProDeo origins (54%), but then these settings play considerable weaker.
Parent - - By Permanent Brain (*****) Date 2012-03-18 13:48
Interesting. - The table of "notable pairs of engines" does NOT include Fruit 2.1 + Rybka 1.0 Beta. :lol: If I got things right, then that means their match of moves percentage must be smaller than 59,37% (R3 <-> H1).

Unfortunately, the similarity spreadsheet in the complete report file does not include Rybka 1.0 Beta.
Parent - By Rebel (****) Date 2012-03-18 14:16
(My) Rybka numbers are here: http://www.top-5000.nl/quick_guide.htm

54% with Fruit 2.1
Parent - By Adam Hair (**) [us] Date 2012-03-21 13:45
The number I have found is 55% to 56%. I did not include the Fruit 2.1/Rybka 1.0 Beta pair in the spreadsheet precisely because it is less than the threshold that I think can be defended as being suspiciously high.
Parent - - By leavenfish (***) [us] Date 2012-03-19 00:25
"When a percentage of 60% or more is found in the similarity test the tournament organisation will not accept the entry of the chess engine in the tournament. The organisation keeps the right to deviate from this rule."

Kind of humorous, don't you think? If someone thought 60% was so appropriate to weed out derivatives/clones (whatever) and then says they keep the right to decided they were off the mark....kind of makes you think the whole thing hasn't been thought out too well to begin with.
Parent - By Banned for Life (Gold) Date 2012-03-19 01:22
Yes. Why have an objective test that you might take exception to once in a while, when you can have a total arbitrary system run by Bob and Harvey? Makes no sense at all! :razz:
Parent - By Rebel (****) Date 2012-03-19 08:56
I have told the CSVN that 50-55 is relative safe, 56-60% you are being tolerant, above that is suspect with the advice not to allow. Eventually it's up to them but I think that sentence should be read in 2 ways:

1. To have the possibility to ask questions first to names with a history that (suddenly) score below 60% after all.

2. To exclude possible false positives. Known names such as Fritz, Hiarcs, Shredder etc. are above all suspicion but if they are suddenly score above 60% questions can be asked first.

This is a new way to fight cloning, I am sure there will be improvements to the system the coming years, but first it needs acceptance by the programmers and tournament directors / organizers. Eventually it's up to them. The main advantage of the system is that there finally is some sort active preemptive doping policy which avoids drama's afterwards.

The CSVN might as well set a new standard for others to follow, time will tell.
Parent - - By Razor (****) [gb] Date 2012-03-19 18:01
The similarity test is in my view pointless and more about PR than it is about 'cloning'.  I mean, scientists have sequenced the genome of the chimpanzee and found that humans are 96 percent similar to the great ape species yet clearly you and I are not chimps even with a 96% similarity!  :smile:
Parent - - By Kappatoo (****) [de] Date 2012-03-19 20:35
If anything, the analogy seems to undermine your case. The DNA sequencing may not show that the species are identical, but they strongly suggest that they have a common origin.
Parent - - By Razor (****) [gb] Date 2012-03-20 05:55
Sorry if my analogy wasn't clear but that was part of my point.  The fact that we know so much about our physical relationship to chimps as we have investigated over time is well documented from a scientific view point.  Yet you and I both know that whilst there is a relationship we don't know how it all started.  This is after all from the chess viewpoint the most important part in the relationship between one engine and another.  Sure I was pointing the 'finger of fun' at the person who claimed 60% to be a reasonable percentage to show correlation existed, however, none of this shows how the two engines arrived at the incredibly high correlation value of 60% {another piece of funny finger pointing from me!}

Just like the natural events that have occurred over time that force an 'evolutionary' sequence of events, i would expect the same to occur in chess engine development, i.e., something like LMR comes along and before a year is out, everyone has LMR and so on.  Chess after all has a finite number of possibilities and an even smaller number of 'best' moves for any given position and so checking to see if engine A plays the same move as engine B and saying that where this occurs more than 60% of the time must mean that one is based on the other is ridiculous - it would be like putting the entire animal kingdom in a forest and setting light to the forest and seeing what the behaviour of all the animals would be - yep, they would all move away from the flames and I suspect we would see a significant correlation in this move {a lot higher than 60% - couldn't resist} so using the CSVN approach all the animals in the forest would be seen as clones of each other.

Does this help?
Parent - - By Kappatoo (****) [de] Date 2012-03-20 10:01
I should say in advance that I have really no opinion about the usefulness of this similarity detector. My point was that our DNA leaves very little doubt that humans and chimpanzees have a common ancestor. (I hope you agree with that.) And if two chess engines have a common ancestor, then one is a derivative of the other.

Your second analogy is about behavior, and thus in this respect more to the point:  Similar behavior is then the analogue of similar move choice. Common DNA, however, rather seems analogous to common code.
Parent - - By Banned for Life (Gold) Date 2012-03-20 15:23
And if two chess engines have a common ancestor, then one is a derivative of the other.

This is a logical fallacy. You are ignoring the possibility that both engines are derivatives of a third engine that you may not even know about. The rest of your argument is fine.

This is often seen in relating correlation to causation. I probably wake up consistently about nine hours later than you do every day, yet even though the correlation is nearly perfect, there is no causation whatsoever. (I like this example because it also shows that a high correlation to something that occurred earlier is also not indicative of causation).
Parent - - By Kappatoo (****) [de] Date 2012-03-20 15:55
You are right, that sentence went wrong. It is even worse than you say, for if they have a common ancestor, then none of them is a derivative of the other - rather, they are both derivatives of the ancestor.
Parent - - By Razor (****) [gb] Date 2012-03-20 18:00
I have replied here because you and 'Banned for Life' are still missing one key point {yes I agree with the other points you and 'Banned for Life' made} and that means I only have to type this once!  :smile:  That is, if we don't know how we got to this point, then accusing someone of cloning, just by the way something behaves in a similar way to something else for 60% of the time {which is why I included the great ape 96% similarity in the first place!} is just ridiculous.  If one wants to ban someone or some engine from a tournament based on this method of detection then as you would expect, innocent parties will be impacted.
Parent - By Kappatoo (****) [de] Date 2012-03-20 18:05
I don't disagree with this.
My complaint against your analogy was supposed to target the ape part and not the engine part: It is not 96% behavioral similarity between chimps and humans, but rather genetic similarity, which seems even more telling.
Parent - - By Banned for Life (Gold) Date 2012-03-20 18:31
You are pointing out that the proposed approach isn't perfect, which is certainly true, but that isn't the correct frame of reference. Rather than comparing to perfection, a better approach would compare the proposed approach to the existing system. In both cases the hypothesis will have some probability of detection and some probability of false alarm, and the systems can also be graded on where they are on a scale of completely deterministic to completely arbitrary. I would argue that by each of these metrics, the proposed system will score considerably better than the current system.
Parent - - By Razor (****) [gb] Date 2012-03-20 19:17
OK, here is something completely opposite to think about - I believe that the only rule that should apply for any engine entering a competition is that it can indeed play chess.  I mean that it understands all the rules of chess and abides by these rules.  A chess engine after all is just an inanimate object that has no malice or evil silicon substrate in its atomic structure; all it has ever been told to do {in a very artificial way it has to be said} is to play chess.  Trying to judge the morality {or anything else for that matter} of the programmer or programmers that created this chess engine by looking at the output alone is, in my view, both ridiculous and pointless.

A tournament organiser can of course use any entry/exit criteria they like for engines playing in their tournaments and if this turns out to be Don's similarity tester then so be it.
Parent - - By OleM (**) [no] Date 2012-03-21 22:22
As a chessplayer you can actually enter any open tournament regardless of how well you know the rules. By your criteria all Rybka versions that don't do bishop underpromotion would not be allowed, because your wording requires it to know all the rules.

Apart from that extreme case, with more precise wording, I actually basically agree. I want to see which entity plays the best chess, and that's it.
Parent - By Razor (****) [gb] Date 2012-03-22 05:48
Don't mistake ability {Bishop under-promotion} with rules.  There is no chess rule that states you MUST promote a pawn to a Bishop; only a rule that says you can.  The fact that Rybka can {try it for yourself - place Rybka in infinite mode - play for both sides and promote to a Bishop - see if Rybka prevents you from doing this} again proves that it allows this; it just has no ability to do this!  :smile:
Parent - - By michiguel (**) Date 2012-03-22 23:23

> The similarity test is in my view pointless and more about PR than it is about 'cloning'.  I mean, scientists have sequenced the genome of the chimpanzee and found that humans are 96 percent similar to the great ape species yet clearly you and I are not chimps even with a 96% similarity!  :smile:


That is a good analogy to prove the opposite of what you imply. The point is not to prove that chimps are not human, the point is to prove that chimps are more related to humans than rabbits. that is exactly what this procedure does. For instance, you do not prove that Loop is fruit, you prove that Loop is more similar in terms of move selection to the fruit family than any other engine available. When cluster trees with bootstrapping are done, the similarity or lack thereof becomes more evident. Particularly when two different people runs the same experiment with thousands of different positions, and you still get the same relationship.

Miguel
Parent - - By Razor (****) [gb] Date 2012-03-23 06:15
I don't believe so Miguel,

Our behaviour {NB: the similarity tester checks the behaviour of chess engines} is different to Chimps and if we were to simply look at the behaviour of apes to humans we would not have known that there was around 96% similarity of our genome pool.  Checking the behaviour of something {as I have given in other examples on this thread} does not confirm cloning/copying/or whatever else we want to call it.  In fact, if you think for a moment about any given chess position, there will be a tendency for chess move selections to become more similar as we start to compare engines of similar knowledge.  Whilst all of this is fascinating perhaps, what it doesn't do is tell us anything about the actions the programmer or programmers took in developing their software.
Parent - - By Lukas Cimiotti (Bronze) [de] Date 2012-03-23 07:06
If you look at the behavior of rabbits, apes and humans you'll find out human behavior has more similarities with apes' behavior than with rabbits' behavior. This is also what you would expect when looking at genome pools.
Parent - By Razor (****) [gb] Date 2012-03-23 19:30
Indeed Lukas, all living things on this planet have something in common.  Effectively as I suspect you may know, there are four basic elements - A, T, C, and G which make up our genes.  The diversity of our planet's living organisms comes from countless different combinations of these four basic elements.

The similarity between humans and other living things is especially strong when we look in detail at the genetic makeup of organisms such as apes or rabbits and of course mice, with which we share a common mammalian ancestor.  For these species, both the number of genes and the way in which they are combined are very similar to ours.

Just to look at this from a different perspective for a moment, on the surface, humans seem very different from each other.  Indeed, we are all unique {I'm sure you would agree with this} but we are also very much the same!  Genetically speaking, the most that any two people differ from each other is only 0.02%.  So, almost 99.9% similar yet so very different!  And here lies the fundamental problem of using something that measures behaviour via this similarity tester and taking the output data and extrapolating in some way based on a % similarity, and concluding from this whether we suspect 'foul play' {or not}, without any knowledge whatsoever on what has caused this behaviour.

I remain convinced that this is a ridiculous approach to be taking and believe innocent parties will be impacted by the results of this method.
Parent - By Adam Hair (**) [us] Date 2012-03-23 17:04
High similarity of move selection does not prove anything. But, it does warrant further investigation. The possibility of that copying (literal or non-literal) took place is higher if engines tend to choose the same moves more often. Sure, engines with similar knowledge tens to pick similar moves. But, the data seems to show that engines with possibly similar ideas but different code do not choose the same move 60% to 70% of the time (given this set of positions).
Parent - - By turbojuice1122 (Gold) [us] Date 2012-03-20 20:47
How would you test something like Hydra, Deep Blue, or the Rybka Cluster?
Parent - - By Rebel (****) Date 2012-03-21 01:01
Add UCI or Winboard to it :grin:
Parent - By Uly (Gold) [mx] Date 2012-03-22 19:39
I think the point turbo was trying to show is that:

1. - Clones can still protect against such measures by being locked remotely.

1. - Excluding such entities from tournaments is not satisfactory as chances are high they're the strongest ones around.
- - By rocket (***) [se] Date 2012-03-20 17:59

>"When a percentage of 60% or more is found"<


This is very harsh I would say such figures could be measured without any cloning at all. Depends on the nature of the positions, if they are complex, drawish e.t.c
Parent - - By Rebel (****) Date 2012-03-20 18:36
I wrote an utility to investigate the forced nature of the 8238 positions, see the below output. It seems that on 24 of the 8238 positions all 99 engines chose the same move, 682 positions had 2 moves, 906 positions 3 moves....... all the way up to 1 position with 20 different moves :eek:

1.   24 (0.29%)
2.  682 (8.28%)
3.  906 (11.00%)
4. 1024 (12.43%)
5.  945 (11.47%)
6.  993 (12.05%)
7.  864 (10.49%)
8.  741 (8.99%)
9.  605 (7.34%)
10.  434 (5.27%)
11.  373 (4.53%)
12.  252 (3.06%)
13.  165 (2.00%)
14.  114 (1.38%)
15.   47 (0.57%)
16.   39 (0.47%)
17.   18 (0.22%)
18.    7 (0.08%)
19.    4 (0.04%)
20.    1 (0.01%)
21.    0 (0.00%)
22.    0 (0.00%)

I am not a statistic but I feel that in a system where a few percent may matter the positions with 1 and 2 solutions (24 and 682) need to be replaced or deleted.
Parent - - By Banned for Life (Gold) Date 2012-03-20 19:34
Excellent! But now you need a statistic for repeatability as well. The most useful positions will of course be those where engines give different moves in a repeatable manner. On the other hand, if repeatability on a position is poor, different move outputs may signify nothing.
Parent - - By Rebel (****) Date 2012-03-21 01:00
Not sure what you mean with "repeatability", perhaps you can rephrase your question ?

I do have something like this, the statistic per position, too easily find the "forced" ones.

Results for SIMILARITY_99.DATA

1. (11)
2. (9)
3. (5)
4. (7)
5. (5)
6. (5)
7. (2)  --> out
8. (6)
9. (8)
10. (14)
11. (5)
12. (5)
13. (11)
Parent - - By Banned for Life (Gold) Date 2012-03-21 01:26
Finally, some engines (notably Rybka 3 and later, Robbolito/IvanHoe, and Houdini) show surprisingly low self-similarity (~75%) when comparing multiple runs of the test. The typical engine shows 95+% self-similarity.

This line is copied from Adam Hair's web site. What Adam calls 'low self-similarity' I call 'poor repeatability', i.e. what are the statistics for the same engine with the same position and the same time? The test would work best if all engines came up with the same answer, but as pointed out, this is not the case.

This factor has a different effect on engines that are far apart than it does on engines that are close together. Engines that are far apart, will move closer together, while those closer together will move farther apart.
Parent - - By Lukas Cimiotti (Bronze) [de] Date 2012-03-21 07:10
Isn't this just a question of mp / sp engines? (I didn't look) I expect sp engines to give always a deteministic result whilst mp engines should give different results.
Parent - - By Banned for Life (Gold) Date 2012-03-21 07:26
Of course most SP engines give deterministic results, but apparently not komodo which has a deterministic setting in its UCI controls.

In any event, I didn't see the test was restricted to SP engines (if it was, Adam's comment would only have held true for komodo), and some MP engines are apparently more non-deterministic than others (led by Rybka and other engines inspired by Rybka).
Parent - - By Lukas Cimiotti (Bronze) [de] Date 2012-03-21 07:49 Edited 2012-03-21 13:20

>but apparently not komodo which has a deterministic setting in its UCI controls


That's interesting. I never looked at Komodo. So it must use some random numbers - maybe the idea was taken fom Rybka's randomizer?
Parent - - By Labyrinth (****) [us] Date 2012-03-21 12:45
Only Rondo can use a Rondomizer :razz:
Parent - By Lukas Cimiotti (Bronze) [de] Date 2012-03-21 13:20
typo corrected
Parent - By Banned for Life (Gold) Date 2012-03-21 14:19
Random numbers are one possibility. It is also possible (but unlikely) that Komodo uses sleeping threads and the order they are put the ready to run list by the scheduler affects search results.
Parent - - By Adam Hair (**) [us] Date 2012-03-21 13:23
There is always information that you realize you forgot to include :red: :smile:

All engines were tested with one core. SIM03 makes use of configuration files, where you can specify the values of UCI options.

At one point, I thought that the low repeatability for some of the engines was related to the positions found in SIM03. I changed the positions and the effect seemed to go away. But I found a mistake. I forgot to change the total number of the positions that I used as a replacement (my set was >10,000, whereas the original set contains 8238 positions). The number is used in calculating the percent of matches. When I corrected my mistake, the relatively low percentages returned.

For most engines, the repeatability was high (>95%). For some, like Tornado, the repeatability was 80% to 90%. For Rybka 3 & 4, Komodo, all Robbolitos, and Houdini (along with Critter, IIRC), the repeatability was relatively low (70% to 75%).
Parent - - By Banned for Life (Gold) Date 2012-03-21 14:15
I'm not sure what to make of this. SP Rybka has always been deterministic and repeatable if you ignore factors that have nothing to do with the engine (i.e. running tests with a constant number of nodes should always give equivalent results). I guess that the low repeatability you saw was due to different number of nodes being performed on different tests, due to extraneous factors related to other processes running on the machine. If you think there is another explanation, I'd love to hear it...
Parent - - By Nick (****) [gb] Date 2012-03-21 14:39
Looks like Don's similarity tester works on time rather than nodes -- even depth might be better for SP repeat-ability.

Sim03 tester readme:


similar.exe is program designed to test the similarity of any two UCI
chess programs.   

It is a program designed to be run from a command shell.  It is not a
graphical utility.

When the program is run a data file is created in the working
directory.  Because of this, the program should always be run from the
same directory.

Invoke the program with no arguments to get usage - and also a catalog
of all programs measured.  

To add data from a new program called "myChessProgram", run it like
this:

    similar.exe -t myChessProgram

This assumes "myChessProgram" is in the PATH and can be executed and is
a UCI chess program.

The program can be run at different times which can be specified in
milliseconds.  The default is 100 which is 1/10 of a second per
position.  However, we could run it at 1/2 second like this:

    similar.exe -t myChessProgram 500

To get a report, you must specify a program to be reported on by id.  When
you run the program without command line arguments you will get a list of
all programs with their "id" on the left column:

  1) Komodo64 1.2 JA (time: 100 ms)
  2) RobboLito version 0.084 (time: 100 ms)
  3) Houdini 1.5 w32 (time: 100 ms)

So to report on RobboLito, do this:

   similar.exe -r 2
Parent - By Banned for Life (Gold) Date 2012-03-21 15:15
Thanks Nick. I was too lazy to look. :surprised:

I never tested R4 or R4.1 for repeatability on number of kn per second on a very low overhead machine, but I've always had the impression that the variability is higher than other engines. I've always attributed this to Vas' heavy dependence on Microsoft solutions for things like memory management...
Parent - - By Adam Hair (**) [us] Date 2012-03-21 16:41
The sim utility uses time to determine when to ask the engine for the best move. But, it can be set up to use "go nodes" or "go depth" (it will not send the next position until it receives the engine's best move). The problem, of course, is that depth and nodes mean different things to different engines. So, thinking time (adjusted to try to cancel out the effects of search speed on move selection) is seemingly more reliable for comparing different engines.

I have no idea as to why there is relatively low repeatability for Rybka 3. Extraneous factors were kept to a minimum, and I believe any such factors would affect other engines unrelated to Rybka 3. Yet, other engines give highly repeatable results, including Rybka 2.3.2a and earlier. The short amount of thinking time given to the top engines surely has some effect on repeatability (20ms to 40ms per position, depending on engine strength), but IIRC, Stockfish does not exhibit low repeatability. Moreover, the low repeatability still existed when I used 2 seconds per position.

One thing that I have not done is create a logfile for Rybka 3 and see if something funny is going on during the test. I have done this for other engines, but not with Rybka 3. The best moves recorded for it at the beginning and end of the test do not look strange, but perhaps some positions causes problems (though I got similar results from a different set of positions).

I do not know how well this will display here, but here is some data that I generated from my initial testing with SIM03. All of the data can be found here, http://talkchess.com/forum/viewtopic.php?t=38772&postdays=0&postorder=asc&highlight=pairwise+analysis&topic_view=flat&start=40 .

sim version 3
------ Crafty 23.4(1) (time: 149 ms scale: 1.0) ------
96.38 Crafty 23.4(8) (time: 149 ms scale: 1.0)
96.35 Crafty 23.4(10) (time: 149 ms scale: 1.0)
96.32 Crafty 23.4(7) (time: 149 ms scale: 1.0)
96.27 Crafty 23.4(5) (time: 149 ms scale: 1.0)
96.22 Crafty 23.4(4) (time: 149 ms scale: 1.0)
96.18 Crafty 23.4(6) (time: 149 ms scale: 1.0)
96.16 Crafty 23.4(2) (time: 149 ms scale: 1.0)
96.15 Crafty 23.4(9) (time: 149 ms scale: 1.0)
96.12 Crafty 23.4(3) (time: 149 ms scale: 1.0)

sim version 3
------ Houdini 1.00(1) (time: 19 ms scale: 1.0) ------
75.88 Houdini 1.00(4) (time: 19 ms scale: 1.0)
75.21 Houdini 1.00(8) (time: 19 ms scale: 1.0)
75.19 Houdini 1.00(2) (time: 19 ms scale: 1.0)
75.15 Houdini 1.00(9) (time: 19 ms scale: 1.0)
75.07 Houdini 1.00(6) (time: 19 ms scale: 1.0)
74.98 Houdini 1.00(7) (time: 19 ms scale: 1.0)
74.65 Houdini 1.00(3) (time: 19 ms scale: 1.0)
74.63 Houdini 1.00(10) (time: 19 ms scale: 1.0)
74.57 Houdini 1.00(5) (time: 19 ms scale: 1.0)

sim version 3
------ Houdini 1.5(1) (time: 10 ms scale: 1.0) ------
70.04 Houdini 1.5(3) (time: 10 ms scale: 1.0)
70.02 Houdini 1.5(6) (time: 10 ms scale: 1.0)
69.90 Houdini 1.5(10) (time: 10 ms scale: 1.0)
69.87 Houdini 1.5(4) (time: 10 ms scale: 1.0)
69.74 Houdini 1.5(7) (time: 10 ms scale: 1.0)
69.74 Houdini 1.5(5) (time: 10 ms scale: 1.0)
69.63 Houdini 1.5(8) (time: 10 ms scale: 1.0)
69.59 Houdini 1.5(9) (time: 10 ms scale: 1.0)
69.45 Houdini 1.5(2) (time: 10 ms scale: 1.0)

sim version 3
------ Loop 2007(1) (time: 166 ms scale: 1.0) ------
99.38 Loop 2007(7) (time: 166 ms scale: 1.0)
99.37 Loop 2007(6) (time: 166 ms scale: 1.0)
99.36 Loop 2007(4) (time: 166 ms scale: 1.0)
99.33 Loop 2007(5) (time: 166 ms scale: 1.0)
99.31 Loop 2007(8) (time: 166 ms scale: 1.0)
99.30 Loop 2007(10) (time: 166 ms scale: 1.0)
99.21 Loop 2007(2) (time: 166 ms scale: 1.0)
99.09 Loop 2007(3) (time: 166 ms scale: 1.0)
99.08 Loop 2007(9) (time: 166 ms scale: 1.0)

sim version 3
------ Naum 4.2(10) (time: 40 ms scale: 1.0) ------
99.83 Naum 4.2(6) (time: 40 ms scale: 1.0)
99.82 Naum 4.2(9) (time: 40 ms scale: 1.0)
99.82 Naum 4.2(4) (time: 40 ms scale: 1.0)
99.81 Naum 4.2(3) (time: 40 ms scale: 1.0)
99.77 Naum 4.2(5) (time: 40 ms scale: 1.0)
99.73 Naum 4.2(2) (time: 40 ms scale: 1.0)
99.65 Naum 4.2(7) (time: 40 ms scale: 1.0)
99.47 Naum 4.2(8) (time: 40 ms scale: 1.0)
99.47 Naum 4.2(1) (time: 40 ms scale: 1.0)

sim version 3
------ Robbolito 0.085g3(1) (time: 19 ms scale: 1.0) ------
76.90 Robbolito 0.085g3(10) (time: 19 ms scale: 1.0)
76.44 Robbolito 0.085g3(4) (time: 19 ms scale: 1.0)
76.06 Robbolito 0.085g3(5) (time: 19 ms scale: 1.0)
76.05 Robbolito 0.085g3(7) (time: 19 ms scale: 1.0)
75.92 Robbolito 0.085g3(3) (time: 19 ms scale: 1.0)
75.86 Robbolito 0.085g3(9) (time: 19 ms scale: 1.0)
75.86 Robbolito 0.085g3(6) (time: 19 ms scale: 1.0)
75.63 Robbolito 0.085g3(8) (time: 19 ms scale: 1.0)
74.93 Robbolito 0.085g3(2) (time: 19 ms scale: 1.0)

sim version 3
------ Rybka 3 (1) (time: 21 ms scale: 1.0) ------
76.84 Rybka 3 (10) (time: 21 ms scale: 1.0)
76.20 Rybka 3 (2) (time: 21 ms scale: 1.0)
76.00 Rybka 3 (8) (time: 21 ms scale: 1.0)
75.78 Rybka 3 (7) (time: 21 ms scale: 1.0)
75.76 Rybka 3 (9) (time: 21 ms scale: 1.0)
75.76 Rybka 3 (3) (time: 21 ms scale: 1.0)
75.70 Rybka 3 (5) (time: 21 ms scale: 1.0)
75.70 Rybka 3 (4) (time: 21 ms scale: 1.0)

sim version 3
------ Stockfish 2.1(1) (time: 18 ms scale: 1.0) ------
96.92 Stockfish 2.1(7) (time: 18 ms scale: 1.0)
96.48 Stockfish 2.1(9) (time: 18 ms scale: 1.0)
96.48 Stockfish 2.1(5) (time: 18 ms scale: 1.0)
96.47 Stockfish 2.1(6) (time: 18 ms scale: 1.0)
96.44 Stockfish 2.1(4) (time: 18 ms scale: 1.0)
96.43 Stockfish 2.1(8) (time: 18 ms scale: 1.0)
96.43 Stockfish 2.1(2) (time: 18 ms scale: 1.0)
96.42 Stockfish 2.1(3) (time: 18 ms scale: 1.0)
96.42 Stockfish 2.1(10) (time: 18 ms scale: 1.0)
Parent - By Banned for Life (Gold) Date 2012-03-21 21:08
It is well known that on many occasions R3 and R4 will take vastly (sorry for the pun!) different times to get to the same point in its processing, but many have assumed that this was somehow associated with MP, either because of search luck or some sort of bug. Using SP very rarely (only when deterministic behavior was required), it never occurred to me that this same behavior might pop up in SP mode.

But rest assured that SP is completely deterministic (maybe with an exception for when sampled search is being used). If this is taken as a given, it seems the testing procedure is suboptimal when it substitutes a high percentage of self-divergence rather than the underlying totally deterministic behavior. This is true for many other engines as well.

Possibly one way to get around this might be to log the number of nodes computed during the first timed test, and then compute that same number of nodes in subsequent tests. This assumes of course that this UCI command is well implemented...
Parent - - By Rebel (****) Date 2012-03-21 08:38
The issue of "self-similarity" indeed is a matter that needs attention and research although I don't know how for the moment and I can only speculate:

1. Some engines add small randomness to the evaluation to escape from "learning opponents". I remember a quote from Amir (Junior) it was his way to deal with repeated book opening lines.

2. Related to (1) engines may have some sort of learning themselves. Either Adam did not turn it off, or SIM03 offers no such possibility, engines may use learning no matter what setting.

It's all pure speculation.

I will ask Adam for the needed data files to look into the matter.
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill