Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Does 2.3.2a beat 2.3 in engine/engine games?
- - By hebroots (**) Date 2007-07-24 17:51
I am running my first set of tests using Fritz interface.  I am running 10'/40+10'/40+10'/40...format games.

The same exact parameters and hash size are used on each engine, and the games alternate between white and black for each engine.

So far on 5 games, I have seen 4 draws and 1 WIN by the 2.3 engine....

I thought 2.3.2a was decisively better, am I missing something?  Is there some better "benchmark" I should be trying?
Parent - - By Svilponis (***) Date 2007-07-24 18:08
5 games? This is definitely too few to make any statistics and conclusions. Run 1000 or 10000 games and tell us then what is the result.
Rybka v. X vs Rybka v. Y games are not so good indicators. You should run multiple tournaments of Rybkas vs many other strong engines.
Parent - - By Felix Kling (Gold) Date 2007-07-24 18:47
I think Rybka vs. Rybka is ok, if you play more games you will see that 2.3.2 is stronger.
Parent - By Svilponis (***) Date 2007-07-25 10:31
I think, that engine games with only one opponent are not so good indicators of strength because there might be such occasion, that one engine has programmed (deliberately or not) to use some special weaknesses of the opponent. And tournament between those two might not reflect the actual chess playing strength. For instance, if Vasik develops special anti-Rybka engine, then it might easily win Rybkas, but not so easily (or maybe not at all) other engines.
Parent - - By hebroots (**) Date 2007-07-24 20:34
My Fritz 7 engine was beating several other engines I downloaded from the Fritz website. Then the Rybka 2.3 engine seemed to beat Fritz easily.

I do not see the point of comparing Rybka 2.3 to others because it is claimed to be the strongest...but I am sure that depends on all the settings involved on any engine.

I will just try 2.3.2a out there in "the field" and see what happens.  I guess I was thinking that 2.3.2a would beat anything at all times, aside from any operator errors.....

just what does "the strongest" mean?..if you don't win in the game you are playing, all the statistics for 10000 other games won't matter
Parent - - By turbojuice1122 (Gold) Date 2007-07-25 00:13
You can't use that logic because by that logic, you would have to say that Rybka 2.3.2a is not stronger than Fritz 5.32 in spite of being rated probably about 450 elo points higher on the same hardware.

When you ask, "which is the strongest engine", the only objective statement is which engine, if you were to play a huge amount of games, would have the highest rating.  Rybka 2.3.2a will occasionally lose to Fritz 5.32 because there are some small things in each engine that cause Fritz 5.32 to be stronger after the opening--but this might happen only once out of 50 or once out of 100 games--but it will always happen.  When I was at my peak as a human player some years ago (probably somewhat over 2200 FIDE, though now I'd be very lucky to be over 2000), you would expect that, out of 1000 games, I would get possibly a few wins (and more draws) against Kasparov--that's just the nature of things--but he would still beat me in the nine hundred and something other games.  I certainly hope you would conclude that Kasparov is the stronger player, even though he would not beat me in every game.
Parent - - By billyraybar (***) Date 2007-07-25 01:49
It would be interesting to test engine strength with no opening book.  It seems to me that would reveal the strongest engine (or version) in the purest sense.
Parent - - By turbojuice1122 (Gold) Date 2007-07-25 02:56
This is a long process, but one that I began undertaking earlier this summer.  The results will be interesting, but they do not reveal the strongest engine "in the purest sense"--they simply reveal which engine doesn't play the opening so weakly as to counteract the great middlegame play.  Most programmers have intentionally not written many good or complex algorithms for opening play because they know that the opening book will control this aspect of the game and thus the elo gains are to be found with improving middlegame and endgame play.  Anyway, thus far, it is looking like Rybka and Hiarcs are leading, but this is still in the early stages--there are currently 24 programs competing with blitz time controls, and the top 12 will then go on to rapid time controls, with the top 6 or 8 competing at tournament time controls (which in my case will probably be 150'+120").  The tournaments are double-round robin.
Parent - - By hebroots (**) Date 2007-07-25 03:57
Turbo, what you say makes perfect sense about the having highest elo rating, and that any person or computer can lose a game here and there.  Also good comments on the opening book plays.  In my somewhat limited experience with chess engines, in playing games of 10 min or less, I have tried to play both with and without the opening books.

It quite often happens that I emerge from the opening period of moves with a better point score NOT using the book vs using the book.  However, for the short games that costs a big price in opening time and in depth on subsequent moves.  Using the book gets me out of the opening phase much quicker but with a somewhat neutral score, which hopefully will improve later on.  However, nothing more boring than the next 10 moves at +/- .02 or so. 

Further complicating things is that many of those opening book moves, after being studied for many years by hand, in my opinion would have a longer range perspective over many moves (like 20 or more), than a chess engine in a shorter game just calculating the best possible score over 6 or 7 moves. So in a game using an opening book, one might hope that the benefits of the book may pay off later in the game.  But that is just speculation on my part.  I think I do better overall using the opening book, as I have been nailed enough times in the middle game with some crummy "search depth 8 or 9" moves.  That happens when the computer seems pressed for time.

One technical question, if you don't mind, is how do engines calculate the point score that one is ahead or behind?  That seems to be such a dramatic difference from engine to engine.  When 2 engines are calculating a point score for a given move, why is there such a big difference?  Sometimes the white engine thinks that it is ahead, but the black engine thinks that it is ahead (both using absolute score)!  But usually both engines agree on which side is ahead, but not always.
Parent - By j_futur (**) Date 2007-07-25 06:39
Hi all,

I was too curious to know about this. I've based my test on 2.2 mp, 2.3.2 mp and 2.3.2a mp versions. So I saw that the "mp" version seems better than others and the 2.2mp was my best before !

This was the result : (Based on a lot of match and tournament - 350 games in all - 30'+0 and 5'+2)
PIV 3.2GHZ /2Go RAM/ 256Mo Hash - Fritz 10 opening and GUI - No Tablebases

Then .....

v2.3.2mp is always the strongest ! (on match and tournament) - Takes always the 1st place.
v2.3.2a had some difficulty to win against the two other engines.

My conclusion :

If you don't have core2Duo or Core2quad PC, v2.3.2mp seems better for you.
v2.3.2a is the strongest compared to all but you must use it on core2Duo or Core2quad (also AMD X2) PC
if you wish to have the great performance of it.

But I realy confirm that 2.3.2a mp is my best now!!

I think that will help you.

PS. When you do an engine match, use the (good and) same opening for having a reversed color! With this, I think 100 different games will be sufficient.

In my opinion, when you take part in a tournament, you won't have 1000 games there to judge that are the best. So if you are really the best, you will be able to solve a lot of positions but not only one or two.

I accept that rybka even the last version, as other engines, is not able to solve some positions but if you choose a good opening book like Fritz10 or RybkaII for your test, hundred games will show to you 90% of success !

Good day./FUTUR

Parent - By billyraybar (***) Date 2007-07-25 08:41
Turbojuice, I understand that authors haven't put much effort into programming for opening play.  However I think that should change.  Regardless, I am interested in the results of your test. 
Parent - - By Henrik Dinesen (***) Date 2007-07-25 06:53
The main problem with the no-book approach, is doubles. When the engines have no way to mark moves played in prior games, they'll surely repeat themself under the exact same conditions, with the exception, that mp-engines may create slight varition, and engines with learn-files and their likes active, might vary their play after a number of games.

This leaves 2 possibilities, when disregarding learn-features existing in some engines, assuming identical conditions:

1) Each engine play each other twice, one game with each color, and even with a huge number of engines, the result will still be a small sample of games
2) Lets each engine play with an own book containing all possible moves, for say move 1 and 2 (black and white), of course no human preferences is made, and no statistics is present, let the engines mark the moves as they play.

Approach 2) is not "no-book", but the closest you can get when you want a larger sample of games while trying to avoid too many doubles.
Then take it a step further, make new books, put in all possible moves for black and white till move 20... let a bunch of engies play a few hundred thousands of games at tournament TC, and vupti, each engine has now formed it's own repertoire, ready to challenge the known GM-theory... ;) *cough*
Parent - By billyraybar (***) Date 2007-07-25 08:45
Yeah, I suppose that is a problem.  I wonder what the probability of a duplicate game, let's say up to move 35, in an engine vs engine match.  
Parent - - By Linus (***) Date 2007-07-25 10:04

>It would be interesting to test engine strength with no opening book. 

That is possible with testset books. E.g. Nunn Testsets 2 is a collection of 25 different openings, which are considered to be almost equal for black and white (although I do not agree with that 100%). To eliminate white/black disadvantage you just let the engines play each position two times with alternating colors. Then you get 50 games without book influence.
Parent - By Henrik Dinesen (***) Date 2007-07-25 11:09
I don't see how that differ much from books: You still feed the engine with human selections, and even less choices than a book provide, to say it mildly.
Testsets is good when you want a quick peek into an engines way of handling some preselected positions, and you may get a good idea about the way they handle certain positions. But - these positions is still chosen by somebody.
Parent - By Michael Waesch Date 2007-07-25 16:35
Use no opening book with the same time controls and you end up with the engines playing more or less one and the same game over and over again.

Parent - - By George Tsavdaris (****) Date 2007-07-25 11:04

>When you ask, "which is the strongest engine", the only objective statement is which engine, if you were to play a huge amount of games, would >have the highest rating.

Why this is the only objective?
This is subjective too.
Perhaps he defines the strongest engine between A and B as the one that wins ALL games they play each other....
That is a perfectly normal and correct definition.

Your definition is much more useful, but we can't say it is more correct or objective....
Parent - - By turbojuice1122 (Gold) Date 2007-07-25 12:42
I think that a natural requirement in a definition for "strongest" is that it at least allow for some player to be the strongest.  God, who omnisciently has access to a 32-man tablebase, should be considered as being stronger than Rybka or Zap on one of Suj's machines--say, Zap running on one of the 64-bit machines.  However, I am quite certain that, out of 1000 games, God would have to give up at least a few draws (but won't lose any!) to Zap in such a situation--there will be some games that Zap plays few enough mistakes not to lose.  However, we know that in predictions of what will happen, nothing in the Universe is absolute, and so in defining "strongest", we must take this into account and define it in a way that the word can correctly be used in conversation.  My definition isn't subjective--that's why I said, "a huge amount of games" (which, when used in mathematical or physical arguments, means some number that might approach infinity)--this makes any conclusions ones that can, at least in principle, be verified independently and will be verified with the same results.  Of course, I'm naturally assuming that we have allowed for an element of randomness instead of requiring one engine to play lines that always lose and another engine to play lines against it that always allow it to win.
Parent - By hebroots (**) Date 2007-07-25 13:39
well, you folks made a great discussion out of my "seed" of a question...anybody have any comments on how engines calculate point values for moves, and why each engine is so different?

Parent - - By SR (****) Date 2007-07-25 15:41
No one can prevent you in making our own private definitions, but please do not expect us to use them.  Your alternative "definition" that A is stronger than B only if A always wins against B 
has the serious flaw that no one can be stronger than a randomly playing monkey.   You claim that the definition is "pretty normal and correct" is incorrect since it follows from the definition that Kasparov is not stronger than a randomly playing novice. There is a possibility that a monkey typing randomly on a typewriter reproduce the complete work by Shakespeare, and there is a probability that a novice playing random moves beat Kasparov in a game. Theoretically your definition is absurd.

On the other hand to claim that the only objective definition of who is strongest of two chess playing units is who has the highest rating (if they played a huge number of games) is of course
an over claim. As a start there are slightly different notions of rating and it has been argued that Professor ELO's  (that is based on the normal distribution) not is the most accurate. It might also be argued that a program that never make any mistakes and always find a forced win (play perfect) is stronger than a program that makes mistakes and do not always find the correct moves.
There is no doubt that chess playing units  that plays "perfect" can vary hugely in how well that score against humans.  If the "perfect" playing unit plays ridiculous moves (except the moves are not mistakes) I suppose even a beginner might be able achieve a draw in most games, and the perfect chess playing unit would have a rating very similar to the opponents it happens to have played.
Parent - - By turbojuice1122 (Gold) Date 2007-07-25 21:33
On the strongest being the one with highest rating, that is why I keep including the phrase "after a huge number of games"--that point is essential, as it is only as the number of games grows very large that Professor Elo's result will converge with other good mathematically objective results.
Parent - - By George Tsavdaris (****) Date 2007-07-25 22:07

>that is why I keep including the phrase "after a huge number of games"--that point is essential,

That point is a no-point. "huge number of games" is indefinite.....
You should give a value to "huge"....
Parent - By turbojuice1122 (Gold) Date 2007-07-26 02:00
That's the point--it's indefinite--"huge" always means, when referring to numbers mathematically or physically, numbers that are so large that any substantial changes in such numbers will have absolutely no effect on the outcome.  In our case, a million would probably serve such a purpose, and a hundred thousand probably would also--you can probably figure out, if you wish, how large must be the number of games such that the criterion for huge would be satisfied and the Elo rankings would converge with any other sort of objective mathematical rating.
Parent - - By SR (****) Date 2007-07-26 11:24 Edited 2007-07-26 11:27
Please read the last part of my post where I refute the idea that Prof Elo always will produce results that are "objective".  In the thought experiment I provide we notice that the hypothetical chess program essentially will have identical rating to the average of its opponents. Thus the same program would have 2000 were it to play mainly 2000 players, but an identiacl copy of the program that were playing 3000 programs would have a rating of 3000.  Of course when it come to human chess I completely agree with you: Elo seems to be a very accurate measure that (I am tempted to say) measure chess strength so well, that we in practive use the ELO as a definition of Chess strength!
Parent - - By turbojuice1122 (Gold) Date 2007-07-26 19:10
In your previous post, you mention the idea of playing "ridiculous moves except for moves that are not mistakes"--keep in mind that in reality, a mistake is something that changes the outcome of the game--changes a previously won position into a drawn position, a previously drawn position into a lost position, or a lost position into one that loses quicker.  Of course, there are exceptions, as long as the outcome of the game isn't changed.  This makes your point on that somewhat moot, I think.  Also, it's incorrect to say that a program mainly playing 2000-rated players will have an elo of around 2000 and also the same program playing 3000-rated players will have an elo of around 3000.  If the program plays 2000-rated players and has an elo of 2000, that means that it achieves a 50% score against that population.  There is absolutely no way that a program that loses half of its points against 2000-rated competition is going to get anything more than a single-digit percentage score against 3000-rated competition.  It's possible that programs against 2000-rated competition might have somewhat different ratings than the same programs against 3000-rated competition, but the difference certainly isn't going to be 1000 points.  For example, the program Tornado might have a rating of, say, 2200 against 2000-rated competition and perhaps 2400 against 3000-rated competition due to occasionally getting lucky and getting draws, and the program Rybka 1.0 Beta might have a rating of 2700 against 2000-rated competition (due to occasionally unluckily giving up draws or missing attacks) and 2850 against 3000-rated competition, but it would be a mistake to suggest that either of these programs would have ratings similar to their competition.  Perhaps you were really intending to suggest something else, but the argument I'm addressing is how it appears to me on the screen.
Parent - - By SR (****) Date 2007-07-26 23:02 Edited 2007-07-26 23:05
A 32 tablebase program that always find the fastest win if a win exist, will never lose a game. However my point is that if the program - as a thought experiment - deliberately plays ridiculous  moves (always keeping a drawn position drawn) a 2000 player will have no problem with virtual certainty to draw the game.  If the program plays white, it might open with 1.a4 (assuming this holds), then a reasonable player might answer 1.- e5 after which the program might play 2.Ra2 (that might theoretically still be a draw).
The point is that if a perfect program wants to "help" the opponent as much as possible, I think any moderately decent player will only loose with very small probability (when he for example happens to make a clear blunder). My point was - and I agree the thought experiment is rather extreme - that perfect play (in the game theoretic sense) does NOT in general imply a high rating. In the extreme case the rating of a player might (within a certain range) reflect the rating of the opponents rating.

Though it is rare, I have seen examples of human players (drawing specialists) with a similar tendency. I knew a quite weak player who nevertheless would play a very high percentage of draws against players rated 1500, but almost play the same high percentage draws against players rated 1900.   So in the range (1500-1900) his rating probably was not really well defined since it depended to much on the opponents rating he happened to be playing.
For strong chess programs i think one should be prepared that one might see a similar tendency (especially when some programs might become drawing specialists). These programs might have a rating similar to the opponent with rating in the range 2900-3300 since the programs are a very hard to beat, but on the other hand are not that good in drumming up complications they can exploit.
Parent - - By turbojuice1122 (Gold) Date 2007-07-26 23:47
Okay, I see what you mean now.  However, the conclusion still isn't quite correct--but in your situation, the spread of ratings between playing against 2000-rated competition and 3000-rated competition will be a bit larger than if the program wants to maximize its winning chances.  However, even with 2000-rated competition, if the program plays "ridiculous" moves, there will be a number of positions in which only one move by the opponent truly keeps the draw, and occasionally the opponent won't find this move, changing the position into a won position for the perfect program.  Naturally, we would have almost all draws with the 3000-rated competition.  I would guess that in a case such as the one you describe, the rating against 2000-rated competition would be in the range of 2400-2600, while the rating against 3000-rated competition would be in the race of 3000-3100.
Parent - - By SR (****) Date 2007-07-28 08:57 Edited 2007-07-28 09:16
Yes, there will be positions where only one move holds the draw, but these will in my view most often be recaptures. It looks like our view of chess somehow differ.  I think an 1500 player would easily draw against a hypothetical helpful 32-tabelbase.  The key issue here I think is how leisurely and advantage one player can have while the game still is theoretical drawn. You seem to think that it is a rather thin line, so the human who have the nominal better position have to be very careful. My judgment is that the situation   is different and that one player even in the opening or middle game, can have an advantage that is not enough to win, but where the game is "play to one goal" since the opponent has absolutely no realistic chances winning (if no attempt is made).

Have a look a positions in tabel base situations that are drawn. For example a drawn position in Rook+bishop against Rook or even more striking Rook,Knight  against two knights.

If you have Rook and Bishop (or Rook and Knight) against Rook (two Knights) in a drawn position you have a very easy game with zero danger of loosing. What it seem to me you are saying is that maybe there will be positions where only one move draw, but I hope you can see that this concern at least is absurd to have for the person with the advantage in the two endgames about. Don't  tell me you a afraid of loosing Rook+bishop against Rook!

Another illustration:
Consider the Rook+Knight endgame against two knights and assume in this example that you have the two knights. Assume that the two armies a far apart and yours 3 pieces are on one side of the board while the computer/tb  has the 3 pieces on the other side of the board. Then (playing a bit around with the tabelbase) you will notice that the position (according to tb) is a "clear" drawn. You as a defender can make any reasonable move, it will remain a draw. If the attacker with the Rook+Knight  just move around near the boarder, the games stays drawn almost independently of how you play (this represent cooperative behavior from the tb). If on the other hand the tb, tries to make progress by centralizing  the pieces etc  you will soon arrive in a position where it become completely impossible for any human player to defend. What you will find (try this) is that one reasonable move  might loose in 205 moves, another in 186 moves while a third move keeps the game drawn. Even if you manage to play the move thats holds the draw, in your next move you often have a similar hard choice.  What such a experiment shows is that tb+good strong moves in drawn positions in some cases produce a VERY, VERY
strong player, while tb+deliberate weak moves in drawn positions leads to a player with a rating more or less identical to the opponent. My original claim (that I think is highly plausible) is that these phenomenon also apply to the 32-piece tb

P.s. Sorry my long  response, but since my original highly plausible view was challenged I decided to elaborate on my view... 
Parent - - By turbojuice1122 (Gold) Date 2007-07-28 12:53
Oh, I can definitely say, with 100% certainty, that your claim would not be true for a 1500-level player.  Analyze any game between two 1500-level players, and you see that it typically goes back and forth between which side has what seems to be a won position.  It also happens in games between 2000-level players, though there are some games between such players where it doesn't happen.  Having analyzed enough games among players of that level, I would say it happens often enough to give a "helpful 32-man tablebase" at least 400 elo points--keep in mind that there are many positions in such games where the tablebase only has one or two possible moves to hold the draw, and each is often of a very subtle nature and requires a careful reply--that's just the nature of chess.  A well-known and much-discussed fact is that games up until around 2200-level are usually decided by tactics (or have the capability of being decided by tactics)--some people say it's closer to 2000, while others say it's around 2400.  When games are decided by tactics, even a "helpful" 32-man tablebase will win the vast majority of the games.  Moves to which you refer as "silly" are often the ones that have the greatest possibility of leading to highly tactical positions.
Parent - - By SR (****) Date 2007-07-28 14:54 Edited 2007-07-28 14:59
It seems you do not "get it". Sorry I tried my best...
Parent - - By turbojuice1122 (Gold) Date 2007-07-28 15:09
I understand exactly your argument and what you're saying--I know for certain you are quite incorrect in thinking that a 1500-rated player will always find drawing moves when playing against a perfect playing entity who seeks to draw by making "ridiculous" moves, and I believe you're quite incorrect in thinking that a 2000-rated player would be able to do this, also.  It is possible that your argument would be a bit more valid if you wish you compare groups of 2500-rated players versus groups of 3000-rated players--in that case, I think you have a valid point (though not a useful one, as you already know).
Parent - - By SR (****) Date 2007-07-28 15:35 Edited 2007-07-28 15:46
Please explain why your argument is not valid in the 6 piece end game I discussed above. Players of 1500 or 2000 are notoriously weak in end games. Would your argument also apply to the end games discuss above? If not what do you think is the difference?   Maybe, you are talking about US rating which is grossly inflated. In my terminology  a 1500 player plays (blitz) at a level similar to a 1500 player in the internet chess club. Weak, yes absolutely, but the players moves resemble proper chess and the 1500 player does not give away pieces unless on a little pressure.  [Incidentally, my rating is 3000 against Rybka (rated 3400) on my PC.  I used to have 2300 ELO and play board 1 on Oxford graduate team  so I am not complete idiot (though I often feel like it) when it come to chess]
Parent - - By turbojuice1122 (Gold) Date 2007-07-28 15:59
Sure--actually, this is quite easy.  In the 6-piece endgames, the level of complexity (which is proportional to the natural logarithm of the number of possible positions in the analysis) is far, far smaller and so the probability of a bad player going awry on a particular move is far smaller--when complexity is lower, the percentage of moves in a drawn position that are losing moves is also lower, in general.  Of course, there are positions for which this isn't the case, such as the opening position (which doesn't have much in the way of complexity compared with a typical middlegame position).  As for analysis, upon which my argument is based, I haven't actually analyzed the play of 1500-level players in 6-piece endgames--such analysis is a waste of time for me, and such endings are rare for 1500-level players anyway, as their games are usually completely decided in the middlegame, and the endgame is just cleanup work.  Furthermore, in 6-piece endgames, an under-2200-rated player who is going to be able to hold a draw against a "benevolent 32-piece tablebase" will also probably be able to hold the draw against Kasparov, and such is certainly not true in an entire game of chess.  Also, of course, in an entire chess game, the distance to "inability to mate" is much farther than in a 6-piece endgame, so the probability of a player screwing up royally is much higher, and that is assuming that the complexity is always the same as in a 6-piece endgame.  The fact that this assumption is obviously wrong makes things even more critical for the low-rated player.
Parent - - By SR (****) Date 2007-07-28 16:31 Edited 2007-07-28 16:41
Of course the 1500 player will NOT hold a draw against Kasparov. This is way out of the realistic possibilities of a 1500 player, even if he starts with a quite comfortable middle game.  The crux of my argument is that a benevolent player hooked up with a 32-piece tb, have a easy task of "helping" the weaker player. Notice that the players might not even get to the middle game:

1. d4, d5 2.c4, e6 3. Nc3, Nf6 4.Bg5, Be7 5.Bc1 (??), Bf8 (!) 6. Bg5, Be7 7. Bc1 (??), Bf8 8.Bg5 draw

White might strictly speaking speaking not have made any mistakes (since the resulting position after 5.Bc1 is drawn), but please do not tell me it requires much rating to play the black pieces.

The example is of course absurdly irrelevant from  practical chess and chess programming, however my point is that I think that "psychological" elements plays an important role in high level chess programming and part of the art is to drum up complications (e.g. unbalanced positions) that increase the chances of success. My examples are extreme cases, but I think less extreme examples are relevant.

For more relevant examples see my review in Chess Base.  I was told that the article I wrote at was highly praised by Kasparov and that Kasparov had very similar views. In fact I was told that he though that Chess Base should have given my review an even higher profile. 
Parent - - By turbojuice1122 (Gold) Date 2007-07-28 20:14
I had mentioned the Kasparov comparison to show that you can't make an extension from results of benevolent 6-piece tablebases to those of benevolent 32-piece tablebases because of the fact that a low-level player able to hold the draw in the endgame situation will also usually be able to hold it against Kasparov.  In an actual game, just as he would lose to Kasparov immediately, there would still occur many positions of sufficient complexity against a benevolent 32-piece tablebase so as to require deep, precise play--my belief is that, in general, these positions would occur noticeably more, not noticeably less, than in games with which we're more familiar.  Unless, of course...

Unless we allow for the two players to work together to produce a draw by third repetition.  I knew that the discussion would eventually come to this, and I didn't address it earlier because this is something different altogether--this is a situation in which a technicality in the rules is what fixes the game result, a technicality that in many situations, including the example you gave, really has nothing to do with the actual game of chess (though there are obviously situations that we see all the time in which it is best for BOTH sides to repeat).  You might just as well have said, 1.Nf3 Nf6 2.Ng1 Ng8 3.Nf3 Nf6 etc.  I think the discussion is really only worthwhile if we make the provision that the position must be "played out" to where technicalities such as intentional third repetition aren't allowed unless it happens to be the one move that holds the draw.  You might think this sounds silly, but we are talking about something very, very different if we allow for intentional third-repetition.

I think that referring to your article is kind of changing the subject somewhat, but I will comment that as soon as I hit the link, I remembered the article, remembering that I was glad even at the time you wrote it and that it came as a necessary rebuttal to the ridiculous claims made by the Crafty analysis group--my first comment after reading that original article was that all they'd done is found which grandmasters play most similarly to a non-tactical third-tier chess engine.  In reality, I think that such a method might be possible with a much stronger engine (such as Rybka or even Shredder or Fritz or, better yet, all three combined) if retrograde analysis of the game is first performed and stored before going back through the game.  However, this by itself doesn't overcome the problem that you note having to do with psychological elements, such as playing moves that you happen to know will cause difficulty for that particular player, but might not cause difficulty for some other player.  Thus, in addition, in the retrograde analysis, one would have to make some sort of lower limit for recording the evaluation difference between "best" move and the text move--perhaps at least 0.25 pawns, which is often used as a criterion for difference between best and second-best moves to determine the probability that someone cheated by using a particular computer program in a game.  This, or perhaps double the amount, would overcome most (but not all) psychological problems.  Of course, there still remains the problem that some players are far more tactical players than others, and so Fischer and Kasparov would still probably be graded down just because of their styles.
Parent - - By SR (****) Date 2007-07-28 20:36 Edited 2007-07-28 20:40
This discussion can of course go on for ever. However, even if the benevolent 32tb player is not allowed to these artificial draws, no one can prevent the 32tb player to refuse to cross the 4th rank (unless the its demanded by the tb). The human player can also decide not to push to much for a win a mainly also stay behind his 4th rank.

It is the nature of the discussion that none of us can "prove" we are right, however I still think it is intuitively clear that even a weak player should easily be able to hold a draw against a benevolent 32tb  player. And I still think its intuitively clear that the rating of a benevolent 32tb player essentially will be that of the average of his opponents within quite a large range (which essentially was my original claim, that then was challenged).

Maybe you could summarize your view and we can then move on....
Parent - By turbojuice1122 (Gold) Date 2007-07-28 23:51
You are correct in that nobody can prevent the 32TB player from refusing to cross the 4th rank, but your qualifier, "unless it's demanded by the tablebases" is quite important.  I am assuming that the 32tb player's opponents are actually seeking to win the chess game if possible--otherwise, the population of players against whom you're talking about getting a rating actually doesn't have a well-defined rating under any system, Prof. Elo's or otherwise.  The opponent will eventually, in the course of trying to win the game, have to make a move that breaks through the wall, seeing a potential advantage.  It is after this point when mistakes can and will occur.

However, I see a possible counterargument here, but I think it's unclear that it would work--that being that, in its benevolence, we make it so that the 32TB player not only refuses to move past the fourth rank, but plays moves that, as much as possible discourage any types of breakthroughs by the other player.  Even with this, however, it seems unclear that a 1500-rated player who is trying to win wouldn't lose.  However, I'm starting to see that the lowest elo rating for which a benevolent 32 tablebase player could have might be lower than I would have thought.  There is also possibly an isolated rating range in which the 32 tablebase player would almost certainly have the same rating as his opponents, that being somewhere higher than the randomly playing monkey, but somewhat lower than a 1500-rated player--a player to whom I'm referring knows how to move the pieces and knows how to avoid direct blunders, but doesn't know how to formulate a winning plan--such a player might have an elo in the realm of around 800 or so.
Parent - - By George Tsavdaris (****) Date 2007-07-28 21:34

>If you have Rook and Bishop (or Rook and Knight) against Rook (two Knights) in a drawn position you have a very easy game with zero danger of >loosing. What it seem to me you are saying is that maybe there will be positions where only one move draw, but I hope you can see that this >concern at least is absurd to have for the person with the advantage in the two endgames about. Don't  tell me you a afraid of loosing >Rook+bishop against Rook!

Yes but in most cases endgames are with Pawns and there a loss of a single tempo is most times critical. So from possible 50 moves one can play the 1500 ELO player would see 5-10 playable and from this only 1-2 will keep the draw. He can find the best one, can find it 2, can even find 3 but i don't think you could expect from him to find it for 20-30 moves.....
Even worse of course is the situation that almost surely the position will become lost for the 1500 player, during the middlegame or even the opening.
Parent - By SR (****) Date 2007-07-28 21:54
Please read carefully the full subtread (especially the last two messages I posted above).  This might help you appriciate the (simple ?) point I was trying to make....
Parent - By George Tsavdaris (****) Date 2007-07-28 21:29

>A 32 tablebase program that always find the fastest win if a win exist, will never lose a game.

This is not true.  Assuming you mean this tablebase program will play perfect moves in drawn positions too, and not playing bad moves converting drawn positions into losing one's then:
If a win for white exists for example and it is playing on the black side there is always the probability for the white side to find all the best moves and win against this perfect 32-tablebase program.

>if the program deliberately plays ridiculous  moves (always keeping a drawn position drawn)

I don't understand this.
If you have a drawn position, then if you play a move that keeps the draw, why do you call it ridiculous move?
I would call ridiculous a move, when you have a drawn position and you play a move that loses.

>However my point is that if the program - as a thought experiment - deliberately plays ridiculous  moves (always keeping a drawn position drawn) >a 2000 player will have no problem with virtual certainty to draw the game.  If the program plays white, it might open with 1.a4 (assuming this >holds), then a reasonable player might answer 1.- e5 after which the program might play 2.Ra2 (that might theoretically still be a draw).

I don't agree with this too, since after 1.a4 (draw we assume) 1...e5, there might be many many drawing moves that would give to the black side much more difficulties than the 2.Ra2, which is not so complicated to play against.
And after 2.e4(hypothetical drawing move too) for example, then black could have only 1-3 choices that draw and much more difficult one's since 2.e4 gives him more complications and troubles than the Ra2. Imagine now that black has to hold for many many such moves and play correctly many such moves to hold the draw. I don't think a 2000 ELO player can do it....
Parent - By George Tsavdaris (****) Date 2007-07-25 22:18

>No one can prevent you in making our own private definitions, but please do not expect us to use them.

I did not say you should use them. I said that there are other definitions for the sentence "stronger Chess engine"......

>Your alternative "definition" that A is stronger than B only if A always wins against B has the serious flaw that no one can be stronger than a >randomly playing monkey.

Even if that was true, it doesn't matter and does not make the definition to have a flaw. A definition of that type, can't have a flaw.

>You claim that the definition is "pretty normal and correct" is incorrect since it follows from the definition that Kasparov is not stronger than a >randomly playing novice.

This is not correct. If Kasparov plays 10 games against a randomly playing novice he would definitely win all 10 games. So according to the aforementioned definition Kasparov is stronger than the randomly playing novice......

>Theoretically your definition is absurd.

Silly yes, with no real meaning yes, without any usefulness yes, but not wrong. A definition of that type can't be wrong or correct. It's just a definition.....
Parent - By lkaufman (*****) Date 2007-07-28 14:52
      Your point about  how the rating of a perfect but "helpful" program would tend to be near the rating of its opposition is valid, and is indeed a real issue -- the more  "drawish" a player's style is, the greater the tendency for his rating to be near his opposition's rating.
     The solution is simple: don't rate draws!
Parent - - By Quapsel (****) Date 2007-07-25 08:40

> just what does "the strongest" mean?..if you don't win in the game you are playing, all the statistics for 10000 other games won't matter

One Engine A ist stronger than another one B can mean:
A will win with an avarage og 60% of all Points.
You can only detect this with a satisfactioning probability if you do many Games.

Parent - - By Sesse (****) Date 2007-07-26 00:22
The 60% number seems quite arbitrary. Honestly, the only sane number here is 50% -- from there it becomes a question of statistics to find out whether you have enough games to say that your >50% score for one side really indicates which is stronger, or if it was just a random fluctuation (and you need more games to find a valid result).

Note that by that metric, unless you're playing exactly the same engine against itself, one side will almost guaranteed be stronger than the other. However, at that point, the next question becomes "by how much"... which leads to the Elo rating system. :-)

/* Steinar */
Parent - By Quapsel (****) Date 2007-07-26 16:17
What I ment was:

Even if an engine A is stronger than another one B,
and even if this strength-difference leads to an expectated value of 60% for A,
you will have a rather large chance to get an contrarious result if you have made only a view games.

In other words:
If you have done only a view games with a result "B ist stronger than A, it won 60%" you must be aware, that this result could occur with a rather large probability, although if in fact "A is stronger than B, it will win 60% at long site"

Parent - By George Tsavdaris (****) Date 2007-07-25 11:10

>just what does "the strongest" mean?..if you don't win in the game you are playing, all the statistics for 10000 other games won't matter

It would matter if you use the current ELO method for measuring performance that gives the indication of the strength of a program.....

If you have a box with 1000 balls, 999 white and 1 black then you can't expect that each time you draw a ball, you will pick a white one. There will be times a black one will be chosen.....
The same occurs with Rybka 2.3.2a and Fritz 7.
Parent - - By Vasik Rajlich (Silver) Date 2007-07-26 08:29
Rybka 2.3.2a will score around 55% against Rybka 2.3.

Parent - - By hebroots (**) Date 2007-07-26 22:58
Can you explain any particular game or computer setup conditions for which that applies?

Thank you
Parent - - By Felix Kling (Gold) Date 2007-07-26 23:07 Edited 2007-07-26 23:17
This should be true for every time control I guess. The engine steup (Hashsize etc.) will also make no big difference.
Just try 5+0 games (5 minutes without increment). After 100 games you normally will get a first tendence but you will of course need infinite games to get an exact result.
That's why there are the testing groups, for example this rating list has a good number of games (~1300 for the latest version, but only against other engines):

you can find more links to rating lists on the front page of (under independent testing)
Parent - By Vasik Rajlich (Silver) Date 2007-07-28 07:55
Yes, it should hold roughly true for all reasonable conditions. In general, blitz games will give you more decisive games and slightly higher winning percentage for the stronger version.

Up Topic Rybka Support & Discussion / Rybka Discussion / Does 2.3.2a beat 2.3 in engine/engine games?

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill