1. Ehlvest played well, at least until time pressure. The final score was exactly the one I voted for in the poll, but Ehlvest in general had much better positions around move 30 or so than I expected him to have.
2. Rybka's performance rating was around 3000. I get this by adding Ehlvest's rating (2629) to the rating differential for the score (191) plus a reasonable estimate for the cumulative value of the handicaps (180). This figure of 3000 is well below the 3112 rating on the CCRL list, but in my opinion such lists overstate the rating differences between computers as compared to how they would rate against humans, by around a 4-3 ratio. So if we assume that 2700 rated computers are properly rated on CCRL, then a 3100 rating there really implies about a 3000 performance against humans, which is what we saw in this match.
3. As for the individual handicaps, the White pieces make an increasing difference as the level goes up. White is worth around 40 points in grandmaster play, but here the average rating of the contestants was even higher than in a World Championship, so probably 50 points is a more accurate figure for White.
4. The three move opening book proved to be less of a handicap than many expected. By choosing lines that are only slightly inferior and not very risky, I was able to keep the opening disadvantages to a minimum, and in some cases to no disadvantage. I would now say that this handicap was only around 40 Elo (and in this particular match even less).
5. The lack of endgame tablebases should have cost Rybka one win, but didn't. Since Rybka lacks certain basic endgame knowledge because Rajlich assumes that it will be used with at least minimal tablebases, this handicap may be more costly than we thought and is probably not an appropriate one for Rybka in general. Call it 20 Elo.
6. The 2-1 time handicap is supposed to be worth 70 Elo or so, but due to pondering is worth somewhat less. Let's say that the reduceed hash table size offsets the pondering. No way to tell how time affected this match.
It's clear from this match that strong GMs with the White pieces and ample time have very good drawing chances against Rybka, though whether they can win more than once in a blue moon is questionable. Ehlvest told me that he thinks that strong GMs with the White pieces will always be able to make draws against computers at long time limits if that is their goal -- it's up to us to prove him wrong! I would like to see such draw-odds matches played once we find ways to make Rybka avoid draws effectively, but that goal is some ways off I'm afraid. For now it looks like pawn handicap matches are the best bet, so that the human will have both the means and incentive to play for the win and so that the matches will be genuinely tossups.
>2. Rybka's performance rating was around 3000. I get this by adding Ehlvest's rating (2629) to the rating differential for the score (191) plus a >reasonable estimate for the cumulative value of the handicaps (180).
Rybka's performance was 2820 and not 3000.
I agree that handicaps give a decrease of ELO about almost the same value you gave(i believe it is between 120-180 ELO).
But Rybka didn't play without handicaps so we can't really speak about its performance without handicaps.
We anyway can agree that Rybla's estimated(by you and me at least) performance without handicaps would be 3000.
But Rybka's actual performance with handicaps was 2820....
I wish the following for the future(i know your intentions about future matches and unfortunately they are not the same with mine) :
-NO handicap games at all. I prefer each side to have all its weapons.
-If you have to make a handicap, then no piece handicaps. I hate non-Chess Chess matches. :)
-If you have to make a handicap, then tablebases,hash, and book should be always normal and non-handicapped. I prefer time handicaps.
For the usual reasons, its very difficult to measure the Elo handicap of a short book against people. Its not that difficult to measure against engines. If you take your quad and 3 move book onto the CB server, and look at your black games, you would see that you end up with a lost position in a significant number of games before your opponent exits book. I suspect that there are probably at least a dozen engine room regulars that given white, would have easily won positions (easily won for a GM :-)) coming out of book against your 3 move book and Rybka, at least 25% of the time.
Obviously, trying to quantify this based on just 6 games is really hard to do. Each half-point is worth ~100 Elo of performance rating.
My thesis can be easily tested by running a quad with Rybka and a 3 move book against some of the strong engine room nicks, and looking at the result when your opponent exits book.
(For some people these statements are a synonym.)
Anyway, your experiment could be tried. I'm not sure there will be use for this book in the future. Larry - if you're reading, what do you think about it?
I agree that GM Ehlvest has played very well and Rybka, too :-). The match was interesting and thanks for organizing it.
I do have different point of view concerning some handicaps, what is handicap and who has it and where.
Let me express my point of view, I am trying to defend human.
-Ehlvest has not had the opennig book nor endgame tablebases, in that circumstances the computer programm shall not have it either, or the human shall have access to the same additional data as the programm. In that case it was ok, that both sides does not have access to endgame tablebasses.
- Rybka does have the opening book - short one but Ehlvest did not, it was the advantage for Rybka. Here You have helped Rybka and in that part of the game GM was playing vs you and Rybka, here were not the match Ehlvest against Rybka. It has been important factor because you have analysed the Ehlvest games, so here we have to subtract sth from Ehlvest rating :-)
- accepting/refusing proposal of the draw, here were once more program + operator vs Ehlvest, I think that operator role shall be only to transfer the moves and proposals, the program should have take the decision by itself. The operator here was having not bad rating :-) and access to the programm analysis. it is important factor next Elo points ;-).
If that conditions will be fullfilled we could speak about the real match against human and the programm.
The formula of the existing matches, not only that one, is being the combination of the game human against programm & centaur.
The human advatages are removed by adding the human help and then the human have to face the programm.
So the human has been not beaten by the programm itself but with the help of other human.
If you think that rybka's moves can be predicted then you are wrong
there are some problems:
1)Ehlvest did not know the choice of rybka in the first 3 moves so he needed to prepare more than an hundred of different opening for preparing to the line that rybka is going to choose(against 1.e4 possible 1...e5 1...c5 1...e6 1...d5 1...d6 and even if we assume only 5 choices in every move we get 125 choices).
2)Rybka with 4 processors is not deterministic
3)The version that was playing was different than the commercial version.
Larry wants to keep his book private, but you can easily perform this experiment. Have somebody make a 3-move book with this philosophy, and then run it on Playchess for 50 games or so against a variety of opponents with strong books.
I got a good laugh from Larry's desire to keep his book private. If he played it against my book, nothing meaningful would be revealed since its hard to imaging any top GM playing 1. b3 with money on the table. The question I have is if Larry put together 10 3-move responses to 1. b3, how many would end up with white leaving book with a very large advantage (say a Rybka eval of at least 1 pawn after a long analysis). I suspect out of 10 unique openings, the answer would be 3-5.
ok, I'll tell you what - I will here make a 3-move book vs 1. b3. Please run a (fair) match between this book and your book, tournament settings, and tell us the results. You don't have to post the games.
Needless to say, you're not allowed to analyze my lines and add to your book before the match starts.
My lines are (against everything):
1) 1. .. Nf6 2. .. e6 3. .. Be7
2) 1. .. Nf6 2. .. b6 3. .. Bb7
3) 1. .. Nf6 2. .. Nc6 3. .. d6
4) 1. .. Nf6 2. .. Nc6 3. .. e6
5) 1. .. Nf6 2. .. g6 3. .. d6
Let's say, 5 games for each of the four "defenses". That will be enough to tell us if 75 Elo is really badly off.
1.b3 Nf6 2.e4 XX 3.e5 XX?? 4.exf6! (fourth move first move out of book, but will certainly be played.
1.b3 Nf6 2.Bb2 XX 3.Bxf6 XX?? 4.Bxe7 or Bxd8 or Bb2, depending on which third move was played. Hmmm...I think the statement "against anything" should be revised... :-)
I am running each of your 5 lines against my book with the black continuation after move 3 being decided by Rybka playing after evaluating to, and including complete depth 20. Each game will be stopped when white runs out of book and the resulting position will be presented here. This is rather time consuming on my 219 kn/s machine (1GB hash, default settings except with EGTBs set to normal), but should provide a good indication of how well Rybka can perform with a simple 3-move book and lots of computation time. Preliminary results indicate that Rybka with the three move book is doing well and I am wishing I had decided on depth 19 for Rybka rather than 20 (after seeing black avoid several traps during depth 20 analysis).
This still leaves open the question of converting your results to Elo - after all, we did agree that white would score well.
I guess we can try to come up with some formula based on the eval scores at the end of each variation and white's expected time advantage at that point.
Only clock advantage from this uncommon opening.
Not much doing here either:
Once again, there were very few games in the database from this opening with very little analysis.
By the way - was your book playing all the way to these positions on the basis of just 15 unanalyzed games?
The tendency of most people (and engines) when confronted with a flank opening is to grab control of the center. Thus the great majority of the lines in my database are based on black making an early grab for control of the center. Interestingly, Rybka at depth 20 never seemed to make this early attempt after its third book move.
All of the white moves came from my book. Not infrequently, a line that has only a few continuations at move 4 will merge with other lines at a later move. The multiple positions represent cases where more than one move was present in my book. If any of these openings were played frequently, they would be analyzed with one or more moves highlighted. The test took longer than I thought because I underestimated the time for black to go through depth 20 early in the game. Its actually surprising in this case that black didn't end up with better positions, given the probability that the moves came from games with analysis at a much shallower depth.
I agree that the best openings for this type of test would be ones that have been very well explored. The major surprise for me was that your openings, designed to be quiet so they could be played against anything, were so much more effective than the openings that I see every day on the CB server (or for that matter in freestyle competitions).
of course, I chose my lines based on frequency of occurrence (in RybkaII.ctg). Obviously, I expected your book to have more data (and it did), but I bet that your analysis is proportioned quite similarly to Jeroen's book.
If we repeated this with 1. e4, white would probably do a bit better, simply because the move is stronger (:)). However, the general trend would be the same - black would find slightly inferior moves which escape big theoretical systems without totally destroying his position, and normal chess would be played. This is exactly what Larry did in preparation against Ehlvest and as you can see from that match, it didn't cause huge problems for Rybka.
On the whole, I think an estimate of 75 Elo is still quite accurate.
1 ... e5 41%
1 ... d5 38%
1 ... Nf6 7%
1 ... c5 6%
So only your last line is represented by hundreds of games in the database.
One problem I have with a 75 Elo number is that in main lines, black generally starts with a significant time deficit, I'm guessing he might on average have used 1/3rd of his time to get through the opening. This wouldn't leave many Elo for lower move quality or prepared traps.
Another problem is that it doesn't square with engine room results, unless the top players engine room books are actually hurting them rather than helping. Normally in these matches, the relative level of preparation is much more important than the strength of the move, so a weaker move with a big advantage in preparation can be more successful than a stronger move with equal prep. Anyway, a good record against engines running on hardware 3-4 times as fast minus the advantage for being white still seems to be well over 100 Elo.
Re. your last paragraph, I just wonder if you're not "mentally cherry-picking". In other words, when you really nail somebody with preparation, it counts, while when you don't, it's because you just didn't prepare enough yet in that line.
Obviously, an inhumanly-good book would be incredibly useful, worth hundreds of Elo.
I am talking about cherry picking when you come up with your estimate of 100+ Elo. In other words, you only count the games where your preparation was actually very good. The value of good deep preparation will be well over 75 Elo.
My lines against your book are the following:
1.b3 Nf6 2.Bb2 e6 3.Bxf6 Be7 4.Bxg7 Rg8 5.Bb2
1.b3 Nf6 2.Bb2 b6 3.Bxf6 Bb7 4.Bb2
1.b3 Nf6 2.Bb2 Nc6 3.Bxf6 d6 4.Bb2
1.b3 Nf6 2.Bb2 Nc6 3.Bxf6 e6 4.Bxd8
1.b3 Nf6 2.Bb2 g6 3.Bxf6 d6 Bxh8
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill