Let's say you enter a computer in a tournament with the best correspondence players in the world. The computer uses a thoroughly worked opening book and then gets 1 hour or 24 hours per move.
We give the computer Black in all games. How many draws would it get out of 100 games? 10%? 90%? What is the guess of the correspondence experts in the forum?
I don't have any idea, and I would like to hear your thoughts.
(D_0 + (1-D_0)/2) > D > (D_0 - (1-D_0)/2) ) would be a pretty good estimate. I am basing this off of the fact that the opponents who take the least risks would result in draw rates roughly half-way between the those player's draw rates against themselves and 100%, while that difference from the normal corr draw rate would be that same amount for the strongest opponents, but in the other direction. Anyway, I'm sure you could probably improve on this estimate greatly.
Does it make much of a difference if the opponents know they play against a spacebar?
Yes, I had an extra bracket, though if Dagh is having trouble understanding, I obviously didn't do a very good job in explaining. (Or, I'm just wrong.)
> How hard is it to beat a spacebar?
What's a spacebar?
IMHO, the problem with the spacebar approach is that there are many cases where the engine will have a number of moves with nearly equal evaluation which result in very different games. And while the engine will decide between these moves with a similar one dimensional evaluation number in a random manner, a person can evaluate the different positions based on the characteristics of the resulting game. With engines still having issues properly evaluating in certain areas, e.g. king safety, kingside pawn storms, long term positional play, and closed positions, there is still the potential for value added when the time frame is long enough, generally four days per move for cc, to target areas where engines aren't that strong. This seems to hold even when using weaker hardware.
This is not too different than the situation where you won the PAL freestyle event with inferior hardware a number of years ago, although with stronger engines and hardware, the human factor is certainly less critical than it was back then.
But I'm pretty sure that Uri would disagree with my argument, since he has been claiming for years that the spacebar approach is nearly optimal...
The Black spacebar will play without opening book. It is an unknown engine of strength similar to Komodo and Stockfish (with similar strengths and weaknesses as current engines) and uses strong hardware and 24 hours pr. move.
You get 100$ for a win, nothing for a draw or loss.
How much would you pay to play this game?
Let's make some assumptions:
0. The human player plays white.
1. Both sides are using equal hardware.
2. The human player has a copy of each of the top engines. In reality, this would quickly allow figuring out which engine spacebar was using, which would provide the significant advantage of allowing a priori prediction of moves in most cases. I guess you could assume use of an unknown engine developed specifically for this event, or in the past, rental of a one of a kind engine (e.g. the Rybka cluster, although this would violate the equal hardware requirement). In any event, the focus of this game would be on playing the opponent, rather than playing objectively the best moves.
3. The human player has nothing else to do for the duration of the game, other than to eat, sleep, and shit. In the interest of playing the best chess, fornication would be postponed until after the match. Attention to personal hygiene would be optional.
Under these conditions, I would guess that the expected win-draw-loss percentage would be something like 30%-65%-5%.
Betting on the game isn't as straightforward as you seem to believe because you need to take into account opportunity cost and risk aversion. Nobody in their right mind would pay anything to play a game that could go on for three months, for $100 (that's why it makes little sense to have a prize fund for cc events, and I get a good chuckle thinking about the ~$5 an hour that was earned by the winners of the last Infinity Chess freestyle event if you don't count Nelson's preparation time). This problem could be addressed by increasing the stakes to a million dollars, but risk aversion would still be a major factor in determining the size of the bet. The takeaway is that these things aren't really done based on expectations. That said, I would pony up $100K to play this match for a prize of $1M if I won. That would provide a large enough expected return to make the time required for the match a worthwhile investment, without being so high that I would have to listen to my wife complain about a possible loss of the investment for the rest of my life...
Are you saying that such a ticket is worth about 30$?
In other words, can we hope to win about 1 out of 3 games against a spacebar?
What about the importance of the opening book?
What would be the result against no opening book?
What would be the result against an opening book consisting of "currently known critical lines" analysed in a mini-maxed IDEA tree? (Exactly how to construct the IDEA tree can be argued about, the main idea here and now is to just "teach" the spacebar about the known danger spots in White's normal ambitious lines, forcing White to use more cheesy lines.)
1) There are a limited number of top engines, so the play of the spacebar can be accurately predicted in most cases by testing with the same engine.
2) Many openings are very problematic for unassisted engines. I have always liked the Sicilian Dragon and am constantly amazed at how poorly current engines perform when looking at the position from the root. Performance in some KIA and KID openings is even worse. Can a person be clever enough to get an engine without a book to play into one of these types of lines? Maybe. This could lead to an early blowout. Being able to predict the engines moves makes this a viable option.
3) Engines still make ugly, positionally indefensible moves because they don't have a rule against them, and can't see a problem in their search. These moves don't disappear at long time controls. Exploiting these moves may be difficult in a freestyle game with only a few minutes per move. With 24 hours per move, the consequences should be more severe.
4) Due to the many 'enhancements' to alpha-beta search, it's not at all uncommon to have a position that evaluates at one score at the root, but a very different score when you advance a few moves along the PV. A person will always check his plan by moving along the PV and ensuring that things are as expected. Engines don't do this for whatever reason (probably because it's not optimal with a pure alpha-beta search).
An opening book with full coverage of "currently known critical lines analysed in a mini-maxed IDEA tree" would be a tougher animal. The opening is obviously far and away the most studied part of the game, and there are a lot of openings that engines in general don't play well, and if you know the exact engine, you can test it against all of these critical lines and find out which ones it plays worst against.
If the engine has a book, it's almost a requirement to force it out of book as early as possible. Since the most likely outcome is a draw, we are treating draws as equivalent to losses, and we are happy winning a third of the time, we really want to enter into really complicated, really murky positions, with most of the pieces still on board. Ideally, these positions will be likely to end in a decisive manner and will break one way or the other in a manner difficult to predict (especially from the root).
Let's say you wanna play 1 e4.
The spacebar has a book based on:
1 e4 e5 2 Nf3 Nc6 3. Bb5 Nf6
3. Bc4 Bc5
The Berlin and the Italian game, thoroughly booked. It also has normal defenses against the Scotch and King's Gambit etc.
Where do you want to take the spacebar out of book? (For instance, you could claim that there must be slow lines in the Italian game around move 10-15-20 where there are so many equally good options for both sides that it's impossible to book much further, or you could suggest some offbeat moves like 3. a3 or whatever).
We ask you to play against the spacebar for your life. You get half a year of endless resources etc. to decide on your first move.
What would you play? (and how do you follow up?)
I know this is difficult to answer in less than half a year, but let's get some qualified guess and reasoning. I want to know where current spacebars are most vulnerable!?
1) Get out of book as early as possible, with a closed but not blocked, roughly equal, and non-drawish position (which is possible for white). I suspect the best way to do this would be with a flank opening that gives a wide margin to all playchess theory.
2) Figure out which engine is being used for spacebar to allow accurate prediction of its moves.
3) Test top engines for vulnerabilities, especially in king safety issues. For example, SF likes to play g3 (without a fianchettoed bishop), and doesn't respect an opponents half open h file with a rook on it.
Of course my opening database is a million miles behind those of Nelson or Eros, but using Convetka's opening Encylopedia as a poor substitute, I seem to be able to get to a sparse point in the book pretty early by fiancheting both bishops and combining this with preparations for a knight on d2 or e2. This seeds the center to black, which I'm ok with, and allows castling on either wing, with a preference for cross castling to increase the probability of a decisive game, and also to take advantage of known issues that engines have with pawn storms.
The biggest question in my mind is how feasible it is to achieve goal number 1. If it is possible to achieve this goal, with knowledge of the opposing engine (and there are only a few top-engines so this wouldn't be that difficult), I think it should be possible to achieve a 30% win rate if one doesn't care if the alternative is a draw or a loss (this is obviously important because it allows for much riskier play).
In short, with a little work you can make anybody's book irrelevant in six moves, maybe less, without your position losing viability. My rule of thumb is that it takes something like an order of magnitude of more data to advance your coverage one move further out reliably. Of course, I am talking here about empirical results alone. If you combine that with evaluative data it's another story.
What are our expected winning rate in the two scenarios:
1) We play against a spacebar that has no opening book.
2) We play against a spacebar that has no opening book, however, we (or black) have to play a novelty/rare move before move 15. (If no novelty is found, we cancel the game and start over.)
A second question:
Let's say our strategy is to play a g3 + b3 system and a knight to d2 or e2, a pawn on d3 or e3, slow stuff. But now, the spacebar is told about this and its operators spend a month preparting against 1.g3.
How much would our winning rate suffer?
If our winning rate suffer a lot, do we have other good alternatives to the g3 + b3 system? How many? When will we run out?
I agree that we can practically always dodge any book with at least a viable position. But will it still be a WINNABLE position?
Some examples I think about:
1) We play 1 e4 e5 2 d3. We have a perfectly viable position, but I would not be happy to bet on winning with white after black probably replies 2... d5 and has a straightforward edge (but probably quickly analysed to 0.00).
2) The spacebar happens to like the Najdorf, and now we surprise it with the rare move 6 h3 (it used to be quite rare). Here I think we would have decent winning chances if the spacebar is on its own. How many truly "critical" semi-novelties are there left against a truly worked opening system? (I don't think there are any left against the Berlin Ruy Lopez, for instance.)
3) Something in the middle between 1 and 2: We don't get an edge, but we also don't give black an edge. We can think of slow lines of the Italian game, for instance.
I think the Carlsen strategy of playing dry positions and combining that with relentlessly accurate play applies to engines to a large extent. Viable positions are all you can hope for in an opening; if you get more it's because your opponent didn't play optimally. Whether you win or not depends on your opponent making mistakes which are subsequently capitalized upon. No mistakes, no winning chances. (At least this is the case 99.99% of the time. There are very rare instances where you can't completely figure out where a player went wrong and you conclude the mistake must have very early in the game and a long series of subsequent moves could not have been improved upon.)
A possibly fruitful avenue of analysis might be a study of draw-rates, which might be a proxy for "exhausted opening theory", though maybe not. If you were to study ECOs individually or well-traveled positions within an ECO, for instance, and somehow normalize the draw-rates for average Elo (lower average Elos produce fewer draws, as do bigger variances in opponent Elos), you might find lines that are remarkably below-average on an Elo-adjusted basis while not unduly sacrificing corresponding success-rates.
Of course it's always tempting to believe that the current crop of engines are nearly perfect, but actually they are far from it, and I'm sure that future engines will make the current best available engine look as bad as Crafty playing against the current Stockfish. The lines that will work ten years from now against today's lines would work just as well today if we could find them...
The point of the analysis I proposed is to identify lines that greater or lesser draw-rates than you would expect given the players involved. Based on that you might identify openings where existing human/engine theory is more misaligned with evaluation scores.
> and I'm sure that future engines will make the current best available engine look as bad as Crafty playing against the current Stockfish.
If current engines are somewhere around 3200 or so then perfect chess is edging towards 4000 elo.
I've heard people argue it may be somewhat lower than that.
If you want to see this yourself, play the following run-of-the-mill Sicilian Dragon and let black start playing at move 8. You'll be amazed how quickly black's game falls apart!
1.e4 c5 2.Nf3 Nc6 3.d4 cxd4 4.Nxd4 g6 5.Nc3 Bg7 6.Be3 Nf6 7.Bc4 O-O 8.Bb3
Black's play from this position is much closer to 1400 Elo than to 3100...
How easy/hard is it for you to get a draw now?
Of course white has a much easier game with the dragon, and I have personally screwed up the move sequence in a freestyle final against Eros a few years ago, but engines frequently screw up from the white side as well. One thing that happens with great frequency is after white pushes the g and h pawns and black plays h5, white pushes the g pawn to g5 ending up with pawns on h4 and g5 locked with black's pawns on h5 and g6, effectively locking up the kingside and killing off white's attack. I've seen Komodo 8 do this in recent engine-engine games. I'm not sure that lots of pondering time would change this behavior.
> I'm sure that future engines will make the current best available engine look as bad as Crafty playing against the current Stockfish.
But keep in mind that just as with scientific theories, improvements over time bring things closer and closer to the truth. I think it will take a bit longer than the time between Rybka 1.0 Beta and current Stockfish to make current Stockfish look so silly, at least at long time controls. In fact, I would be more surprised if it happened in the same time frame than if it never happened at all. The difference between chess and scientific theories is that in chess, we know what perfection will look like (all games are draws), while in the case of scientific theories, we really don't necessarily know in some fields what perfection will look like.
This is not always true. Sometimes the truth has to be modified to bring it closer and closer to the scientific theory. Take for example the best funded of the sciences, climate science. There is now a 35 year record of temperatures from satellites, and a ten year record of temperatures from ocean buoys. This data is ignored in the formulation of models which do not conserve energy and not only cannot predict future events well, but amazingly can't even predict past events. This will continue as long as the funding continues.
Of course financial perversion has never been a problem for chess! In computer chess now we have two top engines, one searches really well, and the other has a much better evaluation. Since the better search is open source, it seems likely that its ideas will eventually get copied into the engine with a better evaluation to produce a clear number one (actually this has probably been occurring for some time now).
Of course as we approach 'perfect' chess, there can be no improvement with additional time per move. I'm not sure we've seen this so far, at least in cases where books weren't used. When we get to the point when ten hours per move is no better than one hour per move, the end will be near...
> When we get to the point when ten hours per move is no better than one hour per move, the end will be near...
We may only solve the opening position. There are still zillions of other positions that are more difficult ;-)
Anyway, it might turn out that your statement about future engines making current Stockfish look as bad as it currently makes Crafty might turn out to be true in various critical positions, e.g. those relevant to the points shared between you and Dagh above.
I agree with you statement in its entirety, but because AGW is probably getting more funding than everything else put together, I would be surprised if it doesn't become the model for other scientific endeavors, real or imaginary. For some things it fits quite well, e.g. looking for near Earth orbit objects that could potentially hit the Earth and cause problems. In other areas it would be a real stretch, but the key is to get the media to write lots of stories for low information voters, then get politicians involved to 'save the Earth'. If there is enough money involved, the UN might even get involved with a cast of thousands, to collect a management fee. Extra bonus points if remediation requires redistribution of income from more to less developed countries. The UN is very interested in managing this type of effort, and will help with the science to keep it all going!
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill