Is there an estimate for what the ELO for Perfect Play is and how long would deepRybka 4 have to think to = Perfect ELO?


> I was looking at the game for post: "Deep Rybka 4 (8 hours) - Deep Rybka 4 (60 hours)" and noticed a ratings estimate by turbojuice1122 says White Elo ~ 3250 and Black ~ 3450. http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=18651
>
> Is there an estimate for what the ELO for Perfect Play is and how long would deepRybka 4 have to think to = Perfect ELO?
I think the ELO system is broken when the weakest opponent of two players is getting too strong to beat. He can play lousy chess, taking no chances, not setting up any attacks etc. but if he can avoid fatal moves he will not get beaten. Like running into a break wall, it does not matter how fast you can run. All matches between these two players are drawn no matter how beautiful chess the other, strongest, player plays. The stronger player will get much better results against weaker players but he can never beat 'The Wall'.
The elo of these two players would depend on the results they both get against the chess population in general, that is still normal but would not reflect the results they get against each other. I don't think Elo really accounts for this at present.



I suppose there could be made to be a version of Rybka that would try to Only DRAW engines--even of 3000-to-Zero ELO. THIS MIGHT EVEN BE OF INTEREST TO HUMANS WHO ARE ANALYZING THEIR GAMES!!
And:
What means 'perfect play'?
Would this only mean "if it has a drawn-position which leads to another (perhaps very simple) drawn position."
or would it mean "...then it will find a drawn position which will make it mostly difficult for the opponent to reach that drawn"
or "... then it will calculate typical human faults and try to use this using a drawn position."
"Perfect play" will need 2) and 3) too, I think!
And:
Imagine, *plop* there suddenly exists a super stong engine A which wins each game against humans.
What ELO would such an engine A get?
And Imagine then later *bang* suddenly a really perfect playing engine B exists, which wins each game against all humans and A too.
What ELO would such an engine B get?
And would B have gotten the same ELO value if A would never have existed?
Quap
I proposed the following definitions:
1. Perfect Play: Just like 6men tablebases of the present, on a given draw position, play moves randomly, otherwise play the moves that win the fastest. Some people would consider that perfect play "too weak" and would add "playing the variation that keeps the game going the longest, in the hope that the opponent will make a mistake along the way", but who knows if 1.f3 is the slowest draw, so this strategy might backfire and the longest draw is also the easiest one.
2. Optimal play: Just like we play today, exactly the same, but we avoid the losing positions, and we have information about what positions are lost from the opposing side, so one tries to steer the game into complex positions that an imperfect opponent is very likely to slip.
3. Ideal play: This one would use opponent modeling, focusing on the history of the opponent and in which positions the weaknesses of an specific opponent would be evident and make it blunder.
I don't think a super strong engine A that beats ALL the humans ALL the time is possible even theoretically, some human could throw dice to pick the moves and eventually draw the engine by hitting perfect moves every time, or otherwise just be lucky and never go to a losing position, something very likely considering perfect games have been played since the 1800s, or something.
maximum elo is obviously a function of time controls. an optimal player's elo against a lineup of strong entities, at say, 1 move/2 days, will be lower than his elo against the same field if the time controls were 1 move/min.
> your "ideal play" term doesnt have a corresponding term in game theory for reasons that should be obvious.
No, sorry, I'm no expert at game theory, so it doesn't have anything about opponent modeling? That's weird.
however, best-response is based on the assumption u know your opponent's entire strategy w/ 100% fidelity. this is impossible if our opponent is a centaur, so maybe this is the source of my "brain fart."
btw: i'm not a game theory expert by any stretch of the imagination. i simply know more than the average person.
I found game theory fascinating, but I still have a lot to read and I'm not familiar with the technical terms. Anyway "best play" (or response) would just use opponent modeling as much as possible, I think it would be acceptable since 100% fidelity isn't even achievable with indeterministic MP engines.
A related question: what is the Elo of randomly chosen legal moves?
In other words, if we divide expertise into categories, with a category range being 200 Elo, a player from the next category up from RandomPlayer would be expected to score ~75% against him.
How many categories before we reach Kasparov and Carlsen? How many further categories before we reach perfect play?
In fact, it is sometimes argued that go is more complex than chess because it has more such categories of expertise.
:
>A related question: what is the Elo of randomly chosen legal moves?
Must be below zero, if it is theoretically possible. I don't have any data on it, but my gut feeling is that in most chess positions the majority of moves lead to a forced loss, so I'd be quite surprised if such play would in any circumstances have a positive elo.
...but my gut feeling is that in most chess positions the majority of moves lead to a forced loss...
Perhaps, but if its opponent is almost as bad, random play will even win some games.
:
> Perhaps, but if its opponent is almost as bad, random play will even win some games.
>
Random will win occasionally against an opponent that is only slightly better than random, but Random will lose more over the long run and thereby give back any points that Random has gained. Random's ELO probably "hover" between 0-75 Elo (guestimate).
I reckon it ought to look around until it finds a miserable-looking player called SelfMater.

:
> I reckon it ought to look around until it finds a miserable-looking player called SelfMater.
>
I added: I agree unless there are player in the tournament that are trying to play "Lose Chess"!


> How many categories before we reach Kasparov and Carlsen? How many further categories before we reach perfect play?
> In fact, it is sometimes argued that go is more complex than chess because it has more such categories of expertise.
>
Is the same level reached in both Chess and Go? Is there a Human limit? Is one easier for Humans/People to play than the other (= Higher category achieved for Humans/People in one game (eg. Chess) vs. another game (eg. Go))?
The estimate could be refined but it will always be a shaky estimate until the depth of the "perfect" game can be found.
For instance, let's say a 5800 ELO engine manages to score AT LEAST 25% performance against anything you throw at it, elo requires a 75.1% performance to gain 192 elo points, this mean that 6000 elo would never be achieved, as nothing can hit the necessary performance that elo requires.
They have been talking "draw death" for chess for more than 100 years. Today's best blitz players playing blitz to tournament standard odds with one of those players 100 years ago would win the match.
We have beliefs about proper play and the right technique. I think this is the origin of the draw death myth. But this is just faith in a method, it has no grip if the opponent is not bound by those beliefs and simply calculates deeper and more open-mindedly.
Of course it is not infinite...there is a point at which a "wall" will appear but I think that would be near the solution which as I suggest is way, way out there.
Consider what we have already seen from perfect play. There are 6-man tables that will blow your mind with mates over 200 moves away. Move to 7-man and it is only going to grow. At 32-man, perfection is going to look very peculiar. In fact, a perfect game may be over 2000 moves imperceptibly appearing to gain nanopawn by nanopawn but in actual fact the outcome is already known from the start. Or even more likely going through long cycles of what appear to be pointless even backward moving moves that return to a position virtually identical with one very subtle change just to go into another pointless looking cycle, but the cycles cumulatively lead to real gains.
the draw rate for top engines at long time controls only gets higher and higher as times marches fwd. u'll be alive to witness the day when the draw rate for 40/120 (or even 40/120 for 1 side, vs. 1move/48hrs) exceeds 98% (a la checkers) in all but the very sharpest lines.
"There are 6-man tables that will blow your mind with mates over 200 moves away. Move to 7-man and it is only going to grow."
this is missing the forest for the trees. consider the following: what % of all 6-men positions are actually won/lost? is this % higher or lower than for the 5-men data set? 4-men? 3-men? if this % decreases for each egtb piece added, it strongly suggests the starting position is a draw.
yes, it's true chess isnt proven to be a draw, but overwhelming evidence clearly tells us it's so.
"this is missing the forest for the trees. consider the following: what % of all 6-men positions are actually won/lost? is this % higher or lower than for the 5-men data set? 4-men? 3-men? if this % decreases for each egtb piece added, it strongly suggests the starting position is a draw."
1. That was an "if" which was not demonstrated as an "is"...and you call that "overwhelming" hmm.
2. Even if it were true I see no compelling reason that slippery slope goes anywhere. And the pattern looks to me like it is going the other direction. 2-man (K-K) 100% draw!, 3-man, only KQ-K and KR-K and some KP-K are wins...so perhaps 50%. And there are lots of 4-man wins. I don't think there is a pattern at least nothing compelling with only 6 datapoints. It is defiantly not a line and as we have no idea what shape the curve is, I can't see this as meaning anything. Also, even if the pattern you by faith claim is true, that says nothing about possibility, only probability.
the marathon match of r3 vs. naum had a draw rate of ~75% if i recall. which is higher than the draw rate achieved at 40/120. if they run a marathon match of r4 vs. r3, i'm certain the draw rate will be even higher. or r4 vs. (whatever is the clear #2 non-rykba engine now, if there is one).
1. then it starts begging the question.
2. there are a lot of 4-piece wins in absolute terms. but the question is: out of all the 4-piece (legal) positions that dont have severe material handicaps (e.g. no real pt in counting the 3-1 positions, agreed?), what percent of those are won? same stipulations, except this time we move to 5-piece? to 6-piece? 32-piece? u're right, we dont have data for this beyond 6-piece, but i'd bet this percentage drops for every egtb piece that's added. it's just a belief, but i'm quite certain on this, just like i'm quite certain the game-theoretic value of chess is 0.5.
There are three questionable aspects with the conclusions you have drawn.
1. And this one should be sufficient in itself...It is a well known fact that Naum gains strength from additional time at a faster rate than Rybka. If it gains on Rybka it would be expected that the increased parity would result in more draws. You certainly can't draw conclusions on the basis of only 2 obsolete engines.
2. Virtually all testing of engines by programmers is at fast time controls or the programmers can't get enough games to draw conclusions rationally. This means that virtually all the time of a programmer is used fixing errors that appear in fast games. So the strength does not necessarily carry over and especially the optimizations are for the speed the programmer is testing at. If configured for longer games there would likely be sizable gains in quality.
3. A draw rate for 100 games compared to another draw rate of only 100 games leave considerable doubt that the disparity even exists. Though it does seem likely and expected from 1.
i presume u mean 2.3.2a. there was a more recent one: www.husvankempen.de/nunn/Replay/cegtextreme2.htm
1. does this hold true for r3 (or r4)? by how much?
2. not sure how this argument applies. we're talking about 40/400 matches. even if u use a larger data set, u're still looking at 40/120 matches.
3. like i said, it'd be nice if we had data for r4 vs. [some other top engine] at 40/400. my guess is they havent gotten around to it, altho what puzzles me is they dont even have 40/120 data for r4 or some of the other newer engines.
I guess it is more convenient to ignore my point of Naum gaining more strength faster with added time increasing parity and draw frequency?
There is nothing special about 400 minutes on a Q6600...computers in no time will do the equivalent in 40 minutes. Possibly in as little as 6 months. Dual socket 16-core Bulldozers (32-cores) are going to crunch. I highly doubt that draw frequency is going to rocket.
Still, thanks for the location. I will have a look at those games...I bet they are great.
To resolve this several engines need to be tested at say 5 time rates with the same time structure format (not some sudden death, other increment, and other multiple time controls and such). They should be thoroughly tested under the same conditions and against the same opponents say 5,000 games each. Then see their relative gain in strength with increased time. Find the 2 with the closest match in gain with time and play them at the 40/400 rate.
Or perhaps easier would be to just have a 20 cycle round robin with the top 8-10 engines at 40/40 and at 40/400 and compare the draw rates. The first idea may be faster...not sure. The trick to making the first one work is having more short time control settings.
http://computerchess.org.uk/ccrl/4040/ is kept well updated. I tried to contact CEGT a couple years ago about becoming a tester...they did not even have the decency to reply to my email. If they are slipping...I wonder why.
that was intentional. i was going to ask u for data for this (particularly for r3, and not 2.3.2a, as vas claims r3 scaling is "much improved"), but i decided to let it slide and give u the benefit of the doubt.
"Dual socket 16-core Bulldozers (32-cores) are going to crunch."
we'll have to see if they can outcrunch intel's 12-core. i was shocked to see paul's rig being much faster than his skully (normalized). apparently the skulltrail was bottlenecked by slow memory or the like.
At 40/4 (CCRL) R3=3259 N4=3156 dif= 103 Elo
At 40/20 (CEGT) R3=3184 N4=3093 dif= 91 Elo
At 40/40 (CCRL) R3=3228 N4=3151 dif= 77 Elo
At 40/120 (CEGT) R3=3149 N4=3088 dif= 61 Elo
If we were to project assuming a line; 4 to 40 is a factor of ten, and so is 40 to 400. So 103-77=26 and 77 minus another 26 puts the difference at 51Elo. I would think a difference of only 51 Elo would generate more draws than in the shorter time controls. A logarithmic graph with all four data points might suggest a curve with a more accurate projection of the 40/400 Elo difference. I guess my Excel is a bit rusty.
To add one more data point my tests at 70/1 repeating yield a difference of 176 Elo. But the time has not adjusted for the hardware difference. I calculated it once, but I don't remember the adjustment factor. But the general pattern is consistent.
IPON which is at 5 min but only one thread but with ponder (almost 2 threads?), has a difference of 127 Elo.
As for whether chess is a draw or a win...of course the conventional wisdom is that it is a draw, but the pattern if there has been any regarding tables is the discovery that many of the material imbalances they thought were draws were actually wins when solved. KQ vs KBB is one such example. And consider connect four. If you had a couple chimps playing it and wins and draws scored by computer, I bet the side that moves first would only win only a fraction more than the side that moved second...still it is a solved win.
I am not saying chess is a win, but I think the possibility that it is have been underestimated. There are two forces at work: On the one hand advantage by its very nature has the tendency to grow. This is because with more options statistically there is an increased chance the best choice is better than the best choice in the smaller set of options. By lengthening the game, there is more time for this advantage to reach a win state because each move is an opportunity and chance for a marginally better option which could cumulatively win. The win might be 1.e3!!!!!!!!!! or some other move to slow down the game and extend the endpoint. Humans and current machines might not be able to milk the .05 advantage into a win, but that does not mean it is not a win with crazy accurate play. On the other hand even some advantages of two pieces can be draws. This means that black has a lot of room to lose ground intentionally to steer the game's exchanges into something less fatal. The question is: Does White's bulldozer run out of gas, or does he succumb to Black's force-redirectional-jujitsu before White pushes him off the cliff? I don't know. But I have a favorite ;) I think that tractor has more gas than is suspected ;)
if (assuming) chess is a forced win, do u think it comes from:
1. material imbalance (forced from the opening), or:
2. superior piece positioning (i.e. forced from the opening)
if u think it's #1, then u're right, we cant throw out the positions that are too lopsided. but if it's #2, i dont see why not. to clarify: i mean to throw out the extreme imbalances, like 5+1, or 4+2 (i.e. data no engine needs because they can find mate themselves every single time).
"On the one hand advantage by its very nature has the tendency to grow."
i've been told there are certain openings where white cant grow his (initial) advantage beyond a certain pt. one such example i think is the dutch.
"But I have a favorite"
clearly u do. i'd also love to see chess a proven win, but i'm not holding my breath nor losing any sleep over that small possibility.
it would really be something if someone found such a line. i'd put it right next to someone proving P=NP.
> it would really be something if someone found such a line. i'd put it right next to someone proving P=NP.
Too bad solving chess wasn't one of the Millennium Problems. Even though the answer seems fairly obvious, it'd be the toughest one to prove!
Now throwing out sets that are too one-sided in your calculation; that is stacking the deck and statistically useless.
"if (assuming) chess is a forced win, do u think it comes from:
1. material imbalance (forced from the opening), or:
2. superior piece positioning (i.e. forced from the opening)"
It is the first move...whether that generates material or position is an irrelevant distinction; material and position are the currency of chess and exchanged back and forth. It is like mater and energy in E=mc²
I believe I explained why advantage grows; no need to revisit that.
Any practical problems in specific situations has to do with our inadequacies or our judgment as to what constitutes advantage. We have the tendency to generalize "advantages" one position to the next, but in all lines there are millions of unique advantages each only applying to a very small subset of positions. No person is sufficient an expert yet to realize these small things in specific situations even in single lines...we are all stuck in generalizations both humans and current engines. We are groping in the dark...the machines just have longer arms that move faster. ;)
i realize now if we didnt filter, the % of won positions probably does increase! here's why: suppose we had 10-piece egtb. the subsets are: 9+1, 8+2, 7+3, 6+4, 5+5. if virtually all 9+1, 8+2, and 7+3 positions are won, these dont need to be looked at unless u feel such positions can be forced from the starting position.
"It is like mater and energy in E=mc²"
i kind of see what u're saying, altho i dont see one side being down more than 1-pawn IF chess is a forced win.
"We are groping in the dark..."
it's said chess was in the dark ages before chess progs came about. if u want to say we're less in the dark than before, that would be more reasonable.
> is this % higher or lower than for the 5-men data set? 4-men? 3-men? if this % decreases for each egtb piece added, it strongly suggests the starting position is a draw.
Some conclusions about 3-4-5-men might be gleaned from the attached file, which I downloaded years ago.
I have another file for 6-men but it doesn't have any percentages.
And your estimate on the time controls for dR4 to = 3800 ELO on e.g. AMD Phenom II x4 955 @ 3.2GHz or Intel 980X i7 6x @ 3.33GHz?

as u're aware, it depends on the time controls used. at 40/120, i'd say it's 3400 +/- 100. at 1move/2days, it's definitely lower, maybe ~3200?
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill