Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka Limit Strength Mode.
1 2 Previous Next  
- - By billwv Date 2007-12-04 00:08
Hi All,

I am new to this forum, and have ordered Rybka 2.3.

I have Fritz 10 and Hiarcs 10. My interest in Rybka is its strength for analysis, and its positional playing style.

My question is: What is the experience playing Rybka at limited Elo, for example 1400?

I have played Fritz and Hiarcs in the Fritz GUI Friend Mode. Does anyone have experience playing Rybka in Friend Mode?

Would appreciate any comments on the best way to play Rybka for a not so strong player (maybe 1400 to 1500 on a good day).

Thanks for your help.

Bill
Parent - - By lkaufman (*****) Date 2007-12-04 03:28
You can of course limit Rybka by rating, but then what is the point of buying Rybka, you could just get the free one second per move version which is far stronger than the ratings you mention, or any other free program (of course for analysis, you are right to buy the real Rybka). I suggest you try playing Rybka at full strength but with a material handicap. You can start with queen odds, and if you are able to win regularly (with no takebacks!) then reduce the handicap to rook and knight (computer is white and pieces are removed from the queen's side). After that reduce to two knights and finally to rook. If you can win at rook odds you are no 1500 player!
Parent - - By Nelson Hernandez (Gold) Date 2007-12-04 03:31
Ho-boy, Larry.  I don't know if I could beat Rybka with Queen and two Rook odds.
Parent - - By billwv Date 2007-12-04 04:41
Thanks for the comments -- good points.

As you point out these programs are brutally strong. And the advantage of Rybka is analysis -- which is my primary reason for getting it.

But, I also enjoy being able to play a game, at my own pace (no time pressure), just to enjoy. And then let the program tell me where I might have done better (analysis).

So, the question, for me, becomes: what programs can best simulate a "real" chess game experience for me -- at my level -- without time constraints.

Chessmaster doesn't make it (it just gives away pieces), Fritz series does a real good job with Friend mode. So, I am wondering about Rybka.

The top programs have clearly mastered strength. In my opinion, the challenge, now, is to provide a challenging, "real game" experience for chess players at all levels. That will get mass appeal and further the game of chess. I realize that any weakening of a program's strength is essentially handicapping, but, how well they can hide that is the challenge. What we want is the player to come away from the game with the feeling that "I won, but it was a real hard fight" and not "I just waited for a stupid mistake".

Anyhow, thanks for your comments. I am sure I will enjoy Rybka. I am enjoying all the information in this forum.

Bill
Parent - By InspectorGadget (*****) Date 2007-12-04 06:25
I think it is also possible to load Rybka onto your Fritz GUI and play Rybka in a Friend Mode in your Fritz GUI.

Coming to Chessmaster, it plays stronger and less blunders as you win the games.
Parent - - By Uly (Gold) Date 2007-12-04 20:02
Two good engines that are good at simulating "consistent" weak play are ProDeo and Homer, they aren't going to randomly blunder a piece and then play like super grand masters the rest of the game, but are going to maintain a certain strength. I recommend Homer since it's already weaker, and you'd have to try several parameters on ProDeo to get the desired effect.

I don't think Rybka would work for this (unless you indeed try material odds), you may want to see 1 ply Rybka, it plays instantly yet incredibly powerful (Yet I can beat other engines at 5 ply, for example).
Parent - - By turbojuice1122 (Gold) Date 2007-12-04 20:36
For those rated in the 1500-range, the Turing engine works as a decent opponent.  There are also other amateur engines out there playing at around the 1800-2000 level, and these make good opponents; it's been awhile since I've played much OTB chess, so I've forgotten who these are.
Parent - - By NATIONAL12 (Gold) Date 2007-12-04 21:53
aranha 05 is rated at 1865,the trouble is i cant remember where i got it from.
Parent - By InspectorGadget (*****) Date 2007-12-05 10:55
SMASH is also weaker, I think it is rated between 2000 and 2200. I got it from http://www.superchessengine.com/new_page_5.htm
Parent - - By bayselo (*) Date 2007-12-04 04:13
How do you measure playing strength anyway besides playing tons of chess games?
Parent - - By lkaufman (*****) Date 2007-12-04 06:02
I can usually estimate the rating of an amateur player (let's say anything below 2000) within about a hundred points just by playing a couple of handicap games with him (considering both the results and the nature of the game). The size of the handicap (knight, rook, queen, etc.) needed to make for a game in which both players have decent chances of victory is a remarkably good predictor of actual rating, even for kids with ratings like 500 or so.
Parent - - By bayselo (*) Date 2007-12-05 10:57
Is there anyway to calculate computer chess or human chess strength  with 1 elo accuracy?
Parent - By Vempele (Silver) Date 2007-12-05 11:21
Sure. You just need lots of games. Or a very loose definition of 'accuracy'.
Parent - - By lkaufman (*****) Date 2007-12-05 15:11
We routinely estimate Rybka's strength (relative to an older version) with about a 1 Elo point standard deviation by playing 80,000 games between them overnight. Of course they are rather fast games (!).
Parent - - By Henrik Dinesen (***) Date 2007-12-05 17:30

> Of course they are rather fast games (!).


"Rather"??? Now really ;)
Parent - - By lkaufman (*****) Date 2007-12-05 21:48
Actually they are not quite as fast as you might think, because I have two quads playing the games, which works out to 10,000 games for each of the eight cores, so roughly 1200 games per hour per core which is 20 games per minute or three seconds per game. That's practically correspondence chess! Actually, the level of play is probably better than the average amateur correspondence chess player (unaided).
Parent - - By turbojuice1122 (Gold) Date 2007-12-05 22:12
But even so, we're talking a factor of 7200 faster than tournament time controls, or effectively 12.8 doublings of speed--at 70 elo per doubling, this is around 900 elo, so Rybka is playing at a level of 2000-2100.  Is it really good to have your simulations at that kind of level instead of playing, say, 1000 games overnight at a level of 2500, which is much closer strategically to the kind of games you're trying to simulate?  You'll have larger error bars here, but is there really any reason at all to think that 1-2 elo at 2000-2100 level will equate to improvement at 2500 level or 3000 level?  There would definitely be an intrinsic uncertainty to levels of elo improvement, such that "certain" testing at 2100 level would still give something like an intrinsic uncertainty of (e.g.) 5-15 elo at the 2500 level, no matter what, such that any elo gains less than the intrinsic uncertainty are completely worthless, i.e. are practically just as likely to be harmful as helpful.  One might think that you could take my same argument and talk about it comparing 2500 play with 3000 play, but it's not the same at all: the intrinsic uncertainty between 2500 and 3000 would be smaller than the intrinsic uncertainty between 2100 and 2500, since the difference between the latter is mainly tactics, which can cause wild variation, while the difference in the former is positional play, which would tend to be more certain, in a sense, i.e. less driven by "luck"; furthermore, the difference in intrinsic uncertainty between 2500 and 3000 would definitely be far less than that between 2100 and 3000, which is what you're currently doing.  Thus, if you have 20 elo improvement with 1000 games overnight, I think you can be fairly sure that it's not going to turn into 2 elo improvement at 3000 elo, whereas if you have 20 elo improvement with 80,000 games overnight, there is definitely a reasonable chance that it could turn into 2 elo improvement at 3000 elo.  Obviously there is a balance somewhere between here--perhaps with "big" changes, wait 24 hours or more so that you can get perhaps 4000 games and thus half the uncertainty, or perhaps play overnight at a level of 2360 and also get half of the uncertainty, but also stand a pretty large gain in intrinsic uncertainty.  Sounds like a good Lagrange multipliers problem...
Parent - - By lkaufman (*****) Date 2007-12-05 23:35
Well you do make some valid points, but we are well aware of all this. Most of the successful changes I make are only worth 1-2 Elo points so they cannot possibly be detected by playing just a thousand games or so. We do periodically check to make sure that our cumulative changes help at more realistic speeds like game in 1 minute or such. It's actually a very interesting question (and one that Vas and I often discuss) to what degree changes that help in "hyper-bullet" chess (say game in one second) will prove helpful at normal blitz or even tournament chess. If you compare version 2.3.2a with version 2.3 you can see that our method works, but I do have the strong suspicion that we could improve Rybka much more if we could play the needed 80,000 (let's say) games at game in 1 minute instead of game in 1 second. This will require 60 times faster computers, so we'll have to wait a while for that! So it's fair to say that as computers get faster, not only will Rybka directly benefit from faster searching, but she will indirectly benefit as we become able to make improvements that cannot be detected with current hardware.
Parent - By Vasik Rajlich (Silver) Date 2007-12-06 11:09
Indeed, as Larry says, testing is a complex and important topic.

If you can't play 100,000 games against a wide variety of opponents at long time controls, the bottom line is that you don't know if your change gained an Elo or lost an Elo. You have to rely on your intuition, use as many different feedback loops as possible, and hope that you go forward more often than you go backward.

We do confirm every one or two months that we do at least go forward.

Vas
Parent - - By Marc Lacrosse (**) Date 2007-12-06 10:48
Which interface do you use for performing these ultrabullet tests ?

Marc
Parent - - By Vasik Rajlich (Silver) Date 2007-12-06 11:10
We have our private application for this.

Vas
Parent - - By Marc Lacrosse (**) Date 2007-12-06 12:19
Any chance to have it publicly available (paid or free) ?

Marc
Parent - By Vasik Rajlich (Silver) Date 2007-12-08 17:03
I'll keep it in mind, but probably not. We have our own private engine protocol, etc. It's really only useful for engine authors :)

Vas
Parent - By BB (****) Date 2007-12-05 17:45
by playing 80,000 games between them overnight. Of course they are rather fast games (!).

Everything is relative. Computers can play a game of SCRABBLE(r) quite well [scoring about 40% against optimal play] in 0.1 seconds or less (static considerations dominate in almost all cases, so move generation is the bottleneck). Now that checkers has been solved, someone might try to measure how well perfect play can be emulated via simple heuristics [however, the current proof that checkers is a draw doesn't quite allow this to be done in a completely straightforward manner, as the proof prunes large subtrees for which only a bound on the game-theoretic value is known].
Parent - - By Mark (****) Date 2007-12-05 19:10
When you had mentioned before about improving Rybka by 1 or 2 elo per week, I had no idea you could actually test that accurately.  I thought it was just an average guesstimate.  Have you done any tablebase vs no tablebase testing at this rate?
Parent - - By lkaufman (*****) Date 2007-12-05 20:20
No, but I did play a thousand games at a reasonable fixed depth (anyone can do this themselves, if you do please report the results) and got zero benefit from 5 man tablebase. This is rather strange, even if they are only worth five Elo or so, but is within the margin of error on a thousand games.
Parent - - By BB (****) Date 2007-12-05 21:02
got zero benefit from 5 man tablebase.

There could be some dependence on what Evaluate() does in a 5-piece endgame when the TB is not present. For instance, if there are KRP vs KR recognisers, or B+hP worries, etc. Some programmes might be just guessing about KP vs K (say), as they assume that a lookup will be available. I definitely agree that it would be interesting to see some more results here (they might not reveal the rating equivalent of tablebases, but, rather, might give us an idea of how tablebase-dependent  various engines are).
Parent - By lkaufman (*****) Date 2007-12-05 21:36
Right, but this was using Rybka, which (so far) does not handle such simple endings properly without tablebases, especially at limited fixed depth. If any test should show a benefit to tablebases, this type of test should be the one.
Parent - - By Vempele (Silver) Date 2007-12-05 21:08

> No, but I did play a thousand games at a reasonable fixed depth (anyone can do this themselves, if you do please report the results) and got zero benefit from 5 man tablebase.


That's probably flawed. I thought using tablebases (usually) resulted in higher depths?
Parent - - By lkaufman (*****) Date 2007-12-05 21:40 Edited 2007-12-05 21:44
This test should be fine. Even if we don't try to access tablebases until we're already in the 5 man endgame, Rybka should then play perfectly as opposed to the possibly poor play of a fixed depth search. Even though we're limiting the search to (for example) 5 plies, the tablebase will still report "mate in forty" if that is the case.
Parent - - By Vempele (Silver) Date 2007-12-05 22:00

> This test should be fine.


Suppose every extra ply adds 50 Elo. Further suppose we have a position where 5-man tablebases help so much that you get 3 extra plies.

Fixed depth cripples the tablebase-using version by 150 points in this hypothetical position. Ouch.

Even if we don't try to access tablebases until we're already in the 5 man endgame, Rybka should then play perfectly as opposed to the possibly poor play of a fixed depth search. Even though we're limiting the search to (for example) 5 plies, the tablebase will still report "mate in forty" if that is the case.

Seems like probing inside the tree actually hurts at fixed depth, or the TB-using version should've been stronger. Then again, 1000 games aren't really enough.
Parent - - By lkaufman (*****) Date 2007-12-05 23:24
I don't know much about tablebases, it's not very relevant to my eval work on Rybka. But I thought that if we are doing a five ply search and we reach a tablebase position on ply 4, for example, we just score it from the tablebase. I don't really know what situation you are talking about where a fixed ply search would lose plies due to tablebase access, unless it's something to do with the hash tables. Vas never mentioned anything to me about any possibility of tablebase actually hurting a fixed-depth search.
Parent - - By turbojuice1122 (Gold) Date 2007-12-06 00:13
I don't think I'd expect much from either 5-man tablebases or from testing at fixed depth.  First, isn't it generally true that if you let an engine think for a specified time in a typical position where it has tablebase hits, it will tend to reach a higher depth than in typical positions where it doesn't have tablebase hits (after all, fewer pieces on the board in the former case)?  If so, then in normal games, a noticeably higher percentage of the nodes will be tablebase hits compared with fixed depth tests.  Also, I think that engines would tend to be helped more by hitting tablebases during the evaluation with 6 men than with 5, since there is a higher variety of positions here that the engine can screw up when left on its own (though I don't think that this logic can be extended indefinitely--I highly doubt that there would be a noticeable elo difference in having 32-piece tablebases compared with 30-piece tablebases).  Naturally, the "proper" test would be testing at normal time controls, but nobody has time for that--but I think there is a reasonable chance here that the same results would occur by testing at one-minute games.  Perhaps I'll try this sometime...
Parent - - By lkaufman (*****) Date 2007-12-06 01:01
Perhaps you are right that tablebases actually help more at fixed time than at fixed depth, I don't know, but this certainly does not mean that they are not helpful at fixed depth. And of course you are also right that six man TBs should help more than five man, but since we are talking about short searches, there are a decent number of five man endings that will be misplayed without TBs, and more positions that will be misplayed before actually reaching the five man level. After all, rook and pawn vs. rook is a five man ending that is very common and likely to be misplayed by a short search. Short searches play the endgame really poorly in general.
Parent - By Vasik Rajlich (Silver) Date 2007-12-06 11:15
It's true that tablebases cause cutoffs inside the search and should decrease node counts during fixed-depth searches. It's also true that a tablebase probe is slower than a static eval, so tablebase usage should decrease the nodes-per-second. Both of these effects, however, are absolutely microscopic, we're talking about less than 1 Elo.

The only interesting effect is the higher evaluation quality. This might give 2 or 3 Elo. We know that it doesn't give more than 10 or so.

Vas
Parent - - By Mark (****) Date 2007-12-05 21:29
Thinking about it some more, playing at the rate of 80,000 games overnight (about 2 games or so a second), tablebases would probably do more harm than good due to the access time.
Parent - By lkaufman (*****) Date 2007-12-05 21:42
The test of tablebase benefit should be done at fixed depth if we're talking about hyper-blitz. After the benefit is calculated, one can do a separate calculation to see how much of the observed gain would be lost in normal chess due to the slowdown.
Parent - - By Banned for Life (Gold) Date 2007-12-06 06:14
Larry,

This seems very strange to me. Obviously there are some end games that Rybka can't play where having the 5-man TBs will give an extra 1/2 point or maybe even a full point. This means that in order to end up at zero benefit, there must be cases where the TBs are actually hurting performance. In a straight alpha-beta fixed depth search, this should never happen so there must be a bad interaction between the TBs and some other heuristic being used to reduce the size of the tree. This sounds like something Vas should check into since it may be costing 5-20 Elo (whatever the gain of the TBs ought to be).

Regards,
Alan
Parent - - By Vasik Rajlich (Silver) Date 2007-12-06 11:19
I'm pretty sure that there is no adverse interaction.

Note that one other issue is resignation threshold. If Rybka can't mate with KR v K at low depths, this won't matter if such positions are resigned.

Anyway, I'm not sure why you say that TBs should gain 5-20 Elo. 5 is probably an upperbound.

Vas
Parent - - By Henrik Dinesen (***) Date 2007-12-06 12:42
If memory serves correctly, the difference between Fruit 2.2 - no TB - and 2.2.1 TB, was less than 5.
Parent - - By turbojuice1122 (Gold) Date 2007-12-06 12:58
If I remember correctly, some of that was due to the fact that Fruit 2.2.1 had some slight bugs that Fruit 2.2 didn't have, and that when both were played without tablebase access, Fruit 2.2 tended to perform better by a couple of elo (and here, we're talking about statistical error coming into play, too).
Parent - By Henrik Dinesen (***) Date 2007-12-06 14:02
Ok, might be so... I was unaware of that.
And yes, always the stats ;)
Parent - By Banned for Life (Gold) Date 2007-12-06 16:06
Henrik,

This would not be a fair comparison unless the test were done with constant search depth.

Regards,
Alan
Parent - - By Banned for Life (Gold) Date 2007-12-06 16:05
Vas,

I have no idea what the actual gain in Elo is but I think the argument is sound. There are certainly some cases where TBs will turn a loss into a draw or a draw into a win and maybe even rarely a loss into a win. Its possible, but probably much less likely, that in some cases the TB moves may end up swindling (as Hyatt put it) the opponent less often than Rybka's moves would. Normally, the TB access have a penalty, small or large, due to disk access time, but this should not be a factor for fixed depth games.

So if there is a known advantage with TBs in some games but the net result is 0 Elo, there must be a hidden disadvantage somewhere else to end up with this result. I can see only the following possibilities:

1) The swindle effect listed above (which I consider to be highly unlikely), or
2) The use of TBs can adversely affect the search for whatever reason, or
3) The advantage of using the TBs in the first place is so small that it isn't registering in Larry's test (< 1 Elo). This doesn't seem to be consistent with anecdotal evidence.

Regards,
Alan
Parent - - By lkaufman (*****) Date 2007-12-07 16:53
The margin of error in a thousand games is far more than 1 Elo point. I need something like 60,000+ games to get it down to that level. If the true gain is only 5 Elo, the thousand games I ran would not be nearly enough. It is also possible that I set the resign margin too low, which would further reduce the tablebase benefit.
Parent - - By ernest (****) Date 2007-12-07 17:57

> The margin of error in a thousand games is far more than 1 Elo point.


Statistics to remember are very simple:
1000 games , with 1/3 being draws, give a standard deviation of 9 Elo points.
Thus, the 95% probability range is 18 Elo points (Gauss curve).

For n times 1000 games, that range is divided by the square root of n.
For 1000 games divided by n, that range is multiplied by the square root of n.
Parent - - By lkaufman (*****) Date 2007-12-07 20:00
In our testing, half draws rather than 1/3 is far more realistic.
Parent - - By ernest (****) Date 2007-12-08 13:18

> half draws rather than 1/3


OK, with 1/2 draws, standard deviation is 8 Elo points and 95% probability range is 15 Elo points.

(the numbers are proportional to the square root of the non-draw ratio, so you multiply by sqrt(0.5/0.666) = 0.866)
Parent - By lkaufman (*****) Date 2007-12-08 16:21
That's why I say you need over 60 thousand games to get the standard deviation down to 1 Elo point; according to your numbers we would need 64,000 since the square root of 64 is 8.
Parent - - By Vasik Rajlich (Silver) Date 2007-12-08 17:11
#1 might exist. In fact, it might be possible to have some sort of hybrid approach, where some drawn-but-better tablebase positions are assigned slightly positive scores (and maybe even some lost tablebase positions are assigned higher scores than other lost tablebase positions).

#2 is hard to believe. The basic search mechanics are just not that complicated.

The main thing is #3, I'm quite sure about it. For me, it's not really inconsistent with anecdotal evidence - the anecdotes are all quite rare.

Vas
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka Limit Strength Mode.
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill