Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / UCI Limit ELO function
1 2 Previous Next  
- - By tasdourian (**) Date 2008-08-03 14:38
In Rybka 2.3.2a, one could limit the strength with the UCI_ELO function to the range between 1200 and 2400.

Three questions about that function in Rybka 3 ( essentially identical in both in the Chessbase and Aquarium GUIs, right?  Because if not, that would matter to me):

1) Is that going to be the same range in Rybka 3, or will it be different?

2) Larry mentioned in an earlier post that he expects this function will be more accurate (will play closer to the requested ELO number) in Rybka 3 than in Rybka 2.3.2a.  Why is that?

3) An old Chessbase article said that the way its engines play "weakly" when the strength is set low is roughly as follows:  the program tries to lose a piece early on in the game, either by just hanging the piece or via a short combination that a human would likely see, so that it essentially becomes an "odds" game, and then it reverts to full strength once it is down a piece.  This strikes me as one step short of crazy, and would make a player confused and annoyed as to why the supposedly "weaker" setting was doing so incredibly well even though it was down, say, a rook.  How, hopefully in contrast, does Rybka play "weakly"?

Here is an excerpt from the Chessbase article (by Steve Lopez), in case people think I'm making this up, followed by a link to the entire article:

"The problem is that the "little guy", the average club player, got lost in the shuffle. As far back as the late 1980's you could buy a dedicated tabletop chess computer for $200 that could routinely beat 95+% of the world's chessplayers. So the emphasis has shifted a bit -- now programmers need to write chessplaying software that can tear the head off of a top player while still possessing the ability to be "dumbed down" enough to give a beginner or club player a competitive game. And the programmers discovered something interesting: it's actually harder to write a "bad" chess program than it is to write a good one.

Now I'm not talking "lobotomy" bad here, nothing like playing 1.e4 e5 2.Qh5 in the hope of exploiting the inherent weakness of f7 after something like Bc4. I mean "convincingly" bad -- the little mistakes that allow the human player to accumulate those Steinitzian small advantages which we were talking about a few paragraphs back. It's hard to write a program like that for a couple of reasons. While some of these mistakes may be mathematically quantifiable, they might not matter within the search horizon of a software program. For example, a backward pawn in the early middlegame is definitely bad but it might not be exploitable until the late middlegame or endgame twenty or twenty-five moves later. A computer just can't see this far ahead. So how would you, the programmer, handle this? Remember that we have to work with ones and zeros here. You either write the program to tell it that a backward pawn is bad (in which case the program will avoid such a structure like the dickens) or else ignore that positional motif (in which case the program will leave backward pawns all over the board).

It's a classic Catch-22. If you ignore the positional factor of a backward pawn, you wind up with a weak chess program and receive complaints that the error isn't one that a competent human player would make. If you include that motif, you wind up with a program that doesn't make that kind of "subtle" mistake -- and you displease players who want an engine which makes "human-like" errors.

As I said above, and someone else commented in the Internet message board thread that inspired this column, "It's harder to write a bad chess program than it is to write a good one".

So how do programmers set about the task of giving you a "handicap" game? The easiest way to do it -- and one which many programmers have adopted -- is to give you a modified form of an "odds" game. Back in the 1800's, when a strong master was playing against a much weaker opponent, he'd give him "odds": he'd start a game with some of his own material removed from the board in order to even the odds (and that's why these games are called "odds" games). There was nothing disrespectful about it: the weaker player was acknowledging the reputation and ability of the master, while the stronger player was being chivalrous and offering the weaker player the chance at a good competitive game instead of a slaughter.

These days, though, we live in a more "egalitarian" world in which many people don't like to admit that someone else is better at anything. That's why odds games are a thing of the past. It's also why I won't teach adults how to play chess anymore. If you offer them odds, they feel insulted. If you don't give them odds and you beat them, they feel humiliated or that advantage has been taken of them. So they expect unlimited takebacks (and we'll talk more about this in a moment) and, after they beat you (for reasons which we'll elaborate upon in a moment), they crow loudly about how they "beat the chess guy". It's another Catch-22. That's why kids are so cool: you offer them odds and they accept, because they're smart enough to know that they don't know, and that you want to give them a chance.

I'll bet not one (adult) chessplayer in a hundred would sit down to play a game against a computer program giving them odds (that is, with some of the program's material missing at the game's start). So the programmers disguise the "odds": depending on what level is set by the player, the program will deliberately hang x amount of material (either outright or, in the case of Fritz' Sparring mode, through setting up and allowing a material-winning combination to the player) and, once the material is off the board, will play like a house afire thereafter.

This works really well for the average player. But what do you do for a titled player who complains that the program at full strength is "too strong" but wants the program to make "intelligent mistakes" (an oxymoron if ever there was one, but it's an expression I've heard used by many chess software users down through the years)?

That's a really tough task. Programmers have become better at it, though. Fritz' Sparring mode took several years of development. It was the early 1990's when I first heard it was in the works, but it didn't see the light of day until Fritz5's introduction in 1997. Even strong players can make use of Sparring mode -- just set the difficulty level at its highest and Fritz will start looking for multi-move combinations for you to play against it instead of allowing just one-movers.

But even that sometimes doesn't satisfy folks. They still want "intelligent mistakes" of an extremely subtle nature. And, in the ultimate Catch-22, I can still remember the days of the early versions of Fritz when titled players would complain when the program made those kind of mistakes ("anti-positional pawn pushes" was a favorite complaint of these folks when talking about Fritz2). Of course, those mistakes were programming problems that were corrected in later versions. But how does a programmer backtrack to cause his chessplaying creation to make such errors on purpose? "

Entire Article:
Parent - By diskamyl (**) Date 2008-08-05 09:02
I never tried to use uci engines for this. However after some level, around 1800 elo etc, Chessmaster personalities become pretty realistic.
Parent - - By lkaufman (*****) Date 2008-08-05 19:26
The bottom limit I believe is still 1200, I think the top limit will be higher. The ratings might be more accurate simply because I didn't get involved in this issue in the past, and I probably know more about what levels on different engines correspond to what ratings than almost anyone, because I was involved with the rating of many chess computers by the USCF in the past. The UCI_Elo funtion works by setting a search depth that I believe equates to those ratings based on evidence from other programs that earned ratings. Since one ply is now rather strong on Rybka 3, the settings go all the way down to -2 depth (which some people think should have been called one ply to be comparable to other engines). Of course even 1200 is stronger than the average non-tournament chess player. This method of weakening play is probably better than forcing random blunders or playing poor positional moves all the time, but it does havae one defect: the endgame is much weaker than the middlegame on fixed depth. So you may find that if you play Rybka at the appropriate limit level for your rating, you will often feel that you are getting outplayed, only to recover in the endgame. If you find that this is a problem, you can start on a lower level and raise the level when the queens come off. Finally, the ChessBase and Convekta versions come with a material handicap menu and supporting small opening book.
Parent - - By Vempele (Silver) Date 2008-08-05 20:43

> Finally, the ChessBase and Convekta versions come with a material handicap menu and supporting small opening book.

Aquarium also has its own strength-handicap algorithm.
Parent - - By Schiffermueller (*) Date 2008-08-06 16:44
1) What is the algorithm?
2) Does the playing strength depend on the time control if the ELO is selected?
3) Does the speed of play depend on the time control in this case ?

Parent - - By Zruty (*****) Date 2008-08-06 17:51

>1) What is the algorithm?

The basic idea is 'mix' the real chess engine with the 'very human-like' one. The 'very human-like' engine doesn't do any search at all, but it looks at the position and evaluates all moves according to its knowledge. It may also 'overlook' a simple knight fork etc.
So we take X (0<=X<=1) of ENGINE and (1-X) of HUMAN_EMULATOR.
Handicapped engine = X*ENGINE + (1-X)*HUMAN_EMULATOR.
X depends on the rating.

It's not as simple as this, but I don't knoqw much more - I wasn't programming this.
This only applies to skill handicaps, not material ones.

>2) Does the playing strength depend on the time control if the ELO is selected?
>3) Does the speed of play depend on the time control in this case ?

Yes to both questions. But not a confident yes, because I didn't write this.
Parent - - By Nelson Hernandez (Gold) Date 2008-08-06 19:55
I've always thought reducing ELO should be a fairly straightforward exercise.  You start by having some kind of evaluation on all legal moves, and you leave the search and evaluation parameters alone.

Once out of book moves would be divided into "playable" and "not playable" depending on some error tolerance.  E.g. you might say "all moves within 50 centipawns of the PV are playable".  Then, among the playable moves, the next move would be selected totally at random.  Naturally the error tolerance would scale with ELO.  It seems to me you could  easily establish relative ELOs for each error tolerance factor through extensive testing.

Another key factor is the book.  You'd use the same principle.  Book moves would be identified as playable/not playable based on some set of criteria and then selected at random.  You'd probably avoid really grotesque moves except at very low skill levels.

Finally you could create different profiles.  You might have a profile that was strong in the opening but fell apart out of book or vice-versa.  Or a profile that was very inclined toward draws, or had some other salient characteristic.  Each one of these would need to be extensively tested to establish relative ELO as well.
Parent - - By lkaufman (*****) Date 2008-08-06 23:02
This would indeed reduce Elo in a fashion that could be correlated with rating, but I think it is a poor solution. As the margin is increased, the positional play of the engine would degenerate steadily, but the tactical play would remain at a high level. The unfortunate 1200 rated user might steadily outplay Rybka with the parameters set properly, only to suddenly see Rybka announce mate in ten with a spectacular combination that even Anand would not see. I think most people who use this feature want the program to roughly simulate human players at their level. I think that reducing search depth (as we do) comes much closer to doing so, although the ideal might be to combine your method with ours. I'm sure we could do better than just fixed depth for this feature, but it wasn't a priority.
Parent - - By Schiffermueller (*) Date 2008-08-07 10:45
A simple Idea to reduce the playing strength in a well defined manner is to reduce the amount of computation. A fixed depth is not a good measure because the amount of computation for a fixed depth decreases drastically in the endgame. As you say above that leads to a weak endgame. A fixed node count is a much better measure controlling the amount of computation. Such level exists in the Arena - interface. The depth varies depending on the position. The ELO can easy be estimated by using the CEGT rating list and the rule: 60 ELO decrease by halving the time or node count. A still more accurate measure is the number of CPU cycles or CPU instructions executed by the engine.

What I want is not an intransparent and obscure algorithm that reduces the playing strength somehow. What I want is a clear, systematic and well documented way for reducing the playing strength. I think the topic handicap play was somewhat neglected in computerchess until now. Basics like a different time control for both sides or the possibility of very short time limits for the computer are not standard. But the Idea with the material handicap used in the new Rybka Chessbase/Aquarium is a good idea.
Parent - - By lkaufman (*****) Date 2008-08-07 16:04
Yes, I think that fixed node count does make sense for this feature, although it might have a slight bias in the opposite direction from fixed depth, namely that the endgames might be of relatively higher quality. But I think it is much better than fixed depth in terms of approximating uniform quality of play thruout the game.
Parent - - By Vempele (Silver) Date 2008-08-07 19:34

> it might have a slight bias in the opposite direction from fixed depth, namely that the endgames might be of relatively higher quality.

How so? The nps usually increases towards the endgame.
Parent - By lkaufman (*****) Date 2008-08-07 19:53
I guess it depends on whether the human is playing repeating time controls, or something with a sudden death finish (as is more usual in tournaments). With sudden death (even with increment), the human will have to play faster in the endgame generally, so if the engine does not also speed up in the endgame the human will get outplayed. But it's not a big deal.
Parent - - By diskamyl (**) Date 2008-08-07 08:55
I believe that's how chessmaster does it. It plays a random move in a given range, which varies according to the level of handicap. However, the problem with <1800 personalities on there is that, it suddenly gives up a piece (choosing the worst move in the range), than for a while it plays like a grandmaster, and again, makes several blunders, again GM, etc. I had a very frustrating experience with it once playing against a 1600 level personality. It simply gave up a piece, and I was happy, just needed to simplify etc, but then, it totally outplayed me, like I was playing against the real CM with a piece handicap.
Parent - By Schiffermueller (*) Date 2008-08-07 11:21
I had also bad experiences with ELO levels. I don't want to play 100 games comming to the conclusion that the computer play has nothing to do with normal human play or even normal computer play. And the set ELO number has nothing to do with the real strength. Therefore levels that weaken the computer should be clear, systematic and well documented  to avoid frustrating. See my post above. I only use such levels anymore if I am sure that I am not kidded.
Parent - By Schiffermueller (*) Date 2008-08-07 13:59
There is a funny method for the computer to addapt the strength of the opponent. 50 percent of the games playing like god, the other games playing like a beginner. :-) It seems that some chess programs use this method. :-).
Parent - By tasdourian (**) Date 2008-08-05 23:37
Thanks for answering this when you must be so busy.  Very helpful, as always.
Parent - - By Sesse (****) Date 2008-08-06 23:15
I find it interesting that this is hard not only for chess, but for other AI-like tasks as well. It was commented on the Quake 3 Arena bots (not an exact citation, but I believe it comes from John Carmack): "The hard part is not making the bots hit -- anyone can make a bot that blasts you out of the sky from a mile away. The hard part is making the bots miss convincingly."

I wonder if increasing the pruning limits aggressively (a poor player is less likely to see that the queen sacrifice ends in mate-in-five) and zeroing out/reducing evaluation terms (a poor player would certainly care more about material relative to positional factors, for instance) would be good additions to make a more convincing less-than-godly chess player.

/* Steinar */
Parent - - By lkaufman (*****) Date 2008-08-07 05:13
Shortening the search depth (as we do) has a similar effect to increasing pruning limits, because as you get closer to the end of the search, you prune more. So we are already doing the first half of your proposal.
Parent - - By Sesse (****) Date 2008-08-07 10:54
Well, part of it -- you are doing only the sound pruning. I'm talking about making the pruning unsound. :-)

/* Steinar */
Parent - By lkaufman (*****) Date 2008-08-07 16:06
Rybka (and probably most other programs) does plenty of unsound pruning near the leaves, so I don't see the distinction you are making.
Parent - - By Vasik Rajlich (Silver) Date 2008-08-07 21:15
Actually, I had pretty much exactly this idea.

You could just randomly remove legal moves from move lists inside the search. The closer you are to the root, the fewer moves would be removed.

To make this even more sophisticated, certain types of moves would be deleted more often than others. For example, maybe long queen moves would be deleted more often. Or moves of pieces which moved since the root position would be deleted more often. Etc.

This might be quite similar to what humans actually do when they overlook something.

Parent - - By Akorps (**) Date 2008-08-08 06:55
Speaking of long queen moves, something odd occured to me just now when analyzing one of my blitz games. Maybe moves of a piece or pawn which has just moved, and which could have been made instead of the move which was made, should be given a lower priority in the search.

Example, I moved a pawn c7-c6. When I analyzed the game it looked like c7-c5 was better. So even on the next move c6-c5 was one of the high rated moves. But if I wanted to move it to c5, I already missed my best chance and it might be waste of tempo to move that pawn again.

So if a queen has just moved from d1 to d2, for example, you might not want to look as much at moving from d2 to d3 on the next move, since you could have moved d1 to d3 directly on the previous move.

Just an odd idea that never crossed my mind before.
Parent - By Vasik Rajlich (Silver) Date 2008-08-08 18:34
You can look at the games of Karpov for the refutation of this theory :)


ps. Actually, you're thinking the right way, but this heuristic is definitely wrong.
Parent - - By Roland Rösler (****) Date 2008-08-07 16:43
because as you get closer to the end of the search, you prune more.

Okay, sometimes you prune extremly at the beginning of the search. I think, in this postion you prune all the root moves beside one. If you go to 2-variants mode, the best move appears instantly.

k2N4/1qpK1p2/1p6/1P4p1/1P4P1/8/8/8 w - - 0 5
Parent - - By lkaufman (*****) Date 2008-08-07 21:27
This appears to be a case of failure to consider Zugzwang. Since I don't work on the search, I don't really know, but my guess would be that Rybka does not consider the possibility of Zugzwang when the player on move still has his queen. At least, this would be a highly reliable rule (which this problem shows is not perfect) that would explain Rybka's failure to see the Zugzwang at any depth (if that's the case, I can't leave my computer searching to depth infinity to find out!).
Parent - - By Vasik Rajlich (Silver) Date 2008-08-07 21:38
Rybka always handles zugzwang properly (with enough depth of course).

Parent - - By Roland Rösler (****) Date 2008-08-07 22:40
Bravo! :-) I see, it´s a secret. Pruning?
Parent - By Vasik Rajlich (Silver) Date 2008-08-08 18:35
Top secret! :)

Parent - By BB (****) Date 2008-08-09 04:49

>Rybka always handles zugzwang properly (with enough depth of course).

(And assuming there is no Bishop underpromotion :)). The Selesniev classic:
rk6/p2p4/KPRp4/8/3P4/8/8/8 w - - 0 0

1. Rc8+ Kxc8 2. b7+ Kb8 3. d5 is Zugzwang only if Kc7 can be met by axb8=B.
Parent - By Roland Rösler (****) Date 2008-08-07 22:39
She sees the zugzwang immediately in the second variant! So this can´t be the problem. And R2 sees it in first variant.
Parent - - By diskamyl (**) Date 2008-08-07 09:00 Edited 2008-08-07 09:19
I think it should somehow have to do with evaluation. It would be a very good idea to take a serious approach to this from the beginning, planning to make changing the evaluation something very easy, etc. Aside from tactics, a 1400 rated player for example should have very little knowledge about bringing the rooks into the game, on which files they would be needed, when a bishop is actually bad, about exchanges, etc.
Parent - - By lkaufman (*****) Date 2008-08-07 16:07
On average you are probably right, but I have known very low rated players with an excellent understanding of chess fundamentals, they just can't see that their queen is attacked!
Parent - By diskamyl (**) Date 2008-08-08 22:32
Yes, I guess you're right, but it would result in a more persistent level of play if a significant amount of handicapping came from the evaluation. I think chessmaster had such a great potential in that area, they just stopped developing the program after CM 9. (I have got 10th and 11th, and the work that has gone into those two versions is really negligible, except that of Waitzkin.)
Parent - - By Vasik Rajlich (Silver) Date 2008-08-07 09:44
Part of the problem (at least in computer chess) is that it has never been a really top priority.

Parent - - By Sesse (****) Date 2008-08-07 10:52
Yes. You have to pick your battles, as developer time is finite. :-)

/* Steinar */
Parent - By Vasik Rajlich (Silver) Date 2008-08-07 11:00
Actually I'd remove the 's' from battles :)

Parent - - By tasdourian (**) Date 2008-08-07 14:41
I agree completely, but I think it goes deeper than has been discussed here previously, and as a result the potential market for a program that could do ELO limiting/handicaps has been considerably underestimated.

Weaker players (1200-2000) may not be aware consciously of what a program is doing to accurately simulate a lower ELO or to play well with a knight handicap, but if such a program existed people would want to play it much more than other programs out there.  Precisely because current efforts (I don't know about Rybka 3, I'm talking about other programs) are not convinicing (they play alternately too weak and then too strong on lower ELO, or in handicap play they don't try to win but instead try to draw), people don't use those features that much.

When those features start to feel natural, people will use them as a way to improve their chess, especially newer players-- it would feel fun and enjoyable, rather than gimmicky.  That is when real word of mouth happens. 

I know that Rybka is designed for power users, but I have used Fritz and Shredder and etc., and despite the bells and whistles, there are not available playing styles that are fun and realistic for weaker players.  No one is going to buy Fritz for those features or tell a friend about them-- no one is going to say how much better they became on those levels.

Chessmaster is Chessmaster, and no one is arguing you should be going there.  What I am saying however is that there is a much larger, untapped audience for an intelligent approach to the above issues, and that the reason people assume there isn't is because it has never been done well.  It's like the Bill James Baseball Abstracts-- every publisher told him there was no market for books that looked at baseball through a mathematical/statistical lens, but when Ballantine finally published him, he brought in a whole new kind of reader and it easily became the best selling baseball book every year.  That is, he wasn't appealing to the "Chessmaster" crowd, he was attracting a different crowd that didn't even know how much it would love baseball analysis when done correctly.

Thanks again, Vas and Larry, for all of your work.  
Parent - By Schiffermueller (*) Date 2008-08-08 11:31
I agree with you. I miss trancparency, documentation and evidence or simple seriosness in a lot of handicap levels (not Rybka!). There are personalities levels with impressive names like Kasparov, Karpov, Beginner. There are ELO levels... But what's behind all this? Something intelligent or something stupid? The problem is that the user can not easilly test it. After a lot of games he may be frustrated. It would be very motivating to know what he uses beforehand.  For example, it would be very motivating if the strength of the ELO level is validated by computer tournaments. So I can say without doubt: I have beaten an 1800 opponent. Tomorrow I try 1900.
Parent - By Schiffermueller (*) Date 2008-08-08 07:41
It is hard letting the computer play human like. But it is not hard letting the computer play just weaker. I think before we try to invent sophisticated algorithms for human like play we should do our homework: Just simulate a slow computer.
I imagine a level where I can adjust the simulated speed directly or indirectly by ELO. In the seccond case the map from ELO to speed should be transparent. Because nobody want to be kidded.
Parent - - By BB (****) Date 2008-08-09 04:35

>The hard part is making the bots miss convincingly.

Steven Lopez discussed the problem of losing intelligently in a 2005 ChessBase article.
Parent - By BB (****) Date 2008-08-12 07:14

>Steven Lopez discussed the problem of losing intelligently in a 2005 ChessBase article.

Oops, this was already given as the source of information from the original poster... Why do I never read the first post in a thread??
Parent - - By Zarkon (***) Date 2008-08-09 01:47
I think the UCI_LimitStrength implementation in Rybka would be improved if there were a time pause before response or some other form of slowdown (as with Hiarcs, Shredder and other engines). As it is, Rybka responds almost instantly. This is no fun to play against! And it is not like playing someone with the limited strength.

Parent - - By tasdourian (**) Date 2008-08-09 02:45
I agree. That "instantaneous response" is one of the most unnerving aspects of chess software. When strength is supposed to be a maximum, it's fine.  But when playing against a "1500" player, a decent pause, of 15-30 seconds or so, would be more civilized.  Perhaps an optional "wait time" setting in Aquarium or Rybka 4?
Parent - - By Vasik Rajlich (Silver) Date 2008-08-09 16:09
To really do this right, you'd also need the computer to start talking trash, maybe smelling a bit, etc :)

Parent - - By Zarkon (***) Date 2008-08-10 03:10 Edited 2008-08-10 03:14
lol :)

But seriously, I like to use the limit UCI option for playing against, but without the slowdown I won't use it (I'm not trying to be contrary here - I really do find it unusable!).

I suspect when designing the strongest possible engine introducing variable time lags for moves is not your highest priority! :)
Parent - - By Vasik Rajlich (Silver) Date 2008-08-10 07:55
Actually, this could be done, it's not a lot of work.

Of course, some users will then complain that the engine should just play immediately and not waste their time, and I don't think that this deserves an engine parameter :)

Parent - - By Zarkon (***) Date 2008-08-10 10:03
You can't please everyone :) Perhaps if you decide to look at it you could do a poll (among users that use the option) to see if there is a strong consensus in favour of adding lag.

I wonder if it's also possible to emulate the "styles" of grandmasters more scientifically than has previously been done. I mean wouldn't it be interesting to go back in time and play against Capablanca? I guess that's another thread though... 
Parent - By Vasik Rajlich (Silver) Date 2008-08-11 20:37

> You can't please everyone  Perhaps if you decide to look at it you could do a poll (among users that use the option) to see if there is a strong consensus in favour of adding lag

If we have a bit more time before Rybka 4, we can consider these sorts of bells and whistles.

> I wonder if it's also possible to emulate the "styles" of grandmasters more scientifically than has previously been done. I mean wouldn't it be interesting to go back in time and play against Capablanca? I guess that's another thread though... 

These would be for Larry. So far we've been concentrating on "serious analysis" stuff, but maybe we'll expand a bit at some point.

Parent - - By Schiffermueller (*) Date 2008-08-11 12:52
It should not be an engine parameter but a parameter of the GUI. In Fritz GUI it is.

Also the UCI_LimitStrength parameter should not be an engine parameter, I think. There are engine parameters to control the strength : fixed depth, fixed nodes, time control. An ELO level should be part of the GUI instead. The GUI maps then the ELO to these parameters.
Parent - By Vasik Rajlich (Silver) Date 2008-08-11 20:38
Yes, of course this is the idea way. This was the idea behind UCI2, but it's not widely accepted by the interfaces.

Up Topic Rybka Support & Discussion / Rybka Discussion / UCI Limit ELO function
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill