Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Naum 3 against Rybka 2.3.2 for CEGT 40/120 Quad List (final)
1 2 Previous Next  
- - By Heinz van Kempen (***) [de] Date 2008-02-22 11:03
Hi all  :)

a good match for Naum 3 here:

[code]CEGT Quad tournament time control  2008

1   Rybka 2.3.2a 64 4CPU  1½½½½1½½½½11½½½½½½01½101½½½½1½10½1½½1½½½½0½½½½½½½½ 28.5/50
2   Naum 3 x64 4CPU       0½½½½0½½½½00½½½½½½10½010½½½½0½01½0½½0½½½½1½½½½½½½½ 21.5/50[/code]

43% and best result from all engines so far against Rybka, a tiny bit better than Zappa Mexico II in the match by Clemens. Naum 2.2 got 30% against Rybka.

http://husvankempen.de/nunn/phpBB2/viewtopic.php?t=938

For comparison more results from Naum 3 up to now.

[code]CEGT Quad tournament time control  2008

1   Naum 3 x64 4CPU        ½½1½½½½½1½½½½½1½1½½10½11½010½½0½½½0½½11½0½½1½½½1½½ 28.0/50
2   Zappa Mexico X64 4CPU  ½½0½½½½½0½½½½½0½0½½01½00½101½½1½½½1½½00½1½½0½½½0½½ 22.0/50[/code]

Another fine result against the first Zappa Mexico version, as Naum 2.2 got 48% against it. For Naum 2.2 against Zappa and others compare here:

http://www.husvankempen.de/nunn/40_120_ratinglist/Quad/4.html

The finished matches can be replayed and downloaded here:

http://www.husvankempen.de/nunn/Replay/matchesip.htm

Overall download update on Sunday.

Interim result:

Naum 3 fought back against Deep Shredder 11:

[code]CEGT Quad tournament time control  2008

1   Naum 3 x64 4CPU            0½½1½10½½½½00½½010101½11½½1111½0 17.5/32
2   Deep Shredder 11 x64 4CPU  1½½0½01½½½½11½½101010½00½½0000½1 14.5/32[/code]

Naum 2.2 got 50% against DS 11.

This match will be finished on Sunday morning with rating list update. After 132 games we have the ELO improvement Alex initially estimated....and of course huge error bars.

Anyway a fine start in all matches. Next oppoents will be Hiarcs 11.2, Deep Fritz 10.1 and Loop M1-P.

Best Regards
Heinz
Parent - By Werewolf (*****) [gb] Date 2008-02-22 12:51
Naum is stronger than I first thought
Parent - - By turbojuice1122 (Gold) [us] Date 2008-02-22 18:55
Wow, 67% draw rate against Rybka and Zappa...

I recently started testing on 1 CPU and have noticed something similar--about 60-70% draws.  This is astoundingly high.  It's interesting, though in retrospect, I'm not so happy that I bought the engine...
Parent - - By Harvey Williamson (*****) Date 2008-02-22 19:09
I wonder if anyone playing ponder on matches can give us a % of ponder hits?
Parent - By Heinz van Kempen (***) [de] Date 2008-02-22 19:19 Edited 2008-02-22 19:32
Hi Harvey,

you will soon see that Naum is not closer to Rybka regarding choice of moves than other top engines including Hiarcs.

I give a copy from another post where I already announced that I let some Linares games analyze by five top engines, this means every single move from GM games by all including full analysis line with 3 minutes for each move.

Hi Per  ,

thanks, looks really that we will have a new second best engine and I doubt that Zappa Mexico II will be better than Naum 3.

Toga is on top of the waitung list and we also wait for Fruit mp, not announced so far as far as I know.

Naum 3 will be finished in two weeks and Toga could start then, if no new Rybka or Deep Fritz will be released before, what I do not expect. If it happens two engines can be also tested at the same time, what would last six weeks instead of three.

It could be also Toga´s turn because Zappa Mexico II only shows the minor improvements expected by Anthony and the new Zappa finished the 2nd CEGT Quad Marathon Championship 40/400 after 42 games with a disappointing result. Werner will give links to this one with rating update on Sunday and when I have posted in CEGT and Rybka forum what will happen probably tomorrow. I will also give remarks about openings and novelties in certain lines based on "The Week in Chess" and others where I collected a top database. Started is the 3rd Quad Marathon match Rybka 2.3.2a against Naum 3. First games will be also available on Sunday. Change here will be that I will play with elected sets from Harry´s new book, giving this way priority to lines en vogue and a frequency of variations chosen in current GM practice. I am doing this in order to see more novelties. But classic variations from the past century when top players where still not influenced by the strong engines will also be included.

Seeing by your efforts with positional and gambit rating lists that a lot of things can still be done this weekend you will also get games from Linares just finished with +ELO 2700 players. They are fully analyzed by the top engines Rybka 2.3.2a, Naum 3, Zappa Mexico II, Deep Shredder 11 and Hiarcs 11.2 mp versions with 3 minutes per move for each engine consecutively. Will also give statistical data regarding this. Commenting such a game this way also lasts averagely more than 24 hours, but who cares.
Parent - - By Debaser (***) Date 2008-02-22 20:01
Only a few games, but it is "blue" with Rybka and Fritz 11, Fritz 11 started this trend and is quite different relative to Fritz 10
          
http://www.computerchess.org.uk/ccrl/4040.live/cgi/engine_details.cgi?match_length=30&print=Details&eng=Naum%203%2064-bit#Naum_3_64-bit

Still Rybka is beating Naum 3     +12−1=13
Parent - - By Uri Blass (*****) [il] Date 2008-02-22 20:42
You choose the worst conditions for Naum(64 bits that rybka earns from it more than naum but only 1 cpu)

If you choose 4 cpus result is more balanced(+7 -2 =23 for rybka)
Maybe if you choose 4 cpu and 32 bits naum has better chances but testers do not test in this way.

http://computerchess.org.uk/ccrl/4040.live/cgi/engine_details.cgi?print=Details&eng=Rybka%202.3.2a%2064-bit%204CPU#Rybka_2_3_2a_64-bit_4CPU
Parent - - By Debaser (***) Date 2008-02-22 20:53
Hi Uri

I was not trying to understimate naum, just that though CCRL statistics (Pond hit and Eval diff) show that naum and fritz are getting similar to rybka, still rybka is better.

A. Naumov already announced it months before the release

http://64.68.157.89/forum/viewtopic.php?topic_view=threads&p=159658&t=17851
Parent - - By Heinz van Kempen (***) [de] Date 2008-02-22 21:15
Hola Debaser :-),

CCRL plays only ponder off games, so we better think about what this ponder hit stats really are.

I think it is better like I am doing now to let engines analyse complete top GM games with same long time (3 minutes per move) and also see the complete main line given. I will give exact amount of hits for same moves, but can already tell you that main lines are mostly different, unlike what I saw when comparing Strelka and Rybka 1.0 beta some time ago, when almost all main lines were the same.

Hopefully we will not see that Anand, Kramnik and others are already Rybka clones :-).

Un muy cordial saludo
Heinz
Parent - - By Debaser (***) Date 2008-02-22 21:30
Hola Heinz

Well, I am a bit lost now, what are pointing out those CCRL statistics then? And the change in the numbers fron Naum 2.2 to Naum 3 and from Fritz 10 to Fritz 11?

Notice that I did not use the word clone ;)

That test about analyzing Morelia-Linares with different engines is very interesting, waiting for the results ;)

Un saludo Heinz
Parent - - By Heinz van Kempen (***) [de] Date 2008-02-22 22:35 Edited 2008-02-22 23:07
Hola Debaser :-),

yes, it seems a bit strange to have ponder stats when no ponder on games are played like the Swedes are doing. As far as I know they take the games with comments and check the main lines, what move is expected, that means one move earlier. Real ponder on would mean to see that an engine already is on ply 18 for example when it is it´s turn again in a match when two computers are linked with a cable.

To show what I am doing take a look here for a first completed game where Anand lost with White pieces against Aronian, choosing the Marshall attack just finished in Linares. There are more games in progress and I will give an update on Sunday or Monday. To gain conclusions and interesting insight, we will need many dozens of this high end games and all kinds of games, that means positional, tactical, endgames and so on. If you like to have a special game analyzed, just post it and I will add it. Might also be a classical one with top players. The test is done with the complete analysis feature in Zappa 64-bit GUI, where one engine after the other calculates the same position (not all at once). Used is a dual core machine AMD X64 4200+, 1024 MB hash, 3 minutes per engine for each ply and engines are 64-bit versions. This is because the Quad machines are busy with the CEGT 40/120 quad rating list and the Marathon matches 40/400. But maybe I will use Quad machines later.

http://www.husvankempen.de/nunn/Replay/gm.htm

We have 68 plies in this game (category tactical) and you have to bear in mind, that many moves in the end were forced, leading to mate.

Statistically still insignificiant, but just for fun here the first stats in category tactical:

Engines choosing the same move than Rybka 2.3.2a mp after three minutes:

Hiarcs 11.2 mp -- 46 identical moves

Zappa Mexico II mp -- 45 identical moves

Naum 3 mp -- 43 identical moves

Deep Shredder 11 -- 40 identical moves

When you check the main lines given I think that so far none of those engines seems to have similar output than Rybka. I will add Deep Fritz 11 to this tests as soon as it is out.

If you have fun to do so, you could also check the GM´s playing most similar to which engines after more games.

Best Wishes
Heinz
Parent - - By Debaser (***) Date 2008-02-22 23:10
Hi again Heinz. Thanks ;)

I understand what you say about CCRL method not being completely correct, it is not the same a depth d, that d-1 or d-2. But do this mean that they are not right and that cannot be taken seriously?

For example, the highest correlation showed so far there was

Strelka 1.8  vs  Rybla 1.0 beta x64    71.4% 0.23

http://www.computerchess.org.uk/ccrl/4040.live/cgi/engine_details.cgi?match_length=30&print=Details&eng=Strelka%201.8#Strelka_1_8
Parent - - By Heinz van Kempen (***) [de] Date 2008-02-22 23:16 Edited 2008-02-22 23:26
Hi :-),

I think they can be taken seriously like CEGT also.

Presentation of stats is good, but when they are based on few games the stats might be misleading. CCRL is aware of this, but many readers not. People will never stop to take premature conclusions, but better wait for many games.

Regarding real ponder on stats, SSDF could give them, but they seem to have a problem with many testers in the past years, so they are not too much up-to-date.

Best Regards
Heinz
Parent - - By Debaser (***) Date 2008-02-22 23:29
With "take them seriously" I was talking about those "dirty" statistics of course, and the conclusions we can draw from them, not the rating list itself.

About SSDF statistics, I cannot see them at the site, do you think that they will post them when they update the list with engines on quad results?

Saludos Heinz
Parent - By Heinz van Kempen (***) [de] Date 2008-02-22 23:38
Hola :-),

yes I understood and agree. People really might think that they run ponder-on games giving stats called ponder stats.

Regarding SSDF. This was my favorite many years ago, but I completely lost interest, because of too less important versions tested there and so I do not even know exactly what is on their website or where we could find more details. When they announced to upgrade to Quad I thought it would become more interesting again. But now I see that they test Quad against very old hardware what on the other hand is understandable in order to have consistency with old results. It is really a pity that they do not have more ressources and testers, as many people still like those ponder on games.

Saludos
Heinz
Parent - By Henrik Dinesen (***) [dk] Date 2008-02-23 08:23
In fact Heinz referred to the "Swedes" - that's not CCRL, but SSDF ;)
Parent - By Shaun Brewer (****) [gb] Date 2008-03-03 22:48
Hi,

in an attempt to clarify ;)

Currently all CCRL games are ponder off, however we take note of the expected move (the move that would have been pondered) when provided by the GUI.

This expected move is used to calculate the ponder hit stats we provide...

All the best

Shaun
Parent - By Ray (****) Date 2008-03-03 22:49 Edited 2008-03-03 22:55
"CCRL plays only ponder off games, so we better think about what this ponder hit stats really are."

It's quite simple. Perhaps "ponder hit" is badly worded. What it means is "expected move hit". Maybe we should re-name it.

In the chessbase GUI for example, when Engine A makes a move, the GUI records in the pgn the move Engine A expects Engine B to respond with. Likewise, when Engine B makes a move, the GUI records in the pgn the move Engine B expects Engine A to respond with. The CCRL programming pulls this information out and presents it. The stats are thus completely valid and very meaningful. And the stats are gathered over potentially hundreds of games and  thousands of moves (although games per engine pair are < 100). So they are good stats, very good, and careful thought has gone into them. Of course, there is always room for improvement

The website has this to say:

Here you can see statistics of expected moves, also called "ponder hit", in CCRL games. When two engines can predict most of the moves in their match, it means that they share similar understanding of chess, similar thinking. Ponder hit statistics shows how exactly similar they are. This data can be collected from simply a database of played games, so it is convenient way to find what engines are similar or different from each other.
It looks simple — just count the predicted moves and divide by number of all moves. It is simple, just there are a few things to consider. First, there are opening moves, where engines don't think. We don't count such moves in this experiment. Second, there are forced moves, where there is no other choice. Such moves should not be counted too. We detect such moves by the time spend on them, so all moves made in 0:00 seconds are not used for this analysis.

Then, there are tablebase moves and mating lines. Such lines are characterized by many forced moves, but they also have many situations where it does not matter what to play. The result is that ponder hit statistics is not so meaningful in such lines. Ponder hit statistics is much more interesting in middlegame positions, where the move choice actually shows engine playing style and understanding. To limit this experiment to middlegame only we exclude all moves made with evaluation of +−9 pawns or more.

Finally, there are boring 50-move lines where engines don't know what to do, but still trying to avoid draw. In those lines engines play shuffle chess and any ponder hit analysis is meaningless. What's worse, just on the 50-th move they will move a pawn to avoid draw, and the shuffle chess continues for another 50 moves. Such cases are difficult to detect automatically, so after few experiments we decided to just ignore the drawn games completely. So, only decided games are used for correlation analysis in our study.
Parent - - By noctiferus (***) [it] Date 2008-02-22 19:10
I didn't buy it, yet.
May you be more explicit, please? Sort of safe playing and drawing attitude?
Parent - - By turbojuice1122 (Gold) [us] Date 2008-02-22 20:51
I have watched some of the games quite closely.  It basically plays as if it has internal negative contempt--it is the Kramnik of the computer chess world (though 250 elo points stronger--though if Naum 3 played Kramnik in a match of 10 games, I would expect 8 games to be draws).  While other "drawish" engines are drawing 45% of their games, here we have Naum 3 drawing 60%.

I am actually cutting off the tests early at 54 games so that I can free up my CPU for other stuff.  The results are +2-4=12 against Rybka 2.3.2a, +2-2=14 against Fritz 11, and +5-4=9 against Shredder 11.  This is all on 1 CPU with a 32-bit operating system at CEGT 40/20.  If you have Rybka 2.3.2a and Zappa Mexico, there is absolutely no reason to buy Naum 3.  While Naum is better at finding draws than Rybka, I think that Zappa Mexico will also find those draws, and Zappa Mexico is also a bit more tactical (if only slightly weaker) than Naum 3. 
Parent - - By Heinz van Kempen (***) [de] Date 2008-02-22 21:25
Hi Turbo :-),

thanks for the results. When you compare with those CEGT 40/20 testers got with Naum 3 X64 4 CPU so far, you will probably confirm that for "power users" at least Naum 3 is a good option.

http://husvankempen.de/nunn/phpBB2/viewtopic.php?t=936

Regarding Kramnik you may be correct regarding playing style, but I doubt that Vladimir could get 8 draws against Naum under equal conditions and with no handicap for Naum, not even against the single 32-bit version. Maybe I underestimate human top players, but they are prone much more to tactical mistakes than top engines.

Best Wishes
Heinz
Parent - By Uly (Gold) [mx] Date 2008-02-22 21:35

> Maybe I underestimate human top players, but they are prone much more to tactical mistakes than top engines.


But if neither Vladimir nor Naum do anything to try to win the game and they stay happy with a draw, game after game, then being able to draw these 8 games sounds likely.
Parent - By turbojuice1122 (Gold) [us] Date 2008-02-23 00:49
I'm referring exactly to what Vytron mentions here: Naum does not seem to "try" to win the game: if it is slightly ahead, it is very content with getting a draw (and actually seems to search for the draw).  It is as if it plays for the draw, hoping that the opponent will make a mistake in attempting to avoid the draw (it has been getting some of its wins by this method)--that is exactly Kramnik's style.  When you match together two opponents in which this is the case, a draw is far and away the most likely result in such games.  In other words, I don't think that Naum on a quad would be over 2900 elo if we restrict its play to top grandmasters, while at the same time, Rybka and Zappa (and probably Hiarcs, too) would be above 3000.
Parent - - By Eelco de Groot (***) Date 2008-02-22 21:59
Well, I think that a chessprogram that is 250 points stronger than Kramnik can not be that bad :) A nice advertisement, although I don't think that Kramnik would necessarily agree with that. I just bought the Naum SP version, so far the 32 bits has even a higher rating in the CCRL tests than the Naum SP 64 bits version, that I like as well, being the owner of a 32-bits operating system only. It would suggest that Naum is not a bitboard engine, although I'm not 100% sure, maybe Alex has written about it somewhere. Not many games yet there on the CCRL pages

http://www.computerchess.org.uk/ccrl/4040.live/cgi/engine_details.cgi?match_length=30&print=Details&eng=Naum%203%2032-bit#Naum_3_32-bit

but beating Fritz 11 4½-2½ and Toga II 1.4 beta 5c 4½-1½ is not a bad start, with zero losses! It must be doing something right... On CCC somebody who had said before he would not buy the engine, came back from this decision watching some of Naum's games and he compared the playing style to Rebel. An improved Rebel is not a bad program!

Regards, Eelco
Parent - By NATIONAL12 (Gold) [gb] Date 2008-02-22 22:06
alex has said that speed up on 64 is only 10%.
Parent - - By Uly (Gold) [mx] Date 2008-02-22 22:28

> On CCC somebody who had said before he would not buy the engine, came back from this decision watching some of Naum's games and he compared the playing style to Rebel. An improved Rebel is not a bad program!


It is only his opinion. As a Pro Deo fan by heart, I can say that Naum doesn't play like Pro Deo.
Parent - - By Eelco de Groot (***) Date 2008-02-22 22:53
Sure Ulysses, Rebel was and stays unique, there is no other program like it I hope, we  need different programs, not programs that all play the same moves. But my point is that I think it is very hard to achieve these kind of results, especially good results against Rybka. Making a program that plays like a coffee house chess monster is not all that difficult but making the same program strong enough to beat all but the best is something entirely different in my opinion. I just want to say that if Naum plays different from Rybka, HIARCS, Toga and Shredder and it is as strong as the rating lists seem to show, I'm not asking my money back. It is after all just the first version of Naum 3, if the customers are not happy with it maybe Alex can make an update although he does not have to do that barring any bugs there might be, not that I have heard of those yet.

Regards, Eelco
Parent - - By Uly (Gold) [mx] Date 2008-02-22 23:14

> But my point is that I think it is very hard to achieve these kind of results, especially good results against Rybka.


I don't care about their scores against Rybka, but about their play style. If Naum has a very drawish playing style, I'm not interested (Doesn't it have some kind of configurable contempt so by changing it Naum tries to avoid draws? It could be interesting.)
Parent - By Heinz van Kempen (***) [de] Date 2008-02-22 23:25
Hi Vytron :-),

percentage of draws seems to depend on opponents. Many against Rybka and Zappa, so you can also blame those two.  Like Uri explained I guess that this is because of the highest level in this matches between top three engines. There are also many draws between Rybka and Zappa by the way.

There are fewer draws against Deep Shredder 11 and so far none against Deep Fritz 10.1 after the first games and when you check all games you will find a lot of highly intersting tactical battles, although when there might be a draw in the end. But you will see more tactics and devastating attacks between 2500 ELO engines or when you have matches between engines of very different strength.

Best Regards
Heinz
Parent - By Eelco de Groot (***) Date 2008-02-23 13:10 Edited 2008-02-23 13:31
I just got my Naum 3 in the mail and there are indeed some options to try, there is a Draw_Contempt_Score  that can be increased and the value of Material can be lowered, that is usually a good way to get more enterprising play and more sacrifices. Also some of the material imbalances in the endgame could be tuned. It will probably not bring much elopoints to make changes but you can influence the style of play with these options, if you want.

Here is the text of Naum's configuration file if you use Naum 3 as a Winboard engine:

##########################################################
# Winboard configuration file for Naum chess engine
# (needs to be in the same directory as naum.exe)
# This file is not used when the engine is in the UCI mode
##########################################################

# Set to 1 to enable pondering (thinking on opponents move)
PONDER = 0

# Set to 1 to enable book learning.
# Note that Naum saves learned info in the book file, so if
# you replace the book file, all the learned info will be lost.
# You should use a separate book file for blitz test tournaments,
# if you don't want less accurate blitz learned info to influence
# openings Naum plays under regular time controls.
LEARN = 1

# Set to 1 to allow engine to resign a game
RESIGN = 1

# If set to 1, engine will not clear the hash tables when position
# on board is changed. This is usefull for analysis, because
# the engine will keep hash entries when user goes backwards
# or forwards on the move list while the engine is in the analysis mode.
# When playing a game this option is ignored, because the engine
# will always keep the hash values.
# Also when the 'new game' command is issued, hash is always cleared.
PRESERVE_HASH = 0

# Transposition table size in megabytes (min 8MB, max 1GB, default 64MB)
TT_SIZE = 64

# Path to endgame tablebase files
EGTB_PATH =

# EGTB cache size in MB (min 1, max 128, default 32)
EGTB_CACHE_SIZE = 32

# Maximum number of threads (CPUs) to use (min 1, max 8, default 1)
MAX_THREADS = 1

# Use positive value to tell Naum the draw is bad, or negative value to indicate the draw is good
# Warning! Using contempt may reduce the playing strength, but it might be good against humans
DRAW_CONTEMPT_SCORE = 0

# Smallest depth in the search tree at which to probe EGTBs.
# Increase this value if the EGTB probing is slowing down the engine.
MIN_EGTB_DEPTH = 3

# Use a positive value to increase the importance of material in the position evaluation.
# Use a negative value to decrease the importance of material compared to the other
# positional factors and king safety.
MATERIAL_IMPORTANCE = 0

# Configurable material evaluation parameters.
# Parameter values are in centipawns and will be added to the default value.
#
MINOR_VS_PAWNS_SCORE = 0
ROOK_VS_PAWNS_SCORE = 0
ROOK_VS_MINOR_SCORE = 0
TWO_MINORS_VS_ROOK_SCORE = 0
THREE_MINORS_VS_QUEEN_SCORE = 0
TWO_ROOKS_VS_QUEEN_SCORE = 0

Naum 3 SP 32-bit has lost just one rating point since yesterday in CCRL 40/40 but with 2981 (+44/-43) after 157 games it is still ahead of Naum 3 SP 64-bit with 2949 (+48/-48) elo. Not without losses anymore against Fritz 11 and Toga II 1.4 beta5c. Thanks to CCRL, Graham I think for the updated list, nice to watch the progress of a program this way if possible!
Parent - - By Uri Blass (*****) [il] Date 2008-02-22 19:19
67% draw is not astonishingly high.
It is logical to expect more draws when the level is higher and
there are often more draws in match for the world championship between humans

The first match between Kasparov and karpov had 40 draws out of 48 games.
Considering the fact that computers today are stronger than humans I wonder what was the reason for the big number of draws
between humans even in matchs for the world championship when a draw with white is a bad result so there is no reason for short GM draws.

Note that even Emanuel lasker had 8 draws out of 10 games in one of his matches (against Carl Schlechter) and it is clear that lasker had good reasons to try to win because he lost in game 5 of the match but he could get only draws in games 6-9

http://he.wikipedia.org/wiki/%D7%AA%D7%9E%D7%95%D7%A0%D7%94:Chess_Lasker_-_Schlechter_Title_Match.jpg

Uri
Parent - - By turbojuice1122 (Gold) [us] Date 2008-02-22 20:45
But this is quite high for computer matches, even at the top level, where percentages over 40% are noted as being higher than typical, and over 50% being quite high.  What's more, this seems to be somewhat consistent versus the various engines, except for Shredder (I am getting that result, too).
Parent - - By Uri Blass (*****) [il] Date 2008-02-22 21:36
percentage over 40% are not higher than typical at the high level(120/40 time control) and I remember that the percentage of draws was even higher in the 400/40 marathon games between rybka and zappa.

http://www.husvankempen.de/nunn/40_120_ratinglist/Quad/qratinglist.html

450 games(40%*450=180

Zappa      222 draws
Shredder   204 draws
Naum        229 draws
Hiarcs       203 draws
Fritz         201 draws
Loop         199 draws
Glaurung    190 draws

Only 3 programs have less than 40%
Junior        169 draws
Rybka        168 draws
Bright        151 draws

Note that Naum3 does not seem to do more draws than Naum2.2
Notice that 19 out of 32 games of Naum3 against shredder11 were not drawn.

CEGT Quad tournament time control  2008

1   Naum 3 x64 4CPU                0½½1½10½½½½00½½010101½11½½1111½0 17.5/32
2   Deep Shredder 11 x64 4CPU  1½½0½01½½½½11½½101010½00½½0000½1 14.5/32

Uri
Parent - By turbojuice1122 (Gold) [us] Date 2008-02-23 00:50
Okay, but that's at tournament time control, where the draw percentage for CEGT testing is a bit higher.  Mine are tests at 40/20 time control.
Parent - By NATIONAL12 (Gold) [gb] Date 2008-02-22 22:09
see my 40/40 results against toga 11 1.4.1 se at 40/40 where draws are 62.5% toga lovers thread.
Parent - - By Banned for Life (Gold) Date 2008-02-22 21:16
It seems to me that the ultra-selective engines, modeled after Rybka, are good at finding good moves and not playing bad moved, but don't do such a good job at finding the best move. If this is true, both avoiding bad moves and not finding the best move will contribute to a higher draw percentage. Some will misinterpret this to mean that engines are approaching perfect play, which I think is far from the truth.

Regards,
Alan
Parent - By Uly (Gold) [mx] Date 2008-02-22 21:31

> It seems to me that the ultra-selective engines, modeled after Rybka, are good at finding good moves and not playing bad moved, but don't do such a good job at finding the best move. If this is true, both avoiding bad moves and not finding the best move will contribute to a higher draw percentage.


I think you're right, and that this approach to the game also produces some boring games; when engines that don't follow Rybka are playing more lively chess. Because best moves are usually aggressive, active or at least they unbalance the position, and other engines play such moves more often, but they perform worse because they also play bad moves more often.
Parent - - By Uri Blass (*****) [il] Date 2008-02-22 21:40
I do not think that you are right and I think that more draws is simply result of higher level.
Test suites when the best moves is a sacrifice are misleading and in most cases the best move is not a sacrifice,

Uri
Parent - - By Ty Nance (**) [us] Date 2008-02-22 21:49 Edited 2008-02-22 21:52
While we may theorize that perfectly played chess leads to a drawn game, I think that a person or a computer can play for a draw, and succeed, without playing perfect chess. If Naum 3 has been optimized to play for a draw, you should consider both options.

In a competitive field where no one can consistently beat Rybka, perhaps the best way to gain Elo points is to consistently draw Rybka while Rybka consistently beats everyone else.

ty
Parent - - By Uri Blass (*****) [il] Date 2008-02-22 22:04
There is one problem with your theory,
Naum does not consistently draw rybka.

Naum3 32 bit  lost 20-10 against rybka 32 bit in one CCRL match when it beated rybka 2.5-0.5 in another tournament by graham bank
so the total result of rybka 32 bits against naum3 32 bits is 22.5-12.5

Uri
Parent - - By Roland Rösler (****) [de] Date 2008-02-22 23:22
Why not 20.5-12.5 ? :-)
Parent - By Uri Blass (*****) [il] Date 2008-02-23 02:04
You are right
It is my mistake when I wrote 22.5
Parent - - By Banned for Life (Gold) Date 2008-02-23 01:07
I don't use test suites so I don't know much about them and I certainly don't spend a lot of time looking for sacrifices, which are frequently played for aesthetic reasons rather than because they were the best move. My conclusions are based on CC games I've played where I've run Rybka to high depths from the root (usually 28+), and then worked backwards along the PV. The backward analysis reveals a better move a reasonably high percentage of the time.

Alan
Parent - - By Uly (Gold) [mx] Date 2008-02-23 01:37

> The backward analysis reveals a better move a reasonably high percentage of the time.


What I do is match the Rybka analysis against the analysis of a more active engine (Like Toga.) The first time I was expecting that Rybka was going to convince the other engine that Rybka's moves were best, but I was surprised that this engine was convincing Rybka that her moves weren't best, and that there were better alternatives. Of course, going back some moves and asking the engine for another move kept the analysis going, and Rybka and the other engine were finding better alternatives to their previous moves. These alternatives were playable moves that put more pressure on the opponent.

Rybka was thinking that the position was a draw, but the moves from the other engine were raising Rybka's scores and Rybka had to accept that her moves weren't best. I found richness in the position that I couldn't have found if I used Rybka alone.
Parent - - By Banned for Life (Gold) Date 2008-02-23 01:43
Yes, I have found the same thing. Rybka's strength seems to be mainly associated with not making bad moves, as opposed to making best moves. I see this behavior to be more conducive to drawing because it relies on the opposing engine to make bad moves. If both engines avoid bad moves and neither is concerned with best moves, a draw is a likely outcome.

Alan
Parent - - By Vasik Rajlich (Silver) [hu] Date 2008-03-03 18:51

> Yes, I have found the same thing. Rybka's strength seems to be mainly associated with not making bad moves, as opposed to making best moves.


Playing the best move in a position is really hard. I doubt that Rybka with an hour on a quad plays the best move even let's say half the time. This isn't some sort of design decision - it's just that chess is hard.

Vas
Parent - - By Banned for Life (Gold) Date 2008-03-03 22:22
Its certainly not a design flaw and seems to work quite well. I was thinking along the lines of comparing "normal" mode with "ultra optimistic" mode. Ultra optimistic mode might play better moves occasionally at the expense of also playing some inferior moves. The more consistent normal mode is stronger.

I attribute the increased draw percentage in engine games to the general decrease in engine bad moves while most seem to attribute it to engines making best moves.

Alan
Parent - By Vasik Rajlich (Silver) [hu] Date 2008-03-08 15:46
Ok, I see what you mean. Yes, this is true. If you take any chess player and change him so that half of his moves are one rank worse and the other half one rank better, he'll play worse overall.

It's not really a matter of playing best moves, although the changed player will play more of those.

> I attribute the increased draw percentage in engine games to the general decrease in engine bad moves while most seem to attribute it to engines making best moves.


Ok - at the moment I don't think that there is much we can attribute to playing tons of best moves. Maybe in ten years.

Vas
Parent - - By Uri Blass (*****) [il] Date 2008-02-23 02:26
some comments:
1)we do not know if rybka is best at the very slow time control that you use
the slowest time control that rybka is tested is 400/40 and you use even slower time control
2)I do not know if you can trust the pv of rybka to be the moves that rybka is going to play later and rybka may have a bug in finding pv moves when fortunately in games she get bigger depth in the next move so she is not going to play them.
If rybka has pv 1.xx yy 2.zz at depth 28 then it does not mean that rybka is going to play 2.zz in a game at depth x<28 so if 2.zz is not the best then it proves nothing if rybka does not play this move.
3)I do not know if other engines are more often correct in finding the best move.
Note that I also felt that rybka is stupid in some correspondence games that I played(I now do not play coorespondence games) but the fact that other engines can help  does not mean that other engines are better in finding the best move and it is possible that they are even worse in the number of times that they can find the best move.

Uri
Parent - - By Banned for Life (Gold) Date 2008-02-23 04:03
1) True, I use about 2^6 times the 10 minutes per move allowed by 400/40.
2) If the PV is wrong (which is not unusual far from the root), this should show up in the backward analysis. The new move is then followed to the same depth prior to starting backward analysis. This is not really critical though. The interesting thing is that backward analysis not infrequently finds a better move at the root. Rybka will stick with this "better" move at the same depths as the original move was calculated to, and with a better eval.
3) True. I have no hard evidence that Rybka is more prone to this behavior than other engines. I have only the impression that Rybka is better at finding bad moves and other engines may be better at finding the best move.

I do know that in a significant fraction of the CC games I play, Rybka will play into a lot of pawn up draws if I just follow her recommendations. Once again, I don't have hard evidence that Rybka is worse here than other engines.

Alan
Up Topic Rybka Support & Discussion / Rybka Discussion / Naum 3 against Rybka 2.3.2 for CEGT 40/120 Quad List (final)
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill