Not logged inRybka Chess Community Forum
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka and Houdini at 40/120 Timecontrol?
1 2 3 4 5 6 Previous Next  
Parent - - By Kappatoo (****) [de] Date 2012-02-19 17:56

> The consensus a few years ago was 150 Elo. This was expected to decline over time. I suspect that its still 100 Elo or so, which would qualify as significant in my book!


I agree that 100 Elo is significant. It's just that I am not a part of that consensus.

I was talking about cases where the hardware used is equivalent, so to a certain extent we may be talking past each other. I assume the case you have in mind is one where a freestyle player (or team) has more than one computer?
It would be interesting to see how well one could make use of this as a centaur. But 1) I would be surprised if one could turn this advantage into 100 Elo points and 2) given some practice, I think a very strong player will never be worse at this than a weak player.

I'm curious to hear your view on the following: Do you think that the decision process of centaur teams could in principle be automated?
Parent - - By Banned for Life (Gold) Date 2012-02-19 18:12
A typical freestyle team might have two or three members, each with three or more computers. More computers allow running different engines, or the same engine looking at different number of root positions simultaneously. Many teams will have software that distributes the moves in the game as they are made, to prevent wasting time when the opponent moves. Of course the typical engine can't take advantage of multiple computers since cluster engines are not readily available.

I would guess that the advantage of the centaur under the conditions above would still be 100 Elo or so. I believe the strong player has an advantage all else being equal, but all else may not be equal. There are team and database management and bookkeeping issues that come into play, and the strong player may not have these skills.

In theory, the process could be automated (except for the part where two or more moves give similar engine eval, and the tie is broken by someone with a great degree of chess knowledge). This has not been done to date because it would be a lot of effort for little return.
Parent - - By Kappatoo (****) [de] Date 2012-02-19 18:56
Wow, that's a lot of resources. I agree that it takes quite some skill if one wants to make use of these resources. (I partially participated in the forum matches, and I think we never really managed to find an efficient analysis procedure. [Edit: Mainly because no-one put the required effort into implementing one, not because of universal incompetence. But still ...])

> In theory, the process could be automated (except for the part where two or more moves give similar engine eval, and the tie is broken by someone with a great degree
> of chess knowledge).


I am puzzled by the part in brackets. What about good old-fashioned interactive analysis? Is there no place for it in a freestyle team?
Parent - - By Banned for Life (Gold) Date 2012-02-19 19:20
It is assumed that a great degree of interactive analysis will be performed, but that there will still be frequent occasions where it is not possible to settle on one move. At this point you either have to have someone look at the lines and choose, or choose randomly...
Parent - - By Kappatoo (****) [de] Date 2012-02-19 20:58
Okay. Would you agree then that interactive analysis in itself requires (or let's say, its quality correlates with) chess playing strength?
Parent - - By Banned for Life (Gold) Date 2012-02-19 21:13
Certainly. There are a number of cases where chess playing strength comes into play. One would be the aforementioned choosing between equal moves that lead to different games. A second would be pressing interactive analysis on lines that the engine doesn't like. This second rational is becoming less frequent, but it's comforting to think it's still of some importance...
Parent - - By Uly (Gold) [mx] Date 2012-02-19 21:28 Edited 2012-02-19 21:40

> A second would be pressing interactive analysis on lines that the engine doesn't like. This second rational is becoming less frequent, but it's comforting to think it's still of some importance...


But I don't think very high OTB strength is needed for that, with my 1400 elo I've managed to force my ideas into the engines to show that they're better, or at least equal, to their ideas, specially on positions they have trouble with (when there's nothing to do and they just shuffle pieces around), positions in where they don't find the winning moves (e.g. if they trade down the Bishop pair for no reason or go into an opposite Bishop ending), or positions where there's a thematic key move that they showed me before in a different similar variation, but that they don't find in this one, but also works.

Discussion here makes it sound like GM strength is needed for great influence, but I think years of experience with chess engines can replace that even if understanding of what is going on at OTB level is not the same.

(this is about corr chess but I think the general ideas apply)
Parent - By Banned for Life (Gold) Date 2012-02-19 22:07
Your argument has several components. First, it suggests that engines still do stupid things (trade down better positions into drawn engames, or drawn positions into lost endgames, or shuffle pieces around rather than doing useful things. Second, it suggests that engines miss thematic elements on a regular basis. These are all valid points, where perhaps great chess playing ability isn't paramount. I was trying to focus on evaluation of quiet positions, where a GM's pattern recognition skills may pick out nuances missed by Larry K.'s chess by numbers approach, with or without the multi-ply search (with would be the case where the PV at some depth for two positions are evaluated by the GM).

The question for you would be:

Would it be more difficult for the GM to learn your techniques, or for you to add 1200 Elo? :razz:

This is a much different question than: 'Do top GMs know better methods of analyzing with engines?', which Harvey alluded to above (which I am very skeptical about).
Parent - By turbojuice1122 (Gold) [us] Date 2012-02-19 17:19
I think that there are many valid points in the following discussion, and they all support Shahar's statement as probably having quite a lot of truth, in stark contrast to Rob's typical "shoot from the hip" reaction.
Parent - - By Sedat Canbaz (****) [tr] Date 2012-02-18 19:48 Edited 2012-02-18 21:19
Actually it's not so hard to estimate about what will be the Elo points between Human vs Engines

And i can give you another example about the current issue

I think the bellow single-processor engine is around on the same level as Top GM of 2700-2800 Elo human points,right ?:
Chess Tiger 2007

So...in other words,if we will start to test Chess Tiger 2007 engine against Rybka,Houdini,Critter
(i mean if we run the Top MP engines on latest i7 6 core or 12 core machines via Auto232 mode or via online)

And after a such test at 40/120,can we expect the Elo difference to be less than 500 Elo ?
For example,Auto232 engine match between Houdini 2.0c x64 6 core against Chess Tiger 2007 w32 1c

Lets take as example SSDF rating,which is very useful for a such comparison:
http://ssdf.bosjo.net/list.htm

Deep Rybka 4 x64 2GB Q6600 2,4 GHz   3216  32  -29  642  78%  3001
Chess Tiger 2007 256MB Athlon 1200 MHz   2786  22  -23  966  39%  2862


Note:the current SSDF Elo difference is 430 Elo,what will be the Rybka's MP Elo difference on 3 times faster machine ??

For example, if Rybka,Houdini Critter will be played on decent fast hardwares,e.g on i7 990X @ 4.60GHz
Note that in case of running MP engines, i7 990X @4.60 GHz is approx.3 times faster than Q6600 2.40GHz

That means you will get extra approx.130 Elo points

In other words,i expect the SSDF Rating to be:
1.Houdini 2.0c Pro x64 6 core i7 990X @4.60 GHz 3400 Elo
2.Deep Rybka 4.1 x64 6 core i7 990X @4.60 GHz   3350 Elo
3.Critter 1.4 x64 6 core i7 990X @4.60 GHz      3350 Elo
4.Chess Tiger 2007 256MB Athlon 1200 MHz     2786 Elo


Btw,SCCT Auto232's rating has almost same Elo points :)

I really wonder too,in case of participating the Top Human GM in SSDF Rating (e.g Kasparov,Kramnik,Carlsen,Topalov,Anand... )
Probably then, i expect the Top Human Players would be rated around 2700-2850 SSDF Elo points !

Even more than 10 years ago,we noticed that the top engines are started to play as GM levels,here is another proof:



Some Notes:
-The above 'Red' table (based on around 2000 years) does not include Hydra chess engine
-Now we are in 2012 that means the current top engines play much stronger than 2000 years engines
-Many of the Top Engines have been improved to play stronger approx.200-300 Elo points (in that period of time)
-Plus 10 yeas ago,the processors were much slower... .e.g in those years,Fritz Benchmark kns values were around 500-1000 kns
-But nowadays, (on latest fast i7 machines) Fritz Benchmark tool generates around 20.000 kns values

Best,
Sedat
Parent - - By lkaufman (*****) Date 2012-02-20 00:12
I don't think you understand my main point. I'll take your word for it that "Chess Tiger" would rate about 2750 against humans (I don't know that program myself). Then you show that if we set CT to 2750 the top engines might come out at 3400 on engine vs engine rating lists. But that does not tell us what the top engines would rate against humans. It is almost surely just an upper bound. Engines never live up to the ratings expected based on these engine vs engine contests, that's why the Swedish list had to be repeatedly lowered over the years. Based on the stats from those lists, I concluded that you must lop off 25% of any elo gain observed on engine vs engine lists if you want to estimate ratings vs. humans. So in your example we must lop off 25% of 650 leaving a predicted rating of 3237.5. Not bad of course but it shows why the existing CCRL and CEGT rating lists needed a downward adjustment, if the top ratings are to be correct.
Parent - - By Sedat Canbaz (****) [tr] Date 2012-02-20 01:18 Edited 2012-02-20 01:31
It seems you missed to see the red crosstasble (based on 2000 year results) or did not read all my notes

Check again the results please

It does not matter,take any other engine (instead of Chess Tiger 2007) which is rated about 2700-2850 Elo points (based on SSDF,CCRL,SCCT... ratings)
The Elo performance is expected to be rated almost same level as Top GMs

Just i'd like to mention again that 10-12 years ago, the top engines elo performance were around 2650-2750 Elo points

Nowadays,exactly the same engine versions are improved to play stronger at least 200-300 Elo points

In 10 (ten) years period of time,the processors are become at least 15-20 times more faster,that means you will get extra at least 350-400 Elo points

Just imagine in 2012...what will be Elo difference between Top MP Engines vs GMs

Personally i expect to see approx.500-600 Elo difference

Greetings,
Sedat
Parent - - By lkaufman (*****) Date 2012-02-20 01:30
I just don't see how anything you are saying has anything to do with my claim that the rating improvements of engines estimated by methods other than playing humans overstate the ratings they would get against humans. I know quite well that engines had high GM level a decade ago, how does that relate to my claim?
Parent - By Sedat Canbaz (****) [tr] Date 2012-02-20 01:42 Edited 2012-02-20 01:48
Actually our comments are just estimations...

The best answer:its will be great,if there will be a serious mach (played with many games) -Man vs Machine 2012

And its will be a BIG surprise for me, if the Top Engines will be performed bellow than 500 Elo
Parent - By Jedi_Knight (*) [nz] Date 2012-10-24 11:03
"Your data indicates that against humans averaging around 2700 (I think) the top engines performed around 2900 several years ago. Many of these games were by Hydra whose strength is unknown relative to today's software."

I would think it would be pretty easy to gain some insight into Hydra's strength. Just run a simple test. Have two machines.. One with Shredder 8 running at an average of over 2000kNps (which is the equivalent speed used against Hydra in 2004), and the other machine with a modern engine (Rybka, Stockfish, Houdi, Komodo, Strelka, Fritz, Shredder 12 etc take your pick) running at a mere fraction of the power that fueled Hydra's algorithms. Make a pgn file with all eight of the Hydra-Shredder games stripped all moves after Shredder's last book move in each game. Then put whatever modern engine you want to compare with Hydra's performance, in Hydra's place, starting out as white after 17...Qa1+. Ponder ON for both engines. => Now you get to see how a modern engine performs in comparison to Hydra's performance against Shredder 8. I would think that if the modern engine has Shredder 8 showing higher negative scores earlier than it did against Hydra in games 1, 2 and 7, and if the modern engine manages to win some of the games which Shredder was able to draw against Hydra, then it's probably reasonably strong evidence that the modern engine is stronger than Hydra.
Parent - By Sedat Canbaz (****) [tr] Date 2012-02-17 15:41 Edited 2012-02-17 16:03
I've just created another useful crosstable:


Some Notes:
-The standing is based on Computer vs Human GM games, played during 2004-2005-2006 years
-For a better conclusion more games is needed,but however i am impressed by the performance of the Top chess programs
-Even 7-8 years ago,(when the processors were much slower and the engines were much weaker )the Top engines performed approx.200 Elo better than the Top GM
-The current Top MP Engines are expected to be performed (on latest fast hardwares) at least 500 Elo points stronger than the Top Human Players

For more details:
http://en.wikipedia.org/wiki/Human-computer_chess_matches

Best Wishes,
Sedat
Parent - - By Rebel (****) Date 2012-02-29 16:54
1r4k1/p4ppp/3P3q/4P2P/3b1P2/1r6/pPQ3B1/K1B2R2 b - -


Argh, that nice game Rebel played against Anand and then ruining a won position by capturing too early on "b2" thinking it had a won ending.

All well today :eek::eek::eek::eek:

00:00:29  15.00  2.33  1..Bxb2 2.Bxb2 Rxb2 3.Qxb2 Rxb2 4.Kxb2 Qxh5 5.Rc1 h6 6.d7 Qe2 7.Ka1 Qd3 8.Bh3 Qd4 9.Kxa2 Qd5 10.Kb2 Qb5 11.Kc2
00:00:32  15.01  2.33  1..Qe6
00:00:47  15.01  3.29  1..Qe6 2.Bc6 R3b4 3.Re1 Rc4 4.Qd2 Qf5 5.Be4 Rxc1 6.Rxc1 Qxe4 7.d7 Rd8 8.Qb4 Qd5 9.Qe7 Qxd7
Parent - By Sedat Canbaz (****) [tr] Date 2012-03-01 10:28

>Argh, that nice game Rebel played against Anand and then ruining a won position by capturing too early on "b2" thinking it had a won ending.


Agreed...it seems GM Anand was too lucky (against 2000 years machines) :wink:

Hardware:QX950 @3.66GHz:
Parent - - By AWRIST (****) Date 2012-02-21 09:28
I expect Chess Tiger 2007.1's real Elo rating should be at least 2700 Elo (in human terms)
Note:i mean those calculations (where ChessTiger 2007.1 is rated at 2550 Elo) are wrong, which are based on Shredder 12 1c's 2800 Elo
But however, CEGT, SWCR, Clemens (which are based on Shredder 12 1c's 2800 Elo) are doing great job -BIG thanks for their works,efforts... !


Observer's view:

Since we have actually a veritable GM Larry, whatever that means in human terms because he declares even the ipon list reliable without any reported game scores, we can further find arguments in Sedat et al who help to keep some mummies alive, e.g. Virtual Chess and CSTal, who are basically on 2900 level too if we recognize it in human terms and a free definition of Shredder on a fixed Elo in human terms of 2800 Larry terms wise. To my surprise I discovered that Junior as ex-WCh could well be lifted on Elo 4500 human terms Kasparov wise.

Overall it was an ethically brilliant move to outlaw Rybka because how silly all these attempts would look if Rybka had continued to win all the tournaments what would result in a top score of (Larry human terms wise) of 7120! That would be absolutely unacceptable for Fabien, SMK or Christophe Th., also considering the depressing sales quotes right now.:lol:

P.S. It's a must to mention the forgotten SSDF side by side with Ipon at least 5 times a week. Next task: Who is stronger Hydra, Fritz or Rebel or Komodo MP? Is BELLE still competing?
Parent - - By Sedat Canbaz (****) [tr] Date 2012-03-01 10:53 Edited 2012-03-01 11:20

>Since we have actually a veritable GM Larry, whatever that means in human terms because he declares even the ipon list reliable without any reported game scores, we can >further find arguments in Sedat et al who help to keep some mummies alive, e.g.


By lkaufman:
the IPON list is probably the most reliable rating.

By lkaufman:
The IPON list has the best conditions regarding uniformity of opposition, books, hardware etc.

No,no,no... i have the best rating list-40/40: :twisted:

Can anybody prove about my rating list is best (true or not) ???

Rank Name                          Elo   +    -    games  score oppo. draws

   1 Houdini 2.0c Pro x64 6c      3443   21   21   10000   61%  3380   53%
   2 Critter 1.4 x64 6c           3380   22   22   10000   49%  3384   59%
   3 Deep Rybka 4.1 x64 6c        3366   16   16   10000   60%  3309   55%
   4 Stockfish 2.2.2 JA x64 6c    3364   23   23   10000   52%  3353   58%
   5 Ivanhoe 46hm x64 6c          3356   22   22   10000   50%  3358   66%
   6 Robopolito 0.10 x64 6c       3355   23   23   10000   58%  3304   62%
   7 Fire 2.2b xTreme GH x64 6c   3344   23   23   10000   53%  3327   65%
   8 Vitruvius 1.0C HEM x64 6c    3334   24   23   10000   56%  3297   54%
   9 Naum 4.2 x64 6c              3296   29   29   10000   54%  3267   46%
  10 Strelka 5.1 x64 1c           3248   15   15   10000   56%  3212   52%
  11 Chiron 1.1a x64 6c           3235   27   27   10000   48%  3247   50%
  12 Deep Shredder 12 x64 6c      3233   26   26   10000   50%  3236   44%
  13 Deep Fritz 12 w32 6c         3226   25   25   10000   45%  3254   49%
  14 Hiarcs 13.2 w32 6c           3223   28   28   10000   50%  3222   44%
  15 Deep Junior 13 x64 6c        3222   28   28   10000   50%  3225   48%
  16 Protector 1.4.0 x64 JA 6c    3212   31   31   10000   52%  3202   41%
  17 Spike 1.4 Leiden w32 6c      3200   25   25   10000   40%  3263   44%
  18 Spark 1.0 x64 6c             3196   25   25   10000   44%  3234   50%
  19 Komodo 3.0 x64 1c            3194   12   12   10000   50%  3169   39%
  20 Zappa Mexico II x64 6c       3171   26   26   10000   43%  3219   44%
Parent - By Uly (Gold) [mx] Date 2012-03-01 23:26

> Can anybody prove about my rating list is best (true or not) ???


It can't be proven because you don't provide any games, just like the IPON! :lol:
Parent - By Sedat Canbaz (****) [tr] Date 2012-02-16 10:48
SCCT's 3 Type Tournaments (same conditions,exception books,see the Elo differences):
http://sedatchess.110mb.com/index.php?p=1_32

SCCT Hardware Tournament (same conditions,exception hardware speeds,see the Elo differences):
http://sedatchess.110mb.com/index.php?p=1_31

SCCT - Ponder OFF/ON (same conditions,exception Ponder OFF/ON,see the rank differences):
http://sedatchess.110mb.com/index.php?p=1_66

Hope this helps...

Kind Regards,
Sedat
Parent - - By Arrière Pensée (Gold) Date 2012-02-12 19:33
CEGT's credibility has gone south of cheese!
Parent - - By lkaufman (*****) Date 2012-02-12 20:16
Why do you say this?
Parent - - By Werewolf (*****) [gb] Date 2012-02-12 20:19
Larry,
It's great to hear what you and Don are doing and the confidence you have in komodo. Are you expecting to release now in March?
Parent - - By lkaufman (*****) Date 2012-02-12 20:24
Don got MP to compile and now has to debug it. Whether that will take a day or a month I can't say. Probably we will release next version about a week after I get a debugged MP version.
Parent - - By Werewolf (*****) [gb] Date 2012-02-12 20:38

> Probably we will release next version about a week after I get a debugged MP version.


Ah ok, fair enough. I was under the impression from some comments at Talkchess that you were going to wait until after the Peter Skinner Tournament though.

By the way, when you did your recent opening book, how many cores did you have for the IDeA stuff? I've got a poxy 8 and it takes ages and I'm not letting it think for as long as you did.
Parent - By lkaufman (*****) Date 2012-02-12 21:27
I used my old octal for much of the book and my new twelve core for some of it. If you are using 8 cores for IDeA, one per position, you should get thru as many positions per day as I did on my Octal if you used the same time limit, regardless of CPU speed, as long as you keep it busy 24/7. Of course it took most of a year, and the moves in the tails of many long lines were added "on the fly", using Houdini MP since Komodo MP did not exist.
Parent - - By Werewolf (*****) [gb] Date 2012-02-12 20:20

> CEGT's credibility has gone south of cheese!


I love CEGT and I understand their decision but it needs a corrector of +100 elo I think.
Parent - By Arrière Pensée (Gold) Date 2012-02-12 21:39
I would also help if they updated their list.
- By rocket (***) [se] Date 2012-02-12 19:11
Houdini 1.5 pounded by a human here...

One thing I hate about the engine is that it's  oblivious to kingside attacks and blockades as you can see here where it allows f5 opposite an amatuer... when it must exchange on f4 as rybka knows...

http://www.youtube.com/watch?v=gb6hqRBuwf8

There was another win opposite H2 which is not there anymore were the same trick was used.
Up Topic Rybka Support & Discussion / Rybka Discussion / Rybka and Houdini at 40/120 Timecontrol?
1 2 3 4 5 6 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill