Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Does the human Elo system really work for chess engines?
- - By MrKris (****) Date 2021-06-07 20:05
I would say no because the engine Win/Loss ratios of 10 and 15 (** below) are compared with similar human Elo difference where the human W/L ratios are only slightly above and below 2 (* below).

Without further checking I would guess that human rating differences of about 200 to 300 would be needed to see W/L ratios as high as the engines here.
(Note excluding colors-revered-game-pairs with both draws and same-color-wins would significantly increase the below engine rating differences.)
Also perhaps worth checking further is that I think I read somewhere that the LOS (Likelyhood of Superiority) is quite dependent on wins and losses.

Carlsen, Magnus (2847) vs Caruana, Fabiano (2820) = +27 Elo (June 2021 FIDE)
44.5/82: +25 -14 =43  =  54.27%  [52.44% draws]
Carlsen Win/Loss Ratio = 1.79  (1.79 wins per loss*)

41: Carlsen, Magnus white: +13 -5 =23  24.5/41 = 59.76% (white wins 32%)
41: Carlsen, Magnus black: +9 -12 =20  20.0/41 = 48.78% (black wins 22%)
                                                                          from a free online DB
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -

Carlsen, Magnus (2847) vs Radjabov, Teimour (2765) = +82 Elo (June 2021 FIDE)
32.0/54:  +18 -8 =28 = 59.26%  [51.85% draws]
Carlsen Win/Loss Ratio = 2.25  (2.25 wins per loss*)

28: Carlsen, Magnus white: +12 -4 =12  18.0/28 = 64.29% (white wins 43%)
26: Carlsen, Magnus black:  +6 -4 =16  14.0/26 = 53.85% (black wins 23%)


Stockfish-family (ave. 3496) vs Ko. Dragon 2 (3467) = +29 Elo (CCRL 40/15 5june2021)
118.0/212: +31 -3 =174 = 56.73%  [83.65% draws]  Ave. LOS = 96.35%
Stockfish-family Win/Loss Ratio = 10.33  (10.33 wins per loss**)

-          -           -            -             -              -            -

Stockfish 210601/04 (ave. 3734) vs Ko. Dragon 2 (3651) = +83 Elo ( 6june20121)
1227.5/2000: +488 -33 =1479  = 61.38%  [73.95% draws]
Stockfish 210601/04 Win/Loss Ratio = 14.79  (14.79 wins per loss**)

(Of course on a side note all engines fail the Turning Test abysmally with their near zero black win rate.
They also strikingly fail with their high draw rate; Pohl and Nooman have been working on that but engines cannot prepare for opponents like humans can.)
Parent - - By rocket (****) Date 2021-06-21 14:36 Upvotes 1
It doesn't. Engines have different elos depending on position. Kasparov lost to Chess genius due to it being a completely concrete position with just number crunching. The problem trying to take engines into endgames is that it's both a weakness (depth) and strength (number crunching).
Parent - By MrKris (****) Date 2021-06-21 20:53

> It doesn't. Engines have different elos depending on position.

And engines never vie with each other from the initial position, they both just get the (same of course) book end to start with.

Mini-Match: Leela vs Dragon, Muzio Gambit (10|3)
All games 1.e4 e5 2.f4 exf4 3.Nf3 g5 4.Bc4 g4 5.O-O
Lc0 56.5/94
Dragon 37.5/94

The Hillbilly Attack (10|2) - in progress
All games 1.e4 c6 2.Bc4
Lc0 75.5/126
Dragon 73.0/126
(Stockfish 69.0/126
Stoofvlees 69.0/126
Parent - - By user923005 (****) Date 2021-06-21 19:36
You cannot project Elo figures from one pool of contestants to another pool.  The actual numbers are totally arbitrary.  The only meaningful data from a list of Elo figures is the distance between opponents, and then it only matters for people/engines in the same player pool.  You cannot compare Fischer's Fide Elo to Komodo's CEGT Elo because they do not have the same meaning.  This is something of a problem because the Elo figures look similar.

Does the algorithm work on machines?  Sure, exactly the same as it works on people, which is just OK.  It is well known that it is somewhat unpredictable for large Elo differences.

The Elo for CCRL or CEGT has nothing to do with the Elo for Fide or USCF, except that we know players with a bigger Elo number in a pool are more likely to win points than players with a lower Elo.

Even with USCF and Fide and BCF ratings, the values do not mean exactly the same thing.  But with those at least we have enough cross-pollination to make a pretty good stab at a conversion factor.

With something like CCRL, we do not know how those Elo values compare to Fide Elo values in a measured way.
Parent - - By rocket (****) Date 2021-06-23 14:51
Elo varies for humans depending on positions too but take stockfish default: It has 1500 elo understanding for closed strategy.  It does not understand blockades.

It’s positionally these days in the 3000 realm,however. Although it's hard to know where positional knowledge ends and search begins.
Parent - - By MarshallArts (***) Date 2021-07-15 02:41
Maybe 6-7 years ago that was true, but no longer today - or have you missed the intervening developments?
Parent - - By MrKris (****) Date 2021-07-15 03:21

> - or have you missed the intervening developments?

Or have you missed the fact that "intervening developments" have only upped the win rate of the the top engines vs. the rest, relatively flawed engines?
And, that they have added virtually nothing related to chess skill, just more quasi-chess "1500 on multi-GHz and multi-GB RAM"?

8/5bb1/N1r2k2/3p1p1p/p1pPpP2/PpP1P2P/1P5P/6RK w - - 0 1

Its still the same order:

1. Stockfish/strong engines
2. FIDE chess players


1. human composer-level chess problemists (many deceased decades before any computers)
2. Stockfish/strong engines
Parent - - By MarshallArts (***) Date 2021-07-15 23:21
Even the top engines used to be almost hopeless in closed positions. The improvement in that area has been very tangible as far as I can see. I call it the 'closed game barrier', and I think it has been crossed. All that remains perhaps is the 'fortress barrier', and maybe also the 'risk-barrier' (see below).

Saying they added close to nothing to chess skill is to ignore a whole lot of real progress. Of course, one can always produce isolated examples of dumb engine play. I like to watch all the games when I'm testing, and also test select openings I'm familiar with, and I can only conclude that the level and quality of top engine chess is so much higher than 10 years ago.

There is one more (new) problem that the top NNUE engines now have, however, and that is a tendency to play too safely with black, resulting in too many draws against weaker opponents. This is a clear problem that hasn't been overcome yet.
Parent - - By MrKris (****) Date 2021-07-16 00:01
When you posted your 1 sentence here:

I was probably making my attempt (not to say I succeeded) at something interesting for readers posted 16 minutes later:
Parent - - By MarshallArts (***) Date 2021-07-16 02:14
Yes, strong engines can still make mistakes or strange moves, sometimes inexplicably so. They are not perfect or else they would never lose (except from busted openings, of course).

By the way, my one liner was not aimed at you, as it's plain to see. :grin: I know you post a lot of interesting games/positions, and I salute that.
Parent - By MrKris (****) Date 2021-07-16 07:25
Parent - By Vegan (****) Date 2021-07-14 00:14
The only way to estimate the ELO of a human and a computer has to consider the performance of the computer etc

faster computers have improved the ability to play chess somewhat
Up Topic The Rybka Lounge / Computer Chess / Does the human Elo system really work for chess engines?

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill