- - By ssace (**) Date 2010-02-06 01:38
I have Arena w/Stockfish quad x64, Rybka quad x64, and a few other engines.

Trying to find an introduction (reading, how to, etc) on doing engine v engine matches.  I know how to load engines and launch matches in Arena.
I have HS Mainbook and 3-4-5 endgame tablebases installed.  Not sure what I'm doing wrong but every time I do matches, my engine ratings in ELO Stat
are never consistent and don't look right.  For example, I did Rybka v Stockfish tournament last night and ELO Stat said rybka's elo was 2700 and Stockfish 2100.
I know these engines should be 3000+ on my machine.  Wierd numbers like 2356, 2100, 2200, 2700 don't seem right.

What is the best way to set up and get the most ELO?  What are the best parameters to set up and test engines ELO?  How many games need to be played
to see an ELO of 3000+?
Parent - - By Uly (Gold) Date 2010-02-06 01:56
ELO stat doesn't have an idea of the strength of the engine, it only can tell you their relative strengths, that is, if there's a 200 elo difference between the engines it's the same that ELOstat shows "2300 and 2500" or "2500 and 2700" or "3200 and 3400", they are equivalent, what matters is the strength difference.

What the rating lists do is "calibrate" the rating output, so, say (this is an made up example), they say that "Fritz 8 Bilbao should be at 2800 elo" and then they run all the games betwen all the engines, and if at the end Fritz 8 Bilbao ends with 2500 elo, they add 300 to all the ratings.

That's how ratings work.

Anyway, make sure that you're running your matches with Ponder/Permanent Brain OFF, and you may want to set Rybka 3 to 64MB RAM, unless you do monitor your memory to ensure there's no memory swapping, because Rybka is greedy and likes using much more RAM than specified.
Parent - - By dragon49 (****) Date 2010-02-06 03:19
When running elostat, it asks for a "start elo" number.  I understand the numbers are relative, but what is a good number to put there, that would give engines ratings (assuming that the pgn databases contained many games with top engines playing the same number of games playing against each other) that are close to their currently accepted ratings?
Parent - By Uly (Gold) Date 2010-02-06 05:43
There is no good number, if you don't mind looking at low ratings 2500 should be fine, if you like looking at high ratings you call tell it to use 3000, you can even start at 0 and get negative ratings.

You can calibrate them to match the CCRL rating list or another one, but that doesn't have any benefit.
