- By user923005 (****) [us] Date 2014-04-18 21:14 Edited 2014-04-18 21:22
The opening set is generated by this SQL query from my database:
Epd + ' ' +
dbo.opcode_format('acd', acd) +
dbo.opcode_format('acs', acs) +
dbo.opcode_format('cce', round(coef * 444.0,0))+
dbo.opcode_format('ce', ce) +
dbo.opcode_format('pm', pm) +
dbo.opcode_format('pv', pv) +
dbo.opcode_format('white_wins', white_wins) +
dbo.opcode_format('black_wins', black_wins) +
dbo.opcode_format('draws', draws) +
dbo.opcode_format('Opening', Opening)

WHERE games > 200 -- if nobody plays the position, how interesting is it? Not very.
AND (white_wins + black_wins) > draws * 1.5 -- Does anyone want to watch a match that goes draw, draw, draw, draw... I know I don't
AND len(Epd) >= 44 -- The position should be (within reason) an opening position
AND abs(coef) < 0.1 -- the actual outcomes should not be lopsided (see white_wins, black_wins and draws for better understanding of coef)
AND abs(ce) < 35 -- The analysis score should not be lopsided if we are trying to determine engine strength
AND abs(ce) > 1 -- no "hardwired" draws please. Some engines score draws as +/- 1. Don't ask me why.
AND len(pm) >= 30 -- There should be lots of alternative moves. Only one or two sensible choices means too many highly similar games
AND acd >= 30 -- The position must have been analyzed to at least 30 plies for the ce analysis to be fully trusted
AND acs >= 180 -- We must have analyzed the position for at least 3 minutes (40 moves in 2 hrs is 3 minutes per move on average)
AND NOT Opening is null -- This position should come from some known opening, rather than some off-the-wall starting point so that people can look up the theory on it
ORDER BY abs(coef) -- The most important factor in selection is the actual outcome in games played. Wins for white and black should be fairly close to the same number.

I should point out that the purpose of the file is neither to have a collection of winning openings nor a collection of drawing openings, but to have a set of openings that is extremely fair for engine analysis.  I have found that books used to determine engine strength have thousands of totally won or lost positions, right when you fall out of the book (even at the major testing sites).
Attachment: UFO.EPD - Ultra fair openings EPD file (37k)
Attachment: UFO-DECORATED.EPD - UFO file decorated with analysis (211k)
Attachment: ultra.pgn - PGN version of the file (some of the transpositions may not be the most popular path to the position (102k)
