The second edition of Tough Tactical Test with 100 various positions for testing engines. Thanks to Dann Corbit for all help.
Conditions:
Time limit: 15 seconds per position
CPU: i5-3570 (MP engines: 4x3600 MHz, SP engines: 1x3800MHz)
Tablebases: 6-pieces Syzygy, 5-pieces Gaviota, 5-pieces (and ~80 GB of most popular 6-pieces) Nalimov, all on SSD
Strongest version of engine is used: 64 bit, POPCNT, SSE4.2 etc.
256 MB hash, no books, no learning, 50 move rule is enabled
Results:
80 out of 100 = 80% - Black Diamond XI-NN (nn-62ef826d1a6d; Tactical=2)
68 out of 100 = 68% - Black Diamond XI-NN (nn-62ef826d1a6d)
64 out of 100 = 64% - Crystal 190121
64 out of 100 = 64% - Bluefish XI-LP FD (Tactical=2; defensive=off)
52 out of 100 = 52% - SF NNUE halfkp-256 090720 (1403)
51 out of 100 = 51% - Bluefish XI-LP FD
44 out of 100 = 44% - Stockfish 110221
23 out of 100 = 23% - Eman 4.00
23 out of 100 = 23% - Black Diamond XR7
21 out of 100 = 21% - CorChess 3.1 260819
20 out of 100 = 20% - MateFinder 260819
18 out of 100 = 18% - Crystal 260819
15 out of 100 = 15% - Crystal-Honey X5i
14 out of 100 = 14% - Honey X5i
13 out of 100 = 13% - Bluefish FD 100919
13 out of 100 = 13% - AsmFishWCP_2019-07-23
12 out of 100 = 12% - SugaR-NN 130819
11 out of 100 = 11% - Cfish 240719
10 out of 100 = 10% - Komodo 11.01
8 out of 100 = 8% - Sting 18
8 out of 100 = 8% - Stockfish 10
7 out of 100 = 7% - Sting 14
5 out of 100 = 5% - Andscacs 0.95
4 out of 100 = 4% - Spark-1.0
4 out of 100 = 4% - Deep Rybka 4
3 out of 100 = 3% - IvanHoe 9.46b
2 out of 100 = 2% - Stockfish 5
2 out of 100 = 2% - Critter 1.6a
2 out of 100 = 2% - Equinox 3.30
2 out of 100 = 2% - Sting SF 3 VE
2 out of 100 = 2% - Xiphos 0.5.6 SSE
2 out of 100 = 2% - Houdini 1.5a
2 out of 100 = 2% - Rybka WinFinder 2.2
1 out of 100 = 1% - Rybka 3 Dynamic
1 out of 100 = 1% - Naum 4.6
1 out of 100 = 1% - Xiphos 0.5.3
1 out of 100 = 1% - Nirvanachess 2.4
1 out of 100 = 1% - Xiphos 0.6
1 out of 100 = 1% - RubiChess 1.6
1 out of 100 = 1% - Wasp 3.75
1 out of 100 = 1% - Fire 2.2+ xTreme GH
1 out of 100 = 1% - Spike 1.4
1 out of 100 = 1% - Arasan 21.3
1 out of 100 = 1% - Texel 1.08a11
1 out of 100 = 1% - Marvin 3.4.0
1 out of 100 = 1% - Booot 6.3.1
0 out of 100 = 0% - SmarThink 1.98 (1CPU)
0 out of 100 = 0% - Alfil 13.1
0 out of 100 = 0% - Gull 3
0 out of 100 = 0% - Nemorino_5.00
0 out of 100 = 0% - RubiChess 1.5
0 out of 100 = 0% - Fire 7.1
0 out of 100 = 0% - Wasp 3.60
0 out of 100 = 0% - rofChade 2.2
0 out of 100 = 0% - Topple 0.7.3
0 out of 100 = 0% - Pedone 1.9
0 out of 100 = 0% - Rodent III 0.273
0 out of 100 = 0% - Protector 1.9.0
0 out of 100 = 0% - Fritz 11 SE (1CPU)
0 out of 100 = 0% - Laser 1.7
0 out of 100 = 0% - Senpai 2.0
0 out of 100 = 0% - ICE 3.0 (1CPU)
0 out of 100 = 0% - Amoeba 3.0
0 out of 100 = 0% - Bobcat 3.25
0 out of 100 = 0% - RofChade 2.1
0 out of 100 = 0% - Deep iCE 4.0.853
Download: https://www.mediafire.com/file/ehjqnb6jbnwtpin/Tough_Tactical_Test_2_20.02.2021.zip/file
Conditions:
Time limit: 15 seconds per position
CPU: i5-3570 (MP engines: 4x3600 MHz, SP engines: 1x3800MHz)
Tablebases: 6-pieces Syzygy, 5-pieces Gaviota, 5-pieces (and ~80 GB of most popular 6-pieces) Nalimov, all on SSD
Strongest version of engine is used: 64 bit, POPCNT, SSE4.2 etc.
256 MB hash, no books, no learning, 50 move rule is enabled
Results:
80 out of 100 = 80% - Black Diamond XI-NN (nn-62ef826d1a6d; Tactical=2)
68 out of 100 = 68% - Black Diamond XI-NN (nn-62ef826d1a6d)
64 out of 100 = 64% - Crystal 190121
64 out of 100 = 64% - Bluefish XI-LP FD (Tactical=2; defensive=off)
52 out of 100 = 52% - SF NNUE halfkp-256 090720 (1403)
51 out of 100 = 51% - Bluefish XI-LP FD
44 out of 100 = 44% - Stockfish 110221
23 out of 100 = 23% - Eman 4.00
23 out of 100 = 23% - Black Diamond XR7
21 out of 100 = 21% - CorChess 3.1 260819
20 out of 100 = 20% - MateFinder 260819
18 out of 100 = 18% - Crystal 260819
15 out of 100 = 15% - Crystal-Honey X5i
14 out of 100 = 14% - Honey X5i
13 out of 100 = 13% - Bluefish FD 100919
13 out of 100 = 13% - AsmFishWCP_2019-07-23
12 out of 100 = 12% - SugaR-NN 130819
11 out of 100 = 11% - Cfish 240719
10 out of 100 = 10% - Komodo 11.01
8 out of 100 = 8% - Sting 18
8 out of 100 = 8% - Stockfish 10
7 out of 100 = 7% - Sting 14
5 out of 100 = 5% - Andscacs 0.95
4 out of 100 = 4% - Spark-1.0
4 out of 100 = 4% - Deep Rybka 4
3 out of 100 = 3% - IvanHoe 9.46b
2 out of 100 = 2% - Stockfish 5
2 out of 100 = 2% - Critter 1.6a
2 out of 100 = 2% - Equinox 3.30
2 out of 100 = 2% - Sting SF 3 VE
2 out of 100 = 2% - Xiphos 0.5.6 SSE
2 out of 100 = 2% - Houdini 1.5a
2 out of 100 = 2% - Rybka WinFinder 2.2
1 out of 100 = 1% - Rybka 3 Dynamic
1 out of 100 = 1% - Naum 4.6
1 out of 100 = 1% - Xiphos 0.5.3
1 out of 100 = 1% - Nirvanachess 2.4
1 out of 100 = 1% - Xiphos 0.6
1 out of 100 = 1% - RubiChess 1.6
1 out of 100 = 1% - Wasp 3.75
1 out of 100 = 1% - Fire 2.2+ xTreme GH
1 out of 100 = 1% - Spike 1.4
1 out of 100 = 1% - Arasan 21.3
1 out of 100 = 1% - Texel 1.08a11
1 out of 100 = 1% - Marvin 3.4.0
1 out of 100 = 1% - Booot 6.3.1
0 out of 100 = 0% - SmarThink 1.98 (1CPU)
0 out of 100 = 0% - Alfil 13.1
0 out of 100 = 0% - Gull 3
0 out of 100 = 0% - Nemorino_5.00
0 out of 100 = 0% - RubiChess 1.5
0 out of 100 = 0% - Fire 7.1
0 out of 100 = 0% - Wasp 3.60
0 out of 100 = 0% - rofChade 2.2
0 out of 100 = 0% - Topple 0.7.3
0 out of 100 = 0% - Pedone 1.9
0 out of 100 = 0% - Rodent III 0.273
0 out of 100 = 0% - Protector 1.9.0
0 out of 100 = 0% - Fritz 11 SE (1CPU)
0 out of 100 = 0% - Laser 1.7
0 out of 100 = 0% - Senpai 2.0
0 out of 100 = 0% - ICE 3.0 (1CPU)
0 out of 100 = 0% - Amoeba 3.0
0 out of 100 = 0% - Bobcat 3.25
0 out of 100 = 0% - RofChade 2.1
0 out of 100 = 0% - Deep iCE 4.0.853
Download: https://www.mediafire.com/file/ehjqnb6jbnwtpin/Tough_Tactical_Test_2_20.02.2021.zip/file
Hi, apologies for being a noob at this, despite being in computer chess for well over 10 years I have never run test suites like this. How is it done ? Through a GUI ? I have Shredder GUI (SMK) plus a selection of the freely available ones.
It is possible to perform analysis without a GUI, but it is much more commonly done using a GUI.
Pretty much every modern chess GUI has ability to process and EPD test suite.
That includes free GUIs like Arena and professional GUIs like ChessAssistant.
From the Shredder Manual:
"Analyse Positions Shredder can automatically analyze all positions in an EPD, PGN or CBF file. For every position one or more solutions can be defined. When the analysis is completed Shredder displays a statistics with the solution times for all positions. Also two files with the statistics will be generated in the Shredder directory. One is in EPD format and one in CSV format which can be directly imported in many spread sheet programs for further examination."
Pretty much every modern chess GUI has ability to process and EPD test suite.
That includes free GUIs like Arena and professional GUIs like ChessAssistant.
From the Shredder Manual:
"Analyse Positions Shredder can automatically analyze all positions in an EPD, PGN or CBF file. For every position one or more solutions can be defined. When the analysis is completed Shredder displays a statistics with the solution times for all positions. Also two files with the statistics will be generated in the Shredder directory. One is in EPD format and one in CSV format which can be directly imported in many spread sheet programs for further examination."
Thanks, I'll try it.
Thanks.
This isn't quite what you intended but I wanted to test this suite at longer TC and faster hardware.
Ryzen 3950X
30 Threads used for engines.
2 minutes per position.
Stockfish 13
88/100 solved
Average solve time = 12.83s
Black Diamond 13 (Tactical = 2)
90/100 solved
Average solve time = 9.90s
I'll now try Houdini 6 Tactical
This isn't quite what you intended but I wanted to test this suite at longer TC and faster hardware.
Ryzen 3950X
30 Threads used for engines.
2 minutes per position.
Stockfish 13
88/100 solved
Average solve time = 12.83s
Black Diamond 13 (Tactical = 2)
90/100 solved
Average solve time = 9.90s
I'll now try Houdini 6 Tactical
You should try the same version of Black Diamond I used. It will solve about 95-96 positions with your conditions.
Do you have a download link for that engine? I found it but it gets 10% so something is wrong, just wondering if you have a better link please.
Also, do you set tactical = 2 ?
Also, do you set tactical = 2 ?
https://github.com/MichaelB7/Stockfish/releases/download/NN/Windows-AVX2-NN.zip
When you unzip, create folder eval in engines directory and put this net inside https://tests.stockfishchess.org/api/nn/nn-62ef826d1a6d.nnue
When you create UCI engine replace nn.bin with nn-62ef826d1a6d.nnue in engine settings and set tactical=2 for the best results in test suites.
When you unzip, create folder eval in engines directory and put this net inside https://tests.stockfishchess.org/api/nn/nn-62ef826d1a6d.nnue
When you create UCI engine replace nn.bin with nn-62ef826d1a6d.nnue in engine settings and set tactical=2 for the best results in test suites.
Thanks.
Something is very wrong because it's coming up with non-sensical moves and very weak play. I'm wondering if this version has become corrupted somehow.
Something is very wrong because it's coming up with non-sensical moves and very weak play. I'm wondering if this version has become corrupted somehow.
Under the conditions quoted above by me, Black Diamond XI with your net gets:
93 / 100
average solve time = 5.82s
This is incredibly fast and I'm wondering what's so special about that net..
This test suite is fascinating and very well made. However, I wouldn't call it a "tactical" test suite in the traditional sense since usually those test suites were focussing on sacrifices leading to mate, or some other obvious advantage. A lot of the positions in this testsuite seem to be about fortresses or advanced understanding of blocked positions that the older engines don't have.
93 / 100
average solve time = 5.82s
This is incredibly fast and I'm wondering what's so special about that net..
This test suite is fascinating and very well made. However, I wouldn't call it a "tactical" test suite in the traditional sense since usually those test suites were focussing on sacrifices leading to mate, or some other obvious advantage. A lot of the positions in this testsuite seem to be about fortresses or advanced understanding of blocked positions that the older engines don't have.
And the same test again, same engine, everything the same, except 16 threads on my 3950X
97/100
6.79s av solve time.
I just don't get this.
97/100
6.79s av solve time.
I just don't get this.
Black Diamond XI-NN seems to prefer only physical cores.
By the way nice result. I estimated 95-96.
By the way nice result. I estimated 95-96.
> Black Diamond XI-NN seems to prefer only physical cores.
That does seem strange though because even with Intel's inferior hyper threading, modern engines show a slight improvement. With AMD's latest Ryzen it should be clear-cut, especially as Black Diamond is based so much on Stockfish.
Thanks for this new set, dorszcz !!!




Have fun!
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill