Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Test Positions / Tough Tactical Test 2
- - By dorszcz (**) Date 2021-02-20 18:19 Upvotes 1
The second edition of Tough Tactical Test with 100 various positions for testing engines. Thanks to Dann Corbit for all help.

Conditions:
Time limit: 15 seconds per position
CPU: i5-3570 (MP engines: 4x3600 MHz, SP engines: 1x3800MHz)
Tablebases: 6-pieces Syzygy, 5-pieces Gaviota, 5-pieces (and ~80 GB of most popular 6-pieces) Nalimov, all on SSD
Strongest version of engine is used: 64 bit, POPCNT, SSE4.2 etc.
256 MB hash, no books, no learning, 50 move rule is enabled

Results:
80 out of 100 = 80%  -  Black Diamond XI-NN (nn-62ef826d1a6d; Tactical=2)
68 out of 100 = 68%  -  Black Diamond XI-NN (nn-62ef826d1a6d)
64 out of 100 = 64%  -  Crystal 190121
64 out of 100 = 64%  -  Bluefish XI-LP FD (Tactical=2; defensive=off)
52 out of 100 = 52%  -  SF NNUE halfkp-256 090720 (1403)
51 out of 100 = 51%  -  Bluefish XI-LP FD
44 out of 100 = 44%  -  Stockfish 110221
23 out of 100 = 23%  -  Eman 4.00
23 out of 100 = 23%  -  Black Diamond XR7
21 out of 100 = 21%  -  CorChess 3.1 260819
20 out of 100 = 20%  -  MateFinder 260819
18 out of 100 = 18%  -  Crystal 260819
15 out of 100 = 15%  -  Crystal-Honey X5i
14 out of 100 = 14%  -  Honey X5i
13 out of 100 = 13%  -  Bluefish FD 100919
13 out of 100 = 13%  -  AsmFishWCP_2019-07-23
12 out of 100 = 12%  -  SugaR-NN 130819
11 out of 100 = 11%  -  Cfish 240719
10 out of 100 = 10%  -  Komodo 11.01
8 out of 100 = 8%  -  Sting 18
8 out of 100 = 8%  -  Stockfish 10
7 out of 100 = 7%  -  Sting 14
5 out of 100 = 5%  -  Andscacs 0.95
4 out of 100 = 4%  -  Spark-1.0
4 out of 100 = 4%  -  Deep Rybka 4
3 out of 100 = 3%  -  IvanHoe 9.46b
2 out of 100 = 2%  -  Stockfish 5
2 out of 100 = 2%  -  Critter 1.6a
2 out of 100 = 2%  -  Equinox 3.30
2 out of 100 = 2%  -  Sting SF 3 VE
2 out of 100 = 2%  -  Xiphos 0.5.6 SSE
2 out of 100 = 2%  -  Houdini 1.5a
2 out of 100 = 2%  -  Rybka WinFinder 2.2
1 out of 100 = 1%  -  Rybka 3 Dynamic
1 out of 100 = 1%  -  Naum 4.6
1 out of 100 = 1%  -  Xiphos 0.5.3
1 out of 100 = 1%  -  Nirvanachess 2.4
1 out of 100 = 1%  -  Xiphos 0.6
1 out of 100 = 1%  -  RubiChess 1.6
1 out of 100 = 1%  -  Wasp 3.75
1 out of 100 = 1%  -  Fire 2.2+ xTreme GH
1 out of 100 = 1%  -  Spike 1.4
1 out of 100 = 1%  -  Arasan 21.3
1 out of 100 = 1%  -  Texel 1.08a11
1 out of 100 = 1%  -  Marvin 3.4.0
1 out of 100 = 1%  -  Booot 6.3.1
0 out of 100 = 0%  -  SmarThink 1.98 (1CPU)
0 out of 100 = 0%  -  Alfil 13.1
0 out of 100 = 0%  -  Gull 3
0 out of 100 = 0%  -  Nemorino_5.00
0 out of 100 = 0%  -  RubiChess 1.5
0 out of 100 = 0%  -  Fire 7.1
0 out of 100 = 0%  -  Wasp 3.60
0 out of 100 = 0%  -  rofChade 2.2
0 out of 100 = 0%  -  Topple 0.7.3
0 out of 100 = 0%  -  Pedone 1.9
0 out of 100 = 0%  -  Rodent III 0.273
0 out of 100 = 0%  -  Protector 1.9.0
0 out of 100 = 0%  -  Fritz 11 SE (1CPU)
0 out of 100 = 0%  -  Laser 1.7
0 out of 100 = 0%  -  Senpai 2.0
0 out of 100 = 0%  -  ICE 3.0 (1CPU)
0 out of 100 = 0%  -  Amoeba 3.0
0 out of 100 = 0%  -  Bobcat 3.25
0 out of 100 = 0%  -  RofChade 2.1
0 out of 100 = 0%  -  Deep iCE 4.0.853

Download: https://www.mediafire.com/file/ehjqnb6jbnwtpin/Tough_Tactical_Test_2_20.02.2021.zip/file
Parent - - By Ray (****) Date 2021-02-21 03:57
Hi, apologies for being a noob at this, despite being in computer chess for well over 10 years I have never run test suites like this. How is it done ?  Through a GUI ?  I have Shredder GUI (SMK) plus a selection of the freely available ones.
Parent - - By user923005 (****) Date 2021-02-21 06:03
It is possible to perform analysis without a GUI, but it is much more commonly done using a GUI.
Pretty much every modern chess GUI has ability to process and EPD test suite.
That includes free GUIs like Arena and professional GUIs like ChessAssistant.

From the Shredder Manual:
"Analyse  Positions Shredder  can  automatically  analyze  all  positions  in  an  EPD,  PGN  or  CBF  file.  For  every  position  one or  more  solutions  can  be  defined.  When  the  analysis  is  completed  Shredder  displays  a  statistics  with the  solution  times  for  all  positions. Also  two  files  with  the  statistics  will  be  generated  in  the  Shredder directory.  One  is  in  EPD  format  and  one  in  CSV  format  which  can  be  directly  imported  in  many  spread sheet  programs  for  further  examination."
Parent - By Ray (****) Date 2021-02-21 06:20
Thanks, I'll try it.
Parent - - By Carl Bicknell (*****) Date 2021-02-21 14:13
Thanks.

This isn't quite what you intended but I wanted to test this suite at longer TC and faster hardware.

Ryzen 3950X
30 Threads used for engines.
2 minutes per position.

Stockfish 13
88/100 solved
Average solve time = 12.83s

Black Diamond 13 (Tactical = 2)
90/100 solved
Average solve time = 9.90s

I'll now try Houdini 6 Tactical
Parent - - By dorszcz (**) Date 2021-02-21 14:58
You should try the same version of Black Diamond I used. It will solve about 95-96 positions with your conditions.
Parent - - By Carl Bicknell (*****) Date 2021-02-21 18:22
Do you have a download link for that engine? I found it but it gets 10% so something is wrong, just wondering if you have a better link please.

Also, do you set tactical = 2 ?
Parent - - By dorszcz (**) Date 2021-02-21 18:44
https://github.com/MichaelB7/Stockfish/releases/download/NN/Windows-AVX2-NN.zip
When you unzip, create folder eval in engines directory and put this net inside https://tests.stockfishchess.org/api/nn/nn-62ef826d1a6d.nnue
When you create UCI engine replace nn.bin with nn-62ef826d1a6d.nnue in engine settings and set tactical=2 for the best results in test suites.
Parent - By Carl Bicknell (*****) Date 2021-02-21 19:22
Thanks.

Something is very wrong because it's coming up with non-sensical moves and very weak play. I'm wondering if this version has become corrupted somehow.
Parent - - By Carl Bicknell (*****) Date 2021-02-23 19:24
Under the conditions quoted above by me, Black Diamond XI with your net gets:

93 / 100
average solve time = 5.82s

This is incredibly fast and I'm wondering what's so special about that net..

This test suite is fascinating and very well made. However, I wouldn't call it a "tactical" test suite in the traditional sense since usually those test suites were focussing on sacrifices leading to mate, or some other obvious advantage. A lot of the positions in this testsuite seem to be about fortresses or advanced understanding of blocked positions that the older engines don't have.
Parent - - By Carl Bicknell (*****) Date 2021-02-24 11:01
And the same test again, same engine, everything the same, except 16 threads on my 3950X

97/100
6.79s av solve time.

I just don't get this.
Parent - - By dorszcz (**) Date 2021-02-24 14:54
Black Diamond XI-NN seems to prefer only physical cores.
By the way nice result. I estimated 95-96.
Parent - By Carl Bicknell (*****) Date 2021-02-24 15:19

> Black Diamond XI-NN seems to prefer only physical cores.


That does seem strange though because even with Intel's inferior hyper threading, modern engines show a slight improvement. With AMD's latest Ryzen it should be clear-cut, especially as Black Diamond is based so much on Stockfish.
Parent - - By Vinvin (***) Date 2021-02-21 19:19
Thanks for this new set, dorszcz  !!!:twisted::twisted::twisted:
Parent - By dorszcz (**) Date 2021-02-21 21:00
Have fun!
Up Topic The Rybka Lounge / Test Positions / Tough Tactical Test 2

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill