Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Houdini 20 Kayra (K112), +54 elo
1 2 Previous Next  
- - By eren1921 (**) Date 2012-09-14 20:56
Time Control: 2'+2'', Gui: Arena 3.0,
Cpu: intel t4300, 2.1 ghz, 2 cpu
Os: Windows 7 64 bit, 2 gb ram
Opening Book (Perfect 2012 book),
2 cpu, No ht, Tablebase off, Ponder off, Hash: 128

Houdini 20 Kayra - Houdini 20b Pro x64 :200 (+ 68,= 89,- 43), 56.3 %
Houdini 20 Kayra - Houdini 20 T3 Pro x64 :200 (+ 57,= 99,- 44), 53.3 %
Houdini 20 Kayra - Houdini 1.5a x64 :200 (+ 61,= 95,- 44), 54.3 %
Houdini 20 Kayra - Critter 1.6 x64 :156 (+ 53,= 73,- 24), 59,7 %
Houdini 20 Kayra - Stockfish VE09 x64 : 50 (+ 22,= 18,- 10), 61.5 %
Houdini 20 Kayra - Stockfish 222 ja x64 : 50 (+ 24,= 21,- 5), 69.0 %
Houdini 20 Kayra - Ivanhoe B47f0.2 x64 : 50 (+ 22,= 23,- 5), 67.0 %
Houdini 20 Kayra- Houdini 20 Z Pro x64 :220 (+ 51,=124,- 45),51.4 %

All engines 2 cpu, x64 versions,
Kayra,T3,Z are Houdini 20b pro x64 tuned engines , only 1 game worked at same time, no any other program didn't run during tournaments

These results (+54 elo)  is valid for my hardwares and test system.
I don't know it's performance at stronger hardwares or longer times.
Maybe it's weaker than Houdini 2.0 at different time controls or hardwares. (who knows)

It's performance is much expressive vs. houdini 2.0b pro at 30"+ 0.5 " time control (intel t8100, 2.1 ghz, 3 ram,2cpu) but I've got suspicious with this time control, and I did'nt believe any setting make a lot elo difference like this,  so ı stopped 30"+ 0.5 " time control  test after 1900 games.

Here are the pgn files
https://hotfile.com/dl/172113264/5653756/MM2.rar.html

https://hotfile.com/dl/172112804/bcb19cd/T100.rar.html
Parent - - By Geomusic (*****) Date 2012-09-14 21:31 Edited 2012-09-14 21:34
Good start Eren, now give us 1500 games a piece instead of 200 or on some only 50?! also make sure to delete all duplicate and flagged games.

This is just speculation of course but, if it's still 54+ ELO I'm sure Stonehenge would hire you as his assistant full-time. LOL
Parent - - By eren1921 (**) Date 2012-09-14 22:14 Edited 2012-09-14 23:15
No ı don't think  real difference is great like this (because of hardware, time control, error bars,....).  I tested over one hundred tuned versions in this year and most of them got good  results to Houdini 20, 1.5a. And 7 of them beat all Houdini versions (2.0,1.5a,t3,z, baracuda) at least 200-300 games each other , 1'+1'' time control.
And only 1 elo point  increase is enough to make me happy.
Parent - - By Geomusic (*****) Date 2012-09-15 04:40
You don't understand, 200 games gives an elo variance of 80 elo...You need at least 5 times this amount to get a reliable figure.
Parent - By eren1921 (**) Date 2012-09-15 05:06 Edited 2012-09-15 05:13
I understood very well, ı know error bars.  I don't say 200 games are enough. if you read my first message, ı don't make test only 2'+2 time control , ı made a test with houdini 2.0 at 30''+0.5 time control , 1900 games. This 1900 games  isn't enough  for me to be sure about of Houdini 20 Kayra's elo . Because ı have suspicious with very fast time controls.
Parent - By Master Om (Bronze) Date 2012-09-15 01:56
What are the settigs ?.
If possible plz PM me. I will also test it.
- - By eren1921 (**) Date 2012-09-15 08:20 Edited 2012-09-15 08:23
A lot of people want me the values of  Houdini 20 Kayra.  I  would like to thank  all  of them for their interest. (I understood English very much, but I’ve some difficulties about speaking  and writing English, to write  this message is a  good practise for me)
     I h e been following chess forums for years  but I didn’t  post  any messages until  this one or two months.  But I have been interested  computer chess  until  80’s  , when ı  was  small ages.
     I was very excited, when ı read a new about Cray Blitz, Hitech, Belle, Deep Thought or other chess computers at this years. Now Houdini, Rybka,Komodo,Critter,Stockfish, Ivanhoe....chess programs are excited me.
     I wrote my first chess program in 1992  when ı was in high school( for amstrad cpc6128, basic language), It  was a  very small, weak program but runs very well.  It took my  two years  to wrote it. I don’t publish it or others  anymore but  I  learned  the  difficulties of to write  or to develope  a chess program at these years.
     There’s no problem of me to give the values of Houdini 20 Kayra   if  it’s elo is less than Houdini 20c   or  it’s elo is higher  up to  20 -25 points.  ( In SCCT ratings  T3 and  Z  versions  elo’s are  approximately 20 points  higher  than original work.
     But how about  it’s elo rating is over  30,40, or more points  than Houdini  20c.
     In  this  Month  or  October  Houdini 3 will  be release.
     If  Houdini 20 Kayra ’s  elo  is close  to Houdini 3, to publish the values of Houdini 20 Kayra won’t be decrease the sales of  Houdini 3  ?  If it  will  decrease,  what will be the labor of Robert Houdart . I’m sure  that he works very hard to develope Houdini engine  and that  time  won’t be  delay  the release of  Houdini  4.
     I wanted  Sedat Canbaz (My one of favourite  tester ) to test  my tuned Houdini , but he said that he was busy but  would be  test it at  least  3-4 or weeks later.
     I posted my values  a good tester too,  but he was problem  about  to use Houdiniconfig,  ı will be waiting  his test results.  After his test   I  will  decide  to    give the values  only  testers with privacy condition 
    And  ı will  plan to make  a new match vs. Houdini 20b  at same time control (2’+2’’) but at  least  500 hundred games.
Parent - - By Stonehenge (***) Date 2012-09-15 10:22
Don't let the future sales of Houdini 3 prevent you from publishing these values.
It is very unlikely that any fiddling with the parameters would increase the objective strength of an engine like Houdini by more than 10 Elo. If you think you have achieved that, please feel free to publish the values so that others can verify them.
Parent - - By eren1921 (**) Date 2012-09-15 17:27 Edited 2012-09-15 17:30
I don't think so Mr.Houdart ( I have lot of respect your works). 
    I'm sure that  perfect settings makes a lot elo difference you might think. I don't claim that ı find the perfect settings (It's really very very difficult) .   
    Probably  my test conditions,hardware,small number of games has reached me the wrong conclusion.
    But If anyone thinks to change the parameters don't increase the elo of engine much (max:10-20 elo),  ı said all of them please make a tuned engine, set the value of  pawn to minimum value and  set the values of other chess pieces to maximum value , or give the pawn max. value  give other chess pieces minimum value and make match with original verson ( minimum 100 games, 1'+1'' or slowly  time controls) and see the huge elo difference with your eyes.
Parent - - By Stonehenge (***) Date 2012-09-15 17:52
It is very easy to *decrease* the strength by 50 points by using completely random parameter values.
It is very difficult to *increase* the strength by more than 10 points in this way.
Parent - By Geomusic (*****) Date 2012-09-16 06:34
stonehenge do you plan to add a monte carlo search to your engine within the next couple of versions?
- By eren1921 (**) Date 2012-09-15 18:53
Testers  will give the final decision.
  I have been  making a match with Houdini 20 , 2'+2'' time control , ı want to make at least 500  hundred games but ı decided to stop the match  in the end 100 games. Because very good testers wait for to test Houdini 20 Kayra.
   My test will finish approximate 3- 4 hours later (23.00-24.00,GMT )  and when my test will  finish (my last test with this version),   ı will immediately send  the  values of Houdini 20 Kayra  who was sent  me private message to test it
- By eren1921 (**) Date 2012-09-16 14:47
First results came from Slankamen(Thanks, his test continues) . It's very different according to my test, ı think it's time to change my hardware, time controls , and my mind before making a new tuned Houdini engine:smile: 

Intel® Core(TM) i7 Q -870@ 2.93 GHz 8.00 MB RAM
Windows 7.x64 Home Premium
Fritz Benchmarks:
Speed: 19.22
KNS: 9225
GUI: CB Rybka 3
Hash: 128
Book: Perfect 2012a-10 moves
GTB-RB-TB: ON
Ponder: ON
Blitz:5' 0
One core each

Code:
1   Houdini 2.0c Pro x64 Kayra  +34/-33/=33 50.50%   50.5/100
2   Houdini 2.0c Pro x64        +33/-34/=33 49.50%   49.5/100
- - By eren1921 (**) Date 2012-09-16 16:28 Edited 2012-09-16 16:43
And ı  took this message from Salva  " im running a match,between your tuned version and my s2 settings,and it doesnt look promissing for you;after about 250 games,my s2 ettings are scoring about 52 %
do you really want that i make public those results when the match will end? "

  I understood that I didn't take a good new. Thanks Salva for your works. You and Tennison are the last hope for us, continue the road  without me.:smile:
Parent - - By Barnard (Bronze) Date 2012-09-16 17:44

>You and Tennison are the last hope for us, continue the road  without me.:smile:4


dont say nonsenses...yu are starting now tunning Houdini,and at the negining,is always harder to obtain results

simply ask people how to tune it,what parameters can achieve the better results,and im sure people will help you with this

if you leave so early,you will never know if you are able to do it,or not...so dont leave and continue trying,and ask for help if you need it :smile:

regards
Parent - - By eren1921 (**) Date 2012-09-16 19:01 Edited 2012-09-17 03:02
No Dear Salva, I don't really think  to leave maybe ı  give an intermediate.
  I didn't make only one tuned engine, In this years ı made 104  tuned Houdini Engine and test it. I tried  a lot of possibilities.
  I  made totaly over 24000 games test (30"+0.5, 1'+1',2'+2'')  time control with two notebook (asus,t4300 2.1 ghz,3 ram,2cpu and asus, t8100, 2.1 ghz,2 ram,2cpu).
  And in my tests 74  of my tuned versions beat Houdini 20b(100,200,300..., games)
  Total scores of my tuned engines performance vs.  Houdini 20b pro x64 is  %52.61   (14245 games, )
  Last 10 tuned versions  performance vs. Houdini 20b pro x64 is %55,08 (3860 games, 2'+2'' time control), 

Something is wrong in my hardwares  but I don't know. Maybe K112 version  is not good choice . I want to plan K107 version to release (in the name of Houdini 20 Kayra) , I made a lot of  games  with K107 version  but  ı changed my mind and  put a different tuned version (K112 , it didn't test very well as K107).
  I want to plan my new tests  with a higher hardware, and many games .  Thanks for your works.
Parent - By Barnard (Bronze) Date 2012-09-16 22:58
you are wellcome

but is supossed that the results (i mean % of points) are independant from the hardware USING THE SAME ENGIE,so is the same that you run a match between 2 HOudini's on a weak or strong computer (in latest case,you will have only a few more draws)

>  made totaly over 24000 games test (30"+0.5, 1'+1',2'+2'')


me,and also people who help me,have more than 200.000 games played to test and verify all my different tunnings...it is not so easy,so take your time,and focus on the parameters tha can give you better results
- - By eren1921 (**) Date 2012-09-16 16:35
I think a little time later, ı  have  to change the topic as Houdini 20 Kayra (K112), -54 elo:smile:
Parent - By Barnard (Bronze) Date 2012-09-16 17:45
read my post above...

dont be so pessimistic,and continue trying it :smile:
- By eren1921 (**) Date 2012-09-16 20:22
Test results from Slankamen,

Intel(R) Core(TM) i7 990 x6 3.47 GHz @ 4.53 GHz 12.00 MB
Windows 7 Professional (Build 7600)
Fritz Benchmarks:
Speed: 39.95
KNS: 19176
GUI: CB- Fritz 13
Book: Immortal 2012a-7 moves
Hash: 128
GTB-RB-TB: ON
Ponder: OFF
Blitz 5m 0

Code:
                                     
1   Houdini 2.0c Pro x64 Kayra   +32  +33/=54/-23 54.55%   60.0/110
2   Houdini 2.0c Pro x64         -32  +23/=54/-33 45.45%   50.0/110
- - By eren1921 (**) Date 2012-09-18 18:27
These are the test results of Houdini 20 Kayra( K112) that sent by private messages. Shortly they said,

a) +20 elo according to Houdini 20c (200 games,4’+2’’, i7 2600k, 3.4 ghz, 4 cores, tb on, ponder off, ht off)

b) I tested it in playchess and got badly beaten. It plays a lot weaker than the default.

c) I cant send you the final results,because i stopped the match, when it was slightly more than 51 % for s2 settings (if i remember well,was 51,1 % for s2), and i never told the results in public because they arent good for you

d) +23 elo according to Houdini 20c (300 games, 5’+3’’, i7 920,@3.0 ghz,4 cores, test, tb on, ponder off, ht off)

e) -13 elo according to Houdini 20c, (250 games, 2’+1’’ test, athlon x4 630, 2.8 ghz,4 cores, tb on, ponder off), sorry your version isn’t good

f) +35 elo according to Houdini 20c. (300 games, 3’+2’’ test, i7 2670, 2.2 ghz, 4 cores, tb off, ponder off)

g) Itn’t as good as Houdini 20c (no any information)

h) Great engine (no any information)

ı) With your settings Kayra takes place 22 at the moment in my list. I don't want to dissappoint you and hope you continue to try find something better .

i) +19 elo according to Houdini 20c (200 games, 3’+1’’ test , i7 980x,@ 4.33 ghz, 6 cores,3 minute test, tb on, ponder on, ht off)

j) I have not tested it the result from users seems not good

k) and Slankamen (you know his tests;  immortalchess.net)

                                       Thanks all of them
                                                      Mehmet
Parent - - By Barnard (Bronze) Date 2012-09-18 18:30
you tried what i suggested you to tune the engine?
Parent - - By eren1921 (**) Date 2012-09-18 18:44
I don't want to plan to  make a new tuned engine.  I  was looking all the results  of all  my Houdini tuned engines (104)  for three days and I'm very hopeful
Parent - By Barnard (Bronze) Date 2012-09-18 18:52

>I don't want to plan to  make a new tuned engine


that is sad,you must not stop your tuning just because the first results are a bit deceptives
- By eren1921 (**) Date 2012-09-18 18:50
I made   some tests with  Houdini 20b pro x64, to see what was wrong  in my test ( in same tournament conditons, in same hardware 
   intel t4300, 2.1 ghz, 2cpu, 2 gb ram, windows 7, 2’+2’’ time control , 2 cores, ponder off, tablebase off, perfect opening book, 
(Houdini 20 Kayra(K112) is a tuned version of  of  Houdini 20b Pro x64) 

Houdini 20 Kayra(K112)-  Houdini 20b Pro x64 : 200 (+ 68,= 89,- 43),  % 56.3   (first  published score)
Houdini 20 Kayra(K112)-  Houdini 20b Pro x64:  200 (+54, =96, -50),   % 51.0   (second 200 games test)
Houdini 20 Kayra(K112)-  Houdini 20b Pro x64:  200 (+60, =94, -46),   % 53.5   (third  200 games test)

     Total percentage score after  600  games  is  %53.6 (+25 elo)  to  Houdini 20b pro x64   is very different according to  my first percentage (First 200 Games)
- By eren1921 (**) Date 2012-09-18 18:52 Edited 2012-09-18 18:56
Houdini 20 Kayra( K112)  = (+54 elo ,total 1400 games),  %56.25 Houdini 20b pro x64, 200 games)

Houdini 20 Kayra(K107) =  (+41  elo , total 2750 games,  %55.24  Houdini 20b pro x64, 630 games)

     Yes ı made a mistake, yes the elo of  Houdini 20 Kayra( K112)  is higher  than Houdini 20 Kayra(K107)  version and  it’s  percantage  score to  Houdini 20b pro x64 is higher, so ı decided that it was the strongest tuned version of me so ı give it’s values to testers for  test  , but ı  neglect that it  didn’t test  as good  K107
(ı have been making some little changes(K107) over it, I don’t want to be  disappointed again, so I will wait  a little bit to publish
- By eren1921 (**) Date 2012-09-18 21:57 Edited 2012-09-18 22:06
Here's the last results of Slankamen, not bad, but it is different from my test.(Because he's a very good tester and  my hardware is too slow according to him). Final results of my test  +45 elo (-9 elo according to 1400 games) after 2000 games. I stopped the test.I'm now sure that my tuned version Houdini 20 Kayra(K107) is very strong according to Houdini 20 Kayra(K112). Houdini 20 Kayra(K107) wasn't  tested by testers yet. 

Intel(R) Core(TM) i7 990 x6 3.47 GHz @ 4.53 GHz 12.00 MB
Windows 7 Professional (Build 7600)
Fritz Benchmarks:
Speed: 39.95
KNS: 19176
GUI: CB- Fritz 13
Book: Immortal 2012a-7 moves
Hash: 128
GTB-RB-TB: ON
Ponder: OFF
Blitz 5m 0


Code:
1   Houdini 2.0c Pro x64 Kayra   +27  +90/=182/-64 53.87%  181.0/336
2   Houdini 2.0c Pro x64         -27  +64/=182/-90 46.13%  155.0/336
- By eren1921 (**) Date 2012-09-19 16:19 Edited 2012-09-19 16:23
Results after 25 games:
Match Houdini 2.0c Kayra - Houdini 2.0c PRO

TC: 5' + 10'', no flag loss (ie crash will not run result)
Book - Perfect 2012a
Use Nalimov and Gaviota TB's
Caches: 256MB
Nalimov caches: 256MB
Gaviota caches: 256MB
CPU: Intel Core 2 Quad Q9505 @2,83 GHz, 4 cores
OS: Windows 7 x64 SP1
GUI: Houdini Aquarium 5.1.0 (build 490
Code:
Houdini 2.0c PRO Kayra ====11====111=1=1=====111 17,5/25
Houdini 2.0c PRO ====00====000=0=0=====000 7,5/25


Dear ofry your first results are very interesting . But the number of games are very small to talk about it.
- - By eren1921 (**) Date 2012-09-21 12:30 Edited 2012-09-21 17:40
This is the  last test results  of Houdini 20 Kayra,  thanks  all the  testers 

a)  +28 elo according to Houdini 20c (500 games,4’+2’’, i7 2600k, 3.4 ghz, 4 cores, fritz gui, tb on, ponder off, ht off)

b)   I tested it in playchess and got badly beaten. It plays a lot weaker than the default.

c)   I cant send you the final results,because i stopped the match, when it was slightly more than 51 % for s2 settings (if i remember well,was 51,1 % for s2), and i never  told  the results  in public  because they arent good for  you

d)  +23  elo according to Houdini 20c (300 games, 5’+3’’, i7 920,@3.0 ghz,4 cores,arena gui, tb on, ponder off, ht off)

e)   +6  elo according to Houdini 20c, (400 games, 2’+1’’ test, athlon x4 630, 2.8 ghz,4 cores, shredder gui,tb on, ponder off),   your version isn’t good

f)   +35 elo according to Houdini 20c. (560 games, 3’+2’’ test, i7 2670, 2.2 ghz, 4 cores, fritz gui, tb off, ponder off) 

g)   It isn’t as good as  Houdini 20c  (no any  information)

h)  Great engine (no any  information)

ı)   With your settings  Kayra takes place  22 at the moment in my list.  I don't  want  to  dissappoint you and  hope you continue to try find something  better .

i) +29  elo according to Houdini 20c  (600 games,  3’+1’’ test , i7 980x,@ 4.33 ghz, 6 cores,fritz gui, 3 minute test, tb on, ponder on, ht off)

j)  I  have not tested  it  the  result  from users seems not good

k)    +45 elo according to  Houdini 20c T3 (100 games, 3’+3’’, i7 -3960x , @4.8 ghz, 1 core, ponder off, tb off) and
   …. - 7 elo according to  Houdini 20c  (100 games, 3’+3’’, i7 -3960x , @4.8 ghz, 1 core ponder off, tb off)

  l) …. +22  elo according to Houdini 20c (566 games, 5 minutes test, 3 different hardwares, 6 core match)
           ….  +-0  elo according to Houdini 20c (200 games, 5 minute test,  2 different hardwares,1 core match)
           .…  - 5  elo according to Critter 1.4 (140 games, 5 minute test,i7 3930k, @4.68 ghz,fritz gui, tb on,ponder off)

     I  didn’t post the results  if the number of games are under 100
     If  any  tester who  made over 100 games match and  didn’t see  his result in this list, I want him to  post his results in this forum
to this topic (maybe ı didn't realize his message)
Parent - - By Stonehenge (***) Date 2012-09-21 12:47
That's a lot of data with relatively low information content - as so often people simply don't play enough games to make any real conclusions.

If you're interested, you can publish the values or send them by PM, and I'll run an 8.000 game test at 1'+1" and publish the results here.
Parent - - By Stonehenge (***) Date 2012-09-23 09:08
The 8000 game 1'+1" test has started, the run should take about 20 hours.
Parent - - By Barnard (Bronze) Date 2012-09-23 09:17
why im guessing that your results will indicate that the settings fail,like all the tests that you ran,in oposition with all the independant tests that ran all the people... :roll: :lol:
Parent - - By jammy (***) Date 2012-09-23 10:46
I know that you have done a great deal of work tiring to improve Houdini. I also think you would be the first person to agree that this engine is the strongest on the market, yet it seems that when the person who made (for want of a better word) this engine tries to help you and others to run tests that you are unable to,  you doubt his motives?:confused:
Just from my own point of view, if you think you can make ,programme a better engine yourself then why don't you just do that, call it Barnard 1, take it to Chessbase, they put it on the market and then you make loads of money. Simple :smile:
Parent - By Barnard (Bronze) Date 2012-09-23 11:45
hi jammy

thanks for your words

now im gonna be very clear;i never doubted about eren,i DOUBT about the impartiality of Houdart giving faked results...can i be more clear?

>Just from my own point of view, if you think you can make ,programme a better engine yourself then why don't you just do that, call it Barnard 1, take it to Chessbase, they put it on the market and then you make loads of money. Simple :smile:


im not a guy who is interested making money,in your example,developing an engine.i think that the knowledge must be free,and must be shared with anyone for free

of course,i know that not all the poeple want share his work for free,and they have ruight to do that;im just telling that even in the case i cn program an engine 50 elo points stronger than Houdini,i wont sell it,i will give for free to everyone

regards
Parent - - By Stonehenge (***) Date 2012-09-23 16:05 Edited 2012-09-23 20:53

> The 8000 game 1'+1" test has started, the run should take about 20 hours.


After 3000 games the standing is Kayra v default: +877 -817 =1306 (+7 Elo ± 10 Elo).
A good start, let's see how it evolves with more games.

EDIT:
After 5000 games the standing is Kayra v default: +1420 -1372 =2208 (+3 Elo ± 7 Elo).
Parent - - By Stonehenge (***) Date 2012-09-24 10:00
The final standing is Kayra v default: +2249 -2240 =3511 (+ 0 Elo ± 6 Elo).
The good start turned out to be elusive - if the test would only have been 1200 games we would have concluded an Elo difference of (+12 Elo ± 16 Elo).

From this test, and the previous one with the T3 and Z settings, it becomes clear that in my testing framework used for the Houdini development it's difficult to improve on the default parameters - as should be expected as they have been optimized in that framework. It is in fact surprising that the T3, Z and Kayra settings even produce results that are about as good as the default version. This suggests that the real optimum could lie somewhere between the default and the different settings.

I have no idea how to explain the consistent better results reported by others (albeit always within the error margins of the test). It could be related to the choice of opening positions or opening book, that would benefit the more aggressive settings of Kayra or T3. Or it could partially be a "contempt" effect - more aggressive works better against weaker opponents.
Parent - By Adam Hair (**) Date 2012-09-25 11:09

> From this test, and the previous one with the T3 and Z settings, it becomes clear that in my testing framework used for the Houdini development it's difficult to improve on the default parameters - as should be expected as they have been optimized in that framework. It is in fact surprising that the T3, Z and Kayra settings even produce results that are about as good as the default version. This suggests that the real optimum could lie somewhere between the default and the different settings.


I have seen some optimization results for one engine and I have actively tried tuning another, well-tuned engine with CLOP. My experience is that, for at least some parameters, there is a plateau of values that are all approximately optimal. So, I am only a little surprised that someone found settings that are approximately equal with Houdini's defaults. Given my belief that you have good tuning and testing methods to go along with your programming abilities, I would be shocked if someone found better settings.
- By eren1921 (**) Date 2012-09-23 13:02 Edited 2012-09-24 05:56
Here's  the  results of  Trap’s Test (His tournament continue and he publiced his results at immortalchess.net)

Intel Core i7-3960X @ 4800 MHz, W7x64, 32 Gb RAM
1 core, 64 Mb, Ponder off, no TBs
1-core CPU Fritz benchmark:3597 kn/s
50x2 EPD openings from "Salvo's Opening Suite 01.04.2010
TC :3'+3"

Houdini 2.0c Pro x64 Kayra - Houdini 2.0c Pro x64 T3Code:

1.  Houdini 2cKr Pro x64      56.5/100  35-22-43  tpm=4518.8 d=20.35 nps=4010659
2.  Houdini 2cT3 Pro x64      43.5/100  22-35-43  tpm=4564.1 d=20.39 nps=3994814

Houdini 2.0c Pro x64 Kayra - Houdini 2.0c Pro x64Code:

1.  Houdini 2cKr Pro x64     49.0/100  30-32-38  tpm=4604.8 d=20.44 nps=4083410
2.  Houdini 2.0c Pro x64     51.0/100  32-30-38  tpm=4574.3 d=20.07 nps=4059335

Houdini 2.0c Pro x64 Kayra - Houdini 1.5a x64Code:

1.  Houdini 2cKr Pro x64     55.0/100  30-20-50  tpm=4487.4 d=20.71 nps=4098717
2.  Houdini 1.5a x64         45.0/100  20-30-50  tpm=4501.5 d=20.48 nps=4094914

          These three matches aren’t include in tournament result

  
Program                        Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 2c Kayra Pro x64        :3021   19  19   700    59.2 %   2956   43.6 %
  2 Houdini 2c T3 Pro x64             :3007   16  16  1000    59.5 %   2940   46.6 %
  3 Houdini 2.0c Pro x64              :2998   16  16  1000    58.2 %   2940   45.3 %
  4 Houdini 1.5a x64                   :2989   14  14  1200    57.5 %   2936   47.8 %
  5 Strelka 5.5 x64                      :2984   12  12  1500    54.7 %   2951   55.4 %
  6 Critter 1.6a 64-bit                  :2980   12  12  1500    54.1 %   2951   56.7 %
  7 Komodo 5 64-bit                    :2976   14  14  1300    53.6 %   2952   46.2 %
  8 Komodo 5 x64 Tweaked          :2962   15  15  1100    50.6 %   2958   50.0 %
  9 Komodo64 SSE Version 4        :2941   14  14  1300    48.4 %   2952   47.9 %
10 Deep Rybka 4.1 SSE42 x64     :2925   12  12  1500    45.8 %   2955   52.2 %
11 IvanHoe 9.46b x64                 :2925   12  12  1500    45.8 %   2955   55.8 %
12 SFish 120902 64bit SSE4.2     :2918   15  15  1000    46.5 %   2943   54.0 %
13 LEOpard 0.7c                         :2915   14  14  1000    46.0 %   2943   57.1 %
14 Stockfish 2.2.2 JA SSE42        :2914   14  14  1200    43.9 %   2957   48.5 %
15 Vitruvius 1.12c.HEM_x64       : 2899   12  12  1400    42.5 %   2952   53.5 %
16 Gull II beta x64                     :2894   12  12  1400    41.7 %   2952   56.1 %

1 Houdini 2c Kayra Pro x64      : 3021  700 (+262,=305,-133), 59.2 %

Deep Rybka 4.1 SSE42 x64: 100 (+ 44,= 37,- 19), 62.5 %
Komodo64 SSE Version 4   : 100 (+ 43,= 35,- 22), 60.5 %
Komodo 5 x64 Tweaked     : 100 (+ 35,= 42,- 23), 56.0 %
Critter 1.6a 64-bit             : 100 (+ 31,= 51,- 18), 56.5 %
Strelka 5.5 x64                 : 100 (+ 28,= 58,- 14), 57.0 %
IvanHoe 9.46b x64            : 100 (+ 36,= 51,- 13), 61.5 %
Komodo 5 64-bit               : 100 (+ 45,= 31,- 24), 60.5 %
2

2 Houdini 2cT3 Pro x64      : 3007  1000 (+362,=466,-172), 59.5 %

Deep Rybka 4.1 SSE42 x64      : 100 (+ 39,= 42,- 19), 60.0 %
Komodo64 SSE Version 4        : 100 (+ 33,= 41,- 26), 53.5 %
Komodo 5 x64 Tweaked          : 100 (+ 23,= 59,- 18), 52.5 %
Critter 1.6a 64-bit           : 100 (+ 27,= 54,- 19), 54.0 %
Strelka 5.5 x64               : 100 (+ 20,= 64,- 16), 52.0 %
IvanHoe 9.46b x64             : 100 (+ 42,= 42,- 16), 63.0 %
Vitruvius 1.12c.HEM_x64       : 100 (+ 45,= 44,- 11), 67.0 %
Komodo 5 64-bit               : 100 (+ 35,= 43,- 22), 56.5 %
Stockfish 2.2.2 JA SSE42      : 100 (+ 45,= 36,- 19), 63.0 %
Gull II beta x64              : 100 (+ 53,= 41,-  6), 73.5 %

3 Houdini 2.0c Pro x64      : 2998  1000 (+356,=453,-191), 58.2 %

Deep Rybka 4.1 SSE42 x64      : 100 (+ 34,= 49,- 17), 58.5 %
Komodo64 SSE Version 4        : 100 (+ 36,= 47,- 17), 59.5 %
Komodo 5 x64 Tweaked          : 100 (+ 30,= 41,- 29), 50.5 %
Critter 1.6a 64-bit                 : 100 (+ 27,= 51,- 22), 52.5 %
Strelka 5.5 x64               : 100 (+ 19,= 57,- 24), 47.5 %
IvanHoe 9.46b x64             : 100 (+ 44,= 40,- 16), 64.0 %
Vitruvius 1.12c.HEM_x64       : 100 (+ 43,= 43,- 14), 64.5 %
Komodo 5 64-bit               : 100 (+ 36,= 41,- 23), 56.5 %
Stockfish 2.2.2 JA SSE42      : 100 (+ 43,= 43,- 14), 64.5 %
Gull II beta x64              : 100 (+ 44,= 41,- 15), 64.5 %
- - By eren1921 (**) Date 2012-09-24 11:33 Edited 2012-09-24 11:59
Yes,  8000 game match is a very good opinion avoid of error elo bars.  But is it enough, to find the stronger version .
     Maybe Kayra  is stronger  than  20c version maybe 20c version is stronger   than Kayra.
     If the score  is for Kayra ,  +20 elo  vs.  20c   , or  the score  is for   20c,   +20  elo vs. Kayra .
     Is it enough to find  the stronger version? What about other engines  Critter,Stockfish,Komodo,Rybka.
     To make enough game matches with Kayra vs. other engines, (1000,2000,..) and compare the total results of two engines  isn’t a good method.
     Yes  I set Kayra’s values to be more aggressive. Because of that  I’m sure that  it’s performance  is better than  vs. other engines  according too Houdini 20c. ( If there is no strong  engine than Houdini, isn’t a good method)
     And I’m sure that it’s performance is better longer games according to blitz  games.

     Look at the  Kayra’s one core matches results  (that  ı publiced 2012-09-21).
+-0  elo according to Houdini 20c (200 games, 5 minute test,  2 different hardwares,1 core match), 
- 7 elo according to  Houdini 20c  (100 games, 3’+3’’, i7 -3960x , @4.8 ghz, 1 core ponder off, tb off)    
Total( -2 elo vs. Houdini 20c,  300 games,1 core matches)

    And look at Kayra’s mp matches results  (that  ı publiced 2012-09-21

Total  (+24 elo vs. Houdini 20c, 2926 games, different hardwares, 4-6 core matches, )
(I didn’t include my test results in it, +25 elo,600 games vs. Houdini 20c,2 core )

    Is it explain the results only  by error elo bars, maybe. 
    I’m still thinking that  Houdini 20 Kayra  version  is stronger than Houdini 20c. But  how much elo is stronger,  I don’t know  the answer yet .
    Maybe +1  elo, maybe  +10, +20…..    Only testers will give an answer.

    Dear  Houdart  thanks for the match.
Parent - By Stonehenge (***) Date 2012-09-24 12:02

>   And look at Kayra’s mp matches results  (that  ı publiced 2012-09-21
>
> Total (+24 elo vs. Houdini 20c, 2926 games, different hardwares, 4-6 core matches, )
> (I didn’t include my test results in it)


Yes, definitely interesting.
Note that your average is statistically biased, as you only include the results that are positive. In your average you don't take into account the results that were reported as, for example, "g) It isn’t as good as  Houdini 20c  (no any  information)".

It would be very interesting to get more details about, for example, the following tests you report:

f) +35 elo according to Houdini 20c. (560 games, 3’+2’’ test, i7 2670, 2.2 ghz, 4 cores, fritz gui, tb off, ponder off) 
i) +29 elo according to Houdini 20c  (600 games,  3’+1’’ test , i7 980x,@ 4.33 ghz, 6 cores,fritz gui, 3 minute test, tb on, ponder on, ht off)

What opening book or set of opening positions were used? Reverted positions? Is it possible to get the games?
Note that for parameter tuning it's irrelevant to play with multiple cores, the "tuned" versions exhibit exactly the same SMP speed-up as the original. So you might as well play games with 1 core as this will provide more efficient usage of the CPU time.
- - By eren1921 (**) Date 2012-09-24 13:02
I didn’t want any of the tester,  the games pgn files. Because I am not deciding  to analyze the games.
To  want the pgn  files ,If I won’t  analyze the  games is query to their truth for me.
Tune the  opponent engine and  give very bad parameters  and make a match. Is  pgn files  show the real score.
Yes you may  say that how we trust your results. It’s  up to you.
Some of the testers  made public matches.(Slankamen,Ofry,Trap.. ),and publiced their results,and gave pgn games at immortalchess.net

For example look at Slankamen test,
                 +22  elo according to Houdini 20c (566 games, 5 minutes test, 3 different hardwares, 6 core match)
           ….  +-0  elo according to Houdini 20c (200 games, 5 minute test,  2 different hardwares,1 core match)
His book:immortalchess 2012a-7moves, the two matches  you asked  Perfect2012a opening  books.

Some  testers said  like that : It  isn’t as good as  Houdini 20c ,
I sent private  messages  him to ask the  results. But he didn’t give me an answer .He only wrote like this   ,nothing else in his message. Maybe he’ s not a tester  only to take the values of Kayra he said that "ı test it". I don’t know
And ı wrote the bottom of publiced test results  “ If  any  tester who  made over 100 games match and  didn’t see  his result in this list, I want him to  post his results in this forum to this topic (maybe ı didn't realize his message)”
Parent - - By Stonehenge (***) Date 2012-09-24 13:29
My request for the PGN is to get an idea of the openings played and the depth of the opening book.

I've noticed the evolution of Trap's results:

After 400 games +30 Elo ± 30 Elo:
  1 Houdini 2cKr Pro x64         : 3029   25  25   400    57.5 %   2977   45.5 %
  2 Houdini 2cT3 Pro x64        : 3008   16  16  1000    59.5 %   2941   46.6 %
  3 Houdini 2.0c Pro x64         : 2999   16  16  1000    58.2 %   2941   45.3 %

After 700 games +23 Elo ± 25 Elo:
  1 Houdini 2cKr Pro x64         : 3021   19  19   700    59.2 %   2956   43.6 %
  2 Houdini 2cT3 Pro x64        : 3007   16  16  1000    59.5 %   2940   46.6 %
  3 Houdini 2.0c Pro x64         : 2998   16  16  1000    58.2 %   2940   45.3 %

After 1000 games +12 Elo ± 22 Elo:
  1 Houdini 2cKr Pro x64         : 3010   16  16  1000    59.4 %   2944   44.4 %
  2 Houdini 2cT3 Pro x64        : 3007   16  16  1000    59.5 %   2940   46.6 %
  3 Houdini 2.0c Pro x64         : 2998   16  16  1000    58.2 %   2940   45.3 %

It's another strong reminder of the reality of statistical variability of engine testing, very similar to my 8000-game 1'+1" test.
Parent - - By eren1921 (**) Date 2012-09-24 13:36 Edited 2012-09-24 17:41
Dear Houdart maybe you are right, I don't know.

  For a strong Houdini 3,  let's  make a test with Kayra vs. other engines

  20 hours  later we will learn the truth about which is the strongest
Parent - - By Gaмßito (****) Date 2012-09-24 17:38
Talking about Houdini 3 it will be very interesting to know if Robert have made some tests also at slower time controls.

In both CEGT and CCRL lists and at slow time controls (40/20 and 40/40), Houdini 2.0c shows a very small improvement over Houdini 1.5a (less than 10 Elo points).
This is not a negative statement but I have curious if this can happen also with Houdini 3. Naturally, the Elo gain tends to reduce at slow time controls, but this time I hope it can be reduced as small as possible. I'd like to hear some opinion of Robert about this.

Regards,
Gaмßito.
Parent - - By Stonehenge (***) Date 2012-09-24 21:38

> Talking about Houdini 3 it will be very interesting to know if Robert have made some tests also at slower time controls.


The slowest TC I've seriously tested Houdini 3 so far is 2'+2" (several 8000 games matches).

To satisfy your and my curiosity, I'm ready to play a long TC match with a total time that is equivalent to 8000 times 2'+2".
This could be, for example, 400 games at 40'+40" or 200 games at 120'+30".
Hardware is a (relatively slow) AMD server that produces about 1 MN/sec per core. The test will take about 2 days to run.

If you're interested please create a new thread for this, it's not right to discuss this here in the topic about the Kayra settings.
Please let me know what TC against which opponent(s) you would like to see, and with which opening suite? If you pick a publicly available opening suite I can publish the games.
Parent - By Gaмßito (****) Date 2012-09-24 23:50
Thanks Robert. The thread is already open here: http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=25731

Regards,
Gaмßito.
Parent - By AWRIST (****) Date 2012-09-25 15:44
It's another strong reminder of the reality of statistical variability of engine testing, very similar to my 8000-game 1'+1" test.

Right. To get the big number effect, the engines must remain the same. You cannot add up many 100 games results with always different players, to then compare the (significant) difference between two players, but this is what many here are doing.
- By eren1921 (**) Date 2012-09-25 07:51
Here's  the  last results of  Trap’s Test (His tournament continue and he publiced his results at immortalchess.net)

Intel Core i7-3960X @ 4800 MHz, W7x64, 32 Gb RAM
1 core, 64 Mb, Ponder off, no TBs
1-core CPU Fritz benchmark:3597 kn/s
50x2 EPD openings from "Salvo's Opening Suite 01.04.2010
TC :3'+3"

Houdini 2.0c Pro x64 Kayra - Houdini 2.0c Pro x64 T3Code:

1.  Houdini 2cKr Pro x64      56.5/100  35-22-43  tpm=4518.8 d=20.35 nps=4010659
2.  Houdini 2cT3 Pro x64      43.5/100  22-35-43  tpm=4564.1 d=20.39 nps=3994814

Houdini 2.0c Pro x64 Kayra - Houdini 2.0c Pro x64Code:

1.  Houdini 2cKr Pro x64     49.0/100  30-32-38  tpm=4604.8 d=20.44 nps=4083410
2.  Houdini 2.0c Pro x64     51.0/100  32-30-38  tpm=4574.3 d=20.07 nps=4059335

Houdini 2.0c Pro x64 Kayra - Houdini 1.5a x64Code:

1.  Houdini 2cKr Pro x64     55.0/100  30-20-50  tpm=4487.4 d=20.71 nps=4098717
2.  Houdini 1.5a x64         45.0/100  20-30-50  tpm=4501.5 d=20.48 nps=4094914

          These three matches aren’t include in tournament result

  
Program                        Elo    +   -   Games   Score   Av.Op.  Draws

  1 Houdini 2c Kayra Pro x64  : 3012   15  15  1200    60.8 %   2936   44.0 %
  2 Houdini 2c T3 Pro x64      : 3005   15  15  1100    59.5 %   2938   46.9 %
  3 Houdini 2.0c Pro x64        : 3001   15  15  1100    59.1 %   2937   45.6 %
  4 Houdini 1.5a x64             : 2989   14  14  1200    57.5 %   2936   47.8 %
  5 Strelka 5.5 x64               : 2983   12  12  1500    54.7 %   2950   55.4 %
  6 Critter 1.6a 64-bit           : 2979   12  12  1500    54.1 %   2950   56.7 %
  7 Komodo 5 64-bit             : 2976   14  14  1300    53.6 %   2951   46.2 %
  8 Komodo 5 x64 Tweaked    : 2961   15  15  1100    50.6 %   2957   50.0 %
  9 Komodo64 SSE Version 4   :2940   14  14  1300    48.4 %   2951   47.9 %
10 Deep Rybka 4.1 SSE42 x64:2925   12  12  1500    45.8 %   2954   52.2 %
11 IvanHoe 9.46b x64           :2925   12  12  1500    45.8 %   2954   55.8 %
12 SFish 120902 64bit SSE4.2:2923   13  13  1200    45.8 %   2953   53.3 %
13 Stockfish 2.2.2 JA SSE42   :2917   14  14  1300    43.8 %   2960   47.5 %
14 LEOpard 0.7c                   :2911   13  13  1200    43.9 %   2953   56.1 %
15 Vitruvius 1.12c.HEM_x64    :2897   12  12  1500    41.6 %   2956   52.3 %
16 Gull II beta x64                :2894   12  12  1500    41.2 %   2956   55.5 %

1 Houdini 2c Kayra Pro x64     : 3012  1200 (+466,=528,-206), 60.8 %
Deep Rybka 4.1 SSE42 x64      : 100 (+ 44,= 37,- 19), 62.5 %
Komodo64 SSE Version 4        : 100 (+ 43,= 35,- 22), 60.5 %
Komodo 5 x64 Tweaked          : 100 (+ 35,= 42,- 23), 56.0 %
Critter 1.6a 64-bit           : 100 (+ 31,= 51,- 18), 56.5 %
Strelka 5.5 x64               : 100 (+ 28,= 58,- 14), 57.0 %
IvanHoe 9.46b x64             : 100 (+ 36,= 51,- 13), 61.5 %
Vitruvius 1.12c.HEM_x64       : 100 (+ 53,= 36,- 11), 71.0 %
Komodo 5 64-bit               : 100 (+ 45,= 31,- 24), 60.5 %
Stockfish 2.2.2 JA SSE42      : 100 (+ 40,= 36,- 24), 58.0 %
Gull II beta x64              : 100 (+ 41,= 48,- 11), 65.0 %
SFish 120902 64bit SSE4.2     : 100 (+ 31,= 50,- 19), 56.0 %
LEOpard 0.7c                  : 100 (+ 39,= 53,-  8), 65.5 %

2 Houdini 2cT3 Pro x64      : 3005  1100 (+397,=516,-187), 59.5 %

Deep Rybka 4.1 SSE42 x64      : 100 (+ 39,= 42,- 19), 60.0 %
Komodo64 SSE Version 4        : 100 (+ 33,= 41,- 26), 53.5 %
Komodo 5 x64 Tweaked          : 100 (+ 23,= 59,- 18), 52.5 %
Critter 1.6a 64-bit           : 100 (+ 27,= 54,- 19), 54.0 %
Strelka 5.5 x64               : 100 (+ 20,= 64,- 16), 52.0 %
IvanHoe 9.46b x64             : 100 (+ 42,= 42,- 16), 63.0 %
Vitruvius 1.12c.HEM_x64       : 100 (+ 45,= 44,- 11), 67.0 %
Komodo 5 64-bit               : 100 (+ 35,= 43,- 22), 56.5 %
Stockfish 2.2.2 JA SSE42      : 100 (+ 45,= 36,- 19), 63.0 %
Gull II beta x64              : 100 (+ 53,= 41,-  6), 73.5 %
SFish 120902 64bit SSE4.2     : 100 (+ 35,= 50,- 15), 60.0 %

3 Houdini 2.0c Pro x64      : 3001  1100 (+399,=502,-199), 59.1 %

Deep Rybka 4.1 SSE42 x64      : 100 (+ 34,= 49,- 17), 58.5 %
Komodo64 SSE Version 4        : 100 (+ 36,= 47,- 17), 59.5 %
Komodo 5 x64 Tweaked          : 100 (+ 30,= 41,- 29), 50.5 %
Critter 1.6a 64-bit           : 100 (+ 27,= 51,- 22), 52.5 %
Strelka 5.5 x64               : 100 (+ 19,= 57,- 24), 47.5 %
IvanHoe 9.46b x64             : 100 (+ 44,= 40,- 16), 64.0 %
Vitruvius 1.12c.HEM_x64       : 100 (+ 43,= 43,- 14), 64.5 %
Komodo 5 64-bit               : 100 (+ 36,= 41,- 23), 56.5 %
Stockfish 2.2.2 JA SSE42      : 100 (+ 43,= 43,- 14), 64.5 %
Gull II beta x64              : 100 (+ 44,= 41,- 15), 64.5 %
LEOpard 0.7c                  : 100 (+ 43,= 49,-  8), 67.5 %
- - By eren1921 (**) Date 2012-09-26 06:16 Edited 2012-09-26 06:34
Final results of Robert Houdart
  Kayra v default: +2249 -2240 =3511 (+ 0 Elo ± 6 Elo).

   During the match, not after ı wrote him that "OK your 8000 match opinion is very very good avoid of error elo bars. But is it enought to find  the strongest version.
8-9 years ago I learned a lesson. My Hiarcs 9 beat Hiarcs 8 Bareev  %55-60 percantage score, but always Hiarcs 8 Bareev performance better the other engines according to Hiarcs 9.  If Houdini Kayra score +10 elo or over elo or Houdini 20c +10 elo or over is it enogh to find the strongest engine. What about the other engines.
Except this match to make 1000 games to 8 other engines or 2000 games to 4 other engines and compare the results according Houdini 20c results, isn't a good method to find the strongest engine.(It's only my opinion)"

  But he gave his decision about Kayra, and after the match he didn't reply none of my messages.(If Kayra's performance was better maybe he replied my messages) 
Yes to be  objective with Kayra version is hard for me. Because the settings belongs to me. And Kayra's tests continue. Maybe to talk about  Kayra is  early.
 
       But what about Z and T3 settings?

  Robert said no elo difference with this versions. To make a match only default version (8000 games or 100000 games) is enough to find the strongest version?.I don't think so.

  All indepent testers said that Z and T3 versions are stronger than Houdini20c. Is Salva or Tennison give  lots of money all the testers.:lol:(The results aren't explain by error elo bars, because testers made total over ten thousands of match with this versions)

     I  guess one thing about Houdini 3. What are  the values of Pawn storm, outer, inner files of Houdini 3?  (I'm sure that they aren't 28,30,32)
Parent - By Stonehenge (***) Date 2012-09-26 08:36

> But he gave his decision about Kayra, and after the match he didn't reply none of my messages.


I replied to you: You are correct. The self-test is only the first step, if results are positive normally I would then play the engine against my set of 9 tuning engines (9 x 3000 games) which would decide whether the change is really an improvement.

After the auto-test results that show no progress, there is little value in running this second test.

> I guess one thing about Houdini 3. What are the values of Pawn storm, outer, inner files of Houdini 3? (I'm sure that they aren't 28,30,32)


In the current Houdini 3 version these parameters are at the same values as in Houdini 2.
Why would I make changes that don't produce any measurable progress in my testing framework?
Up Topic The Rybka Lounge / Computer Chess / Houdini 20 Kayra (K112), +54 elo
1 2 Previous Next  

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill