1
2
This is interesting Stockfish has won 4 of the first 6 games with 2 draws at 30 min games
what is going on here?
what is going on here?
What kind of hardware are you using and your settings would be interesting to hear. 30 minute games really don't tell you very much- try (90 m 30 s)/40+(15m +30s) and certainly stay away from default settings.
I suspect some thing wrong with your setup. Please check this out R4 & SF long time control. http://www.talkchess.com/forum/viewforum.php?f=6
Wayne
Wayne
But Sedat Canbaz has strong evidence here:
http://sedatchess.110mb.com/index.php?p=1_62
http://sedatchess.110mb.com/index.php?p=1_62
strong evidence? after 18 games? you must be kidding :)
I would like to see a graph of Rybka v. Stockfish ELO over the past few years to see which is increasing faster. There might be evidence that Stockfish's strength is increasing faster and will surpass R4.
Results from the past are no guarantee for the future...
I'm not looking for guarantees, just evidence.
One could reasonably suspect that:
1) It is more difficult to make the same Elo improvement at a higher level, and
2) It is more difficult to make the same Elo improvement when you are already at the highest level.
For these reasons, the Elo improvement rate is unlikely to provide any indication of future outcomes. Note that the simplest way for Glauring/Stockfish to improve its growth rate would have been to start off weaker than they did, and this would not be indicative of better future performance.
1) It is more difficult to make the same Elo improvement at a higher level, and
2) It is more difficult to make the same Elo improvement when you are already at the highest level.
For these reasons, the Elo improvement rate is unlikely to provide any indication of future outcomes. Note that the simplest way for Glauring/Stockfish to improve its growth rate would have been to start off weaker than they did, and this would not be indicative of better future performance.
> It is more difficult to make the same Elo improvement when you are already at the highest level.
Rybka 2.3.2a was at the highest level, yet Rybka 3 had a higher elo jump than the rest of the competition. I still agree with you that past results can't tell anything about the future.
Which do you see as a more significant accomplishment for a chess engine author?
A) Improve the chess engine's performance from 2200 to 2300, or
B) Improve the chess engine's performance from 3200 to 3300?
If you agree that the answer is B, I rest my case.
A) Improve the chess engine's performance from 2200 to 2300, or
B) Improve the chess engine's performance from 3200 to 3300?
If you agree that the answer is B, I rest my case.
Yeah, but Vas made the most significant achievement with relative ease.
While hundreds of much lessor skilled engine developers made the transition from 2200 to 2300 with ease...
Yes, but say, Fritz couln't make a jump at all and Deep Fritz 12 is 4 elo weaker than Deep Fritz 11, even though the engine is 3096 and Rybka 2.3.2a is 3127. I think it has to do more with the skill of the programmer than with what elo do you have.
And Vas might as well have added extra 100 elo to his 3229 engine, again with ease, but decided to release only 60 elo of the improvement, that turned out to only be 30 for unknown reasons.
And Vas might as well have added extra 100 elo to his 3229 engine, again with ease, but decided to release only 60 elo of the improvement, that turned out to only be 30 for unknown reasons.
If you are improving a weak engine, you don't have to develop anything new. You can just add in new features from strong open source engines. If you are trying to improve a top engine, you have to be original. It's always going to be more difficult to be an original thinker than to copy other people's work.
I think the change from Fritz 11 to Fritz 12 is a very poor counter example because Chessbase hasn't seemed serious about Fritz since the days of DF8 (which wasn't much better than DF7).
I think the change from Fritz 11 to Fritz 12 is a very poor counter example because Chessbase hasn't seemed serious about Fritz since the days of DF8 (which wasn't much better than DF7).
despite everyone being supportive of Stockfish,i am very wary of it.it appears to me to have a lot of R3 or whatever in it.its evals are crazy.it may look good in 40/40 games but i just cannot trust it in corr games.i would trust engine x and R4 much more.this is only my opinion i hasten to add.
> it appears to me to have a lot of R3 or whatever in it.
Like what? I've relied on it on several of my correspondence games (a batch of won games after Stockfish 1.5 was released and before Rybka 4 was released), there hasn't been a single thing that has made me suspicious, they're worlds apart in similarity.
its the evals that bugger me up.every time i have looked down a Stockfish line i find R4 AN better.however you may disagree.
I agree about the evals being unreliable, but if you have an analysis method that focuses on move choices, Stockfish is a must check. And, anyway, I've only relied on Stockfish on "winning" positions where Stockfish is a beast and very good at killing other engine's evaluation scores, but it's also a bad defender.
The covariance testing that I did on BB's one million positions showed that stockfish's static eval function is very different than Rybka's while the clones is (not surprisingly) rather similar. Having a different eval function with similar strength should be useful, but I agree it's hard to use Stockfish for analysis because the eval jumps around so much.
unlike Vytron i have never trusted Stockfish in Corr chess.i do admit to using the engine.it is very fast in giving you a line esp on a Skully.i think my results in corr games on the forum speak for themselves.i have won a few and lost zero so far.no doubt in due course someone will put me in my place.
> I suspect some thing wrong with your setup
What are you talking about? You sent your comment to me so- are you trying to tell me that (90 m 30 s)/40+(15m +30s) will not get you accurate results? 120 Minutes For The First 40 Moves 60 Minutes For The Next 20 Moves 15 Minutes For The Rest- is probably going to get you similar results. If I am reading his data correctly his results aren't that much different than what I came up with.
R4 is still top fish in this cyber sea- but not by much. Once or two more updates of Stockfish, might clinch it.
HOWEVER- Vas' update, and if I read him right, tweaks-"might" put R4 in an incontestable position for yet a while longer.
I currently run a match R4 vs Stckf 1.8
5min+1s
So far results are very close:
+8 =9 -11
But keep in mind that this is absolutely not significant in the statistics! For exapmle results: 2 wins or less in 10 games (0,054) is about the SAME probability as getting 41 wins or less in 100 games (0,044). The most important thing is statistical significance, not actual number of games. Eg. results: 10wins 0losses is almost same as 34wins 66losses when we expect both to be about the same (easily checked by excel).
I run matches on my 1,6Ghz computer with tablebases 4 pieces, random book 3 ply, learning off and various time limits: 1min, 1min+1s, 2min, 2min+2s, 4min, 5min, 5min+5s, 10min .... or anything I make up.
So far it seems there may be a slight edge to Rybka but not much.
5min+1s
So far results are very close:
+8 =9 -11
But keep in mind that this is absolutely not significant in the statistics! For exapmle results: 2 wins or less in 10 games (0,054) is about the SAME probability as getting 41 wins or less in 100 games (0,044). The most important thing is statistical significance, not actual number of games. Eg. results: 10wins 0losses is almost same as 34wins 66losses when we expect both to be about the same (easily checked by excel).
I run matches on my 1,6Ghz computer with tablebases 4 pieces, random book 3 ply, learning off and various time limits: 1min, 1min+1s, 2min, 2min+2s, 4min, 5min, 5min+5s, 10min .... or anything I make up.
So far it seems there may be a slight edge to Rybka but not much.
Update match
R4 vs Stckf 1.8
+11 =12 -14
Very interesting, it seems that engines are almost the same strength.
I am looking for tomorrow morning results....
R4 vs Stckf 1.8
+11 =12 -14
Very interesting, it seems that engines are almost the same strength.
I am looking for tomorrow morning results....
> Very interesting, it seems that engines are almost the same strength.
Not quite! But close. Rybka still has the edge- but that might certainly end with the next Stockfish update. However, like I say-it also depends on what Vas does with his bug fix and if he adds a few tweaks.
I think we should adapt the new name RYBAK it has a sting to it.LOL
...or simply: http://en.wikipedia.org/wiki/Alexander_Rybak
Ahmmmmm....... are you actually waiting for a R4+ fix?
Ahhhaaa hah ha ha haha haahaaaahh ahhaahha....... I gotta tell my wife that one!
Ahhhaaa hah ha ha haha haahaaaahh ahhaahha....... I gotta tell my wife that one!
She refuses to work for you because you don't spell her name right :-(
> She refuses to work for you because you don't spell her name right
That is Norm's new clone " RYBAK 4"! That's why Stockfish is killing that guppy! It probably bellies up on 8cores!
I think you are referring to Rybklone...
> I think you are referring to Rybklone...
Typo, it's Rybaklone.
Yes, Rybaklone it is!
That is it- you are on the money!
Large Testrows show, that R4 is 35 or 40 ELO stronger than SF18, fitting to a 55% : 45%
The likelyhood for "R4 gets 1 point or less in 6 games" is less than 2% and quickly we could say "Uh, something strange happend!"
But I assume, such very, very short Testrows are done very often.
And maybe each 60th Testrow may show this result.
By the way: maybe each 60th Testrow will show R4 wit 5.5 or 6 points!
And this is posted then in Forums and is discussed as a strange event.
But statistics say: this should appear. Otherwise it would be strange.
Quap
The likelyhood for "R4 gets 1 point or less in 6 games" is less than 2% and quickly we could say "Uh, something strange happend!"
But I assume, such very, very short Testrows are done very often.
And maybe each 60th Testrow may show this result.
By the way: maybe each 60th Testrow will show R4 wit 5.5 or 6 points!
And this is posted then in Forums and is discussed as a strange event.
But statistics say: this should appear. Otherwise it would be strange.
Quap
Anyone who thinks Stockfish 1.8 is stronger than R4 (or equal) needs to either check their settings or run more games. Rybka 4 is still the strongest. My own R4 settings (queen cp +7 for both black and white, 75 rook endgame scaling) absolutely demolishes Stockfish 1.8 in most games.
Stockfish 1.8 is no doubt an improvement over 1.7.1, but it's a small improvement, just like R4 wasn't that huge an improvement over R3.
Stockfish 1.8 is no doubt an improvement over 1.7.1, but it's a small improvement, just like R4 wasn't that huge an improvement over R3.
How are those settings working on long time controls against SF ?
I don't test with long Time Controls since my CPU time is limited. Also, I'm busy running other tournaments, but at short TC's, R4 was pulling ahead.
Like i said, you can test it yourself with my settings (Queen cp 7 for both black and white, rook endgame scaling 75). It's about 10-12 ELO stronger than the default Rybka 4 settings.
Like i said, you can test it yourself with my settings (Queen cp 7 for both black and white, rook endgame scaling 75). It's about 10-12 ELO stronger than the default Rybka 4 settings.
Perhaps "absolutely demolishes" means something different to you than it does to me. If your settings are 10-12 ELO stronger than the default Rybka 4 settings, that means that Stockfish 1.8 is something like 60 ELO weaker than Rybka 4 with your settings. Sure, 60 ELO is noticeable but "absolutely demolishes"? To me, that would mean something like it would win 3x more than it loses (75% score) and that would indicate something like 190-ish ELO stronger.
Assume i said "is clearly stronger" instead then :)
Don't forget that Stockfish is FREE!
AND it is continuously being updated (for FREE).
AND it is continuously being updated (for FREE).
Yah! And no promised Stockfish 3 + either!!
i have the perfect idea, if we all want a stronger engine lets just all not buy rybka 5 if an elo of 70 or more is not stated- since it is not much of a difference any way- i can play s-9 vs- rybka 4 and match rybka just by occasionally going over the moves of shredder and making my own improvements here and there by understanding the basic principles of development esp learning that when your pieces are close together they are stronger and protect one another-
i have created a new system of play that is simple yet effective against engines
i think the big disappointment of rybka 4 should be responded to by not purchasing unless major improvements are made (or else we can keep expecting shitty results in r5 and r6 if we keep buying them when they are only a bit stronger!!!)
i have created a new system of play that is simple yet effective against engines
i think the big disappointment of rybka 4 should be responded to by not purchasing unless major improvements are made (or else we can keep expecting shitty results in r5 and r6 if we keep buying them when they are only a bit stronger!!!)
Probably won't be a Rybka5 there will only be "The Cloud"
what is the cloud? you mean something even more bogus
im going to be coming in to some major cash soon and i will give people a system that will beat rybka, or, get programmers to show the true elos- it will be easy for me to improve over many programmers as my elo is 2800, even being so i am not interested in playing professionally-
im going to be coming in to some major cash soon and i will give people a system that will beat rybka, or, get programmers to show the true elos- it will be easy for me to improve over many programmers as my elo is 2800, even being so i am not interested in playing professionally-
> what is the cloud?
New Vas's model: Instead of buying Rybka, you rent her, and the hardware she runs at, online.
>i will give people a system that will beat rybka, or, get programmers to show the true elos- it will be easy for me to improve over many programmers as my elo is 2800, even being so i am not interested in playing professionally
???? Can you clarify what you mean by this? I do not understand at all :/
He probably means he has no clue what he is talking about and likely needs medical attention.
1
2
Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill