- - By Ghengis-Kann (***) [us] Date 2018-03-08 02:19
When I tell Chess Assistant 16 to find duplicates it creates 2 new databases, but instead of building a database with one copy of each game it creates 2 databases, one of which contains both duplicates and another which does not have the game at all.

This is obviously useless.

What am I doing wrong?
Parent - - By ventura07 (***) [ca] Date 2018-03-13 21:49
Which method are you using under Criteria?

Which Rule are you using Results | Rule?

I usually use the Duplicate search when I have added the same file twice to my database so I pick exact match under both Criteria - Compare Headers and Compare Game Bodies and then I end up with two datasets: essential and discardable and the discardable one can be deleted.

I haven't had to run it for several months but I'll try it tonight as a test to make sure of the steps I mentioned.
Parent - - By Ghengis-Kann (***) [us] Date 2018-03-14 15:10
Thanks for your reply.

I have been having so many problems with it crashing that I can't even test anything.

Is there a limit to the size of the files it can handle?
Parent - - By ventura07 (***) [ca] Date 2018-03-14 16:17
I don't think there is a limit. I ran it once with a file which combined Hugebase and Megabase although I didn't try to deduplicate the results as it would have been too much work!

I tend to use it on small files where I'm adding files and end up doing it twice. So the search is very specific - exact matches.

Is CA crashing generally or when you run the duplicate function?
Parent - By Ghengis-Kann (***) [us] Date 2018-03-19 16:59
I sorted Megabase, Hugebase, and the correspondence database that came with AQ2018 to select the games where both players are over 2100 ELO and converted them all to CDP format using AQ2012, because that is the last version of Aquarium where the record number is shown in the database window.
It is possible that the problem is actually related to AQ2012.

I can combine the data sets in CA2016 with no problem, but there is something like a 90% overlap between Hugebase and Megabase at that level.
Attempting to find duplicates results in either a memory access error, or array out of bounds error that causes the program to sit there doing nothing while the progress timers tick in unison (the amount of time spent and predicted time to completion increment in lockstep).
At that point I kill the process.

I'm at the point of just giving up and using Hugebase since it is so similar to Megabase at the highest levels, and I suspect that the new professional level games being added to each are almost all duplicates.

I don't know if CA would crash with other functionality because the only thing I have tried to do with it is remove these duplicates.
