We have now finished loading records from the University of Birmingham Library catalogue. The one million records took us about two weeks to merge into the existing records on Copac. The overall size of the Copac database stayed pretty much the same — around 30 million records.
The final tranche of records from Cardiff University Library made it into the Copac database this morning. Altogether, we loaded 700,000 Cardiff records (in between applying updates from the other contributing libraries) in about 10 days. The de-duplication process slows down the loading rate — if we had loaded the records without the de-duplication they would have loaded in a single day.
We have now started loading the records from Oxford University into the new Copac MODS XML database. As there are approximately 5.5 million records to load it is going to take some time to complete.
The Oxford records are being consolidated into the database as they are loaded, or in other words, the database is being de-duplicated as the records are loaded.
Going from the first handful of batches loaded, each Oxford record is matching, on average, with 1.3 existing Copac records. Each of those existing records contains holdings from approximately 5 other libraries. This means that each record finally added to the database contains, on average, holdings from 6 libraries.
If this consolidation rate continues for all the Oxford records it will mean, come the end of the loading process, that the database will be smaller than when we started loading Oxford.
One would expect Oxford to hold many unique items in its collections and so the consolidation rate is likely to vary widely over all the load batches.
Not all the libraries were consolidated as we loaded them, as it would have taken too long. We saved loading Oxford until now in the expectation that its sheer size and coverage will bring together many of duplicate records that resulted from loading other libraries unconsolidated.