Catalogue of the Leadhills Miners’ Library added to Copac

We’re pleased to announce that the records of the Leadhills Miners’ Library have been added to Copac.

Leadhills Miners’ Library is the principal collection of the Leadhills Heritage Trust, which manages the library. The library was founded in 1741 as the Leadhills Reading Society; is both the first and earliest subscription library to be founded in Britain; and is also the world’s first library for working people. Its stock peaked at around 4000 volumes in the early 20th century, and today its 2500 surviving volumes represent a history of working class reading from the early 18th century until the 1930s. The collection demonstrates the development of working class reading: initially focusing on religion, before expanding to cover secular non–fiction (including history, voyages and travel, and biography) and then fiction. It includes 600 volumes purchased with grants from the Ferguson Bequest Fund between about 1870 and 1930, and is the largest surviving collection of its kind. The collection also includes local imprints, such as that of John Wilson of Kilmarnock, Robert Burns’s first publisher. The library functioned as a lending library until the 1960s and is now a closed reference and research collection.

The library is closely linked to the early lifelong learning ideology of mutual improvement and was the first library in Europe to make this connection, following the development of the idea in Philadelphia by Benjamin Franklin in 1731. It therefore played a key role in the development of information ideology in Europe. The principal users of the library were lead miners, whose favourable working conditions and high levels of literacy gave them time to read. The foundation of the library was linked to a programme of reforms in the village, originated by the mine manager and Jacobite intellectual, James Stirling of Garden (1692-1770).

The Leadhills library banner (c 1820)

The Leadhills library banner (c 1820)

The Library’s collections also include the earliest library banner in Britain (c 1820), which featured on the Antiques Roadshow in June 2017, and the largest collection of Bargain Books in Scotland. These record the short term contracts made between the mine managers and teams of miners. The collection has recently been digitised. The Library also possesses the only known example of a library pulpit where the library president (preses) sat while presiding over the monthly loan and return meetings. Some examples of printed catalogues are also held, including the last major catalogue of the library collection itself, listing 3800 volumes and printed in 1904. A modern catalogue was compiled in the 1980s and has formed the basis for the records now available through COPAC.

Leadhills also holds a collection of library artefacts, including ballot boxes for voting on accepting new members, membership certificates and printing plates for printing off membership certificates, and a printing plate for printing copies of the Library bookplate.

The Library had its own building prior to 1791 but its location is not known. The current building was erected in 1791 and is one of the oldest public library buildings in Scotland. It is essentially a miners’ cottage without internal divisions and demonstrates the influence of domestic architecture on library design. It is shelved on three sides with the fourth, north-facing, long wall providing fenestration and a door. It is mentioned in the Old Statistical Account.

The Library is open to the public on Saturday and Sunday afternoons, May to September, 2-4 pm. Access at other times by appointment. Tours of the library and village are available for groups on request.

In the mutual improvement tradition the Library offers a monthly programme of lectures during the winter months and also occasional special lectures. Local community groups also meet in the Library.

John Crawford
Chair, Leadhills Heritage Trust

You can find out more about the library, including contact details, on their Copac information page. To browse the library’s records, select the Main Search tab on Copac and choose ‘Leadhills Miners’ Library’ from the list of libraries.

Birkbeck, University of London Library catalogue added to Copac

We’re pleased to announce that the holdings of Birkbeck, University of London Library have been added to Copac.

Photo of Birkbeck, University of London Library

Photo of Birkbeck, University of London Library

Spread across five floors of the main Birkbeck building in Bloomsbury, central London, this richly-resourced library holds more than 300,000 items covering subjects including applied linguistics, economics, mathematics and statistics, law, psychology and Victorian studies, as well as offering a wealth of online resources.

You can find out more about the library on their Copac information page, and see descriptions of their archival collections at the Archives Hub.

To browse or limit your search to its holdings, select the Main Search tab in Copac and choose ‘Birkbeck, University of London Library’ from the list of libraries.

Catalogues of Bangor University Library and Northumbria University Library added to Copac

We’re pleased to announce that the holdings of Bangor University Library and Northumbria University Library have been added to Copac.

Photograph of Bangor University Library

Bangor University Library

Bangor University Library holds an extensive range of print and electronic resources, and also has one of the largest university-based archives in the UK, cared for by its internationally recognised Archives and Special Collections department.

The service is housed over four libraries:

  • Main Library – holds collections for Arts, Languages, Humanities, Social Sciences, Music, Law, the Welsh Library, and is the location of the Archives and Special Collections department.
  • Deiniol Library – holds collections for Sciences, Psychology and Healthcare Sciences. Also kept here is a large collection of Ordnance, Soil and Geological Survey maps.
  • Normal Library – holds materials related to Education, Sport, Health and Exercise Sciences and collections of children’s books.
  • Wrexham Library – holds materials on Nursing, Midwifery, Radiography and Healthcare Sciences.

You can find out more about the library on their Copac information page, and see descriptions of their archival collections at the Archives Hub.

Northumbria University Library holds comprehensive digital and print collections, comprising over 550,000 print books, over 822,000 ebooks and in excess of 108,000 online journals and print journals.

The service is housed over three libraries:

  • City Campus Library – is the largest of the libraries and houses collections that support all of the subjects that are taught at City Campus. The library has been extensively refurbished since 2013.
  • Coach Lane Library – based on the east side of Coach Lane Campus, and houses collections that support all of the subjects that are taught at Coach Lane Campus.
  • Law Practice Library – situated within City Campus East and is located on the first floor of the CCE1 Building (Business and Law), and it houses a reference collection of law resources, including textbooks, law reports and journals.

You can find out more about the library on their Copac information page.

To browse or limit your search to the holdings of either library, select the Main Search tab in Copac and choose the library name from the list of libraries.

Institute of Ismaili Studies & ISMC Library: catalogue added to Copac

We’re pleased to announce that the holdings of the Institute of Ismaili Studies & ISMC Library have been added to Copac.

Photograph of a Qur’an at the IIS & ISMC Library

Qur’an at the IIS & ISMC Library. Image copyright: IIS & ISMC Library

The Institute of Ismaili Studies & ISMC Library (IIS-ISMC) aims to serve scholarship in areas and languages of interest to the Institute for the Study of Muslim Civilisations and the Institute of Ismaili Studies by emphasising the development of collections of primary and secondary resources, and published and rare materials, in the fields of Islamic studies and Muslim civilisation, as well as in the Humanities in general.

The library gathers materials published in and about the major areas of the Muslim world and their diasporas, as well as their history and evolution, while specifically paying attention to topics such as Ismaili studies, wider Shi’i studies, and Qur’anic studies and education.

Among the special collections, library highlights include the donation of the personal library of Professor Annemarie Schimmel, an important collection focusing on the Indo-Muslim communities and cultures, which contains several out of print works in Sindhi, Persian and Urdu; and also the donation of part of the library and personal archive of Professor Mohammed Arkoun, including his professional correspondence, notes, offprints of his articles and over 200 theses on Islamic thought, history and culture.

The library also houses an important collection of books in Ottoman Turkish that mainly includes works of literature from the Tanzimat and post-Tanzimat period, particularly novels, poetry and dramas, as well as travel literature, language materials and historical works, dating from the 18th to the early 20th centuries.

To browse or limit your search to its holdings, select the Main Search tab in Copac and choose ‘Institute of Ismaili Studies & ISMC Library’ from the list of libraries.

New Copac database and revised interface

We’ve released a new Copac database and made a number of revisions to the interface. The most visible changes are:

  • An updated look which will work better with mobile devices.
  • Increased deduplication, including all pre-1800 materials.
  • Clearer indication of document format (eg. print vs electronic).
  • Options to expand merged records. You can look ‘under the bonnet’ of a merged record to see the original individual records supplied by each library, or just a subset of the original records eg. just those for printed materials.

We have currently removed the options for sorting search results. This is a temporary measure, one of a number of changes we have made whilst we assess how the new database performs now it’s in service. We will reintroduce the sort options again once we have a better sense of the overall system performance. We are also looking to move off our old hardware in the near future with one aim being to increase response times.

Changes to the database and interface have been made in response to feedback, in particular balancing concerns about duplicate records vs the desire not to lose access to the original records from each library for early printed materials. We’ve recently been working with Copac users on the interface changes and we’re continuing with interface testing and development later this year. So any feedback you have on the interface will be valuable for us to include into the ongoing development.

Note: The document format identification and deduplication are not perfect, they are both affected by the variability of the data. Deduplication of records for early printed materials has raised particular issues. We have a range of checks to try to deal with some of the record variation in both these ares, but we will be looking further at these in the future.

Missing catalogues:

Four of our contributors changed to a new library system last year, so to ensure we can continue to update their data we need a complete catalogue reload. They have had difficulties successfully exporting data so, currently, four catalogues are missing from Copac. We have been working with one of the libraries and their system supplier to help resolve problems with their data export. This has taken some time, but we should begin the load of the York catalogue shortly. If this goes well we will be aiming to load the other missing catalogues as soon as possible. The libraries affected are:

  • Imperial College London
  • University of Manchester
  • University of Sheffield
  • University of York (including NRM and York Minster)

Ongoing development

The new database and revised interface have involved major changes behind-the-scenes to provide us with a stable base for continued service expansion, as well as the potential to introduce new facilities in the future. We have some ongoing system issues and we’re working to mitigate these in the short term, whilst at the same time planning a move from our old hardware onto a new cloud platform, with a focus on response times.

Keeping in touch

You can stay in touch with Copac activity through:

You can also provide feedback on the service at any time through the Copac helpdesk: as well as by filling in our annual user survey. We really appreciate your feedback and the comments we get help guide the development of the service.

Copac deduplication

Over 60 institutions contribute records to the Copac database. We try to de-duplicate those contributions so that records from multiple contributors for the same item are “consolidated” together into a single Copac record. Our de-duplication efforts have reduced over 75 million records down to 40 million.

Our contributors send us updates on a regular basis which results in a large amount of database “churn.” Approximately one million records a month are altered as part of the updating process.

Updating a consolidated record

Updating a database like Copac is not as immediately intuitive as you may think. A contributor sending us a new record may result in us deleting a Copac record. A contributor who deletes a record may result in a Copac record being created. A diagram may help explain this.

A Copac consolidated record created from 5 contributed records. Lines show how contributed records match with one another.

The above graph represents a single Copac record consolidated from five contributed records: a1, a2, a3, b1 & b2. A line between two records indicates that our record matching algorithm thinks the records are for the same bibliographic item. Hence, record a1,a2 & a3 match with one another; b1 & b2 match with each other and a1 matches with b1.

Should record b1 be deleted from the database, then as b2 does not match with any of a1, a2 or a3 we are left with two clumps of records. Records a1, a2 & a3 would form one consolidated record and b2 would constitute a Copac record in its own right as it matches with no other record. Hence the deletion of a contributed record turns one Copac record into two Copac records.

I hope it is clear that the inverse can happen — that a new contributed record can bring together multiple Copac records into a single Copac record.

The above is what would happen in an ideal world. Unfortunately the current Copac database does not save a log of the record matches it has made and neither does it attempt to re-match the remaining records of a consolidated set when a record is deleted. The result is that when record b1 is deleted, record b2 will stay attached to records a1, a2 & a3. Coupled with the high amount of database churn this can sometimes result in seemingly mis-consolidated records.

Smarter updates

As part of our forthcoming improvements to Copac  we are keeping a log of records that match. This makes it easier for the Copac update procedures to correctly disentangle a consolidated record and should result in less mis-consolidations.

We are also trying to make the update procedures smarter and have them do less. For historical reasons the current Copac database is really two databases: a database of the contributors records and a database of consolidated records. The contributors database is updated first and a set of deletions and additions/updates is passed onto the consolidated database. The consolidated database doesn’t know if an updated record has changed in a trivial way or now represents another item completely. It therefore has no choice but to re-consolidate the record and that means deleting it from the database and then adding it back in (there is no update functionality.) This is highly inefficient.

The new scheme of things tries to be a bit more intelligent. An updated record from a contributor is compared with the old version of itself and categorised as follows:

  • The main bibliographic details are unchanged and only the holdings information is different.
  • The bibliographic record has changed, but not in a way that would affect the way it has matched with other records.
  • The bibliographic record has changed significantly.

Only in the last case does the updated record need to be re-consolidated (and in future that will be done without having to delete the record first!) In the first two cases we would only need to refresh the record that we use to create our displays.


An analysis of an update from one of our contributors showed that it contained 3818 updated records; 954 had unchanged bibliographic details and only 155 had changed significantly and needed reconsolidating. The saving there is quite big. In the current Copac database we have to re-consolidate 3818 records. In the new version of Copac we only need to re-consolidate 155. This will reduce database churn significantly, result in updates being applied faster and allow us to have more contributors.

Example Consolidations

Just for interest and because I like the graphs, I’ve included a couple graphs of consolidated records from our test database. The first graph shows a larger set of records. There are two records in this set that when either are deleted would result in the set being broken up into two smaller sets.

The graph below shows a smaller set of records where each record matches with every other record.

Performance improvements

The run up to Christmas (or Autumn term if you prefer) is always our busiest time of year as measured by the number of searches performed by our users. Last year the search response times were not what we would have liked and we have been investigating the causes of the poor performance and ways of improving it. Our IT people determined that at our busiest times the disk drives in our SAN were being pushed to their maximum performance and just couldn’t deliver data any faster. So, over the summer we have installed an array of Solid State Disks to act as a fast cache for our file-systems (for the more technical I believe it is actually configured as a ZFS Level 2 Cache.)

The SSD cache was turned on during our brief downtime on Thursday morning and so far the results look promising. I’m told the cache is still “warming up” and that performance may improve still further. The best performance indicator I can provide is the graph below. We run a “standard” query against the database every 30 minutes and record the time taken to run the query. The graph below plots the time (in seconds) to run the query since midnight on the 23rd August 2011. I think it is pretty obvious from looking at the graph exactly when the SSD cache was configured in.

It all looks very promising so far and I think we can look forward to the Autumn with less trepidation and hopefully some happier users.

Hardware move

The hardware move has gone relatively smoothly today. We’ve had some configuration issues that prevented some Z39.50 users from pulling back records and another configuration problem that meant a small percentage of the records weren’t visible. That should all be fixed now, but if you see something else that looks like a problem, then please let us know.

The DNS entry for was changed at about 10am this morning. At 4pm we’re still seeing some usage on the old hardware. However, most usage started coming through to the new machine very soon after the DNS change.

The change over to the new hardware has involved a lot of preparation over many weeks. Now it’s done we can now get back to re-engineering Copac… a new database backend and new search facilities for the users.

Behind the Copac record 2: MODS and de-duplication

We left the records having been rigorously checked for MARC consistency, and uploaded to the MARC21 database used for the RLUK cataloguing service. Next they are processed again, to be added to Copac.

One of the major differences between Copac and the MARC21 database is that the Copac records are not in MARC21. They’re in MODS XML, which is

an XML schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. It is a derivative of the MARC 21 bibliographic format (MAchine-Readable Cataloging) and as such includes a subset of MARC fields, using language-based tags rather than numeric ones.

Copac records are in MODS rather than MARC because Copac records are freely available for anyone to download, and use as they wish. The records in the MARC21 database are not – they remain the property of the creating library or data provider. We couldn’t offer MARC records on Copac without getting into all sorts of copyright issues. Using MODS also means we have all the interoperability benefits of using an XML format.

Before we add the records to Copac we check local data to ensure we’re making best use of available local holdings details, and converting local location codes correctly. Locations in MARC records will often be in a truncated or coded form, eg ‘MLIB’ for ‘Main Library’. We make sure that these will display in a format that will be meaningful to our users.
Click for larger version
It is also at this point that we do the de-duplication of records for Copac. Now, Copac de-duplication garners very mixed reactions: some users think we aren’t doing enough de-duplication; and occasionally we get told that we’re doing too much! We can’t ever hope to please everyone, but we’re aware that the process isn’t perfect, and we’ll be reviewing and updating deduplication during the reengineering. We will also be exploring FRBR work level deduplication.
As I’ve mentioned in an earlier blog post , we don’t de-duplicate anything published pre-1801. So what do we do for the post-1801 records?

As new records comes in we do a quick and dirty match against the existing records using one or more of ISBN, ISSN, title key and date. This identifies potential matches which go through a range of other exact and partial field matches. The exact procedure will vary depending on the type of material, so journals (for instance) will go through a slightly different process than monographs.

Records that are deemed to be the same are merged and for many fields unique data from each record is indexed. This provides for enhanced access to materials eg. a wider range of subject headings than would be present in any of the original records. The deduplication process can thus result in the creation of a single enhanced record containing holdings details for a range of contributing libraries.

As we create the Copac records we also check for the availability of supplementary content information for each document, derived from BookData. We incorporate this into the Copac record further enhancing record content for both search and display, eg. a table of contents, abstract, reviews.

Because the deduplication process is fully automated it needs to err on the side of caution, otherwise some materials might disappear from view, subsumed into similar but unrelated works. This can mean records that appear to be self-evident duplicates to a searcher may be separated on Copac because of minor differences in the records. Changes made to solve one problem example could result in many other records being mis-consolidated. It’s a tricky balance.

However, there is another issue: the current load and deduplication is a relatively slow process. We have large amounts of data flowing onto the database everyday and restricted time for dealing with updates. Consequently, where a library has being making significant local changes to their data, and we get a very large update (say 50,000 records), then this will be loaded straight onto Copac without going through the deduplication process.

This means that the load will, almost certainly, result in duplicate records. These will disappear gradually as they are pulled together by subsequent data loads, but it is this bypassing of the deduplication procedure in favour of timeliness, that results in many of the duplicate records visible on Copac. One of the aims of the reengineering is to streamline the dataload process, to avoid this update bottleneck, and improve overall duplicate consolidation levels.

So, that’s the Copac record, from receipt to display. We hope you’ve enjoyed this look behind the Copac records. Anything else you’d like to know about? Tell us in the comments!

Thanks to Shirley Cousins for the explanation of the de-duplication procedures

Behind the Copac record

We’re going to be talking quite a lot about the Copac reengineering, including the move to FRBRise Copac, and in order for you to have some idea of how this is going to change what we do, you need to know what we do now.  So here’s a brief background on the life of a Copac record.

Records are sent to us by the contributing institutions, usually in MARC exchange format, which looks like this:

An unprocessed MARC exchange file

An unprocessed MARC exchange file

We then run this through programmes created by our wonderful programmers (and about which I know very very little, except that they’re fantastic and save both my eyes and my sanity), which create records that look like this:

A processed MARC file

A processed MARC file

This is much easier on the eye, which is fortunate, as this is the stage where I use the warning file (also generated by the program) to look through and track down any possible errors. This is mainly only done when loading a new library – once a library has been loaded, we just keep an eye on their updates to identify any changes, or new issues that arise.

For instance, the warning file might say ‘WARNING: LONG NAME IN 100 MAY NOT BE PERSONAL NAME  REC 92765’.  I would then look up that record, and check whether the long name in the 100 is, in fact, a personal name, or if it is a corporate name and needs to be in a 110.

This program has been evolving ever since the start of Copac, and it’s now able to handle most changes with very little need for human intervention.  Therefore, when I see ‘WARNING: 700 ‘1 $aDaille, Jean, 1594-1670.’ CHANGED TO ‘1 $aDaille, Jean,$d1594-1670.’, I know that I don’t need to do anything – that change is correct.
Some warnings do need looking at in more depth.  If I see a warning that says something along the lines of ‘WARNING: NO 245 IN REC 76932.  240 CONVERTED TO 245’, then I will look at the original record and the altered record to see if that change is correct.

At this stage we’ll also check if there are any generic fields being used in a local way, that notes are in the correct notes fields, and that all records have holdings information.  Note that we’re largely not in a position to assess the quality of the data in the fields – purely that the right sort of data is in the right fields.  We wouldn’t, for example, correct typos in author’s names or incorrect publication dates.  As well as the fact that doing so would require making judgements, and make the whole process simply unmanageable, the data on Copac belongs to the contributing libraries, and so they are the ones who would need to make any corrections to the content.  Thus, in general,  the only changes we would make are to the MARC structure (or occasionally to the encoding of special characters), to try to ensure standardised data for record sharing and for building Copac.  The  data content of the fields we leave exactly as they are.

Once we’re satisfied that all this is correct, the data is loaded onto the RLUK shared cataloguing database in MARC21 format, where it is available for use by RLUK members and customers.  Back in the Copac office, it’s time for another round of processing, before the data is loaded onto Copac.  More on that next time!