Surfacing the Academic Long Tail — Announcing new work with activity data

We’re pleased to announce that JISC has funded us to work on the SALT (Surfacing the Academic Long Tail) Project, which we’re undertaking with the University of Manchester, John Rylands University Library.

Over the next six months the SALT project will building a recommender prototype for Copac and the JRUL OPAC interface, which will be tested by the communities of users of those services.  Following on from the invaluable work undertaken at the University of Huddersfield, we’ll be working with ten years+ of aggregated and anonymised circulation data amassed by JRUL.  Our approach will be to develop an API onto that data, which in turn we’ll use to develop the recommender functionality in both services.   Obviously, we’re indebted to the previous knowledge acquired by a similar project at the University of Huddersfield and the SALT project will work closely with colleagues at Huddersfield (Dave Pattern and Graham Stone) to see what happens when we apply this concept in the research library and national library service contexts.

Our overall aim is that by working collaboratively with other institutions and Research Libraries UK, the SALT project will advance our knowledge and understanding of how best to support research in the 21st century. Libraries are a rich source of valuable information, but sometimes the sheer volume of materials they hold can be overwhelming even to the most experienced researcher — and we know that researchers’ expectation on how to discover content is shifting in an increasingly personalised digital world. We know that library users — particularly those researching niche or specialist subjects — are often seeking content based on a recommendation from a contemporary, a peer, colleagues or academic tutors. The SALT Project aims to provide libraries with the ability to provide users with that information. Similar to Amazons, ‘customers who bought this item also bought….’ the recommenders on this system will appear on a local library catalogue and on Copac and will be based on circulation data which has been gathered over the past 10 years at The University of Manchester’s internationally renowned research library.

How effective will this model prove to be for users — particularly humanities researchers users?

Here’s what we want to find out:

  • Will researchers in the field of humanities benefit from receiving book recommendations, and if so, in what ways?
  • Will the users go beyond the reading list and be exposed to rare and niche collections — will new paths of discovery be opened up?
  • Will collections in the library, previously undervalued and underused find a new appreciative audience — will the Long Tail be exposed and exploited for research?
  • Will researchers see new links in their studies, possibly in other disciplines?

We also want to consider if there are other  potential beneficiaries.  By highlighting rarer collections, valuing niche items and bringing to the surface less popular but nevertheless worthy materials, libraries will have the leverage they need to ensure the preservation of these rich materials. Can such data or services assist in decision-making around collections management? We will be consulting with Leeds University Library and the White Rose Consortium, as well as UKRR in this area.

(And finally, as part of our sustainability planning, we want to look at how scalable this approach might be for developing a shared aggregation service of circulation data for UK University Libraries.  We’re working with potential data contributors such as Cambridge University LibraryUniversity of Sussex Library, and the M25 consortium as well as RLUK to trial and provide feedback on the project outputs, with specific attention to the sustainability of an API service as a national shared service for HE/FE that supports academic excellence and drives institutional efficiencies.

Perspectives on Goldmining.

Last Friday, Shirley and I headed down to London for the TiLE workshop: ‘”Sitting on a gold mine” — Improving Provision and Services for Learners by Aggregating and Using ‘Learner Behaviour Data.’ The aim of the workship was to take a ‘blue skies’ (but also practical) view of how usage data can be aggregated to improve resource discovery services on a local and national (and potentially global) level. Chris Keene from the University of Sussex library has written a really useful and comprehensive post about the proceedings (I had no idea he was ferverishly live blogging across the table from me — but thanks, Chris!)

I was invited to present a ‘Sector Perspective’ on the issue, and specifically the ‘Pain Points’ identifed around ‘Creating Context’ and ‘Enabling Contribution.’ The TiLE project suggests a lofty vision where, with the sufficient amount of context data about a user (derived from goldmines such as attention data pools and profile data stored within VLEs, library service databases, institional profiles — you know, simple enough;-) services could become much more Amazon-like.  OPACs could suggest to users, ‘First Year History Students who used this textbook, also highly rated this textbook…’ and such. The OPAC is thus transformed from relic of the past, to a dynamic online space enabling robust ‘architectures of participation.’

This view is very appealing, and certainly at Copac we’re doing our part to really interrogate how we can support *effective* adaptive personalisation. Nonetheless, as a former researcher and teacher, I’ve always had my doubts as to whether the Library catalogue per se, is the right ‘place’ for this type of activity.

We might be able to ‘enable contribution’ technically, but will it make a difference? An area that perhaps most urgently needs attention is research on the social component and drivers for contributing user-generated content.  As the TiLE project has identified, the ‘goldmine’ here to galvanise such usage is ‘context’ or usage data. But is it enough, especially in the context of specialised research?

As an example of the potential ‘cultural issues’ that might emerge, the TiLE project suggests the case of the questionably nefarious tag ‘wkd bk m8’ which is submitted as a tag for a record. They ask, “Is this a low-quality contribution, or does it signal something useful to other users, particularly to users who are similar to the contributor?”

I’d tend to agree the latter, but would also say that this is just the tip of the iceberg when it comes to rhetorical context. For example, consider the user-generated content that might arise around contentious works around the ‘State of Israel.’ The fact that Wikipedia has multiple differing and ‘sparring’ entries around this is a good indicator of the complexity that emerges. I would say that this is incredibly rich complexity, but on a practical level potentially very difficult for users to negotiate. Which UGC derived ‘context’ is relevant for differing users? Will our user model be granular or precise enough to adjust accordingly?

One of the challenges of accommodating a system-wide model is the tackling of semantic context. Right now, for instance, Mimas and EDINA have been tasked to come up with a demonstrator for a tag recommender that could be implemented across JISC services. This seems like a relatively simple proposition, but as soon as we start thinking about semantic context, we are immediately confronted with the question of which concept models or ontologies do we draw from?

Semantic harvesting and text mining projects such as the Intute Repository Search have pinpointed the challenge of ‘ontological drift’ between disciplines and levels. As we move into this new terrain of Library 2.0 this drift will likely become all the more evident.

Is the OPAC too generic to facilitate the type of semantic precision to enable meaningful contribution? I have a hunch it is, as did other participants when we broke out into discussion sessions.

But perhaps the goldmine of context data, that ‘user DNA,’ will provide us with new ways to tackle the challenge, and there was also a general sense that we needed to forge forward on this issue — try things out and experiment with attention data.  A service that gathers that aggregates both user-generated and attention/context data would be of tremendous benefit, and Copac (and other like services) can potentially move to a model where adaptive personalisation is supported.  Indeed, Copac as a system-wide service has a great potential as an aggregator in this regard.

There is risk involved around these issues, but there are some potential ‘quick wins’ that are of clear immediate benefit. Another speaker on Friday was Dave Pattern, who within a few minutes of ‘beaming to us live via video from Huddersfield’ had released the University of Huddersfield’s book usage data (check it out).

This is one goldmine we’re only too happy to dig into, and we’re looking forward to collaborating with Dave in the next year to find ways to exploit and further his work in a National context.  We want to implement recommender functions in Copac, but also (more importantly) working at Mimas to develop a system for the store and share of usage data from multiple UK libraries (any early volunteers?!)  The idea is that this data can also be reused to improve services on a local level.   We’re just at the proposal stage in this whole process, but we feel very motivated, and the energy of the TiLE project workshop has only motivated us more.