A significant step towards open bibliographic data was made in Copenhagen this week at the 25th anniversary meeting of the Conference of European National Librarians (CENL) hosted by the Royal Library of Denmark. From the CENL announcement:
…the Conference of European National Librarians (CENL), has voted overwhelmingly to support the open licensing of their data. What does that mean in practice? It means that the datasets describing all the millions of books and texts ever published in Europe – the title, author, date, imprint, place of publication and so on, which exists in the vast library catalogues of Europe – will become increasingly accessible for anybody to re-use for whatever purpose they want. The first outcome of the open licence agreement is that the metadata provided by national libraries to Europeana.eu, Europe’s digital library, museum and archive, via the CENL service The European Library, will have a Creative Commons Universal Public Domain Dedication, or CC0 licence. This metadata relates to millions of digitised texts and images coming into Europeana from initiatives that include Google’s mass digitisations of books in the national libraries of the Netherlands and Austria. ….it will mean that vast quantities of trustworthy data are available for Linked Open Data developments
There is much to be welcomed here. Firstly that the vote was overwhelming. Secondly that the open license chosen to release this data under is Creative Commons CC0 thus enabling reuse for any purpose. You cannot expect such a vote to cover all the detail, but the phrase ‘trustworthy data are available for Linked Open Data developments’ does give rise to some possible concerns for me. My concern is not from the point of view that this implies that the data will need to be published as Linked Data – this also should be welcomed. My concern comes from some of the library focused Linked Data conversations, presentations and initiatives I have experienced over the last few months and years. Many in the library community, that have worked with Linked Data, lean towards the approach of using Linked Data techniques to reproduce the very fine detailed structure and terminology of their bibliographic records as a representation of those records in RDF (Linked Data data format). Two examples of this that come to mind:
- The recent release of an RDF representation of the MARC21 elements and vocabularies by MMA – Possibly of internal use only to someone transforming a library’s MARC record collection to identify concepts and entities to then describe as linked data. Mind-numbingly impenetrable for anyone who is not a librarian looking for useful data.
- The Europeana Data Model (EDM). An impressive and elegant Linked Data RDF representation of the internal record structure and process concerns of Europeana. However again not modelled in a way to make it easy for those outside the [Europeana] library community to engage with, understand and extract meaning from.
The fundamental issue I have with the first of these and other examples is that their authors have approached this from the direction of wishing to encode their vast collections of bibliographic records as Linked Data. Whereas they would have ended up with a more open [to the wider world] result if they had used the contents of their records as a rich resource from which to build descriptions of the resources they hold. In that way you end up with descriptions of things (books, authors, places, publishers, events, etc.) as against descriptions of records created by libraries. Fortunately there is an excellent example of a national library publishing Linked Data which describe the things they hold. The British Library have published descriptions of 2.6 million items they hold in the form of the British National Bibliography. I urge those within Europeana and the European National libraries community, who will be involved in this opening up initiative, to take a close look at the evolving data model that the BL have shared, to kick-start the conversation on the most appropriate [Linked Data] techniques to apply to bibliographic data. For more detail see this Overview of the British Library Data Model. This opening up of data is a great opportunity for trusted librarian curated data to become a core part of the growing web of data, that should not be missed. We must be aware of previous missed opportunities, such as the way XMLMarc just slavishly recreated an old structure in a new format. Otherwise we could end up with what could be characterised, in web integration terms as, a significant open data white elephant. Nevertheless I am optimistic, with examples such as the British Library BnB backing up this enthusiastic move to open up a vast collection of metadata, in a useful way, that will stimulate Linked Data development, I have some confidence in a good outcome. Disclosure:Bibliographic domain experts from the British Library worked with Linked Data experts from the Talis team, in the evolution of the BnB data model – something that could be extended and or/repeated with other national and international library organisations.This post was also published on the Talis Consulting Blog