Skip to content

Linked Data in the Context of Digital Libraries

Next week I am giving a talk on Linked Data in the Context of Digital Library Systems at SWIB09 in Cologne. So it is time to catch up with recent developments in that area. Some projects I am mentioning here will be presented in separate talks. Nevertheless I would like to briefly sketch their main contributions.

For each project, I try to find the answers to the following questions, which might be important for future adopters of Linked Data technology:

  1. What kind of data are exposed as Linked Data?
  2. How did they implement the Linked Data Principles?
  3. What was the motivation of the institutions to consider Linked Data as a way for sharing data?

The Library of Congress Subject Headings (LCSH)

The Library of Congress was one of the early adopters of Linked Data. Last year (or even earlier?) Ed Summers and his colleagues started to build a first prototype and exposed the approximately 260,000 authority records held at the Library of Congress according to the Linked Data Principles. They converted the LCSH, which were originally available as MARCXML, into SKOS and used the Library of Congress Control Number to mint the URIs of the exposed concepts. On May 1st 2009 the experimental service went into production (see http://lcsh.info/) and the LCSH are now available at http://id.loc.gov/, following the scheme http://id.loc.gov/authorities/{lccn}#concept. Here is an example: http://id.loc.gov/authorities/sh85058486#concept. Dereferencing this URI using the HTTP Accept Header “Accept: application/rdf+xml” returns:



  
    
    Hallstatt period
    
    
    
    1986-02-11T00:00:00-04:00
    
    1996-09-11T10:10:33-04:00
    
    
  
  
    Mounds--Rhine River Valley
  
  
    Iron age
  

The Swedish Union Catalogue (LIBRIS)

The Swedish Union Catalogue was the other Linked Data service presented at DC2008 (here is the paper). The catalogue comprises data from about 175 libraries and contains six million records following the MARC21 standard. Today records describing various types of resources including persons, authors, subjects, organizations are accessible at http://libris.kb.se/. This example shows a record about a book: http://libris.kb.se/bib/10542240. When you dereference this URI, you get a 303 See Other response with Location http://libris.kb.se/data/bib/10542240?format=application%2Frdf%2Bxml. This in turn returns the following record:

The records are made available by building a simple RDF wrapper on-top of the integrated library system. Similar to the LCSH, persistent URIs were created using the record’s unique number. So the URIs follow the pattern http://libris.kb.se/resource/bib/{number} for bibliographic records and http://libris.kb.se/resource/auth/{number} for authority records. Dublin Core (http://dublincore.org/) is used as vocabulary for bibliographic data, FOAF for persons and organizations. Currently the returned data are not interlinked with any other data source.

The motivation for LIBRIS to make their bibliographic data available as linked data was that the standard way to access bibliographic data is still through search-retrieve protocols such as SRU/W or Z.39.50. There is currently no way to address records directly and there are hardly any links between records. Since the LIBRIS developers had to create a new Web interface anyway, they decided to make data also available for machines, i.e., to exposed them following the linked data principles. With minor effort, they are now making data that was previously available only for the library community and/or people who were familiar the complexity of existing specifications.


  

    Hallstatt textiles : technical analysis, scientifc investigation and experiment on Iron Age textiles /
    2005
    text
    Anton Kern
    Kern, Anton 1947-
    Hallstatt : eine Einleitung zu einem sehr bemerkenswerten Ort
    
    
  

RAMEAU Subject Headings

Within the TELplus project, European institutions started to work on a service (http://www.cs.vu.nl/STITCH/rameau/) that exposes a SKOSified version the RAMEAU subject headings as open linked data. Rameau is the main subject vocabulary at the French national library (BnF) and other french institutions. It contains approx. 160,000 concepts including common nouns and geographic names. The concepts are interlinked with the LCSH concepts based on 60,000 available manual mappings. The service has been announced in April 2009 and is still experimental. Here is the announcement mail.

Depending on a client’s preference, the service can return rdf/xml, or html descriptions containing RDFa markup. Here is the sample response returned when dereferencing the resource http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11942233p (which is linked in the above LCSH record) and following the 303 See Other response:






]>

   FRBNF119422333
   Civilisation de Hallstatt
   Civilisation hallstattienne
   Culture de Hallstatt
   Culture hallstatienne
   Hallstatt, Civilisation de
   Hallstattien
   Hallstattkultur
   Osthalstattkultur
   Premier âge du fer
   Civilisation du premier âge du fer en Europe
   Source : Dict. de la préhistoire / A. Leroi-Gourhan, 1994. - Les sociétés de la préhistoire / J.-P. Mohen, Y. Taborin, 1998. - Les Celtes / V. Kruta, 2000. - La préhistoire / D. Vialou, 2004
   Domaine : 930
   
   
   
   
   
   
   




   
   
   http://www.w3.org/2004/02/skos/core#closeMatch
   1.0




   
   
   http://www.w3.org/2004/02/skos/core#closeMatch
   1.0



   


Dewey Decimal Classification (DDC) Summaries

In August 2009 Michael Panzer from OCLC announced the Dewey Decimal Classification (DDC) Summaries to be published as linked data. From http://dewey.info one can retrieve the top 1000 classes of the Dewey Decimal Classification in nine languages. As all the other services mentioned before, it uses content negotiation to determine whether to deliver rdf/xml, xhtml + rdfa, or other serialization formats (Turtle, Json). The service is still experimental, and the data are not interlinked with others. The technical details about this service are described here, including the promising statement that OCLC plans to conduct further development into this direction.

I couldn’t find any information about the effort it took to implement that service. It seems that the design of the URI patterns was far from trivial: the DDC Summary service supports things like versioning and allows clients to retrieve the changes that happened to concepts over time.

A very interesting aspect of the DDC Summaries Linked Data Service is that the exposed data include licensing information, which explicitly permits reuse of the exposed data in terms of the Creative Commons BY-NC-ND license. This means that you can use the exposed DDC Summaries under the conditions that (i) you attribute the work in the manner specified by the author, (ii) you do not use the data for commercial purposes, and (iii) you do not alter, transform or build upon this work. I am not a lawyer, but in my interpretation linking from your data sets to the DDC Summaries should be perfectly legal. Using the DDC Summaries data for organizing assets in your own commercial library system is not.

Here is a sample resource exposed by this service retrieved from http://dewey.info/class/943/2009/08/about.en:




  
    
    
    OCLC Online Computer Library Center, Inc.
    
    
    en
    943
    
    Central Europe; Germany
    
    
  


VIAF

Another very interesting project highly relevant for the digital libraries domain is called VIAF: The Virtual International Authority File (http://viaf.org/). It is a joint project of more than ten national libraries, implemented and hosted by OCLC. It has the goal to match and link authority files of national libraries and then making that information available on the Web.

Unfortunately there is not so much information available on this project, besides a set of slides saying that Semantic Web Building blocks are used within this service. But dereferencing a given URL (e.g., http://viaf.org/viaf/56611857) with the HTTP Accept Header application/rdf+xml clearly shows that the VIAF service is also taking into account the Linked Data principles:


	
	
	
		קפקה, פרנץ, 1883-1924
		Kafka, Franz, 1883-1924
		
		
		
	

Summary

So what were the main motivations of all these developments?

First of all, several people within library institutions realized that libraries should open their data to the context of a globally interlinked information network, which is the Web. To achieve that, they must integrate their vocabularies and data with the Web environment (or actually the Web architecture) so that their data can be used and integrated with any other Web application also in other communities. Linked Data is one possible the technical realization of that idea.

The problem with existing library-data exchange protocols is that (i) although the use the Internet infrastructure for exchanging data, they do not really integrate with the Web architecture, and (ii) they are very specific to the digital libraries community and difficult to adopt in other domains. The common building blocks of the Web (e.g., URI, HTTP) are widely accepted across domains.

The effort for implementing a Linked Data service on-top of existing systems was obviously rather low: LIBRIS implemented their service as part of their Website reorganization; the first LCSH prototype was implemented by a single person (?).

Interestingly the motivation to adopt Linked Data within library institutions was – as far as I know – always bottom-up, driven by a few technical enthusiasts. This nicely reflects the beginning of developments in other areas (e.g., the Web, Open Source Software, Wikipedia,…). Hopefully we can see similar developments in the digital libraries domain.

2 Trackbacks/Pingbacks

  1. Data.Information.Knowledge.Web › Notes from SWIB09 on Thursday, November 26, 2009 at 18:50

    [...] topic for libraries. Since there is already much information available on these services (see my previous post), I won’t repeat the details here. Just the updates, at least I was not aware of: LIBRIS now [...]

  2. [...] year, I wrote about Linked Data projects in the libraries area and summarized their main points. That list [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*