Mark Hedges: Data grid storage for digital libraries and archives based on iRODS


View the presentation slides PDF (1.1 Mb)

Abstract

Digital library software can provide a powerful and flexible infrastructure for managing and delivering complex digital resources and metadata, and as such form a key component in emerging e-Research infrastructures. However, while such systems are proving successful in managing complexity, issues can arise in dealing with the very large, distributed data files that may constitute the resources managed by such systems.

Data grid middleware, such as Storage Resource Broker (SRB), has been widely and successfully used for storing and managing large distributed data sets. In particular it has been implemented as a storage layer for digital libraries and archives, in the UK, Australia and elsewhere, providing efficient support for managing and processing the underlying distributed data objects, which may be very large. However, there are limitations in the support that they provide for implementing the complex metadata and processing required in digital library and archive environments.

iRODS, developed by the San Diego Supercomputer Center as the successor to SRB, shows great promise in addressing some of the limitations of these earlier systems. In particular, iRODS incorporates a Rule Engine that allows complex workflows and data management policies to be integrated at the data level, offering improved performance and isolating users from functionality that does not need to be visible to them. These workflows and policies are represented in terms of rules, which allow pre-defined sequences of actions to be triggered in particular circumstances. As the processing included in rules is more or less arbitrary, this facility also allows more sophisticated integration of digital library and archive systems with the underlying data grid, over and above simply using it as a storage layer, for example by accessing the more advanced metadata management facilities of digital repository systems or triple stores.

The Centre for e-Research, King’s College London, is taking an approach that combines the Fedora digital library software with a storage layer implemented using iRODS. This approach allows us to use Fedora’s flexible architecture to manage the structure of resources and to provide application-layer services to users, while the grid-based storage layer and rule system supports efficient management and processing of the underlying data objects. In particular we will present a number of practical examples illustrating the utility and importance of the iRODS’ rule-based approach for digital library and archive systems. These examples will include: (i) the encoding of digital preservation strategies and policies as rules, embedding automated curation within the storage layer; (ii) the use of rules to perform efficient manipulation and analysis of large data objects, such as video and high resolution images, in digital libraries; and (iii) fine-grained, role-based distributed access management (using Shibboleth).

About the speakers

Mark Hedges Mark Hedges is Deputy Director of the Centre for e-Research at King’s College London, and before this was Technical Manager of the Arts and Humanities Data Service. At both of these institutions he has worked in the fields of data and information management, digital repositories, digital libraries and e-research infrastructures. Prior to this, he was employed for 17 years in the software industry, taking the lead on a number of largescale development projects for industrial and commercial clients. His academic background is in mathematics and philosophy – he has a Ph.D. in mathematics – and, more recently, in Byzantine studies.

 

 

 

 

 

 


John Byron John Byron has been Executive Director of the Australian Academy of the Humanities since August 2003. He has worked in higher education and research policy in different capacities since 1997, including a stint in 2001 as national President of the Council of Australian Postgraduate Associations. He has a PhD from the University of Sydney, a first-class honours degree in English, and a university medal from the University of Adelaide. His doctoral dissertation was on recent movies dealing with questions of memory, reality, identity and authenticity. He is Secretary of the Association for the Medical Humanities (ANZ), and serves on several other boards including the Governing Council of Old Parliament House and the Council for the Humanities, Arts and Social Sciences. He is a member of the Australian Research Council’s Humanities subcommittee to the Indicators Development Group for the Excellence in Research for Australia initiative. He is an Adjunct Research Fellow of the Research School of Humanities Research Centre at the Australian National University.