XDMS: The ARCHER data management system


Poster by: Nick Nicholas, Link Affiliates. 

As e-research practitioners know, the assets produced through e- research need to be managed to remain accessible, discoverable, and reusable over any period of time. Data management does not just happen, especially after the conclusion of a project, and cannot be the responsibility of the researcher alone. This has led to the increasing adoption of data management plans (DMP), as a partnership between institutional IT and the researcher, to share the burden of curating and maintaining research data for the wider community.

Persistent identifiers are widely used to provide ongoing citation and access to data, unaffected by the changes in storage and services that data might undergo over its lifespan. Creating persistent identifiers, and keeping them up to date and useable, is part of the responsibility for long-term curation of research data. So identifiers must be integrated into any comprehensive data management strategy.

The PILIN project has already modelled identifier usage and policy in 2007, and the PILIN Transition Project, as part of the ANDS Establishment project, has been looking in 2008 at formulating policies and guidelines for identifier use in e-research. The project has identified DMPs as an appropriate mechanism for promoting persistent identifiers in e-research, and for providing guidance for good identifier practice. To that end, we have prepared two documents which can be consulted in preparing DMPs, which we outline in this presentation.

The “Information Modelling Guide for Identifiers in e-research” is intended for researchers, to formulate the information model for what entities in the data should be persistently identified. This is the most critical contribution by researchers to identifier planning, as experts in the subject matter of the data. Researchers can identify what groupings, versions, and representations of data make conceptual sense to the likely users of that data; this in turn determines what citations of data users expect to discover and resolve to the data itself.

Beyond the information model, the guide helps researchers establish priorities for which identifiers should be maintained in the long term, past the conclusion of the project, and how identifiers should be resolved: whether directly to the data for downloading, or indirectly e.g. to metadata descriptions. The guide also introduces researchers to the concept of the data lifecycle, including data curation, repurposing, archiving, and deletion, to inform their long- term data management choices.

Incorporating Persistent Identifiers into a Data Management Plan” is intended for the IT managers responsible for formulating DMP templates. It outlines how a DMP can take into account the requirements for persistent identifiers. Beyond the requirement for an adequately complete information model, this also includes establishing the likely information management processes for the data, and leveraging identifiers for those processes. This involves data access processes, but also identifier creation and updating, to ensure identifiers are kept up to date and decoupled from data through the lifespan of the data, and beyond. Identifiers are pervasive in data management, and the document also explores the implications of persistent identification for external obligations (e.g. auditing), usage tracking, data access policies, and data sanitisation.