Authors
Steve Androulakis, Mark Bate, Anthony Beitz, Wojtek Goscinski, and Ashley Buckle (Monash University)
Abstract
There is an increasing need for researchers to describe and share data sets associated with published scientific results with the wider scientific community. This need comes from scientific journals requesting data sets to be made available with results for verification purposes, but also from communities themselves. Having open access to data sets of publications allows researchers to learn more about others’ findings, and process them to further their own research goals.
Several challenges have arisen when attempting to share often-large data sets over the internet. There is usually no central location that has the capacity and funds to store the world’s raw data for any one scientific discipline. Data sets, often several gigabytes, if not terabytes in size, are often too large to fit into the software constraints of traditional digital repositories which were created with the storage of papers and media clips in mind.
Technological challenges aren’t the only ones encountered when attempting to fill the data commons. Ease of data set deposition, description and citation are all considerations when trying to create a system that is widely adopted. Many labs don’t have direct access to computing expertise, thus data set deposition and repository setup must be as simple as possible.
We have created a solution that takes into account these challenges, using common web technologies, data annotation and deposition tools that can be deployed at any site cheaply and easily. TARDIS provides the protein crystallography community the means to easily describe their data sets, deposit them into a simple repository and have them displayed on the web. Data is shown alongside rich, searchable metadata and is freely downloadable. The TARDIS framework is designed to be easily adaptable to a wide range of scientific disciplines.
About the speaker
Steve Androulakis is a software developer based at Monash University. His areas of work include eResearch and bioinformatics solutions relating to digital curation, the sharing and collaboration of research data, and applying high performance computing approaches to research problems.
