Rodney McDuff: The Research and Education Distributed Data Storage (REDDS) Project


View the presentation slides PDF (977 Kb)

Abstract

We are on the cusp of an explosion of data storage. World-wide, in both the commercial and the higher education and research sectors, we are seeing an exponential trend in the growth of data to be stored. For instance, JPMorgan Chase currently have more that 14 petabytes of active data storage with plans to increase that with “several more petabytes” in the coming year. Research institutions like NASA, the San Diego Supercomputer Centre and Lawrence Livermore National Laboratory currently have multi-petabyte online storage with an order-of-magnitude more in off-line storage. Additionally, when experiments such as ATLAS (based at CERN) start running at their full trigger data rate of 750MB/s, they will be accruing data at a rate of over 2 petabytes/month. Well aware of these trends, the data storage industry have forecasted that in 2011 there will be approximately 1.8 zettabyes (10^21) of stored data in existence and by 2013, the industry as a whole will be shipping a yottabyte (10^24) of storage per year.

Australia will not be isolated from these trends, especially in our higher education and research sector. We too have our fair share of data intensive research projects such as the Monash Centre for Synchrotron Science, the Skymapper project and AuScope to name a few. This ties us into another emerging global trend; that Big Science is usually found in the company of Big Data Storage. Like-wise Small Science often needs Big Data Storage as well. Using conservative estimates, data accrued over all researchers in the Australian Higher Education and Research sector easily accumulates to over a petabyte and it is growing.

While this preponderance of data is currently pushing the envelope of current conventional data storage technologies, other secondary issues are also being created. Without the means to create and maintain relevant metadata, provide efficient methods to “discover” where data is stored, how to digitally curate it and how to access it in a federated and trusted manner, the data will be, for all intents and purposes, lost beyond an event horizon of inaccessibility. The Australian National Data Service (ANDS) project has been created to mitigate these metadata, discovery, access and digital curation problems by creating various standards in these areas. However ANDS will not provide massive data storage resources itself, leaving this responsibility and burden to research institutions and universities themselves.

To help relieve this burden, the REDDS project will focus on unconventional methods to provide highly fault and disaster tolerant, high performance, secure and cost effective massive data storage. Of prime focus will be the class of Distributed Data Storage systems of which there are many examples in both the open source and commercial worlds. This class of data storage systems has the potential to revolutionize data storage in the Australian Higher Education and Research sector, providing highly redundant and secure data storage at a cost effectiveness that surpasses conventional data storage paradigms. The REDDS project will “blaze the trail” for research institutions and universities wishing to deploy such storage technologies.

About the speaker

Rodney McDuff is Manager of the Strategic Technologies Group with the Information Technology Services at The University of Queensland and works on various projects including the Australian Access Federation project. Rodney is also the Australian liaison to the Internet2 Middleware Architecture Committee for Education (MACE) and a member of the Internet2 Presence and Integrated Communications (PIC) Working Group. Whilst at ITS Rodney has been responsible for the planning, implementation and maintenance of core IT infrastructure services, applications and systems at UQ. He has also previously worked as a Research Fellow at the Advanced Computation Modelling Centre.