Robert Bell, Jeroen van den Muyzenberg and Peter Edwards:Your data, our responsibility


Extended abstract PDF

Authors

Robert Bell, Jeroen van den Muyzenberg and Peter Edwards (CSIRO)

Abstract

High Performance Computing (HPC) centres become holders of high volumes of data. The facilities they provide and the policies that they apply to users' data affect the productivity of the users and their perception of the service. Users would like unlimited storage, and with high performance, visible from everywhere, secure from loss at zero cost. Centres have to balance factors such as capacity and performance, and make backups of files to provide some protection against loss. This paper will outline some issues with data storage at HPC centres. It then addresses the need for backup and various techniques used. It will finish by showing how the rsync command can be used to provide full backups on every cycle at the cost of incremental backups, and how the Tower of Hanoi scheme can be used to automatically manage backup sets at appropriate intervals back in time. The techniques together provide a reduction in the storage needed for backups by factors of about five over conventional schemes.