View the presentation slides
(2 Mb)
Abstract
The distributed relational database architecture currently utilized by BioGrid enables authorized researchers to access clinical data from various institutions for research. The inherent tight coupling between the data sources and analysis systems as well as centralized knowledge of the structure of data for each source present operational and administrative challenges when scaling the system to large numbers of institutions. We will present an alternate architecture and demonstrator system that is simpler, secure, resilient to change, and scalable. This service-oriented approach allows a loosely coupled federation of service and data providers that have no knowledge of, or dependence on, the structure of information in any other provider. Data is exchanged using XML and the system is adaptive to changes in the structure of that data. Control of the data, its availability and form remains entirely within the jurisdiction of the provider and the form of the data returned may change depending on the identity and authority of the person making the request. This service-oriented approach will allow BioGrid:
- To significantly increase the number of data sources and researchers with the minimum administrative overhead.
- Researchers to utilize standard web-based interfaces from their desktop to securely locate and access data holdings.
- Researchers to use a common ontology regardless of the ontology of the data in the source repositories.
- To support multiple independent research organizations using the same data sources.
- Each data provider complete control over every fragment of data that leaves their institution. The form and amount of the data may change as a function of the authority of the person making a request.
- To alter the amount and structure of data available at each source without requiring changes to the rest of the federation.
- To utilize a broad range of analytical processing tools and services.
Whilst developed for BioGrid, the architecture presented will be of interest to any research dealing with distributed heterogeneous data.
About the speakers
Jason Lohrey is the CTO of Arcitecta and the conceiver and architect of Mediaflux™. Jason has a degree in Physics and Computer Science augmented with Fine Arts from the University of Tasmania and has worked in the IT industry for approximately 16 years. His background includes industrial, commercial, scientific, and creative applications for computing. For most of his working career, Jason has focused on research, design and development of digital asset management and database systems with companies including Kodak (with the Academy award winning non-linear editing and compositing system, Cineon), Discreet Logic and Silicon Graphics. Five years ago, while in residence at painter Arthur Boyd’s property at Bundanon, New South Wales, he penned the first lines of software for Mediaflux™.
Steve Melnikoff is the deputy director for VeRSI and senior lecturer at the University of Melbourne. He is a collaborator with Ass/Prof Gary Egan and Dr Neil Killeen at Melbourne University’s Centre for NeuroScience (in conjunction with the Florey Neuroscience Institutes) in developing informatics tools. His past work includes developing Grid computing infrastructure, ‘struts’ based interfaces for high-energy physics data analysis, and one of the earliest GridSphere portal applications for astrophysics simulations. Currently Steve Melnikoff and his colleagues are in the process of establishing a ‘Research Data Capabilites Group’ at Melbourne University to support Australian scientists with lead eResearch technologies.