View the presentation slides
(1.1 Mb)
Abstract
Current protein-protein interaction data is distributed across a wide range of disparate, large-scale, publicly-available databases and repositories. Semantic Web technologies such as RDF, OWL ontologies and the SPARQL query language appear to provide solutions to the data integration challenge. However existing RDF triple stores suffer from limited scalability and poor querying performance. In this paper we present a novel approach that combines Google’s distributed processing MapReduce architecture with Semantic Web technologies to enable high-speed querying and reasoning across large-scale protein-protein interaction datasets. We describe the system architecture, implementation and the results of performance evaluations based on queries across integrated PPI data specified by molecular biologists.
About the speaker
Andrew Newman is currently working for the University of Queensland’s eResearch centre and recently completed his Honours. His has previously worked on Kowari and continues to actively support the RDF API for Java, JRDF. His current interests include scale-out systems, the Semantic Web, defeasible logic, agile databases, ontology development, and software development methodologies.