View the presentation slides
(1.8 Mb)
Abstract
Bioinformaticians have a large range of tools and services at their disposal, each of which may be useful alone, but more powerful when integrated with others to perform a more complex task. Routine workflows are a critical aspect of modern science, but the setup may act as a barrier to free exploration. Web mash-ups provide one solution to this problem, rapidly combining data and services from multiple sources into a single facility through a web based application.
Mashups are growing in popularity in the life sciences, but like their mainstream equivalents, these are generally written from scratch, and independently hosted. Mashups are small, focused applications with limited scope, their appeal dependent on a combination of services critical to the solution of a given problem. Typically these types of applications are implemented as desktop programs frequently used only by the program author. This specificity of bioinformatic applications and the absence of suitable component libraries may act as barriers to mashup sharing and their broader adoption. Moreover, many active biologists lack the programming skills necessary to integrate novel mashups, and greater support is necessary.
Recently, a number of sophisticated frameworks for developing and hosting mashups have emerged, each markedly simplifying the mashup creation and sharing process. We believe that these frameworks may in time address some of these concerns, making novel mashups accessible to less technical users.
To create a mashup, the author creates a wrapper around the separate tools and services to be composed. The wrappers have interfaces known to the author, and so the wrapped tools and services may interact with each other. Frameworks promote a standard means of building these wrappers, supporting a common interface between different service components. Over time, adoption of suitable frameworks within the scientific community is likely to lead to a community based component library supporting complex bioinformatic exploration. However, this adoption is dependent on a broader awareness of the capability of the mashup approach, and perhaps upon a number of groups seeding the environment with components of sufficient utility.
In this presentation, we consider these issues in some detail and offer an example of the emerging mashup capability using the Popfly environment from Microsoft. Three demonstration mashups are provided. The first simply maps Uniprot Protein Accession Ids to their GenBank equivalents. The second mashup retrieves articles from the Pubmed database that reference a GenBank Accession Id given by the user. This mashup is immediately useful to a researcher who has a specific gene of interest and wants to find all known information about the gene, and as a component of a more elaborate annotation mashup. The final demo is based on a fairly elaborate undergraduate teaching exercise from QUT, in which the student is given a protein coding nucleotide sequence, and is asked successively to determine the gene family and to perform an NCBI Entrez search to obtain more information about the gene.
We conclude by exploring strategies for enhancing uptake and sharing across the community.
About the speaker
James M. Hogan is an Associate Professor at QUT and a project leader within the Microsoft QUT eResearch Centre, where he heads the Smart Tools for Bioinformatics initiative. Most of his research efforts are focused on problems related to gene regulation, and on novel machine learning methods and software tools to allow discovery and management of promoter and transcription factor binding sites, and to understand the relationships between them. Among other achievements, this group has produced the state of the art methods for promoter prediction in bacteria, the SilverGene genome browser, the BioPatML pattern description language and tools, and more recently a series of mashup components supporting bioinformatic exploration. See www.mquter.qut.edu.au for details, demos and downloads.