Kerry Taylor: Data Ingestion for Water e-Research


Extended abstract PDF

Authors

Kerry Taylor, Yanfeng Shu, and Li LI (CSIRO ICT Centre)

Abstract

The Bureau of Meteorology is required to collect, hold, manage, interpret and disseminate Australia's water information as specified under the Commonwealth Water Act 2007. This information is necessary for policy development, planning and enforcement to manage our scarce water resources, and will provide an excellent platform for water e-research. The Act allows the Bureau to collect data from over 240 independent organisations across Australia as listed in the Water Regulations 2008. While CSIRO and the Bureau are developing a uniform water data transfer format (WDTF), there is an urgent need for robust and evolvable data ingestion tools to deal with the diversity of data models, differing vocabularies and information systems currently in use by these agencies.

We propose an approach that makes elaborate use of rich semantic representations, embedded in a reusable integration tool for dataintensive e-research applications. First we develop a SKOS-based thesaurus of terms collected from expert documents on hydrology, water resources, and geography. Then, we develop an OWL 2.0 conceptual model to bridge semantic gaps between data from organisations, WDTF and the Regulations. Next we use a suite of ontology-enabled matching and mapping techniques to derive executable mappings between the original data files and the ontology model. Finally the mappings are embedded in a general purpose composition tool (CSIRO’s Semantic Service Architecture) and this enables automatic production of valid WDTF documents from the original files.

We propose that this technology, making extensive use of domain knowledge in the form of declarative semantics, is widely applicable to large data ingestion problems in e-research where large numbers of files and repeatable, transparent ingestion is required.

About the speaker

Kerry TaylorKerry Taylor works in the Information Engineering Laboratory of the CSIRO ICT Centre. Her research work has focused around developing innovative information technologies to underpin and apply scientific research in natural resource management: including water, biodiversity, agriculture, marine, and biofuels, as well as human population health. In 2009 Kerry is co-chairing the Australasian Ontology Workshop at the Australasian AI conference in Melbourne and the Semantic Sensor Networks Workshop at the International Semantic Web Conference in Washington DC. She also co-chairs the W3C incubator group that is developing standards for semantic sensor networks (SSN-XG). Kerry holds an Honours degree in computer science from UNSW and a PhD in computer science from ANU.