Simon Cox: An information model and transfer encoding for observations and sampling


Abstract

Observations are at the core of natural sciences. Their results provide the evidence that forms the basis for the description of features, from which interpretations and inferences are made, and underlie the development of models that describe the processes that govern the natural universe. Management and exchange of observation data is thus central to eResearch.

Many investigations in natural environment and resources require access to observational data originally collected and maintained for disparate purposes. For example, water resources investigations will commonly utilize observations from hydrology, meteorology and climate science, geology, geophysics, ecology etc. However, the different disciplines use differing practices and assumptions in data collection, and typically encode their data using different technologies or file formats. Even if data serialization is nominally standardized (e.g. Excel spreadsheets; Access database; SQL dump) column order, labels or field names are unlikely to be shared.

We have developed a generic information model for observations and sampling. The aim is to define a number of terms used for measurements, and the relationships between them. These include observation, measurement, result, procedure, feature of interest, observed property, property type, coverage and related terms. The model has been tested against applications in a variety of earth and environmental sciences, and appears to be capable of capturing all the necessary information, when combined with domain-based feature-types and specialized observation procedures where necessary.

In many practical cases, observations are not made on the feature of ultimate interest to an investigation, because:

a. The feature is inaccessible (e.g. concealed, or too large for exhaustive observation)
  •  This introduces the concept of sampling, whereby observations are made on a subset of the complete feature, with the intention that the sample represents the whole
b. The properties are not directly observable (e.g. the feature is remote, or for other reasons does not provides a direct physical signal)
  • However, there are sensible properties that may be combined and/or further processed to obtain an estimate of the property of interest
Both of these challenges may be overcome by using a proximate "sampling feature" for initial observations. The sampling feature is accessible and has properties that are sensible. Similar kinds of sampling features have been used to investigate spatial features across various application domains. They may be organized on the basis of the dimensionality of their shape: a SamplingPoint samples its target at a point (0-manifold); a SamplingCurve along a curve (1-manifold); a SamplingSurface on a surface (2-manifold) and a SamplingSolid in an enclosed region (3-manifold). Other common patterns include the description of a specimen, processing chains, observed properties and sub-sampling.

By exploiting these common patterns many components of information models are reuseable and repeatable, thus making seamless integration of disparate data sets much easier. It will also encourage the development of standard interfaces for observational data, and common processing and visualization systems.

The model has been formalized in UML and XML Schema, and published as an Open Geospatial Consortium standard. It leverages standard components for features and geometry from other OGC standards.

About the speaker

Simon Cox obtained a PhD in Geophysics from Columbia University, New York in 1987. He joined CSIRO Geomechanics to continue work on experimental rock mechanics, spent a few years lecturing at Monash University, then moved to the west to join CSIRO Exploration & Mining, initially working with the AGCRC. He now specializes in information modeling and transfer formats, and is editor or co-editor of a number of international standards issued through Open Geospatial Consortium and ISO. In 2006 OGC awarded Simon the Kenneth D.Gardels award for sustained contributions and leadership