Abstract
Observations are at the core of natural sciences. Their results provide the evidence that forms the basis for the description of features, from which interpretations and inferences are made, and underlie the development of models that describe the processes that govern the natural universe. Management and exchange of observation data is thus central to eResearch.Many investigations in natural environment and resources require access to observational data originally collected and maintained for disparate purposes. For example, water resources investigations will commonly utilize observations from hydrology, meteorology and climate science, geology, geophysics, ecology etc. However, the different disciplines use differing practices and assumptions in data collection, and typically encode their data using different technologies or file formats. Even if data serialization is nominally standardized (e.g. Excel spreadsheets; Access database; SQL dump) column order, labels or field names are unlikely to be shared.
We have developed a generic information model for observations and sampling. The aim is to define a number of terms used for measurements, and the relationships between them. These include observation, measurement, result, procedure, feature of interest, observed property, property type, coverage and related terms. The model has been tested against applications in a variety of earth and environmental sciences, and appears to be capable of capturing all the necessary information, when combined with domain-based feature-types and specialized observation procedures where necessary.
In many practical cases, observations are not made on the feature of ultimate interest to an investigation, because:
a. The feature is inaccessible (e.g. concealed, or too large for exhaustive observation)
- This introduces the concept of sampling, whereby observations are made on a subset of the complete feature, with the intention that the sample represents the whole
- However, there are sensible properties that may be combined and/or further processed to obtain an estimate of the property of interest
By exploiting these common patterns many components of information models are reuseable and repeatable, thus making seamless integration of disparate data sets much easier. It will also encourage the development of standard interfaces for observational data, and common processing and visualization systems.
The model has been formalized in UML and XML Schema, and published as an Open Geospatial Consortium standard. It leverages standard components for features and geometry from other OGC standards.