MARS Data Integration and Support : ECMWF’s data archive as an OGC Service

The Meteorological Archival and Retrieval System (MARS) is the main repository of meteorological data at ECMWF. MARS hosts operational and research data, as well as data from special projects. The archive holds an excess of 200 Petabytes of data, mainly using the GRIB format for meteorological fields and BUFR for meteorological observations. Most of the data produced at ECMWF daily is archived in MARS, and therefore available to users via its services.
The MARS archive integration aimed to demonstrate the capability of Earthserver 2 provided technologies, to bring content stored in the MARS archive to its users via the WCS and WCPS OGC standards. Due to their large size it is practically infeasible to ingest (i.e. copy) these datasets from the archive into an WCS system. Thus, a different approach was needed. The solution would have to centre around the extraction of just a subset of data required by a WCS/WCPS operation.
Three partners, ECMWF, CITE and Rasdaman have tightly collaborated, delivering an innovative solution that tackles the challenges of the integration, opening the possibility for the exposure of this large data set to its audience, via well known OGC standards.

Technical Overview

The overall architecture of the solution is depicted in the diagram following diagram.

Key elements

The key elements of the integration are:

  • MARS archive: the ECMWF system managing the access to the meteorological data, dealing with all aspects of data handling, such as  storage, management, access
  • rasdaman: the array database engine that supports all relevant OGC standards
  • FeMME: the metadata management engine that allows retrieval of coverages according to their metadata descriptors.
  • xWCPS engine: the xWCPS language parser and executor that acts on top of FeMME and rasdaman.

Stages

The stages and concepts behind the integration and operation of the solution are as follows:

Registration: It is the first stage of the process, when MARS “datasets” are published into the FEMME/rasdaman system. This, so called "Coverage registration" may be performed by system analysts, using tools developed specifically with the aim to automate the registration of multiple coverages.

Request:  The initiation of the servicing stage starts when a user submits a WCS/WCPS query to the system. The request is identical to a WCS request, yet fired against a different endpoint that handles the MARS/rasdaman engines. Two different request endpoints are supported, synchronous and asynchronous ones.

Process: At this stage, the service endpoint fully parses all relevant WCS/WCPS requests to comprehend the “subsetting” expressions, so that it can form sufficiently restricted queries that will be issued to the MARS endpoint. When it receives the MARS results (in an asynchronous manner) it ingests the retrieved GRIB file in rasdaman. At the end of this, second operation, a new query is formed towards rasdaman to produce the results to be returned to the caller. In case of an asynchronous request the response is cached locally, so it can be returned to the initial caller upon demand and a response-polling endpoint is returned instead of the data itself. Otherwise the response directly follows the processing stage, containing the result payload.

Response: unless delivered via a synchronous operation, there is a discrete response stage, when the caller utilizes the system services to retrieve query results / data of a prior operation.