What to Find Here

EarthServer is fully committed to standards, most importantly the datacube standards of OGC,ISO, and European INSPIRE, centering around coverages. This tutorial gives a one-stop shop of information about datacube standards: a primer, interactive demos, slides, standards, publications...you name it!

Actionable datacubes provide a powerful, yet simplifying paradigm for getting insight from massive spatio-temporal data. This concept, which generalizes the concept of seamless maps from 2-D to n-D, is based on preprocessing incoming data so as to integrate all data form one sensor into one logical array, say 3-D x/y/t for image timeseries or 4-D x/y/z/t for weather forecasts. This enables spatial analysis (both horizontally and vertically) and multi-temporal analysis simultaneously. Adequate service interfaces enable “shipping code to the data” to avoid excessive data transport. In standardization, datacubes belong to the category of coverages as established by ISO and OGC. Datacube services based on these standards are in operational use since many years now, serving multi-Petabyte assets. Below is a kaleidoscope of coverage-enabled services, with their individual portals and clients supported.

However, despite the accepted advantages many questions reach us on datacubes - not surprisingly, as this concept is new (and not yet taught comprehensively at universities). This EarthServer tutorial provides answers and links to get you from zero (or above) ot hero!

Coverages, explained:

Still have questions?



Like it? Cite it! If you find this tutorial useful, please cite it! The material is based on (and extended from):
P. Baumann, D. Misev, V. Merticariu, B. Pham Huu: Datacubes: Towards Space/Time Analysis-Ready Data. In: J. Doellner, M. Jobst, P. Schmitz (eds.): Service Oriented Mapping - Changing Paradigm in Map Production and Geoinformation Management, Springer Lecture Notes in Geoinformation and Cartography, 2018

In a Nutshell:
What is a Coverage, and what can I do with it?

Coverage? Huh?

Coverages are the worldwide accepted concept for representing raster data and datacubes, such as: 1-D sensor timeseries, 2-D images and DEMs, 3-D x/y/t image timeseries and x/y/z geophysics data, 4-D x/y/z/t atmospheric data, etc. Generally, coverages represent regular and irregular grids, point clouds and general meshes, although today raster data and datacubes prevail.

The Coverage Data Model

A coverage consists of

  • a domain set ("where are the data located?"),
  • a range set (the pixel payload),
  • a range type ("what is the semantics of these data?)", and
  • optionally arbitrary metadata ("what else should I know about these data?").

OGC, ISO, and EU INSPIRE agree on a common interoperable Coverage Implementation Schema which comes with broad format encoding support, such as XML, JSON, RDF, GeoTIFF, NetCDF, etc. [more on coverage data]

The Coverage Service Model

The OGC Web Coverage Service (WCS) standards suite esablishes a structured, modular collection of increasingly powerful services. WCS Core defines basic functionality: coverage subsetting and format encoding. Any implementation claiming to support WCS must support this Core functionality (which is not hard, it's tentatively kept simple) and may additionally support any of the WCS extensions defined, such as the Earth datacube analytics and fusion language, Web Coverage Processing Service (WCPS). [more on coverage services]

The OGC/ISO Coverage Standards Ecosystem

The authoritative source for coverage data is OGC CIS which is also ISO standard (ISO 19123-2:2018). OGC WCS, together with its datacube analytics language WCPS, is the accepted way to offer flexible, easy-to-use services on spatio-temporal coverages, in particular: datacubes.

OGC provides a rigorous conformance test suite validating implementations down to pixel level. Any implementation can test itself by downloading and executing this free and open test suite.

Intro video:

Curious for More?

...read on below, get "from zero to hero"!


Coverage Data:
OGC/ISO Coverage Implementation Schema (CIS)

This section answers the key question: How can a coverage represent a datacube?

Abstractly, a coverage is a function mapping location (i.e., coordinates) to data values - in plain words: a coverage offers some value (such as a color pixel) for each of the coordinates it covers. These coordinates are called direct positions - only at those direct positions the coverage stores a value; via interpolation, values can also be generated for coordinates between direct positions - more about that later. To represent this, the coverage data structure technically consists of four main components (plus some details which we ignore at this level of detail):

  • domain set: where can I find values?
  • range set: the values.
  • range type: what do the values mean?
  • metadata: what else should I know about these data??

The UML diagram illustrates this structure; it shows the four components, of which metadata are optional, as well as the inheritance from the basic geo objecttype, Feature. Quit simple, isn't it?

Let us inspect these components in turn.

Domain set. The domain set consists of direct positions where values are located. As we focus on raster data / datacubes we only consider gid coverages where the domain set forms a (regular or irregular) grid. Such a grid, as well as its grid coordinates, can be of any number of dimensions (better said: axes), made up from spatial, temporal, and other axes such as spectral frequencies, for example. The underlying grid space of a domain set is defined by its corresponding Coordinate Reference System (CRS), more about that later.

In gridded coverages, which is what we focus on with raster data and datacubes, coordinates are aligned on some grid, obviously. Still, there is an amazing variety among possible grid types: Cartesian or geo-referenced, space or time or something else, regular or irregular, etc. With growing complexity the description of a grid, as part of the domain set, grows, and likewise does the size of the corresponding domain set. The simplest case is a regular Cartesian or geo-referenced axis; in this case, we simply need to store lower and upper bound as well as resolution. In case of an irregular axis this is more involved: all the individual grid points on the axis between its lower and upper bound need to be stored explicitly. More complex grids require even more involved representations. Below is an example for a grid, in XML representation, that involves both regular axes (Lat and Long) and an irregular one (time). Note the definition of Lat and Long as regular axis with a given resolution, as opposed to the explicit enumeration of the time steps in the date axis. The underlying (Cartesian) grid, modelling the array data structure, is given by the GridLimits - but that is just a technical detail.

<DomainSet>
    <GeneralGrid srsName="http://www.opengis.net/def/crs-compound?1=http://www.opengis.net/def/crs/EPSG/0/4326&2=http://www.opengis.net/def/crs/OGC/0/AnsiDate"
        axisLabels="Lat Long date" uomLabels="deg deg d">
        <RegularAxis   axisLabel="Lat"  uomLabel="deg" lowerBound="40"  upperBound="60" resolution="10"/>
        <RegularAxis   axisLabel="Long" uomLabel="deg" lowerBound="-10" upperBound="10"  resolution="10"/>
        <IrregularAxis axisLabel="date" uomLabel="d">
            <C>2015-12-01</C>
            <C>2015-12-02</C>
            <C>2016-00-12</C>
            <C>2016-05-01</C>
        </IrregularAxis>
        <GridLimits srsName="http://www.opengis.net/def/crs/OGC/0/Index3D" axisLabels="i j k">
            <IndexAxis axisLabel="i" lowerBound="0" upperBound="2"/>
            <IndexAxis axisLabel="j" lowerBound="0" upperBound="2"/>
            <IndexAxis axisLabel="k" lowerBound="0" upperBound="2"/>
        </GridLimits>
    </GeneralGrid>
</DomainSet>

Range values. For storage, these values will need to be linearized following one of many possible schemes, but this is an implementation detail of the particular format chosen and does not affect the fact that coordinates are determined by the coverage axes. There is also the question on how the direct positions of the domain set are connected to their respective values. Actually, there are several ways of achieving this:

  • The domain set, together with a sequentialization rule which we ignore here, indicates a sequence of direct positions; the sequence in the range set follows this.
  • Domain and range set are stored interleaved, as a sequence of coordinate/value pairs so that the correspondence is clear.
  • Domain and range set are tiled, that is: partitioned into smaller parts. Inside each such tile any of the above techniques can be used.
  • Sometimes the domain set is not explicitly available, but just some information to derive the underlying grid. A typical case is a sensor model which stores Ground Control Points out of which the sensor model generates the grid coordinates for the range values.

Range type. A coverage’s range type captures the semantics of the range set values; its definition is based on SWE (Sensor Web Enablement) Common so that sensor data can be transformed into coverages without information loss, thereby enabling seamless service chains from upstream data acquisition (e.g., through OGC SOS) to downstream analysis-ready user services (such as OGC WMS, WCS, and WCPS). Notably, the range type can go far beyond just a datatype indicator (such as integer versus float); unit of measure, accuracy, nil values, and the semantics (by way of a URL reference), and more information can be provided with a range type, thereby accurately describing the meaning of the values. The following is an example range type definition for panchromatic optical data, encoded in GML:

<swe:field name="panchromatic">
    <swe:Quantity definition="http://opengis.net/def/property/OGC/0/Radiance">
        <swe:description>panchromatic sensor</swe:description>
        <swe:NilValues>
            <swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/AboveDetectionRange">255</swe:nilValue>
        </swe:NilValues>
        <swe:uom code="W.m-2.sr-1.nm-1"/>
    </swe:Quantity>
</swe:field>

Metadata. This optional part is left unspecified in the standard, it can contain any number of anything, literally (xs:any, for the XML experts). In addition to domain set and range type, the mandatory technical metadata of a coverage, these are completely application dependent. Of course, the coverage cannot understand them, but they will duly be transported so that the connection between data and metadata is preserved. One example for such metadata is given by the European INSPIRE legal framework for a common Spatial Data Infrastructure. INSPIRE prescribes canonical metadata for each object following a specific schema. Coverage metadata are the right place to keep those. This demonstration showcases use of INSPIRE metadata. Note the "any number": different applications may add their own metadata, and each application in practice would only look at those metadata it understands.

Here is an example where three different, independent metadata records coexist in the coverage; note how they all bring along their own schema:

<GeneralGridCoverage xmlns="http://www.opengis.net/cis/1.1/gml" ...>

    <DomainSet> ...  </DomainSet>
    <RangeSet> ...  </RangeSet>
    <RangeType> ...  </RangeType>
    <Metadata>
        <el-covmd:ElevationGridCoverageMetadata xmlns:el-covmd="http://inspire.ec.europa.eu/schemas/el-covmd/4.0" ...>
            ...
        </el-covmd:ElevationGridCoverageMetadata>
        <card4l:Card4lMetadata xmlns:card4l="..." ...>
            ...
        </card4l:Card4lMetadata>
        <special:MySpecialMetadata xmlns:special="..." ...>
            ...
        </special:MySpecialMetadata>

    </Metadata>

</GeneralGridCoverage> 

Coordinates and Grids

Let's come back to CRSs and the cumbersome srsName attribute we skipped explaining in the above domain set example. Science tells us that a coordinate is meaningless as long as there is no indication about the reference system in which it is expressed. A value of 42 - is that degrees (referring to what datum?), meters, years since epoch, or million years backwards? All of that information is provided by the CRS. As per OGC Naming Authority decision, CRSs shall be expressed in URLs, and that is what we find in the srsName attribute (which, BTW, expands to spatial reference system name - some legacy terminology stemming from GML). These URLs you can indeed resolve by simply following http://www.opengis.net/def/crs/EPSG/0/4326 or http://www.opengis.net/def/crs/OGC/0/AnsiDate; a so-called CRS resolver service, running an open-source implementation by Jacobs University, does that trick. Being annoyed about the complexity, CIS 1.1 allows any commonly accepted notation, including notations like EPSG:4326. So far, so good - but we need multi-dimensional CRSs. The EPSG catalog is large, but preparing all possible axis combinations is just unfeasible. Therefore, and foillowing ISO 19111-2, CRS and axis composition is provided where the base URL ends with crs-compound, followed by an ordered list of component CRSs and axis. And that is exactly what our funny srsName does - slightly reformatted it becomes clear:

http://www.opengis.net/def/crs-compound?
    1=http://www.opengis.net/def/crs/EPSG/0/4326
  & 2=http://www.opengis.net/def/crs/OGC/0/AnsiDate

This complets the information necessary to understand a domain set. But here is some more service: it would slow down servicees considerable if, for each coverage decoding, it first needs to retrieve the CRS definition. Actually, not all of it is needed anyway - most important are the axis names and units of measure. The axisLabels="Lat Long date" and uomLabels="deg deg d" attributes provide this excerpt directly, for the tool's convenience. Addiionally, the axis labels define the sequence of axes; this axis sequence ambiguity actually is a problem recurring in GeoJSON and other OGC services where it is just implicitly assumed.

Format encodings

The coverage structure is represented 1:1 in the XML, JSON, and RDF encodings coming with CIS 1.1. However, while these encodings are "informationally complete" in that they carry all of a coverage's information they are not always inefficient. For 1-D diagrams formats like JSON are ideal because the file will be small anyway, and can be consumed conveniently by some JavaScript or TypeScript client and its diagramming libraries. A satellite scene, on the other hand, nobody would want to store and exchange in one monolithic XML file.

Therefore, binary formats are supported for coverage encoding in addition. A lineup of widely used binary formats is available in CIS extensions, including GeoTIFF, NetCDF, GRIB2, JPEG2000, and several more. Of course, it is well known that these formats have their individual ways of representing "metadata" that could carry domain set, range type, and coverage metadata. Actually, often one or the other component cannot be transported at all - think of JPEG, for example. Well, often this is ok, and if a user explicitly requests such a format they will know hwat they do. Citing the JPEG example again, if the coverage should just be portrayed on a screen it is perfectly acceptable to miss part of the data.

But as always in life, we want it all: informationally complete and, at the same time, efficient. For this purpose a container concept has been introduced with CIS 1.0 and extended with CIS 1.1. A coverage container consists of a "shell" that may use any suitable format that can hold files inside, such as Multipart/MIME, zip, SAFE, etc. Inside there is a canonical header in some information-lossless format, like the abovementioned XML, JSON, or RDF, followed by one or more files which typically would contain some efficient binary encoding of the range set (but could also be metadata or domain set, for that matter). The header holds the coverage data in general, but for the "outsourced" parts instead of the data stores a refernce to the file(s) coming. By the way, this mechanism can also be used to encode tiled coverages.

In summary, the common, unified structure of a coverage can be encoded in a variety of alternativees, with individual pros and cons, with options for all situations arising.

Abstract vs.Concrete Specifications

The coverage standards of OGC and ISO – which are kept identical – divide coverage definition into an abstract and a “concrete” level. On abstract level we find ISO 19123, which is identical to OGC Abstract Topic 6, providing coverage concepts and definitions in a very general way. This definition still allows manifold implementations which are not necessarily interoperable. This fact is underpinned by the observation that such services regularly come with their own clients, which are not interchangeable across different server implementations. Therefore, ISO and OGC provide a concretization of coverages, based on these abstract standards but establishing interoperability; again, specifications ISO 19123-2 and OGC Coverage Implementation Schema (CIS) are identical. This concretization gets manifest in OGC conformance tests which assess any Web Coverage Service (WCS) and the coverages it delivers down to the level of single pixels; hence, CIS/WCS are interoperable and can be used in any combination of clients and servers.

Standardization Status

In OGC, the originally named "GML 3.2.1 Application Schema - Coverages" (GMLCOV) was renamed to "Coverage Implementation Schema" for clarification: it is not (just) a GML encoding, but also a general, encoding-independent data structure. OGC CIS 1.0 was adopted by ISO as 19123-2. Subsequently, CIS 1.1 was adopted by OGC as a backwards compatible extension of CIS 1.0, introducing the more powerful and simpler General Grid Coverage.

Note: OGC/ISO CIS is the accepted global standard in industry, science, and government. Further definitions, such as by W3C or definitions solely relying on the abstract model of ISO 19123, are not authoritative, not interoperability-proven, not rigorously implementation-tested, and not supported by the multitude of implementations supporting it. While the W3C model does not come with any suitable service concept, OGC/ISO coverages can be served via OGC WCS/WCPS (which has the most dedicated coverage functionality), but also via OGC WMS, WPS, and SOS, among others. Also OAPI-Coverages is based on CIS 1.1.
Further reading:

Coverage Service:
OGC Web Coverage Service (WCS)

This section answers the key question: How can coverage datacubes be served through WCS?

While coverages can be served through a variety of interfaces - including WFS, WMS, and WPS - only WebCoverage Service (WCS) offers comprehensive, tailored functionality. Similar to the modularly organized CIS, the WCS suite also has a mandatory Core around which a series of optionally implementable extensions gather. This structure is tentqtive so as to have the lowest possible effort and skill level for WCS implementers while, at the same time, allow advanced implementers building compliant, but more elaborate WCS tools.

WCS Core

WCS Core offers very simple, focused coverage access and extraction functionality:

  • subsetting which is a summary term for two types of data extraction (which can well be combined, as we will show soon):
    • trimming: "give me a cutout along an axis, defined by the lower and upper bound provided"
    • slicing: give me a cutout at the slice point indicated, thereby extracting a result with reduced dimension as compared to the original coverage"
  • format encoding: "deliver the result in my favourite format"

Technically, WCS Core offers three request types:

  • GetCapabilities: what data does this service offer, and what are its capabilities?
  • DescribeCoverage: tell me details about this coverage!
  • GetCoverage: give me that coverage (or part of it), and in my favourite format!

We focus on GetCoverage as the central workhorse of WCS and explain it by example. The following request, using GET/KVP encoding, performs trimming in space and slicing in time, effectively extracting a 2D image from a 3D timeseries datacube, delivered as GeoTIFF (we use this WCS demo service):

https://ows.rasdaman.org/rasdaman/ows ? SERVICE=WCS & VERSION=2.0.1
    & REQUEST=GetCoverage
    & COVERAGEID=AvgTemperatureColorScaled
    & SUBSET=Lat(-30,70) & SUBSET=Long(-140,140) & SUBSET=ansi(%222000-02-01T00:00:00.000Z%22)
    & FORMAT=image/jpeg

We can see the SUBSET parameters, one for each axis (yes, unix is a crazy name for a time axis, agreed). In this particular example we might obtain a time slice at a particular point in time and for some particular geographic region from an x/y/t datacube for display on screen.

Note: Actually, only subsetting axes need to be indicated; along all axes not mentioned the full extent of the coverage will be delivered.

The encoding format in which a coverage is stored in the server is called the coverage's Native Format; without any FORMAT parameter in a request the coverage will be delivered in this Native Format (also in case of subsetting). The CRS in which the coverage’s domain set coordinates are expressed is referred to as its Native CRS. A WCS Core will always interpret subsetting coordinates and deliver coverage results in the Native CRS of the coverage addressed.

A central promise of WCS Core is that coverage values will be delivered guaranteed unchanged – no rounding, no resampling etc. may take place. Of course, this may change if some lossy format is used (such as JPEG with a quality factor less than 100%). Further, WCS extensions may change this behavior as we will see below.

The WCS Universe

Both CIS and WCS are organised in a modular way:

  • CIS has a core (the basic model) and several extensions, including encodings for XML, JSON, RDF, GeoTIFF, JPEG2000, JMLJP2, NetCDF, GRIB2, and more.
  • WCS has a core and several extensions, one set for functionality and another set for protocol bindings (i.e., the way requests, exceptions, etc. are dealt with).

A synoptic view of the OGC coverage data and service model is presented below. Now I hear you ask: And how do I find out which extension the particular service supports? The answer is: The list of extensions as well as data formats supported, CRSs supported, and so on is listed in the Capabilities document of a service, to be obtained via a GetCapabilities request.

The following diagram shows the architecture of the complete WCS suite with its components and their relationships:

WCS extensions can be classified into functionality extensions and protocol bindings. The latter allow expressing any request of Core and extensions in one of GET/KVP, POST/XML, SOAP, and (still under discussion) OpenAPI.

WCS Functionality Extensions

The functionality-enhancing extensions provide a variety of useful extras over WCS Core; we look at each in turn.

  • WCS Range Subsetting extends the GetCoverage request with an extra, optional parameter to select range components (“bands”, “variables”). With RANGESUBSET a selection of bands can be extracted by band name, allowing to select and also reorder. For example, the following request extracts a false-color image from some hyperspectral scene:
     http://www.acme.com/wcs ? SERVICE=WCS & VERSION=2.1
        & REQUEST=GetCoverage & COVERAGEID=c001
        & RANGESUBSET=nir,red,green

    Generally, one or more bands can be extracted and brought into any sequence desired. For large amounts of bands, intervals can be specified, such as red:blue, based on the order defined in the range type.

  • WCS Scaling allows changing resolution of a gridded data set; to this end, an optional SCALING parameter is added to the GetCoverage request. The interpolation method applied is implementation dependent unless WCS Interpolation is implemented in the server (see below). Obviously, in presence of scaling it cannot be guaranteed any longer that range values remain unchanged.
  • WCS CRS supports reprojection of a coverage from its Native CRS into another one. With the same mechanism as before, an optional parameter OUTPUTCRS lets users indicate the target CRS in which the coverage should be delivered. Additionally, the CRS of a subsetting bbox given by one or more SUBSET parameters can be changed from the coverage's Native CRS through the optional request parameter SUBSETTINGCRS. Following OGC convention, CRSs are expressed as URLs, such as in the following example:
     http://www.acme.com/ows?SERVICE=WCS & VERSION=2.1
        & REQUEST=GetCoverage & COVERAGEID=c001
        & OUTPUTCRS=http://www.opengis.net/def/crs/EPSG/0/4326

    The list of CRSs supported by a server are listed in its capabilities document which can be retrieved through a GetCapabilities request.

  • WCS Interpolation enables clients to request a particular implementation method whenever interpolation is applied (currently: scaling or CRS reprojection). The interpolation methods supported by a server are listed in its capabilities document which can be retrieved through a GetCapabilities request. Such methods might include nearest-neighbor, linear, or quadratic.
  • WCS Processing performs server-side processing, filtering, analytics, and fusion. To this end, introduces a new request type, ProcessCoverages. It has a single parameter, query, which is supposed to contain a WCPS query string. See the WCPS section for details.
  • WCS Transaction (or short: WCS-T) allows modifying a server’s coverage offering. To this end, three new request types are introduced: InsertCoverage creates a new coverage in the server; DeleteCoverage removes one or more, and UpdateCoverage performs a partial replacement within an existing coverage. For example, the following request generates a new coverage in the server by reading the coverage from some other server via the URL indicated:
    http://www.acme.com/wcs
        ? SERVICE=WCS & VERSION=2.1
        & REQUEST=InsertCoverage
        & COVERAGEREF=http://bcme.com/archive/hurricane.nc
  • WCS WQS (Web Query Service) stands out in that it offers XPath-based selection of items from the WCS service and coverage metadata. In future, it is planned to merge this with WCPS.
Further reading:

WCS Protocol Extensions

Since its inception, WCS offers several different protocol bindings, i.e., languages between client and server. This does not address coverage encodings - they are just the optic of communication - but rather the way requests and responses are formatted.

  • GET/KVP uses http GET requests and responses, with functionality (REQUEST=...) and parameters (SUBSET=...) encoded in the query part of the URL. This allows for a convenient single-line command that any browser can send, but in is restricted in its length by built-in restrictions of Web client and server libraries.
  • XML/POST uses http POST requests instead, with commands and parameters transmitted through an XML structure specified in the standards. This is convenient in particular when large data have to be transmitted (such as in a WCS-T request), but cannot be copy-pasted into a browser.
  • SOAP uses the SOAP standard to exchange requests and responses. AS with POST/XML, the payload is encoded in XML.

OGC OAPI-Coverages is a draft-stage effort in OGC to unify feature, coverage, map, and processing based on OpenAPI concepts. From a WCS perspective, this is an evolution, not a disruptive replacement, that constitutes an additional way of expressing requests to the server and handling responses. Most notably, OAPI-Coverages is based on the Coverage Implementation Schema (CIS) 1.1 just as WCS 2.1 is. Currently, OAPI-Coverages functionality is a small subset of WCS (more or less: WCS-Core).

WCS Conformance Testing

For compliance testing, OGC provides a free, open test suite which examines WCS implementations and the coverages delivered down to the level of single pixels, thereby ensuring interoperability across the large and increasing ecosystem of open-source and proprietary servers and clients.

Standardization Status

OGC WCS 2 is the current version (WCS 1 is deprecated since long). WCS 2.0.1 is based on CIS 1.0; WCS 2.1, adopted in 2019, is identical except that it also knows about CIS 1.1 General Grid Coverages. The WCS 2 extensions likewise have been enhanced; some of them have been at version 1.0 and are lifted now to 1.1, some already had 2.0 and are now 2.1. The WCPS language (which is independent from the WCS request protocol binding) is originally 1.0 and has been elevated to 1.1 to incorporate the CIS 1.1 coverage structure. OGC WQS has the status of a Discussion Paper.

OAPI-Coverages is under discussion for quite some time now (official kickoff in May 2018), but not yet stable and still changing; recent estimates expect adoption sometime 2022. From an implementation perspective there are the caveats that (i) the specification in places still is inconcise which prohibits interoperability, (ii) not implementation proven, and (iii) there are no clients available yet.

Further reading:


Coverage Analytics:
Web Coverage Processing Service (WCPS)

Preamble: Big Earth Data Analytics demands “shipping the code to the data” – however: what type of code?
For sure not procedural code (such as C++ or python) which requires strong coding skills from users and developers and also is highly insecure from a service administration perspective. Rather, use some safe, high-level query language, which also enables strong server-side optimizations, including parallelization, distributed processing, and use of mixed CPU/GPU, to name a few.
Paving the way for this future-oriented perspective, OGC already in 2008 has standardized the Web Coverage Processing Service (WCPS) geo datacube analytics language which today is part of the WCS suite, linked in through the WCS Processing extension. Today, WCPS is being used routinely in Petascale operational datacube services, e.g., in the intercontinental EarthServer federation.
This section answers the key question: How to do datacube analytics based on coverages?

My First WCPS Query

A WCPS query has several parts, the two most important ones being the for and the return clauses. The for part serves to indicate the coverage objects we want to work with, and to tie them to a variable - a shorthand, if you will - for subsequent processing directives. Say, we want to inspect coverages A, B, and C in turn using variable $c as an iterator (we note in passing that such variables start with a $ symbol). This is written as:

for $c in ( A, B, C )

This is fine, but not complete, as after all it does not give us any result. We need to add a return clause where we state what we actually want to get back. The following "hello world" example returns 42 (assuming coverage A exists on the server, otherwise adapt the name accordingly):

for $c in ( AvgLandTemp )
return 42

Ahem, nothing impressive, aadmittedly: all we get back is 3 times the literal 42 as below - but hey, we need to start somewhere.

42
42
42
Try it: This and the further examples you can readily try out in the WCPS sandbox of the Earth Datacube Playground. Just select the sample query from the menu at the bottom and hit execute.
Don't get surprised by extra noise like diagram>>. This prefix is just a hint for the client to render whatever is returned as either pure text display (prefix text>>), a diagram (prefix diagram>>), or an image representation (prefix image>>); without any prefix you are prompted for a storage location for the file received.

Not impressive, admittedly, but a first sign of life. Oh, by the way, such a simple number is returned in ASCII as text. This can be printed or easily converted into any programming language's number representation. If you will, ASCII is the transfer encoding.

Note: As the knowledgeable reader may suspect already: yes, WCPS syntax in its structure tentatively is similar to FLWOR expressions in XQuery; this has enabled easy integration of XPath metadata expressions in WCPS 1.1 which heralds seamless integration of data and metadata queries.

Things get more complicated once we start extraction from coverages - as we all know there is a plethora of format encodings available, end everybody has their own favourite. Now WCPS is graceful on this and lets you select your personal preference. As long as the data allows this - the server will have a hard time encoding something 4D into PNG which is just made for 2D, so that won't work.

Understood, but how do we express our desired encoding format? Well, by applying an encode() function on the result wich the coverage output as first parameter and the encoding format as second parameter, indicated through its MIME type like "image/tiff" for GeoTIFF. Assuming we have a small 2D coverage (be careful, you really don't want to try this with a multi-Terabyte object!) this could read as follows:

for $c in ( A )
return encode( $c, "image/tiff" )

The above query would return one GeoTIFF file; had we indicated a list of say 3 coverages then we would get back 3 GeoTIFF files, one for each iteration.

Now we are ready to roll - let us inspect next how we can extract from coverages. After all, as mentioned, we would not want to download coverages which typically are Big Data, hence "too big to transport".

Combining Coverages: Data Fusion

Oh, by the way, the for clause can do even more for us: combining several coverages. This is achieved by listing several variable declarations in the for clause. For example, let us request a band ratio (more on that below) from bands stored in two separate coverages, say MODIS-nir and MODIS-red. Ignoring the processing magic in the return clause for now - we will address this below - we can establish fusion requests like the following:

for $nir in ( MODIS-nir ),
    $red in ( MODIS-red )
return encode( (unsigned int) 127*(($red-$nir)/($red+$nir)+1), "image/tiff" )

Generally speaking, for some lineup

for $x1 in (X1,1,X1,2,...),
    $x2 in (X2,1,X2,2,...),
    ...,
    $xn in (Xn,1,Xn,2,...)
return f($x1,$x2,...,$xn)
each combination of the $xi,j is generated and evaluated in turn. This allows data fusion on an arbitrary number of coverages. As an example, this INSPIRE WCS service contains a 4-way fusion.

Coverage Expressions

At the heart of WCPS are coverage expressions - they actually perform the heavy lifting of Big Data Analytics. A rich set of operators is provided by the language, effectively covering some mileage of Tensor Algebra, up to about the Fast Fourier Transform). We start our tour with the most basic operation: coordinate-based extraction from a coverage.

Coverage Subsetting

The most immediate operation, subsetting, we know already from WCS trimming and slicing. In WCPS, notation is more compact: instead of the WCS SUBSET sequence subsetting in WCPS is enshrined in brackets, such as:

$c[ Lat(30:60), Long(40:60), date("2020-03-21") ]

Sequence of axes does not matter, as we have always indicate the name of the axis addressed. As with WCS, axes left out will be delivered uncut. Needless to say, the lower bound must not be higher than the upper bound.

Induced Operations

We have already used so-called induced operations where some operation which is known on pixels silently gets applied to all pixels simultaneously. Any unary or binary function defined on the cell types qualifies for induced use, includomg all the well-known arithmetic, comparison, Boolean, trigonometric, and exponential/logarithmic operations, as well as case distinction. Talking about case distinction, the following example performs a color classification of land temperature (note the null value 99999 which is mapped to white, see the query result in the figure right):

for $c in ( AvgLandTemp )
return
  encode(
    switch
      case $c = 99999 return {red: 255; green: 255; blue: 255}
      case $c < 18    return {red: 0; green: 0; blue: 255}
      case $c < 23    return {red: 255; green: 255; blue: 0}
      case $c < 30    return {red: 255; green: 140; blue: 0}
      default         return {red: 255; green: 0; blue: 0},
    "image/png"
  )

Expression Optimization: Leave it to the Server!

As always it is corageous to let the expression iterate over the full coverage. Being streetwise we therefore add subsetting:

for $c in ( AvgLandTemp )
return
  encode( switch
    case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] = 99999 return {red: 255; green: 255; blue: 255}
    case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] < 18    return {red: 0; green: 0; blue: 255}
    case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] < 23    return {red: 255; green: 255; blue: 0}
    case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] < 30    return {red: 255; green: 140; blue: 0}
    default                                                      return {red: 255; green: 0; blue: 0},
    "image/png"
  )

By instinct we have expressed the processing steps in the sequence that is most efficient: first subsetting, then pixel computations. However, things are getting unwieldy, repeating subset coordinates all over is onerous. We should not be forced to do it that way. Actually, a good optimizer, such as the one implemented in rasdaman, will rearrange automatically to the most efficient execution sequence. If we know that our WCPS server is that smart we can just as well indicate the subsetting once and for all at the end:

for $c in ( AvgLandTemp )
return
  encode(
    ( switch
        case $c = 99999 return {red: 255; green: 255; blue: 255}
        case $c < 18    return {red: 0; green: 0; blue: 255}
        case $c < 23    return {red: 255; green: 255; blue: 0}
        case $c < 30    return {red: 255; green: 140; blue: 0}
        default         return {red: 255; green: 0; blue: 0}
    ) [ date("2014-07"), Lat(35:75), Long(-20:40) ],
    "image/png")

Let Clause

And here is yet another way of making expressions crisper and easier to read: pulling parts out into separate variable definitions. This is what the let clause accomplishes by letting us give names to expressions; the above query would now read:

for $c in ( AvgLandTemp )
let $subset := [ date("2014-07"), Lat(35:75), Long(-20:40) ]
return
  encode(
    ( switch
        case $c = 99999 return {red: 255; green: 255; blue: 255}
        case $c < 18    return {red: 0; green: 0; blue: 255}
        case $c < 23    return {red: 255; green: 255; blue: 0}
        case $c < 30    return {red: 255; green: 140; blue: 0}
        default         return {red: 255; green: 0; blue: 0}
    ) [ $subset ],
    "image/png"
  )

any number of variables can be defined, and it is highly recommended as it nicely structures queries for human eyes.

Multi-Band Constructor

Ready for a fun fact? We can construct entirely new coverages, cell-by-cell. We want to do this, for example, to support WebGL clients in doing 3D terrain draping. The trick is to provide a PNG image where RGB is built from some any source (like a satellite image) while feeding the alpha channel with the elevation data. Of course we will need to adjust resolution as both typically differ widely; we choose to re-scale the DEM. This is the corresponding query looks like:

for $s in (SatImage),
    $d in (DEM)
return
  encode(
    struct {
      red:   $s.b7,
      green: $s.b5,
      blue:  $s.b0,
      alpha: scale( $d, 20 )
    },
    "image/png"
   )

Type Casting

In passing we note that also casts to different pixel types are available. This is necessary, for example, when some arithmetic operation would return float while the format (here: PNG) expects 8-bit unsigned quantities.

Note: The band ratio example above, if used for visualization, would likely need adjustment, too, as it returns float values between -1 and +1, not suitable for visual display. This illustrates the fact that WCS and WCPS primarily are data services, not visualization services - that said, visualization is a special case of queries where the result type is single- or three-band (sometimes, as we have seen, four-band) with 8-bit channel values. In general, data service delivering data that can be processed further, as opposed to WMS and WMTS which deliver images solely suitable for human consumption.

Coverage Constructor

In the functional, side-effect free language of WCPS we already have derived new coverages, to be shipped to the client in some encoding. What we have manipulated was the mostly pixel values. The domain set was either given by a subsetting expressions or carried over implicitly. The new range type, finally, was either forced by a cast or derived automatically, All these are special cases of a general constructor capable of building a completely new coverage with all elements controlled. This coverage constructor has three clauses. In the coverage section a name is given to the new coverage, in the over section the extent is defined for each axis the coverage will have, and in the values clause an expression is indicated for computing the coverage's cell values. Here is an example creating a 256x256 matrix with axes x and y containing a white-to-black grey bar:

coverage GreyMatrix
over     $px x ( 0 : 255 ),
         $py y ( 0 : 255 )
values   (char) ( ( $px + $py ) / 2 )

Should we want to refer to another coverage then this is no problem, in both domain and range set. Auxiliary function imageCrsDomain() delivers the extent for some axis, in Cartesian coordinates so that iteration is possible:

coverage GreyMatrix
over     $px x imageCrsDomain( $c, Long )
         $py y (imageCrsDomain( $c, Lat )
values   (char) ( ( $px + $py ) / 2 )

Coverage Condenser

What's missing still is some way to aggregate coverage values. In coverage speak this is said to condense values. Such a condenser function receives a coverage expression to iterate over and delivers min, max, sum, avg, count as well as some (is there any true value?) and all (are all values true?). For example, this below yields the maximum value in the red band of a satellite scene:

for $s in ( LandsatScene )
return max( $s.red )

Note that there is no encode() - the result, a single number, is simply returned as an ASCII string.

Very often condensers are used in conjunction with a coverage constructor. Here is an example, deriving a histogram from some satellite band:

for $s in ( LandsatScene )
return
  encode(
    coverage $bucket ( 0 : 255 )
    values count( $s.red = $bucket ),
    "text/csv"
   )

And just like there is a constructor generalizing induced operations there is also a general condenser construct behind these condenser functions, which now turn out to be special cases. The general structure is, in analogy to the constructor, the keyword condense followed by the operation to be executed, an over clause indicating the region to iterate over, and finally a using clause with an expression to be evaluated at each position. Hence, the above max() example is equivalent to the following writing:

for $s in ( LandsatScene )
return
  condense max
  over     $p in imageCrsDomain( $s )
  using    $s.red

So why should we use this much more complicated construct? Well, it has access to the coordinates, thereby allowing to express neighbourhood operations. This can be seen nicely when expressing a matrix multiplication (remember, it is ai,k*bk,j):

for $a in ( MatrixA ),
    $b in ( MatrixB ) 
return
  encode(
    coverage MatMult
    domain   $i in crsDomain($a)[0],
             $j in crsDomain($b)[1]
    values   condense +
             over     $k in crsDomain($a)[1]
             using    $a[ $i, $k ] * $b[ $k, $j ] ,
  "text/csv" )

Coverage Filtering

Sometimes we are interested in just some coverages, but we don't know upfront which of them qualify before we have seen their data contents. Downloading all candidates for client-side inspection is not an option as they are "too big to transport", so we need to let the server find out. This is accomplished with the where clause which performs server-side preselection among coverage sets by applying any criterion that you may provide; only thise coverages passing the filter will be forwarded to the return processing and delivery.

Here is an example that will give you the Bavaria DEM only if it contains heights over 3000m - which it doesn't, so the result will be empty (try with a smaller number!):

for $c in ( Bavaria_50_DSM )
where max( $c ) > 3000
return 1 

Finale: A More Coverage Processing Example

Ready for a final effort? To show the expressive power of WCPS let us do edge detection using the Sobel oeprator given as

To WCPS this translates as:

for $c in (NIR)
let $kernel1 := coverage kernel1
                over     $x x (-1:1), $y y (-1:1)
                value list < 1; 0; -1; 2; 0; -2; 1; 0; -1 >,
    $kernel2 := coverage kernel1
                over    $x x (-1:1), $y y (-1:1)
                value list < 1; 2; 1; 0; 0; 0; -1; -2; -1 >,
    $cutOut := [ ${cutOut} ]
return
  encode(
    sqrt(
      pow(
        coverage Gx
        over $px1 i( imageCrsdomain( $c[$cutOut], i ) ),
             $py1 j( imageCrsdomain( $c[$cutOut], j ) )
        values
          condense +
          over     $kx1 x( imageCrsdomain( $kernel1, x ) ),
                   $ky1 y( imageCrsdomain( $kernel1, y ) )
          using    $kernel1[ x($kx1), y($ky1) ] * $c.${selectedBand}[ i($px1 + $kx1), j($py1 + $ky1) ] ,
        2.0
      )
      +
      pow(
        coverage Gy
        over $px2 i( imageCrsdomain( $c[$cutOut], i ) ),
             $py2 j( imageCrsdomain( $c[$cutOut], j ) )
        values
          condense +
          over     $kx2 x( imageCrsdomain($kernel2, x ) ),
                   $ky2 y( imageCrsdomain($kernel2, y ) )
        using    $kernel2[ x($kx2), y($ky2) ] * $c.${selectedBand}[ i($px2 + $kx2), j($py2 + $ky2) ] ,
        2.0
      )
    ),
    "image/jpeg"
  )

When applied in this demo to a false color image (left) the server returns this image (right):

Note: The WCPS syntax has been designed tentatively close to XQuery/XPath so as to allow integration with XML or JSON based metadata retrieval; Kakaletris et al have implemented a coupling of WCPS with XPath, leading to unified data/metadata queries, and the integrated data/metadata retrieval language WQS is prepared for this.

This concludes our primer on WCPS; for more introductory and detail information on various levels refer to the list below.

Further reading:

Articles & Links

Even more reading:

High-Performance Datacube Engine:
rasdaman

The rasdaman engine has pioneered Actionable Datacubes® and Array Databases. With its enabling approach of a high-level datacube analytics language -- adopted into ISO SQL -- and underpinned by a powerful datacube architecture rasdaman remains the gold standard for modern multi-dimensional raster data services.

More information:

Questions? Answers!

We gladly share our experience to answer any questions you may have, from strategic issues down to any technical depth. This can be discussed based on your own data, your own ecosystem requirements, and of course under strict confidentiality (such as under an NDA). Webinars as well as on-site meetings are possible.

Contact us - we gladly share our experience and insight from many years of writing, implementing, and deploying OGC, ISO, and INSPIRE standards, from Raspberry Pi to dozens-of-Petabytes DIAS archives.