publications

Datacubes in Action: Array Databases as Enabling Innovation

A paradigm shift is becoming reality. We begin to see the datacubes behind the millions of files. We start combining heterogeneous datacubes in an ad-hoc fashion. And we begin to overcome the age-old, technology imposed divide between data and metadata, supported by query languages like OGC WCPS for geo datacubes and ISO SQL/MDA for general multi-dimensional arrays.

Big Earth Data at Your Fingertips

We plan to showcase rasdaman to demonstrate flexibility and scalability of Array Databases. Flexibility arises from the query language which allows to formulate both simple and complex ad-hoc requests; these can refer to both local data and data residing somewhere in the peer network, thereby allowing fusion across data centers. Scalability arises from the intelligent optimizations that can be applied to datacube queries, from local parallelization (across CPU cores, GPUs, and other New Hardware), and from splitting queries across different nodes in a cloud or across clouds.

Datacubes as the New Virtual Research Environment Paradigm

A paradigm shift is becoming reality. We begin to see the datacubes behind the millions of files. We start combining heterogeneous datacubes in an ad-hoc fashion. And we begin to overcome the age-old, technology imposed divide between data and metadata. Query languages like SQL/MDA, the forthcoming extension to the ISO SQL standard with massive multi-dimensional arrays, but also increasing support for large-scale point cloud and mesh handling.

Science SQL: Advancing from Data to Service Stewardship. LSDMA Symposium “The Challenge of Big Data in Science”.

In today's science archives, data typically are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, and RDF graphs, they traditionally do not support "the data", in particular: multidimensional arrays. Consequently, file-based solutions let users "drown in data files" rather than presenting just a few datacubes for dissection and rejoining with other cubes.

Agile retrieval of Big Data with EarthServe

With the unprecedented increase of orbital sensor, in-situ measurement, and simulation data there is a rich, yet not leveraged potential for getting insights from dissecting datasets and rejoining them with other datasets. Obviously, the goal is to allow users to "ask any question, any time" thereby enabling them to "build their own product on the go".

The rasdaman Open-Source Array DBMS.

Rasdaman ("raster data manager") has pioneered Array Databases by adding massive multi-dimensional gridded data, an information category long missing in databases, to scalable data management and analysis. Its declarative query language, rasql, extends SQL with array operators which are optimized and parallelized on server side. Installations can easily be mashed up securely, thereby enabling largescale location-transparent query processing in federations. Domain experts value the integration with their commonly used tools leading to a quick learning curve.

Big Earth Data at Your Fingertips

The term "Big Data" is a contemporary shorthand characterizing data which are too large, fast-lived, heterogeneous, or complex to get understood and exploited. Technologically, this is a cross-cutting challenge affecting both storage and processing, data and metadata, servers and clients as well as mashups. Further, making new, substantially more powerful tools available for simple use by non-experts while not constraining complex tasks of experts just adds to the complexity. All this holds for many application domains, but specifically so for the field of Earth Observation (EO).

EarthServer: Big Earth Datacubes at Your Fingertips

"Big Data" is a shorthand for data too large, fast-lived, heterogeneous, or complex to get understood and exploited. Technologically, this affects storage and processing, data and metadata, servers and clients as well as mashups. Further, more powerful tools must be available for simple regular use while not constraining experts. Challenge is to allow users to "ask any question, any time" enabling them to "build their own product".

Achieving Interoperability with Big Geo Data Standards

With OGC coverages, a concrete, interoperable data model has been established which unifies n-D spatio-temporal regular and irregular grids, point clouds, and meshes - hence, the main contributors to today's Big Geo Data. While coverages can be served through many OGC services, the Web Coverage Service (WCS) suite provides versatile streamlined coverage functionality ranging from simple access to flexible spatio-temporal analytics. Flexibility and scalability of the WCS suite has been demonstrated in practice through services with up to 130+ TB of space/time datacubes.

WCS for INSPIRE: Analyzing Massive Spatio-Temporal Datacubes.

The transatlantic EarthServer initiative has made spatio-temporal analytics a commodity for scientists,
engineers, and decision makers.

By utilizing novel parallel Array Database technology with frontends strictly based on the open OGC
standards, datacubes have become first-class citizens accessible through direct interaction with simple
point-and-click Web GUIs Flexibility and scalability of this approach has shown on 130+ TB datasets
together covering Earth and Planetary sciences.

Pages