publications

Standards-Based Services for Big Spatio-Temporal Data

With the unprecedented increase of orbital sensor, in-situ measurement, and simulation data as well as their derived products there is an immense potential for getting new and timely insights - yet, the value is not fully leveraged as of today. Incidentally, such spatio-temporal sensor, image, simulation, and statistics data in practice typically constitute prime Big Data contributors.

Big Geo Data Services: From More Bytes to More Barrels

The data deluge is affecting the oil and gas industry just as much as many other industries. However, aside from the sheer volume there is the challenge of data variety, such as regular and irregular grids, multi-dimensional space/time grids, point clouds, and TINs and other meshes. A uniform conceptualization for modelling and serving them could save substantial effort, such as the proverbial "department of reformatting".

Array Databases: Agile Analytics (not just) for the Earth Sciences

Gridded data, such as images, image timeseries, and climate datacubes, today are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, or RDF graphs, they traditionally do not support multidimensional arrays.

"Science SQL" as a Building Block for Flexible, Standards-based Data Infrastructures

We have learnt to live with the pain of separating data and metadata into non-interoperable silos. For metadata, we enjoy the flexibility of databases, be they relational, graph, or some other NoSQL. Contrasting this, users still "drown in files" as an unstructured, low-level archiving paradigm. It is time to bridge this chasm which once was technologically induced, but today can be overcome.

Big Data in the Earth sciences, the Tera- to Exabyte archives

Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timeseries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier

Datacubes: Exploiting Big Earth Data Better

With the unprecedented increase of orbital sensor, in-situ measurement, and simulation data there is a rich, yet not leveraged potential for getting insights from dissecting datasets and rejoining them with other datasets, effectively establishing a "datacube" paradigm with the ultimate goal to allow users to "ask any question, any time" thereby enabling them to "build their own product on the go". One of the most influential initiatives in Big Geo Data is EarthServer which is demonstrating new directions for flexible, scalable datacube services based on innovative NewSQL technology.

Datacubes in Action: Array Databases as Enabling Innovation

A paradigm shift is becoming reality. We begin to see the datacubes behind the millions of files. We start combining heterogeneous datacubes in an ad-hoc fashion. And we begin to overcome the age-old, technology imposed divide between data and metadata, supported by query languages like OGC WCPS for geo datacubes and ISO SQL/MDA for general multi-dimensional arrays.

Big Earth Data at Your Fingertips

We plan to showcase rasdaman to demonstrate flexibility and scalability of Array Databases. Flexibility arises from the query language which allows to formulate both simple and complex ad-hoc requests; these can refer to both local data and data residing somewhere in the peer network, thereby allowing fusion across data centers. Scalability arises from the intelligent optimizations that can be applied to datacube queries, from local parallelization (across CPU cores, GPUs, and other New Hardware), and from splitting queries across different nodes in a cloud or across clouds.

Datacubes as the New Virtual Research Environment Paradigm

A paradigm shift is becoming reality. We begin to see the datacubes behind the millions of files. We start combining heterogeneous datacubes in an ad-hoc fashion. And we begin to overcome the age-old, technology imposed divide between data and metadata. Query languages like SQL/MDA, the forthcoming extension to the ISO SQL standard with massive multi-dimensional arrays, but also increasing support for large-scale point cloud and mesh handling.

Science SQL: Advancing from Data to Service Stewardship. LSDMA Symposium “The Challenge of Big Data in Science”.

In today's science archives, data typically are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, and RDF graphs, they traditionally do not support "the data", in particular: multidimensional arrays. Consequently, file-based solutions let users "drown in data files" rather than presenting just a few datacubes for dissection and rejoining with other cubes.

Pages