How EOSC could open the way for Earth Observation innovation
The massive streams of high-resolution Earth Observation (EO) data derived from the EU Copernicus Sentinel sensors, have established Europe as the predominant spatial data provider for environmental monitoring applications. This data is made available under open license with an unprecedented frequency and spatial extent.
In principle, these data sources can inspire a wide range of science and monitoring applications from regional to continental scales. In practice, innovation is mostly happening outside Europe by large US IT companies. This leads to the unfortunate situation where EO science user communities need to rely on non-European platform suppliers for the Big Data Analytics they need to scale high-volume use of data streams.
There is world-class expertise in EO analytics in Europe. But we are missing a solution to provide core cloud services coupled to an online long-term data archive of Sentinel.
- Easily accessible European computing environments to allow scaling and sharing of (Sentinel) data among a large community of users.
- A European platform, similar to the Google Earth Engine, to make large-scale storage accessible through sophisticated indexing and caching solutions with an advanced application programming interface (API).
A key requirement is a core computing and storage architecture based on principles tailored to handle very large data sets and fast user query response.
EOSC as a solution
The European Open Science Cloud has the potential to become a viable European alternative to Google Earth Engine for the scientific EO community. The federated resources provided through the EOSC-hub project can become the storage and computing infrastructure necessary to enable full scaling across Copernicus data inputs. As a platform, it should lead to a collation of the many European initiatives in EO software, by establishing a common interface to massively parallel server workflow handling applied to an optimally indexed data storage format.
Components for the client API to define workflow graphs can be adopted from existing open source frameworks (e.g. Jupyter notebooks, python and node.js). An interface with European open datasets would demonstrate immediately the advantage of the EOSC infrastructure in practical EO science applications. This impact can be enhanced by creating an open science data sharing environment. Facilitating executable data analysis “papers” that use EOSC as the common platform would boost reproducibility and scaling of EO science results.
The capability of commercial providers such as Google and Amazon in the big data analytics domain pose a serious risk to the continuation of the successful Copernicus programme, aggravated by the fact that there are no concrete plans for Europe to maintain a full online archive of Sentinel data.
Here there is an opportunity to leverage the existing EOSC infrastructure and capture European EO expertise around it. ESA is already deleting the oldest Sentinel data holdings from the online archive, and while the deleted data can still be retrieved from ‘cold’ storage (e.g. tape archives), this limits applications requiring long time series. To the best of our knowledge there is no European initiative planning to host the long-term Sentinel data archive.
The EOSC could therefore be used to:
- Host the online archive of all Sentinel data.
- Create an optimized IT infrastructure environment for the EO community
Any action taken in this regard should be in collaboration with and build on existing experiences, such as the Copernicus DIAS and other projects.
How it could be set up
So how could we accomplish this?
- First, assess the feasibility of establishing EOSC as the host for the permanent archive of all Copernicus Sentinel data, knowing that the data volume in 2018 was 9,7 Petabytes, with annual growth of 6 Petabytes per year.
- Second, working closely with ESA, adopt a phased approach towards hosting the full archive. For example:
- Start with most important geographical areas
- Prioritise the S1 and S2 sensors
- Structure data transfer to prioritise European uptake
- The gradual phasing in can be accompanied by a gradual reduction of the ESA hosted archive
- In parallel, establish alternatives to S3 storage of complete image files to facilitate fast multi-scale data collation and interactive analytics and visualization. This activity would greatly benefit from experience in other EOSC big data analytics domains.
The federated nature of EOSC makes it a prominent candidate to serve the long term data storage and analytics challenges of the EU’s Copernicus Sentinel program. By leveraging its expertise in other Big Data Analytics domains, it can extend its scope to serve the European Science Earth Observation community.
This article was prepared with contributions from EuroGEOSS and the EC Joint Research Centre.