Services for the European Open Science Cloud
Services for the European Open Science Cloud
The posters will be showcased for the full duration of the event in room Prague A+D and will be presented on Thursday, 11th April 2019 at 11 am CET in room Prague A+D.
The metadata service B2FIND is intended to be the central indexing tool for EOSC-hub and plays a central role within the pan-European collaborative research data infrastructure EUDAT-CDI by providing a simple and user-friendly discovery portal to find and access research data collections stored in EUDAT data centers and in community specific repositories. Therefore metadata collected from heterogeneous sources are stored in a comprehensive joint metadata catalogue and made searchable via an open data portal. B2FIND provides transparent access to the scientific data objects through given references and identifiers. The implemented metadata ingestion workflow consists of three steps. First metadata records - provided either by various research communities or via other EUDAT services - are harvested. Afterwards the raw metadata records are converted and mapped to unified key-value dictionaries as specified by the B2FIND schema, whereas the most subtle and challenging task is to map non-uniform, community specific metadata to homogenous structured datasets. To assure and improve metadata quality this mapping process is accompanied by:
• iterative and intense exchange with community representatives,
• usage of controlled vocabularies and community specific ontologies and
• formal and semantic mapping and validation.
Finally the mapped and checked records are uploaded as datasets to the catalogue which is based on CKAN, an open source data portal software that provides a rich RESTful JSON API and uses SOLR for indexing. The homogenization of community specific data models and vocabularies enables not only a unique presentation of these datasets as tables of field-value pairs but also an interdisciplinary and cross-community search with geospatial and temporal search functionalities. Results from a free text search may be narrowed by using the facets. B2FIND offers support for new communities interested in publishing their data within EUDAT and EOSC-hub.
In this poster, we present the Cherenkov Telescope Array (CTA) requirements to archive data. Those requirements drive some of the development of the eXtrem Data Cloud (XDC)’s team.
CTA is the next generation ground-based observatory for gamma-ray astronomy at very-high energies. With more than 100 telescopes located in the northern and southern hemispheres, CTA will be the world’s largest and most sensitive high-energy gamma-ray observatory. CTA expects to have more than 4 PB of data archived per year with an additional fix amount of 20 PB for Monte Carlo simulation.
XDC project aims at developing scalable technologies for federating storage resources and managing data in highly distributed computing environments
The first CTA’s requirement is the Quality of Service (QoS). Indeed, the archival solution should be able to manage replicas on tapes & disks. “Cold” data can be stored on cheap storage like tapes, where as “hot” data must be stored on low latency storage. The second CTA‘s requirement is the management of Metadata which described the data. The archival solution must process those metadata because some of the policies (like Quality of Service or access control) rely on those metadata. Moreover, CTA’s data follow FAIR principles (Findability, Accessibility, Interoperability, and Reusability), and there is a proprietary period of 1 year. During this period, only the Principal Investigator (PI) and his associates can retrieve the data. The archival solution must handle such access restriction. The archive must be able to preserve integrity of the data for 30 years. Therefore, the solution must be built on top of open and widely used standard. Moreover, the archival solution must be compatible with the EOSC approach. Ideally, the archival solution will be an archive service in the future EOSC catalog. Lastly, performance are key points for the CTA requirements. Indeed, the archive must not be the bottleneck for the ingest process neither for query process. CTA & XDC’s teams work closely together to make sure XDC solution will be able to meet CTA requirements, and build a Proof of Concept.
CO-OPERAS - open access in the European research area through scholarly communication - IN aims to build a bridge between SSH data and the EOSC, widening the concept of “research data” to include all of the types of digital research output linked to scholarly communication that are, in SSH, part of the research process.
One of the main challenges the SSH need to address to achieve that goal is the fragmented nature of research fields, across many disciplines and subdisciplines, usually grounded in regional, national and linguistic specific communities: as a result, code multilingualism is a clear trait of these disciplines where English as a Lingua Franca is far from being the sole means to communicate research results. Multilingualism has to be properly addressed in order to ensure access and reuse of SSH data.
The fragmentation of SSH data across different types, formats, languages, disciplines or even institutions is a major impediment to their discovery from outside the specific and often small communities where they were produced. As a consequence, machine readable tools and materials are rarely available and often incomplete or non-interoperable, hence accessibility and reuse are far from optimal in these fields. These issues are perceived as strategically important priorities by the research community. On the other hand, SSH disciplines have undergone major changes in their communication practices, driven by the development of digital technologies and the open science paradigm. For example the boundaries of the “scholarly record” are now blurring, and the research monograph, which was until recently, the primary form of research dissemination in the humanities, is being associated with or even challenged by technical innovations such as text and data mining, open annotations, data embedding, and collaborative writing. In the SSH, “Big Data” approaches are not sufficient to address researchers’ needs in terms of data management and exploitation. SSH data often need to be very precisely qualified, described, curated and managed: they are smart and small data, which means they have to be managed to be integrated in the EOSC landscape according to contextually appropriate, specific protocols. The concept of “continuous communication” underpinning the SSH research lifecycle holds an immense potential as an inspiring model of Open Science with direct societal impact. CO-OPERAS is based upon a solid international framework, built through strong collaboration between more than 36 partners from 13 countries, representing diverse stakeholders and service providers encompassing the entire cycle of scholarly communication in SSH. CO-OPERAS IN aims to bring the FAIR principles into the SSH research environment, leveraging existing scholarly communication services and platforms to connect them as components of an emerging EOSC, and more broadly to the global SSH communities. The main purpose of the CO-OPERAS IN is the FAIRification of the research process and resources in the SSH, leveraging both on building services, sharing standards and on changing the communication culture in SSH. A second purpose is the contribution of CO-OPERAS network to the FAIR standards from the SSH data.
ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. Apart from that, ELIXIR designed and currently operates its own state-of-the-art AAI service aligned with AARC2 blueprint architecture.
On our poster, we would like to show how this community-centric AAI approach works when managing authentication and authorisation for different types of services used by the community. That will be demonstrated using three real use-cases when ELIXIR AAI manages access to internal services, external services which are operated outside the community, and services provided by international e-infrastructures.
The aim of the research in our lab (www.bonvinlab.org), is to integrate diverse experimental information sources with computational structural biology methods to obtain a
comprehensive description of the structural and dynamic landscape of complex biomolecular machines. For this purpose, we have been developing HADDOCK over the last 15 years. HADDOCK has pioneered the use experimental and/or bioinformatics data to guide the modelling process. A large variety of experimental data can be included, such as from Nuclear Magnetic Resonance, Mutagenesis, Mass Spectrometry, Small Angle X-ray Scattering, Cryo-Electron Microscopy or bioinformatics predictions. Data from any other experimental and/or predictive methods generating either distance- or interface mappingbased information can easily be incorporated into HADDOCK. HADDOCK is freely available in the form of a user-friendly web server which has been making use of the EGI federated HTC resources since over 10 years now. Its importance for the scientific community, its top performance in the CAPRI community-wide blind protein docking experiment and its continuous development and improvements are reflected in the high number of citations since its release, circa 2000, and its large worldwide user community (>12500).
HADDOCK is currently a core software in the H2020 Center of Excellence for Computational Biomolecular Research (bioexcel.eu) and is operating as a WeNMR thematic service (wenmr.eu) under the the H2020 European Open Science Cloud hub project (eosc-hub.eu), together with several other services accessible from https://wenmr.science.uu.nl. Those portals are building upon EOSC-Hub services, making use of DIRAC4EGI to submit over 8 million jobs per year to HTC resources, even accessing GPGPU resources via containerization (a product of the INDIGO-Datacloud project), and facilitating their use by integrating the EGI CheckIN single-sign-on. All those form the HADDOCK/WeNMR ecosystem.
Staff and students affiliated to a university can access and download all research papers the institution subscribes to, provided that they are logged in to the institution’s network. While readers can access research literature their university subscribes to quite easily, it is not possible for text and data miners to machine access research literature their university subscribes to effectively and at scale.
The current amendments and exceptions in the Copyright Law in some countries, such as the UK, have already given the green light to text and data mine (TDM) content we have acquired access to for non-commercial research. eduTDM aims to find a pragmatic solution to arrange how this content can be delivered to text miners as easily as possible based on the subscription they have.
eduTDM (https://edutdm.core.ac.uk) is a working group which consists of a variety of stakeholders, including the publishing industry, text and data mining scientific community, digital infrastructures representatives and policy makers. The objectives of the group are to:
- Initiate and establish a collaboration and a communication route between a range of stakeholders on this topic.
- Define, discuss and disseminate the principles and views of the stakeholders involved in this collaboration.
- Understand the position of the stakeholders on eduTDM.
- Establish a development plan for putting eduTDM into practice.
The group has so far drafted a white paper collecting and describing the requirements for eduTDM from all relevant stakeholder groups. Initial technical architecture has then been developed and is currently under consultation.
In more than two years, the EOSCpilot project focussed on the engagement of a variety of stakeholders, spanning from e-Infrastructures to Industry to the General public. This poster will focus on engagement best practices and success stories regarding Research Producing Organisations, Academic Institutions and Research Libraries on the one hand, and with learned societies, research communities, scientific and professional associations on the other. Despite being the natural stakeholders and potential users of the EOSC, the engagement process took a significant amount of effort and thus it’s worth sharing the methodology and key aspects.
Various engagement channels, communication tools and venues have been used by the project partners, ranging from private and direct contacts and newsletters to participation in the EOSC Stakeholder Fora and interviews. Playing an essential role for the EOSCpilot, these stakeholder categories have been approached in various ways, in order to reach a certain level of community building and collaboration. A two-way collaboration has not only provided these intermediaries with insight in the EOSC governance and its services, but also informed the project, thus contributing in its structure and content, and helping it in addressing challenges, that occurred throughout the project’s developments. By engaging with Research Producing Organisations, Academic Institutions and Research Libraries, EOSCpilot managed to uncover the role of these communities as intermediaries that can involve the end-users in shaping EOSC. Their contribution was not only essential, but also structural. In addition to this, it provided the opportunity for the project to discuss the skills and training in Open Science and the EOSC ecosystem, by directly linking with these communities. Such a successful webinar took place in December 2018, co-organised by EOSCpilot and the LIBER working group on digital skills for library staff and researchers, which assisted in gaining a better understanding on how they fit in the EOSC, and how to train research support staff and librarians to support researchers with an improved knowledge of the EOSC landscape. Furthermore, engagement activities with this category led in the shared publication of a Vision for Open Science.
Discussing with Learned societies, research communities, scientific and professional associations, the EOSCpilot confirmed that EOSC represents a game changer in the way researchers and scientists work and perform their research activities. Also, the launch of the EOSC Portal opens new possibilities in this respect. However, the uncertain timeline might affect the uptake of the EOSC among these communities; in addition to this, end-users’ communities need some clarity in the delivery of fully working services.
The EOSC helps my Science Demonstrators to provide more powerful tools to the research communities we are serving. Roberto Scopigno, PI of VisualMedia, at the EOSCpilot All Hands Meeting in Pisa, 8-9 March 2018.
This poster presents the EURAMET project TC-IM 1449 "Research data management and the EOSC" and its context within the European metrology landscape. Metrology - the science of measurement - is the foundation of confidence in measured data. The TC-IM project aims to foster collaboration between metrology institutes' research activities and delivered services in a way that enables a long-term commitment for the EOSC. In a next step, this TC-IM project is intended to be transformed into an activity of a EURAMET metrology network "Metrological infrastructures for digitalization".
Over the past decade or so, infrastructure has become the indispensable backbone of science. e-infrastructures, research infrastructures, data infrastructures and other facilities, big and small; local, national and international; centralized or distributed; disciplinary and cross- or multi-disciplinary; consisting of computer networks, machines, grids, clouds, data centres, knowledge and expertise providers, methodological and standards groups, data-scientists and other data experts, e-science engineers, and so on. Now the time has come to fit these components and bring the people into an overarching commons, which the European Open Science Cloud aims to provide. The FAIRsFAIR project addresses, in a 36 months timeplan, the development and concrete realisation of an overall knowledge infrastructure on academic quality data management, procedures, standards, metrics and related matters, based on the FAIR principles.
FAIRsFAIR aims to supply practical solutions for the use of the FAIR data principles throughout the research data life cycle. Emphasis is on fostering FAIR data culture and the uptake of good practices in making data FAIR. FAIRsFAIR will play a key role in the development of global standards for FAIR certification of repositories and the data within them contributing to those policies and practices that will turn the EOSC programme into a functioning infrastructure. In the end, FAIRsFAIR will provide a platform for using and implementing the FAIR principles in the day to day work of European research data providers and repositories. FAIRsFAIR will also deliver essential FAIR dimensions of the Rules of Participation (RoP) and regulatory compliance for participation in the EOSC. The EOSC governance structure will use these FAIR aligned RoPs to establish whether components of the infrastructure function in a FAIR manner.
This poster will present how FAIRsFAIR will focus on and involve all scientific communities for supporting, creating, further developing and implementing a common scheme to ensure wide uptake of and compliance with FAIR data principles in the practices of data producers as well as national and European research data providers and repositories contributing to the EOSC. Furthermore, it will show how it will closely collaborate with other relevant global projects and initiatives already on the way e.g. GO-FAIR, the Research Data Alliance (RDA), World Data System (WDS), CODATA and other European projects (e.g. the EOSC governance, the ESFRI clusters SSHOC, PANOSC, ENVRI FAIR, ESCAPE and EOSCLife) and with the EOSC coordination structure.
Social Sciences and Humanities (SSH) research is divided across a wide array of disciplines and languages. While this specialization makes it possible to investigate the extensive variety of SSH topics, it also leads to a fragmentation that prevents SSH research from reaching its full potential. Moreover, SSH publications are numerous: researchers don’t have main revues as in other fields and it exists several small revues. That means it is very difficult for researchers to find information and publications or to make them more visible. Use and reuse of SSH research is low, interdisciplinary collaboration possibilities are often missed, and as a result, the societal impact is limited.
ISIDORE is the only tool in Europe able to crawl all the sources with a high level of granularity, regarding the precision of the metadata. Developed by TGIR Huma-Num (CNRS) in France, it impulses a virtuous research circle for SSH researchers. This service collects, enriches and provides unified access to digital documents and data from the humanities and social sciences in the whole Europe. ISIDORE harvests structured and unstructured data: bibliographical records, metadata, integral text from digital publications, corpus, databases and scientifical news accessible on the web. Once harvested, those datas are enriched and standardized in different languages, by crossing with referentials (vocabulary lists, thesaurus) produced by the scientific community and research institutions. This process is done with an algorithm based on a morphological analysis of the terms (Nerd) which exploits metadata of the resources as well as the full text, by analysing these data to link them with vocabularies. Those enrichments allow to link the data between each other. The enrichment of data with international vocabularies allows to research documents in several languages. A research in English can allow to find documents in Spanish for example.
Launched in 2009, more than 6400 sources are already harvested, for a total of more than 5 millions documents. The re-exposure of the enriched metadata follows, in turn, the principles of Web of data (RDF). Thanks to this feature, and because of its involvement into FAIR practices, ISIDORE is different from a simple search engine: it offers to the whole community to enrich constantly its own data. ISIDORE is at the core of the full intelligent discovery solution developed by OPERAS Research Infrastructure and a consortium of 18 partners. It provides all necessary means to build interdisciplinary projects and to develop large-scale scientific missions. The solution will thus increase the economic and societal impacts of SSH resources by building a full multilingual and multicultural solution for the appropriation of SSH resources. The platform provides a 360° discovery experience thanks to linked exploration provided by the ISIDORE search engine and by innovative tools to support research.
The growing volume of data produced by different sources enables the addressing of diverse global challenges, including climate change, species reduction and water quality. New data sources are usually open: satellite data, Internet of Things, meteorological stations, etc. However, they are sometimes very heterogeneous in terms of format, volume or resolution, and they require advanced computing systems to be analyzed or managed. Remote sensing data provided by satellites, historical and forecasting data provided by meteorological agencies or in-situ sensors gives a lot of information to monitor an ecosystem.
The eXtreme-DataCloud project (XDC), under the umbrella of the H2020 programme, aims at developing a scalable environment for data management and computing, addressing the problems of the growing data volume and focused in providing a complete framework for research communities through the European Open Science Cloud. The target of this project is to integrate different services and tools based on Cloud Computing to manage Big Data sources, and Use Cases from diverse disciplines are represented. One of the goals of the project is to deal with extremely large and heterogeneous datasets, including diverse data and metadata types, formats and standards that enable the automatic integration of Big Data.
The LifeWatch ERIC Use Case at XDC is integrating data from those heterogeneous data sources for Environmental data such as Satellites (NASA Landsat, ESA Sentinel), meteorological stations (both historical and forecasting data) or In-situ instrumentation. These sources produce data in different formats like NetCDF4, HDF5 or CSV, and they are accessible via different types of APIs. The goal of this Use Case is to automatize different stages of the data lifecycle in order to simulate freshwater environments like reservoirs to forecast the hydrodynamics and water quality, facing the problem of eutrophication. The idea is to deploy a framework focused on allowing data life cycle management in the “DataCloud” in a FAIR fashion, integrating different services based on cloud computing to
The Use Case is progressing and it offers a Jupyter Hub interface to manage different stages in the data life cycle, including data ingestion from different sources, analysis and visualization. Jupyter Hub is accessible via Indigo IAM, and it deploys automatically a docker container per user that includes a set of software components for data management. The Docker container mounts a Onedata space storing the different datasets selected by the user, which includes a metadata layer based on EML (Ecological Metadata Language) that enables findability. Since everything is integrated in terms of AAI thanks to Indigo IAM, the user can launch different analysis jobs directly to the PaaS Orchestrator, and after processing, check the results in the Jupyter notebook.
The proposed poster will describe the details about the Use Case and how the problems are being addressed. It will provide details about the implementations and how this integration of Jupyter Hub in a “DataCloud” can be adopted by many different Use Cases.
The Bridge of Data Project is an outgoing project at the Gdansk University of Technology (GUT) related to the open data curation and sharing. It is a continuum of the Bridge of Knowledge platform that was mainly focused on the institutional repository and scientific services for the academic community.
In this poster, we would like to present the process of our approach to metadata collection and description and show the challenges and difficulties in designing it. Selecting the standards that will be appropriate for datasets collections and fulfill the FAIR principles, is a weighty and difficult decision. For scientific publications, we already support Dublin Core and Highwire Press tags. Additionally, to ensure the project’s compatibility with 5 stars Open Data, each object is described by schema.org with JSON-LD formatting.
As metadata is essential for efficiently storing, sorting, retrieving, sharing and link scientific data, and to assure the description of granular levels of resources, we have decided to use the DDI (Data Documentation Initiative) standard for the first level of metadata. Due to the wide range of disciplines covered by our project (GUT in collaboration with Gdansk University and Gdansk Medical University) from humanities, social sciences, technical and engineering to medical science, we were looking for a standard that best reflects our needs and assumptions. DDI standard is quite general, flexible, and more accessible for all disciplines and broader communities in comparison to others. In addition, DDI is more interoperable than other standards that will result in better indexing provided datasets in various search engines and data hubs raising awareness of its presence and availability. The second level of metadata will be subject-specific and more constrained to assure scientific objects more findable and reusable. As one example, we will support INSPIRE standard for GIS data.
It has to be highlighted that our data repository will have a hierarchical structure that allows e.g. research teams to assign the specific collection of datasets to specific projects a then sub-collection to different research objects such as individual scholars, publications, software or images.
The Open Clouds for Research Environments project (OCRE), aims to accelerate cloud adoption in the European research community, by bringing together cloud providers, Earth Observation (EO) organisations, companies and the research and education community.
This will be achieved through ready-to-use service agreements and €9.5 million in adoption funding facilitated through cloud vouchers.
From the demand side, through OCRE, research institutions will be able to take advantage of innovative commercial services as the process will be more streamlined. Less time is needed to discover and acquire the services they need from the market. Researchers themselves, by benefiting from vouchers and increasing their uptake of commercial services from diverse providers, will enjoy better tools to carry out their work.
On the supply side, for commercial cloud service providers, the legal, financial, and technical compliance barriers will be minimised. OCRE will make market requirements easier to understand, allowing them to tailor their offering to the research community. Specifically for earth observation SMEs, they will be introduced into the "marketplaces" used by research communities to procure services. This opens up new opportunities and provides them with a better view of the market for their niche solutions.
In the end, OCRE will make selected commercial digital services an integral part of the European Open Science Cloud (EOSC), ensuring compliance with EOSC requirements and visibility in the EOSC-hub Service Catalogue and the EOSC Marketplace.
Funding agencies, journals, and academic institutions frequently require research data to be published according to the FAIR (Findable, Accessible, Interoperable and Reusable) principles. To achieve this, every step of the research process needs to be accurately documented, and data needs to be securely stored, backed up, and annotated with sufficient metadata to make it re-usable and re-producible. The use of an integrated Electronic Lab Notebook (ELN) and Laboratory Information Management System (LIMS), with data management capabilities, can help researchers towards this goal. ETH Zürich Scientific IT Services (SIS) has developed such a platform, openBIS, for over 10 years in close collaboration with ETH scientists, to whom it is provided as a service on institutional infrastructure.
openBIS is open source software that can be used by any academic and non-for-profit organization; however, the implementation of a data management platform requires dedicated IT resources and skills that some research groups and institutes do not have. In order to address this, ETH SIS has recently launched the national openRDM.swiss project. openRDM.swiss offers research data management as a service to the Swiss research community, based on the openBIS platform. The service is available either as a cloud-hosted version on the SWITCHengines infrastructure, or as a self-hosted version using local infrastructure. The cloud-hosted version, with optional JupyterHub integration for data analysis, will be available via the recently launched SWITCHhub, a national marketplace for digital solutions tailored to research. In addition, openRDM.swiss includes training activities so that researchers can successfully adopt the new service in their laboratories. Finally, the project plans to improve interoperability with other data management and publication services, in particular research data repositories including the ETH Research Collection and Zenodo.
Serverless computing, in the shape of an event-driven functions-as-a-service (FaaS) computing model, is being widely adopted for the execution of stateless functions that can be rapidly scaled in response to certain events. However, the main services offered by the public Clouds providers, such as AWS Lambda, Microsoft Azure Functions and Google Functions do not fit the requirements of scientific applications, which typically involve resource-intensive data processing and longer executions than those supported by the aforementioned services.
Still, scientific applications could benefit from the ability of being triggered in response to file uploads to a certain storage platform, so that the execution of multiple parallel invocations of the function/application would speed up the simultaneously processing of data while provisioning on-demand the required computing power to cope with the increased workloads.
Enter OSCAR, an open-source platform to support serverless computing for data-processing applications. OSCAR supports the FaaS computing model for file-processing applications. It can be automatically deployed on multi-clouds thanks to the EC3 (Elastic Cloud Compute Cluster) and IM (Infrastructure Manager) open-source developments.
OSCAR is provisioned on top of a Kubernetes cluster which is configured with a plugin created for the CLUES elasticity manager, in order to automatically provision additional nodes of the Kubernetes cluster to achieve two-level elasticity (elasticity for the number of containers and elasticity for the number of nodes). The following services are deployed inside the Kubernetes cluster: i) Minio, a high-performance distributed object storage server with an API compatible with Amazon S3; ii) OpenFaaS, a FaaS platform to create functions triggered via HTTP requests; iii) Event Gateway, an event router that wires functions to HTTP endpoints and iv) OSCAR UI, a web-based GUI aimed at end users to facilitate interaction with the platform.
The development of OSCAR has reached the status of prototype (TRL6) and it has also been integrated with the following EGI Services: i) EGI DataHub, in order to use OneData as the source of events. This way, scientists can upload files to their OneData space and this triggers invocations of the function in order to perform parallel processing on multiple files where the output data is automatically stored back in the same space; ii) EGI Applications on Demand, for users to self-deploy their elastic OSCAR clusters on EGI Federated Cloud through the EC3 portal; iii) EGI Cloud Compute, to provision Virtual Machines, as nodes of the Kubernetes cluster, from the EGI Federated Cloud.
The benefits of this platform have been assessed by integrating a use case related to Plants Classification using DEEP learning techniques that arised in the context of the DEEP Hybrid-DataCloud European project.
OSCAR will be introduced in the EGI Conference 2019 but we plan to showcase its main functionalities, in the shape of a poster, also during the EOSC-Hub week. This activity is co-funded by the EGI Strategic and Innovation Fund.
The Social Sciences and Humanities Open Science Cloud (SSHOC) is one of the five European Union H2020 Programme “ INFRA-EOSC-2018” recently funded cluster projects (together with ENVRI-FAIR, PANOSC, ESCAPE, EOSC-LIFE) that will leverage and interconnect existing and new infrastructures from the SSH ERICs and foster interdisciplinary research and collaboration.
An ambitious number of 47 organisations, experienced and skilled in Social Science & Humanities Infrastructures have gathered from all over Europe to collaborate together on SSHOC, the Social Science and Humanities Open Science Cloud project, coordinated by the Consortium of European Social Science Data Archives (CESSDA). The project started its journey in January 2019 and runs through to April 2022 to realise the transition from the current landscape with disciplinary silos and separated e-infrastructure facilities into a fully-fledged cloud-based infrastructure where data are FAIR, tools and training readily accessible, thus providing a significant contribution towards achieving the vision put forward by the European Cloud Initiative - and support the implementation of European Open Science Cloud.
All SSH ESFRI Landmarks and Projects (CESSDA, ESS, DARIAH, CLARIN and SHARE), relevant international SSH data infrastructures and the Association of European Research Libraries (LIBER) participate in the SSHOC project ensuring an inclusive approach. The consortium has the expertise to cover the whole data cycle: from data creation and curation to optimal re-use of data and can address training and advocacy to increase actual re-use of data. The consortium is also very well placed to address SSH specific challenges such as the distributed character of its infrastructures, multi-linguality, huge internal complexity of some of the data it deals with and secured access to sensitive data.
The project will pool, harmonize and make easily usable tools and services that will allow to process, enrich, analyse and compare the vast heterogeneous collections of SSH data available across the boundaries of individual repositories or institutions in Europe. The project will build the common SSH Cloud, maximise reuse through Open Science and FAIR principles, interconnect existing and new infrastructures and set up a Governance for SSH-EOSC. The expected impacts of the Social Science and Humanities Open Science Cloud:
-The social sciences and humanities are seamlessly integrated in the European Open Science Cloud.
-Availability of an EU wide; easy to use SSH Open Marketplace, where tools and data are open available.
-EU wide availability of high quality SSH data.
-EU wide availability of trusted and secure access mechanisms for SSH data, confirming to EU legal requirements.
-State of the art advanced through dedicated SSH data pilots cluster projects.
-Data sharing is the new normal among the different SSH communities.
A task force will be set up with EOSC-hub project to exchange and harmonize views on common themes, and existing contacts with other European and international organizations operating in and around the EOSC space will be invited to engage in the process.
With the data adquired by the European Space Agency (ESA) satellites, such as Sentinel, equipped with the latest technologies in multi-spectral sensors, we face an unprecedented amount of data with spatial and temporal resolutions never reached before. Exploring the potential of this data with state-of-the-art Machine Learning techniques like Deep Learning, could potentially change the way we think about and protect our planet's resources.
For this purpose we have integrated a super-resolution application from [Lanaras et al. 2018] to the DEEP Hybrid DataCloud Open Catalog [DEEP Catalog], to upscale low resolution (60m and 20m) bands from the Sentinel-2 satellite to full 10m resolution. In this way we hope to allow scientists with no machine learning background to use this service in an easy and transparent manner, without meddling with the code or the underlying ressources. We also hope to demontrate to potential users with this example how easy it is to integrate existing external code with our framework.
[Lanaras et al. 2018] Lanaras, C., Bioucas-Dias, J., Galliani, S., Baltsavias, E., & Schindler, K. (2018). Super-resolution of Sentinel-2 images: Learning a globally applicable deep neural network. ISPRS Journal of Photogrammetry and Remote Sensing, 146, 305-319.
[DEEP Catalog] https://marketplace.deep-hybrid-datacloud.eu/
Research often involves the use of personal data as a basis for the scientific analysis. However, a particular challenge in this area is to use these data resources without violating privacy. And for that we need secure digital infrastructures, compliant with both national and European regulations. EOSC-hub project provides services for sensitive data  through two partners: the Sigma2 / University of Oslo in Norway, and the CSC in Finland. Despite the fact that access to sensitive data must be restricted, it is important in many sensitive data use cases to support publishing metadata about sensitive data in order to e.g. attract research collaboration. EUDAT B2SHARE  is a user-friendly, reliable and trustworthy way for researchers, scientific communities and citizen scientists to store and share small-scale research data from diverse contexts.
Secure B2SHARE is a framework which is based on B2SHARE, and includes components for storing, describing and sharing sensitive datasets as well as controlling access to the datasets without jeopardizing privacy or security. Secure B2SHARE is mainly composed of: B2SHARE, Secure data Submission Service (SDS2), Authorization service (AS), and Secure Storage Service(S3) service. Access to sensitive data is restricted, while metadata describing the dataset is made publicly available *. Publicly available metadata is searchable and also harvestable, so metadata catalog services (such as B2FIND) can include public, non-sensitive metadata of sensitive datasets in their catalogs and search results. Secure B2SHARE is in progress to be offered by the two EOSC-hub sensitive data services: TSD  and ePouta 
* Some of the metadata can be sensitive, and as such, access restricted.
It has been recognised that FAIR data play an essential role in the objectives of Open Science (1). In clinical studies, discoverability, an essential component of FAIR data (findability,) is a major issue despite the implementation of clinical trial registries. In order to have access to all documents belonging to a clinical trial (e.g. publications, study protocol, statistical analysis plan, individual participant dataset), a central web portal federating available data sources (including registries, repositories) is necessary, making that information searchable. Such a portal is under development in the EU H2020-funded project eXtreme DataCloud (XDC) (grant agreement 777367).
Software development is based upon a detailed use case description, formal requirements and standardised metadata schema and data structures and is part of the XDC infrastructure. Metadata from given data sources are imported and mapped to a standardised metadata schema and pumped into OneData. Functionality for discoverability of studies and related data objects is provided by INFN and the GUI (web portal) is developed by OneData.
The ECRIN metadata schema for clinical studies based upon DataCite was updated (2,3). So far metadata from 7 data sources have been imported (CT.gov, PubMed, WWARN, Edinburgh DataShare, BioLINCC, ZENODO, Data Dryad), using different modalities (e.g. DB download, OAI-PMH, scraping of web pages) and covering more than 500000 records from clinical studies and associated documents. The imported data sources have been stored as JSON objects/relational DB form on the test bed server at INFN, Bologna. The metadata acquired have been mapped to the ECRIN metadata schema using standard JSON templates. Upload of the mapped metadata into OneData has started and the provision of the search functionality by INFN is currently being developed.
Preparatory work for the meta data repository has been performed, and integration into the XDC infrastructure successfully started. It is planned to have an initial demonstrator with full functionality in April 2019. The web portal will be introduced as service into the European Open Science Cloud (EOSC).
1. Huson J et al.: Fair Data Action Plan. Interim recommendations and actions from the EC Expert group on Fair data, 2018
2. Canham S, Ohmann C: A metadata schema for data objects in clinical research, Trials 2016; 17:557
3. Canham S. Ohmann C: ECRIN Clinical Research Metadata Schema Version 2 (April 2018) (Version 2.0). Zenodo. http://doi.org/10.5281/zenodo.1312539
AGINFRA+ addresses the challenge of supporting user-driven design and prototyping of innovative e-infrastructure services and applications. It particularly tries to meet the needs of the scientific and technological communities that work on the multi-disciplinary and multi-domain problems related to agriculture and food. It uses, adapts and evolves existing open e-infrastructure resources and services, in order to demonstrate how fast prototyping and development of innovative data and computing- intensive applications can take place.
The AGINFRA+ project is exploiting the Virtual Research Environments (VREs) paradigm for three research communities. VREs are a prominent existing cloud-based solution provided by the D4Science Initiative. VREs are web-based, community-oriented, user-friendly, open-science-compliant working environments for scientists and Evaluation practitioners working together on a research task. These research communities are (a) the Agro-climatic and economic modelling research community (b) The Food safety risk assessment research community and (c) the Food security research community.
This poster presents the three identified action lines to sustain AGINFRA+ project results, namely:
- Consultancy service to deploy and configure custom VREs for agri-food research communities: corresponds to Virtual Research Environments (VREs) that can be developed and deployed as a service to various agri-food communities based on (a) the D4Science infrastructure and (b) the customization of generic or domain specific services offered by various partners, which have been tested and validated with agri-food researchers during the project lifetime.
- Data Journals: corresponds to the two data journals that we have been set-up based on the ARPHA publishing platform (http://arphahub.com/), namely the Food Modelling Journal (https://fmj.pensoft.net/) and the Viticulture Data Journal (https://vdj.pensoft.net/)
- Thematic VREs: corresponds to the four VREs that have been set-up during the project lifetime for supporting the three user communities of the project, namely (a) one VRE for Agro-Climatic and Economic Modelling community, (b) two VREs for the Food Safety Risk Assessment Community and (c) one VRE for Food Security Community.
The ENES Climate Analytics Service (ECAS) offers a server-side data processing and sharing environment, enabling users from multiple disciplines to execute Python data workflows. ECAS supports users in acquiring data from multiple sources (including ESGF, DataHub, B2DROP, B2SHARE), leveraging the computational capabilities of the Ophidia Big Data Analytics framework in a fully virtualized and scalable Python environment, and sharing results via B2DROP and B2SHARE. The ECAS team conducts frequent training events open for users of all disciplines and builds up a comprehensive knowledge base and repository of freely available Python workflows accessible via GitHub. To this end, a set of Jupyter notebooks addressing scientific needs have been developed for the end users.
The use cases ECAS already enables include a variety of climate indexes to be calculated, e.g., number of tropical nights, monthly temperature averages or precipitation trend analysis. Most importantly, through connection with the Earth System Grid Federation (ESGF), ECAS provides easy, server-side access to the high-volume CMIP5 and future CMIP6 datasets. Therefore, ECAS is of particular interest to users from the climate impact and climate services community as it enables users to conduct individual data analysis on these datasets, which are typically not available locally and also without requiring local compute resources. This underpins the long-term service vision of ECAS: To provide an easy-to-use data processing environment for everyone, independent of support by larger institutions.
ECAS offers a flexible approach for data processing and sharing, adapting to needs of different user communities by providing more than a single pathway from data input to data output. To enable basic tracking of provenance and other contextual information, ECAS supports the automated generation of B2HANDLE PIDs to make sharing of resulting FAIR data objects a default scenario. ECAS is also highly scalable, enabling deployment in Dockerized scenarios, including possible future deployment on EGI resources (in that respect, the integration with IM in the FedCloud environment is ongoing). ECAS has been recently integrated with OneData to address federated data storage scenarios with data sharing, caching and replication, over distributed environments.
ECAS is set up as a long-term sustainable service, based on components already in mature form and experiencing regular updates. Notably, the latest version of ECAS supports the most recent Ophidia release (1.5.0). The integration activities of the ECAS service are part of the EOSC-hub project. ECAS is currently offered by DKRZ and CMCC and free to use for the EOSC user community. Further information is available on the EOSC market place:
VECMA (Verified Exascale Computing for Multiscale Applications, https://www.vecma.eu/) is a three-year (2018 – 2021) EU project funded by the H2020 FETHPC - Transition to Exascale Computing program participated by nine partners from UK and other countries in Europe. Supercomputers run powerful simulations that model some of the greatest challenges scientists are facing today, for example from achieving nuclear fusion to finding safer drugs and predicting future climate change. The power of accurate prediction requires multiscale calculation of vast amounts of data. But to confirm reliability so researchers and application developers have confidence in their results, computer simulations have to both correctly represent processes and accurately quantify the uncertainties associated with their calculations. VECMA is driven by these challenges and the need of multiscale and multiphysics simulations from research community and industry in most major scientific disciplines to run on current multi-petascale computers and emerging exascale environments with high fidelity such that their output is "actionable". In scientific terms, the calculations and simulations are certifiable as validated (V), verified (V), and equipped with uncertainty quantification (UQ) by tight error bars such that they may be relied upon for making important decisions in all the domains of concern. The central deliverable will be an open source toolkit for multiscale VVUQ based on generic multiscale VV and UQ primitives, to be released in stages over the lifetime of this project, fully tested and evaluated in emerging exascale environments, actively promoted over the lifetime of this project, and made widely available in European HPC centres.
OpenRiskNet (https://openrisknet.org/) is a 3-year project funded by the EU within Horizon 2020 EINFRA-22-2016 Programme, with the main objective to develop an open e-infrastructure providing resources and services to a variety of industries requiring risk assessment (e.g. chemicals, cosmetic ingredients, drugs or nanotechnologies). OpenRiskNet is working with a network of partners, organised within an Associated Partners Programme, aiming to strengthen the working ties between the OpenRiskNet members and other organisations developing relevant solutions or tools within the scientific community.
The infrastructure is built on virtual research environments (VREs), which can be deployed to workstations as well as public and in-house cloud infrastructures. Services providing data, data analysis, modelling and simulation tools for risk assessment are integrated into the e-infrastructure and can be combined into workflows using harmonised and interoperable application programming interfaces (APIs) (https://openrisknet.org/e-infrastructure/services/). For complete risk assessment and safe-by-design studies, data and tools from different areas have to be available, thus the OpenRiskNet e-infrastructure functionality is defined by a variety of incorporated services demonstrated within a set of case studies. The case studies present real-world settings such as data curation, systems biology approaches for grouping compounds, read-across applications using chemical and biological similarity, and identification of areas of concern based only on alternative methods (non-animal testing) approaches.
OpenRiskNet resources (training materials, publications, reports, webinar recordings, etc.) are publicly available in the project's library (https://openrisknet.org/library/). Also, OpenRiskNet is listed in EOSC and eInfraCentral catalogues, and is sharing its resources in OpenAIRE (via Zenodo), TeSS (ELIXIR's Training Portal), EU NanoSafety Cluster, e-IRG Knowledge Base and other scientific communities.