A study case was set up to identify existing services and solutions from EGI and EUDAT that could address the data pre-processing, post-processing, publishing needs of these two ESFRI projects. The outcome of the pilot is expected to be directly applicable to EISCAT_3D, and indirectly by other ESFRIs of ENVRI. In cooperation with EISCAT-3D representatives in ENVRI, EGI.eu will try to find best suitable solutions for data pre-processing of primary data and post-processing toward publishing.
The design of the next generation incoherent scatter radar system, EISCAT_3D, opens up opportunities for physicists to explore many new research fields. On the other hand, it also introduces significant challenges in handling large-scale experimental data which will be massively generated at great speeds and volumes. This challenge is typically referred to as a big data problem and requires solutions from beyond the capabilities of conventional database technologies. To identify existing services and new services that can tackle the EISCAT_3D big data challenge, a collaboration has been formed in February 2013 among EISCAT_3D, EGI and the EUDAT infrastructures under the ENVRI project.
Phase 1 Proof of concept architecture draft
A 'Towards a Big Data Strategy for EISCAT-3D' document is emerging from the collaboration and it outlines a project that would take the first steps towards defining the EISCAT_3D big data strategy.
- 'Towards a Big Data Strategy for EISCAT-3D' presentation (pdf) during 16th EISCAT International Symposium 2013
Phase 2 Requirements gathering
Following questionnaires has been used to collect requirements from EISCAT data managers and scientists:
For scientists: https://www.surveymonkey.com/s/ENVRI-EISCAT_Scientists
For data managers: https://www.surveymonkey.com/s/ENVRI-EISCAT_Data_managers
Phase 3 Prototype system based on technologies and resources of EGI and EUDAT
Development and pilot deployment of OSGC - OpenSource Geospatial Catalogue
Presentation: EGI OpenSearch Catalogue Appliances for EISCAT 3D
OSGC is an Open Source implementation of an OpenSearch GeoSpatial Catalogue compliant to OGC 10-32r3 specification, developed by EGI.eu (http://www.egi.eu/) under the ENVRI (http://envri.eu/) project.
OSGC provides a catalogue engine built on top of a PostgreSQL+Postgis database, which exposes a cusmizable OpenSearch interface. Most of the application configuration can be set from the Admin web interface, while Data Administrators have a separated Dropbox interface, which ease the management of the catalog and the data storage, and a Data Gateway interface, which controls access to data and produces data access statistics.
- OpenSearch catalogue engine with customizable output formats, products metadata, query schema, input formats (for ingestion).
- Web admin interface (to offer the catalog as a Platform-As-A-Service on the Cloud).
- Dropbox, to automatically extract metadata, register it into the catalogue and optionally push the data file into Cloud or other connected storage.
- Data Gateway interface, to control access to data, produce data access statistics and bridge non-http protocols
- OpenSearch web client interface, with the possibility to execute it remotely or as a standalone application (for integration into Cloud Virtual Laboratories PaaS services) and cumulative download (with shop-chart functionality).