In Cloud computing, both the public and private sectors are already offering Cloud resources as IaaS (Infrastructure as a Service). However, there are numerous areas of interest to scientific communities where Cloud Computing uptake is currently lacking, especially at the PaaS (Platform as a Service) and SaaS (Software as a Service) levels. In this context, INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation), a project funded under the Horizon 2020 framework program of the European Union, aims at developing a data & computing platform targeted at scientific communities, deployable on multiple hardware, and provisioned over hybrid e-Infrastructures.
This platform features contributions from leading European distributed resource providers, developers, and users from various Virtual Research Communities (VRCs). It is based on open source solutions addressing scientific challenges in the Grid, Cloud and HPC/local infrastructures and, in the case of Cloud platforms, providing PaaS and SaaS solutions. SaaS solutions are exposed to end user through Science Gateways, mobile appliances, and APIs to be integrated in desktop applications. INDIGO adopts the Future Gateway (FG) framework as both the presentation layer and the API server for the end user applications. The FG is a standard-based solution that, by exploiting well consolidated standards like OCCI, SAGA, SAML, TOSCA, etc., is capable to target any distributed computing infrastructure, while providing a solution for mobile appliances as well. If the present contribution will be accepted, we will demonstrate “live” a few use cases selected by the project from the final users’ perspective. They are briefly explained in the following.
1) Climate change: the case study on Climate models intercomparison data analysis relates to the climate change domain and community (European Network for Earth System modelling - ENES). It is directly connected to the Coupled Model Intercomparison Project (CMIP), one of the most internationally relevant and large climate experiment as well as to the Earth System Grid Federation (ESGF) infrastructure in terms of existing eco-system and services. In the last three years, ESGF has been serving the Coupled Model Intercomparison Project Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the IPCC AR5. The test case focuses on a subset of this global data archive and proposes a common approach to perform three different scientific data analysis classes: (i) trend analysis, (ii) anomalies analysis, and (iii) climate change signal analysis. The first one will be specifically addressed by the demo. The test case demonstrates the INDIGO capabilities in terms of software framework deployed on heterogeneous infrastructures (e.g., HPC clusters and cloud environments), as well as workflow support to run distributed, parallel data analyses. While general-purpose WfMSs (in this case Kepler WfMS) are exploited in this use case to orchestrate multi-site tasks, the Ophidia framework is adopted at the single-site level to run scientific data analytics workflows consisting of tens/hundreds of data processing, analysis, and visualization operators. The demonstration will highlight: (i) the interoperability with the already existing community-based software ecosystem and infrastructure (IS-ENES/ESFG); (ii) the adoption of workflow management system solutions (both coarse and fine grained) for large-scale climate data analysis (e.g. Ophidia, Kepler); (iii) the exploitation of Cloud technologies/solutions from the INDIGO PaaS offering easy-to-deploy, flexible, isolated and dynamic big data analysis solutions; and (iv) the provisioning of interfaces, toolkits and libraries to develop high-level interfaces/applications integrated in a Science Gateway. With regard to the last point, the demo will show how the results of the experiments will be easily made available to the end user for inspection, download, and visualization. To this end, the user interface will provide specific/advanced support for data analytics and visualization.
2) Molecular Dynamics of proteins: the three-dimensional (3D) structure of biological macromolecules consists of a set of (x,y,z) coordinates for each atom of the molecule under investigation. The INSTRUCT ESFRI provides access to high specification, specialist equipment for the experimental determination of such coordinates. However, the 3D structure of any molecule is not completely rigid, but fluctuates over time due to the kinetic energy available at room temperature. Such flexibility is often directly relevant to the physiological function performed by proteins and nucleic acids in the cell. Although there are experiments that can provide information on the extent and time scales of macromolecular motions, computer simulation (Molecular Dynamics, MD) is the only technique that provides a full atomistic view of motions throughout all regions of the macromolecule. The present demo will highlight the use of the exploitation of Cloud technologies/solutions from the INDIGO PaaS to perform MD simulations using protocolized methods in VMs and the use of web interfaces to set up and analyze such simulations.
About the Demo Authors
Giovanni Aloisio is Full professor of Information Processing Systems at the Dept. of Innovation Engineering of the University of Salento, Lecce, Italy, where he leads the HPC laboratory. Former director of the “Advanced Scientific Computing” (ASC) Division at the Euro-Mediterranean Center on Climate Change (CMCC), he is now a member of the CMCC Strategic Council and Director of the CMCC Supercomputing Center. His expertise concerns high performance computing, grid & cloud computing and distributed data management. He has been involved into several EU grid projects such as GridLab, EGEE, IS-ENES1. He has been responsible for ENES (European Network for Earth System Modelling) in the EU-FP7 EESI (European Exascale Software Initiative) project, chairing the Working Group on Weather, Climate and solid Earth Sciences (WCES). He has also contributed to the IESP (International Exascale Software Project) exascale roadmap. He has been the chair of the European panel of experts on WCES that has contributed to the PRACE strategic document "The Scientific Case for HPC in Europe 2015-2020". Presently, he is coordinating CMCC activities into several EU FP7 projects such as EUBrazilCC, IS-ENES2, CLIP-C and the G8 ExArch. As CMCC, he is also the coordinator of the OFIDIA (Operational FIre Danger preventIon plAtform) project, in the context of the European Territorial Cooperation Program Greece-Italy 2007-2013. He is responsible for the University of Salento (as PRACE Third Party) in the EU-FP7 EESI2 project, chairing the WCES Working Group. He is a member of the ENES HPC Task Force. He is the author of more than 100 papers in referred journals on high performance computing, grid & cloud computing and distributed data management.
Roberto Barbera was born in Catania (Italy) in October 1963. He graduated in Physics "cum laude" at the University of Catania in 1986 and since 1990 he holds a Ph.D. in Physics from the same University. Since 2005 he is Associate Professor of Experimental Physics at the Department of Physics and Astronomy of the Catania University and at the beginning of 2014 he got the National Scientific Qualification to act as Full Professor of Experimental Physics of Fundamental Interactions. Since his graduation his main research activity has been done in the domains of Experimental Nuclear and Particle Physics. He has been involved in many experiments in France, Russia, Sweden and United States to study nuclear matter properties in heavy ion collisions at intermediate energies. He is author of several book chapters, more than 250 scientific papers published on international journals, and more than 400 proceedings of international conferences (see his Google Scholar profile at: http://scholar.google.com/citations?hl=en&user=W5helEUAAAAJ). He is editor of the International Journal of Distributed Systems and Technologies and referee of Journal of Grid Computing, Future Generation Computer Systems, and BMC Medical Informatics. He is also a consultant of the European Commission and a reviewer of the European Science Foundation as well as of Ministries of Science and Technology of various countries in the world.
Since 1997 he has been involved in CERN experiments and he is one of the physicists involved in the ALICE Experiment at LHC. Within ALICE he’s been the coordinator of the off-line software of the Inner Tracking System and member of the ALICE Off-line Board. Since late 1999 he is interested in Distributed Scientific Computing. He’s been member of the Technical Committee of TERENA (the Trans-European Research and Education Networking Association, www.terena.org), of the Executive Committee of the Italian Grid Infrastructure (the Italian National Grid Initiative, www.italiangrid.it) and of the Scientific & Technical Committee of Consortium GARR (the Italian National Research and Education Network, www.garr.it). At European level, he has been/he is involved with managerial duties in many FP6, FP7 and H2020 EU funded projects (agINFRA, CHAIN, CHAIN-REDS, DCH-RP, DECIDE, EarthServer, EELA, EELA-2, EGEE, EGEE-II, EGEE-III, EGI-Engage, EGI-InSpire, eI4Africa, EPIKH, EUChinaGRID, EUMEDGRID, EUMEDGRID-Support, GISELA, ICEAGE, INDICATE, INDIGO-DataCloud, etc.) in Europe, Asia, Africa and Latin America and he’s currently the Technical Coordinator of the Sci-GaIA project (www.sci-gaia.eu). Since 2004 he coordinates the international GILDA t-Infrastructure he created for training and dissemination (http://gilda.ct.infn.it) and that has been used in more than 500 training events in more than 60 countries worldwide. Since 2010 he oversees the design and the development of the Catania Science Gateway Framework (www.catania-science-gateways.it). He is also the manager of the GrIDP Identity Federation (http://gridp.garr.it) and he is strongly involved in the establishment of Certificate Authorities, Identity Federations and Open Access Digital Repositories for Open Science in various regions of the world. More information are available on his ORCID profile at http://orcid.org/0000-0001-5971-6415.
Riccardo Bruno was born in Catania the 3rd December 1969 and graduated in computer science in May 1999. He started to work as researcher for the university of Catania from April 1999. In April 2000 he joined a billing system software provider for mobile companies as Software Engineer and System Integration, operating in the EMEA countries. From January 2006 he joined the INFN collaborating with several European financed projects aiding Scientists and Researchers to exploit distributed computing infrastructures such as Grid, Cloud and HPC. In particular he promoted the adoption for the European Grid infrastructure in the Mediterranean and Latin American countries. He also supported several scientific applications to run on Grid/Cloud and HPC as well as install and maintain their services. Actually he is collaborating with the INDIGO-Datacloud European project aiming to establish a PAAS platform dedicated to science.
Marco Fargetta was born in Catania (Italy) in December 1976. He graduated in Computer Engineering at the University of Catania in 2002. In 2007 he completed a Ph.D. in Computer Engineering from the same University (a part of the Ph.D. study was done at the University of Manchester, UK) with a thesis titled "A Model for Automatically Supporting Advanced Reservation, Allocation and Pricing in a Grid Environment". His research activity started with the Ph.D. in 2003 and initially was focused on advanced scheduling of Grid jobs. After the Ph.D. his research has focused on authentication and authorisation aspects and user interfaces of distributed computing environments, including grid and clouds. Since 2007 he has been involved in different projects at national and international levels. These include:
- TriGrid VL project (http://www.trigrid.it), funded by the Sicilian Regional Government, as member of the University of Catania, Department of Mathematics;
- ICEAGE project with the Department of Physics and Astronomy, funded by the European Union;
- PI2S2 and DECIDE projects, funded by the regional government, with Consorzio COMETA (an organization owned by all University and research centre in Sicily);
- EUAsiaGrid, RECAS, PRISMA and INDIGO-DataCloud (currently involved) projects, funded by the Italian PONREC activity with INFN.
The activity performed in the context of the Ph.D. and the following projects is documented by more than 20 publications in international conferences, journals and books.
Additionally he is the technical manager of the GrIDP Identity Federation (http://gridp.garr.it), a production ready federation managed by INFN and GARR which aims at federating services.
Currently he works at INFN on a new framework aimed at simplify access and use of distributed infrastructures.
Sandro Fiore, Ph.D., is the Director of the Advanced Scientific Computing (ASC) Division of the Euro-Mediterranean Centre on Climate Change. His research activities focus on parallel, distributed, grid and cloud computing, in particular on distributed data management, data analytics/mining and high performance database management. He is Visiting Scientist at Lawrence Livermore National Laboratory (LLNL) working at PCMDI in the context of the Earth System Grid Federation (ESGF). Since 2004, he has been involved into several national and international projects like: EGEE (the 3 cycles), EGI-InSPIRE, IS-ENES1 and IS-ENES2, EUBRAZILCC, ExArch, ORIENTGATE, TESSA, OFIDIA, CLIP-C, INDIGO-DATACLOUD working on data management topics. He is the Principal Investigator of the Ophidia project, a research project on high performance data analytics and mining for eScience (http://ophidia.cmcc.it). He is author and co-author of more than 50 papers in refereed books/journals/proceedings on distributed and grid computing and holds a patent on data management topics. He is editor of the book “Grid and Cloud Database Management” (Springer, 2011). He is ACM Member.
Andrea Giachetti (born in 1971) is an expert in the development of scientific software tools and web interfaces (https://it.linkedin.com/pub/andrea-giachetti/31/41a/527). He received an International Ph.D. in Structural Biology jointly by the Universities of Florence, Frankfurt and Utrecht. Andrea Giachetti has a significant expertise on the management of authentication and secure access over infrastructures for distributed computing. He has been responsible for the development of most of the services provided by the CIRMMP unit in the e-NMR and WeNMR e-infrastructure projects, since 2009. He currently manages the grid site at CIRMMP. He has coauthored about 15 publications in international journals.
Emidio Giorgio was born in Enna, Italy, in 1978. He graduated in Computer Science in 2003 with a research work on automated recognition of patterns. Soon later Emidio started his work at INFN, being involved in several distributed computed projects, prevalently funded by European Commission in FP6/7/H2020 contexts. Among them:
- EGEE I/II/III : The EGEE projects series was funded in FP6 to establish a common distributed e-Infrastructure, based on Grids and shared among main european research areas. He was involved in training and education tasks.
- ICEAGE : This project was funded in FP6 to extend and advance Grid Education within Universities and Academic course and International Schools. Emidio was leader for Training Infrastructure provision
- EMI : The European Middleware Initiative was funded within FP7 to harmonize and evolve the several middleware developed in the past grid projects. Emidio was Work Package leader for NA2, Dissemination, Training and Collaboration
- IGI : IGI is a JRU among Italian research institutes to establish an Italian distributed computing infrastructure. Emidio led training and education activities.
- PRISMA is a project funded by Italian government to develop a vertical cloud platform (Iaas+Paas) able to support local public administration offices.
- INDIGO is a EC project aiming to develop a sustainable, PaaS based cloud platform for eScience.
This activity is documented with more than 20 publications in journal and international conferences. He currently manages three Grid sites and one cloud site, all of which are part of to EGI production infrastructures. Other interests currently range from deployment of Cloud infrastructure, infrastructure monitoring and software defined networking.
Michal Konrad Owsiak has graduated in Computer Science with research in Artificial Intelligence and Parallel Computations. He is a computer scientist with 15 years of experience in both commercial and scientific areas. Skilled Java and C/C++ developer with knowledge of Objective-C. Apart from experience gained in real life projects, he also acquired certification in Java within following areas: Sun Certified Java Programmer, Sun Certified Web Component Developer, Sun Certified Business Component Developer. During numerous projects he has gained knowledge of various scripting platforms: Python, Groovy, bash, tcsh. Currently, he works for PSNC as System Architect - focused on DevOps based approach for software development. During his work in PSNC he has taken part in numerous, EU-funded, projects: int.eu.grid, EUFORIA, PL-GRID, EFDA ITM-TF ISIP, EFDA ITM-TF CPT, EUROfusion WPISA-CPT. While participating in GRID and HPC related projects he gained skills in GRID and HPC related development. Familiar with MPI. Experienced and fluent with low level coding (C/JNI). Fan of bringing debugging tasks into extremes - mostly by applying numerous debugging techniques and code tracing patterns. Participating in number of projects, where physics codes were involved, let him gain practical knowledge of Fortran (GFortran/G95/Intel Fortran Compiler) and Python languages. Whenever XML is involved in the process, he can easily jump in as he is familiar with XML principles and common XML tools (xsltproc, xmllint, Oxygen, just to mention few). Official contributor of the Kepler workflow system, co-author of the Serpens suite. Conducted ITM/EUROfusion related trainings since 2009. Fluent spoken English, "can do" attitude, enthusiast of agile development.
Marcin Plociennik is working at Poznan Supercomputing and Networking Center, he is head of IoT Department. Many years’ experience concerning software engineering, users’ support gained through working in a number of projects focused mainly on researches concerning distributed computing, scientific workflows and Internet of Things (EU: CrossGrid, BalticGrid I/II, EGEE, Euforia, int.eu.grid, DORII(deputy project coordinator),EFDA ITM ISIP, EUROFusion WPISA CPT, EGI_Inspire, SymbIoTe, EoCOE national projects: Future ICT, PLGrid/PLGrid+/PLGrid NG). He is leading the teams of the scientific workflows, mobile apps and IoT, he was leading also OGF research and standardization bodies(RISGE-WG, ARI-WG). He is official Kepler contributor, co-author of the Serpens suite. He is Deputy Core Programming Team Coordinator of WPISA EUROFusion consortium. He is INDIGO DataCloud WP6 (Science Gateways, Workflows and Toolkits) work package leader.
Antonio Rosato ( born in 1971) graduated in Chemistry in 1995 (110/110 cum laude), and received his PhD in Chemistry in 1998 at the University of Florence. Ricercatore at the University of Florence from 1999 to 2002, Associate Professor at the University of Florence, Faculty of Sciences, since 2002. He has received the "Premio Nazionale Federchimica" in 1996, the Prize "Sapio NMR Junior" in 2001 and the Prizes "Gastone De Santis" and "Raffaello Nasini", both from the Italian Chemical Society, in 2005 and 2009 respectively. Metalloproteins are the main focus of his research activities. Antonio Rosato has developed innovative methodologies for the study via NMR of the solution structure of paramagnetic metalloproteins, and for the investigation of the determinants of the thermodynamic stability of the fold of these systems through the combination of various biophysical methods. He is actively working on the implementation of bioinformatic research on metalloproteins. Together with his coworkers, he has developed innovative protocols and tools for their identification in genome sequences, for the prediction of their 3D structure and of their interactions of biological relevance, and for their comparative structural analysis. Since 2009, he has chaired the Comparative Assessment of Structure Determination by NMR (CASD-NMR) international initiative to demonstrate the reliability and foster the adoption of automated methods for NMR structure determination of proteins. CASD-NMR involved all prominent labs world-wide developing software tools in this field. He has been involved in the development of grid-based electronic infrastructures for Structural Biology, starting with the e-NMR project in 2009. At present he is involved in the MoBrain competence center of EGI and in the West-Life e-Infrastructure for Structural Biology, besides other activities. Antonio Rosato is the author of about 100 articles on scientific journals of international renown and book chapters. He has contributed to the determination of the structure in solution of around thirty metalloproteins.
Dean N. Williams has been a leading researcher in computational science at Lawrence Livermore National Laboratory (LLNL) since 1987. He has unique experience in distributed computing and networking technologies and practical application in the areas of climate change, biology, and other large-scale scientific data projects. For three decades, Mr. Williams has been the Chair or Principal Investigator (PI) for several large DOE projects related to “Big Data” initiatives, including the Earth System Grid Federation, the Ultrascale Visualization Climate Data Analysis Tools, and the Climate 100 Advanced Networking Initiative. These combined sofware investigations are essential to the national and international climate communities and share in the 2007 Nobel Prize-winning Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4). Mr. Williams serves on many national and international advisory boards and is the Program Lead for LLNL’s Analytics and Informatics Management Systems (AIMS).