The complex analysis of big amounts of data
Over the last few years, many fields of science have been confronted with tremendous amounts of data that are growing rapidly. Large-scale experiments are carried out as scientific workflows on powerful server infrastructures, combining data-transformation and analytic operations. Often groups of researchers from different organizations collaborate on administrating and constantly revising and changing those workflows. Therefore, it is more difficult to comprehend what kind of changes and variations have taken place and how they might have influenced the results. Reproducibility of results is an essential aspect in this context since sustainable scientific progress is only possible if researchers can trust previously published results, which they then can use a fundament for advancing their field. However, to increase the trustworthiness and scientific value of such studies in the future, developing novel approaches that implement traceability and reproducibility is of utmost importance.
How to understand and visualize scientific workflows
The key to traceability and reproducibility lies in the collection of information about the processed data, the applied analysis steps, and their parameters over time. We refer to this bundle of information as provenance graph. Right now, it is extremely difficult to find out which changes occurring at the level of the input data, the workflow itself and its parametrization in the context of large-scale projects actually caused variations in the output using existing provenance approaches. The primary aim of the project is to develop an innovative visual forensic solution for scientific workflow provenance graphs, combining the following features:
- scalable workflow visualization methods
- change metrics for heterogeneous data
- advanced visual comparison techniques.
Traceability and reproducibility
The primary goal of the proposed research project is to implement provenance at all levels, allowing analysts to gain a deeper understanding of the workflow, changes applied to the workflow, and how they influence the results. Therefore, this project will have a positive impact on a variety of fields and disciplines.
Der Standard, 05.12.2016, "Landkarte und Kompass für den Datendschungel"