Runtime Data Analytics in Complex Fluid Flow Simulation


Complex high-fidelity fluid flow simulations in high performance computers are still costly. They often involve the selection of many computational parameters and options. The set-up of such parameters can be a cumbersome task and there is no guarantee that they will lead to a successful simulation. Usually, this is a trial-and-error process even for experienced users. Tracking at runtime some quantities of interest from output files is the regular procedure and, whenever possible, computations are halted, using checkpoint/restart procedures to resume with a new set of parameters or resubmitting the job to the queue. The typical simulation workflow involves the following steps: (i) preprocessing and mesh generation; (ii) time stepping, saving data on disk when required; and (iii) post-processing, typically visualizing the data generated by the simulation and extracting relevant information on the quantities of interest. For large-scale problems, this workflow involves saving a considerable amount of raw data in persistent storage. In-situ visualization techniques circumvent the storage bottleneck, removing the necessity of transferring data to persistent storage before visualization. In this talk, we will show how to extend in-situ visualization techniques with in-transit data analysis to provide information to help control the simulations at runtime. Often, by only observing a region of interest, an experienced analyst can infer that something is not going well in the simulation, deciding to stop it or change parameters. Preferably, to optimize resource use, these actions should be at runtime. However, to do that, visual information should be complemented with information regarding the evolution of quantities of interest, residual norms, number of linear and nonlinear iterations, often within a specific time window, not just the current values. To obtain this complementary information, often it is necessary to write specific code to identify the files related to the time window of interest, opening and parsing them to obtain specific values and tracking their evolution. Of particular interest here are turbidity currents, underflows responsible for sediment deposits that generate that host possible hydrocarbon reservoirs. The mathematical model for turbidity currents involves solving coupled high Reynolds number incompressible fluid flow and transport. We use for the simulations the libMesh library, which provides a platform for parallel, adaptive, multiphysics finite element computations. libMesh supports adaptive mesh refinement and coarsening (AMR/C) on general unstructured meshes with a variety of error estimators. We discuss the integration of libMesh with in-situ visualization and in-transit data analysis tools. We present a parallel performance analysis for turbidity currents simulations showing that the overhead for both in-situ visualization and in-transit data analysis is negligible. Our tools enable monitoring the sediments appearance at runtime and steer the simulation based on the solver convergence or other data analyses and visual information on the sediment deposits, thus enhancing the analytical power of turbidity currents simulations. The data analysis tool registers the provenance of the simulation data for reproducibility, including registering the runtime changes on simulation parameters.