Buch


Reactive Load Balancing and Resilience Techniques in Simulation Applications on Supercomputers

Reactive Load Balancing and Resilience Techniques in Simulation Applications on Supercomputers

Philipp Johannes Samfaß

 

84,00 EUR
Nicht lieferbar



84,00 EUR
Nicht lieferbar



Produktinformation


Übersicht


Verlag : Dr. Hut
Buchreihe : Informatik
Sprache : Englisch
Erschienen : 24. 11. 2022
Seiten : 237
Einband : Gebunden
Höhe : 240 mm
Breite : 170 mm
Gewicht : 600 g
ISBN : 9783843951685
Sprache : Englisch

Du und »Reactive Load Balancing and Resilience Techniques in Simulation Applications on Supercomputers«




Produktinformation


Simulation applications running on supercomputers enable important scientific breakthroughs. Achieving optimal resource utilization and minimal power consumption on such systems requires effective load balancing methods. However, with growing performance variability, load balancing becomes increasingly challenging as execution times for work can no longer be predicted. Further, modern numerical algorithms are highly dynamic with respect to their computational work. Besides, the sheer scale of supercomputers makes them vulnerable to errors, which can result in process failures and silent data corruptions. Simulation applications require new techniques that render them more resilient against the increasing unpredictability and unreliability of modern hardware and software.

In this thesis, I design, implement and evaluate such techniques. They are reactive in the sense that they---in contrast to many predictive state-of-the-art load balancing or fault resilience approaches---not only predict future behavior of hardware and software, but they detect unexpected events (e.g., imbalances or errors) at runtime and react to them. All methods employ migration or even replication of tasks and sharing of their outcomes between processes and nodes for reactive resilience.

Their benefits are shown for two task-based parallel simulation applications for solving systems of hyperbolic partial differential equations on dynamically adaptive meshes. I demonstrate that the reactive methods can tackle the increasing variability of execution times on the hardware level and that they can balance unpredictable workload imbalances in modern numerics on the software level, which resulted in performance improvements in time-to-solution of up to a factor of 3.3X. Findings in the context of replication-based fault resilience indicate that reactive resilience against process failures and silent data corruptions can be achieved without the full performance price of replicated computations.

Deine Buchhandlung


Buchhandlung LeseLust
Inh. Gernod Siering

Georgenstraße 2
99817 Eisenach

03691/733822
kontakt@leselust-eisenach.de

Montag-Freitag 9-17 Uhr
Sonnabend 10-14 Uhr



Deine Buchhandlung
Buchhandlung LeseLust
Inh. Gernod Siering

Georgenstraße 2
99817 Eisenach

03691/733822
kontakt@leselust-eisenach.de

Montag-Freitag 9-17 Uhr
Sonnabend 10-14 Uhr