SReplay: Deterministic sub-group replay for one-sided communication

Abstract

© 2016 ACM.Replay of parallel execution is required by HPC debuggers and resilience mechanisms. Up-to-date, there is no existing deterministic replay solution for one-sided communication. The essential problem is that the readers of updated data do not have any information on which remote threads produced the updates, the conventional happens-before based ordering tracking techniques are challenging to work at scale. This paper presents SReplay, the first software tool for sub-group deterministic record and replay for one-sided communication. SReplay allows the user to specify and record the execution of a set of threads of interest, and then deterministically replays the execution of the sub-group on a local machine without starting the remaining threads. SReplay ensures sub-group determinism using a hybrid dataand order-replay technique. SReplay maintains scalability by a combination of local logging and approximative event order tracking within sub-group. Our evaluation on deterministic and nondeterministic UPC programs shows that SReplay introduces an overhead ranging from 1:3× to 29×, when running on 1,024 cores and tracking up to 16 threads.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 101,459

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Quantum mechanics and the question of determinism in science.C. O. Akpan - 2005 - Sophia: An African Journal of Philosophy 8 (1):72-79.

Analytics

Added to PP
2017-05-17

Downloads
7 (#1,644,695)

6 months
5 (#1,071,419)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references