In many parallel scientific simulations, work is as-
signed to processors by decomposing a spatial domain consisting
of mesh cells, particles, or other elements. When work per
element changes, simulations can use dynamic load balance
algorithms to distribute work to processors evenly. Typical SPMD
simulations wait while a load balance algorithm runs on all
processors, but this algorithm can itself become a bottleneck.
We propose a novel approach based on two key observations:
(1) application state typically changes slowly in SPMD physics
simulations, so work assignments computed in the past still
produce good load balance in the future; (2) we can decouple
the load balance algorithm so that it runs concurrently with
the application and more efficiently on a smaller number of
processors. We then apply the work assignment “late”, once it
has been computed. We call this approach lazy load balancing.
In this paper, we show that the rate of change in work
distribution is slow for a Barnes-Hut benchmark and for ParaDiS,
a dislocation dynamics simulation. We implement an MPMD
framework to exploit this property to save resources by running a
load balancing algorithm at higher parallel efficiency on a smaller
number of processors. Using our framework, we explore the
trade-offs of lazy load balancing and demonstrate performance
improvements of up to 46%.