![]() |
|||
|
|
Parallel Sweeps for Particle Transport Calculations
supported by NSF, DOE (ASCI ASAP Level 2 and 3 Programs)
Mark Mathis,
Nancy Amato,
Marvin Adams
Introduction
There is a growing need to accurately simulate physical systems
whose evolutions depend on the transport of subatomic particles.
It has long been recognized that the huge computational demands of
the transport problem mean that practical solution times will be
obtained only by the efficient utilization of parallel processing.
For example, since estimates place the time devoted to particle
transport in multi-physics simulations at 50-80% of total execution time,
parallelizing deterministic particle transport calculations is an
important problem in many applications targeted by the DOE's
Accelerated Strategic Computing Initiative (ASCI). One common approach
to deterministic particle transport calculations is the discrete-ordinates
method, whose most time consuming step is the transport sweep which involves
multiple sweeps through the spatial grid, one for each direction of particle travel.
The efficient parallel implementation of the transport sweeps is the key
to parallelizing the discrete-ordinates method.
Contribution
Our key contribution to this field is a new general model that can
be used to compare the running times of transport sweeps on three-dimensional
orthogonal grids for various mappings of the grid cells to processors.
Our model, which includes machine-dependent parameters such as
computation cost and communication latency, can be used to analyze
and compare the effects of various spatial decompositions on the
running time of the transport sweep. Insight obtained from the model
yields two significant contributions to the theory of optimal transport
sweeps on orthogonal grids.
First, our model provides a theoretical basis which explains why, and
under what circumstances, the column decomposition of the current
standard KBA algorithm is superior to the `balanced' decomposition
obtained by classic domain decomposition techniques.
Second, our model enables us to identify a new decomposition, which we call
Hybrid, that proves to be almost as good as and sometimes
better than the current standard KBA method.
Transport Sweeps
During each iteration, an ordered traversal through the spatial grid is
performed for each direction of particle travel.
A sweep of the grid begins at one of the eight corners,
depending on the direction desired. On an arbitrary grid, there may be as
many unique sweep orderings as there are directions. However,
for the orthogonal grids considered in this work, there are only eight
unique sweep orderings, one for each octant of directions. That is, if all
directions in a given octant are processed together, then eight distinct
traversals of the spatial grid must be performed in each iteration. The
sweep for a given octant progresses from its starting corner across the
grid to the opposite corner following a diagonal trajectory.

The constraining factor is that every cell behind a diagonal plane must be calculated before any cell ahead of the diagonal plane can be calculated. To simultaneously process all directions, there is one sweep plane for each octant of directions. It would appear that this inherently sequential nature of the sweep would seriously inhibit any parallel implementation of the basic sweep algorithm. However, while the progress of the diagonal sweep plane is sequential, every cell inside the sweep plane is independent of all other cells in the same plane. Therefore, all cells in a sweep plane may be processed in parallel. Furthermore, the sweeps for each direction are independent and can be processed in parallel.
Spatial Decomposition
The key factor determining the parallel efficiency of the sweeps is the mapping of the spatial domain onto the processors, that is, the assignment of the spatial cells to the processors. In this thesis, we develop a model that can be used to analyze transport sweeps on regular orthogonal grids for any `regular' mapping of the grid cells to processors. While the theory is general, our analysis will concentrate on three different methods for performing this decomposition: the column decomposition of the KBA (which is the current standard), a Volumetric method which uses a `balanced' decomposition (obtained by classic domain decomposition techniques), and then a Hybrid approach that combines aspects of the first two methods. We have selected these three decompositions since they represent the two extreme cases (the KBA's columns vs. the Volumetric's balanced sub-domains) and an intermediate case (Hybrid).

Inter-processor communication will be required to move data from an `upstream' cell on one processor to an adjacent `downstream' cell on another processor, where which cells are upstream and downstream depends on the sweep direction. During a sweep, each processor will calculate one or more contiguous planes of cells in its partition before it processes the next set of cells in its partition and before the next processor can process its adjacent set of cells. It is important to note that the time to process one block of cells is highly dependent on the partitioning method being used.
The total time required by the sweep algorithm, or completion time, T,
is the collective sum of all computation and communication required on the critical path.
The computation time is the summation of the computation required for each individual step.
The communication time is the summation of all outstanding communication
required to perform the transport sweep. This assumption limits communication time to
cross-processor communication that cannot be pipelined. Combining these assumptions
gives us an expression for the completion time, T (we find it useful to normalize the equation
by dividing through by the grind time, omega).
where,
The performance of any method can be tuned using the block size parameter,
k. Specifically, we want to choose a k such that the completion time
is minimized. The optimal value of k can easily be found by
minimizing the model equation with respect to k.
Conclusion
The key contribution of this thesis is a new general model which can
be used to predict the running time of parallel sweeps on orthogonal
grids for any regular mapping of the grid cells to processors. In
particular, our model accounts for machine-dependent parameters such as
computation and communication/latency costs (assumed constant for all grid
cells) and is parameterized by the number of processors,
the dimensions of the underlying spatial transport grid, and
the dimensions of the coarse grid processor
overlay (which determines the dimensions of the sub-domain assigned to
each processor). Thus, our model can be used to analyze and compare the
effects of various spatial decompositions on the running time of the transport
sweep.
Insight obtained from the model yields the following contributions to the
theory of optimal transport sweeps on orthogonal grids.
The methods we have analyzed represent a family of algorithms for
three dimensional sweeps. Given an input grid and a parallel computer
system (i.e., number of processors and estimates of computation and
communication costs for each grid cell), one can select the best method
for the given configuration. Furthermore, one can minimize the completion
time by selecting an optimal block size. Future work on this subject will
involve correctly accounting for restarts in the execution of the transport
sweep.
Finding Strongly Connected Components in Parallel in Particle Transport Sweeps, William McLendon, Bruce Hendrickson, Steve Plimpton, Lawrence Rauchwerger, In Proc. ACM Symp. Par. Alg.
Arch. (SPAA), pp. 328-329, Crete, Greece, Jul 2001. A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations, Mark M. Mathis, Masters Thesis, Department of Computer Science, Texas A&M University, Dec 2000.






Papers
Proceedings(ps, pdf, abstract)
A General Performance Model for Parallel Sweeps on Orthogonal Grids for Particle Transport Calculations, Mark M. Mathis, Nancy M. Amato, Marvin Adams, In Proc. ACM Int. Conf.
Supercomputing (ICS), pp. 255-263, Santa Fe, NM, May 2000.
Proceedings(ps, pdf, abstract)
Dissertations and Theses
Masters Thesis(ps, pdf, abstract)
Animations
One Sweep Only 
All Distinct Sweeps 
Parasol Home
| Research
| People
| General info
| Seminars
| Resources
Parasol Lab,
301 Harvey R. Bright Bldg, 3112 TAMU, College Station, TX 77843-3112
Contact Webmaster
Phone 979.458.0722 Fax 979.458.0718
Department of Computer Science
| Dwight Look College of Engineering
| Texas A&M University
Privacy statement: Computer Science Engineering TAMU