| Algorithms & Applcations Group | Home | Research | Publications | People | Resources | News
Algorithms & Applications Group

Project Personnel:Lawrence Rauchwerger

Many parallel applications consist of multiple computational components. While the execution of some of these components or tasks depends on the completion of other tasks, others can be executed at the same time, which increases parallelism of the problem. The task scheduling problem is the problem of assigning the tasks in the system in a manner that will optimize the overall performance of the application, while assuring the correctness of the result.

The task scheduling problem can be modeled as a weighted directed acyclic graph (DAG). A vertex represents a task, and its weight the size of the task computation. An arc represents the communication among two tasks, and its weight represents the communication cost. The directed edge shows the dependency between two tasks.

#### Example Computation DAG

The primary goal of task scheduling is to schedule tasks on processors and minimize the makespan of the schedule, i.e., the completion time of the last task relative to the start time of the first task. The output of the problem is an assignment of tasks to processors.

Unfortunately, standard task scheduling is NP-complete, even for the unit task size and unit communication cost. However, various heuristic methods have been proposed that obtain suboptimal solutions (schedules) in polynomial time.

#### Wave Front Method (WFM)

The wave fronts of the graph are determined according to the level of the vertices in a breadth-first-search traversal of the DAG. The vertices in each wave front are independent from each other, and are all assigned to different processors. Following is an example of applying Wave Front Method.

#### Critical Path Merge (CPM)

A critical path in a DAG is a maximum weight root to leaf path (the path weight is the summation of all vertex and edge weights on the path). CPM computes the critical path, clusters all tasks in it, assigns them to the same processor, and removes them from the graph. This process is iterated until all tasks are scheduled.

#### Heavy Edge Merge (HEM)

Heavy Edge Merge works by iteratively clustering vertices (tasks) along edges with non-increasing weights. During an initialization stage, the edges are sorted in non-increasing order by edge weight, one task is assigned to each (virtual) processor, and the makespan of this assignment is computed. Then, all edges are processed in sorted order. For each edge, the makespan resulting from merging the tasks associated with the endpoints (perhaps clusters themselves) is computed. If the makespan increases, then the merge is not performed.

#### Dominant Sequence Clustering (DSC)

Standard Template Adaptive Parallel Library (STAPL) is a parallel version of Standard Template Library (STL) for C++.

#### STAPL components

Task Scheduling is important for the performance of applying STAPL in various applications. STAPL Scheduler is responsible of mapping tasks onto multiple processors to gain the maximum performance out of the system. The STAPL scheduler should also provide an interface for the user to have control of the scheduling process. The scheduling strategy can be internally selected, or the user can select a specific way, like wave front execution, or ultimately provide their own scheduling strategies for their best interests. The scheduling results are passed to the STAPL Executor. The Executor is responsible for the real dispatching and execution of the problem.

One of the applications of STAPL is the ASCI project. Due to the specific requirement of physical computation, we develop a scheduling package, ASCI Scheduler, which uses the STAPL scheduler in its construction and is an example of a user-defined scheduler. The following pages present a short introduction to task scheduling problems, and some algorithms used in STAPL Scheduler.

#### Scheduling Sweeps for Particle Transport Calculations

In this work, we consider the relationship between task scheduling and deterministic mesh sweeps that arise in particle-transport computations. In particular, we argue that the efficient parallelization of such computations is most accurately viewed as a generalization of the traditional task scheduling problem and not as an application for domain decomposition, as has been generally assumed in the past. In fact, as we show, the transport problem represents an interesting composite task-scheduling problem: given one set of tasks, and multiple dependence graphs for these tasks (one for each distinguishable sweep direction), find an assignment of tasks to processors that minimizes the time required to process all such graphs. Within this context, we study and propose scheduling algorithms that are suitable for the particular generalization of the scheduling problem that arises in the context of transport sweeps.

This project is an application of STAPL. Based on STAPL Scheduler, we developed ASCI Scheduler for this particular problem.

#### ASCI Scheduler

ASCI scheduler is developed for the ASCI project to optimize the Particle transport Calculations on parallel systems, specificly for the scheduling of multiple dependence computation graphs emerged from the ASCI project. The problem is classified into two categories:

For general scheduling to apply to both orthogonal and unstructured grids, we define the following functionalities of the scheduler.

#### Scheduling

In the Scheduling phase, the unique schedule to be used for all the dependence graphs is computed. The goal of the scheduling is to find the optimal mappings of tasks onto processors, such that the total computation time of the whole sweep can be minimized.

For orthogonal grids, we have KBA, Hybrid, and Volumetric methods. There are also options for the users to define their own processor mappings. The theoretical analysis of these algorithms can be found here.

For unstructured grids, we use the heuristics implemented in STAPL Scheduler, namely wave front method, critical path method, heavy edge merge, and dominant sequence clustering, to compute the best schedule for the multiple dependence graphs.

Due to the existence of various machine topologies in supercomputing field, we adopt a hierarchical scheduling policy, such that each processor partition can be scheduled iteratively when necessary. In the following example, processor p1 can be viewed as a virtual processor which represents two physical processors p10 and p11.  The hierarchical scheduling provides flexibility in mapping to different machine topologies.

#### Cellset Aggregation

To further optimize the computation, the scheduled computation graph is further clustered into a graph on cellsets. Each node in the graph contains a set of cells, which composes a computation block before each communication. The purpose of aggregation is to properly overlap the computation and communication, and balance the communication overhead with the cost of data transferring. The optimal cellset size is machine dependent, and can be computed based on machine parameters. There are theoretical analysis results for orthogonal cases for the optimal cellset size (description here). To get the optimal values for unstructured grids, we can use simulation results.

#### Graph Transformation

Based on the sweep angles, the aggregated cellset graph is transformed into different directed cellset graph in this phase. The transformation is based on the angle between the face normal of a pair of cellsets and the sweep angle. If the angle is within 90 degree, there is a directed edge between the cellsets, otherwise, there is no dependency between them.

#### Cell Ordering

Cell Ordering is to find out the execution order of the cells in a cellset for different sweep angles. The basic ordering uses a topological sort on the directed graph between cells.

Here is an implementation flow chart to show the major steps in the ASCI Scheduler implementation.

Load Balancing Techniques for Scalable Parallelization of Sampling-Based Motion Planning Algorithms, Adam Fidel, Sam Ade Jacobs, Shishir Sharma, Lawrence Rauchwerger, Nancy M. Amato, Technical Report, TR13-002 , Parasol Laboratory, Department of Computer Science, Texas A&M University, Mar 2013.
Technical Report(pdf, abstract)

Task Scheduling and Parallel Mesh-Sweeps in Transport Computations, Nancy M. Amato, Ping An, Technical Report, TR00-009, Department of Computer Science and Engineering, Texas A&M University, Jan 2000.
Technical Report(ps, pdf)

Supported by NSF, DOE

Project Alumni:Ping An,Sam Ade Jacobs,Olga Pearce,Shishir Sharma

 Parasol Home | Research | People | General info | Seminars | Resources   Parasol Laboratory, 425 Harvey R. Bright Bldg, 3112 TAMU, College Station, TX 77843-3112  parasol-admin@cse.tamu.edu      Phone 979.458.0722     Fax 979.458.0718  Department of Computer Science and Engineering | Dwight Look College of Engineering | Texas A&M University     Privacy statement: Computer Science and Engineering Engineering TAMU Web Accessibility Policy and Law - Web Accessibility and Usability Standards  -   Contact Webmaster