CPSC 626: Parallel Algorithm Design and Analysis
Programming Assignment #2
Fall 2008
Due: Thursday October 16, 2008 in class
General Guidelines for Programming Assignments
-
Programs should be written using good programming style
and should be well documented.
-
Code will be turned in electronically using the
turnin program on CSNET.
-
Everyone must turn in their own code. You may work with others (this is
encouraged), but you must reference your sources.
-
Your report must be submitted in hardcopy, with a completed and signed
cover sheet available on the course homepage.
Any assignment turned in without a fully completed and signed coverpage
will receive ZERO POINTS.
The objective of this programming assignment is for you to analyze and
compare the performance of the MPI and OpenMP prefix sums programs you
wrote in the first programming assignment. You will also compare them
with a sequential program for computing the prefix sums
that you will write.
Coding Portion.
You should implement a sequential algorithm for computing the
prefix sums. This algorithm CANNOT be one of your parallel
algorithms using one processor. It should be an algorithm that
does not have any of the overhead of parallelism.
To assist you in collecting you experimental results, we have
prepared some sample scripts for hydra (OpenMP and MPI).
You can find them below, and also a link to the SC help page
for batch jobs on hydra.
Report.
You should prepare and submit a brief report
that includes theoretical analysis,
a description of your experiments, and discussion of your results.
Your report should be prepared on a word processor and should be
of the quality that you could submit to a technical workshop or
conference.
You must use some electronic tool (e.g., matlab, gnuplot, excel)
to create the plot - handwritten plots will NOT be accepted.
At a minimum, your report should include the following sections:
- Introduction.
In this section, you should describe the objective of this assignment.
-
Theoretical analysis.
In this section, you should
- Describe your algorithms (sequential, OpenMP, MPI) for this problem
using pseudo code and words.
- Provide an analysis of the complexity (time and work) of your
algorithms using asympotitic notation.
-
Experimental Setup.
In this section, you should
provide a description of your experiment setup, which includes but
is not limited to
- Machine specification.
- What experiments did you run and why? Explain what you
believe you will learn from each experiment.
- How did you generate the test inputs? What input sizes did you test? Why?
- What is your timing mechanism?
- How many times did you repeat each experiment?
-
Experimental Results.
In this section, you should
compare the performance (running time) of the experiments
to their theoretical complexity.
-
Your report should contain several types of graphs.
Here are some types of plots you may want to use to
address the questions below:
- Speedup (y-axis) vs. #processors (x-axis):
Select a reasonable data size and show the speedups for
both versions of the parallel program for a range
of processors (e.g., p=1,2,4,8,16, ..).
- Time (y-axis) vs. #elements (x-axis):
Showing the running times of the sequential program (not a
parallel program using one processor), and the 2 versions of the
parallel programs for different data sizes and a fixed number of
processors.
You may want a few graphs for different numbers of processors.
- Time/expected performance (y-axis) vs. #elements (x-axis):
This will allow you compare your observed performance with your
theoretically expected peformance (why?) and to determine the
constants hidden in the asymptotic analysis (how?).
You may want a few graphs for different numbers of processors (why?).
-
Analyze and discuss the performance of your algorithms.
Using the timing and speedup graphs, discuss the efficiency and
scalability of the parallel algorithms.
Include analysis and discussions of both strong and weak scaling.
-
Using the timing graphs, decide if your measured performance matches
your predicted asymptotic performance.
Experimentally determine the constants hidden in your asymptotic
analysis from the plots, and determine the values of n (data size)
for which the asymptotic analysis holds.
These values should be determined from the plots (you may wish
to plot some additional functions to assist with this).
Provide a discussion of your results, which includes but is not limited to:
-
To what extent does theoretical analysis agree with the experimental
results? Attempt to understand and explain any discrepancies you note.
-
Report on the constants you determined for the asymptotic analysis,
and the values of n for which you found the asymptotic analysis holds.
Explain how you determined these values, and what you can infer from
them about the behavior of the programs.
-
Your report should also attempt to interpret the results.
Be sure to try to explain any differences from the theory
and any differences among the algorithms.
Which implementation is most efficient?
Why do you think one is more or less efficient than the others?
Etc. What would you do differently next time?
- Conclusion.
In this section you should provide some summary discussion
comparing the two parallel algorithms and offering advice
as to when each of them should be used. Here it would be
relevant to include factors such as each of implementation
and portability in your discussion.