CPSC 311-501 and CPSC 311-502: Analysis of Algorithms
Programming Assignment #1
Due: Thursday March 20, 2003 at the beginning of class
(Extended one week until Thursday March 27)
General Guidelines for Programming Assignments
-
Programs should be written in C, C++, or JAVA, using good
programming style, and should be well documented.
Code will be turned in electronically using the turnin command
(see the man page for more details).
Your programs must be able to be compiled and run on the Unix machines
in the CS department.
The 'folder names' for our class are
"311-501" and
"311-502".
-
The major thing you will turn in is a (hard copy) typed
write-up detailing any assumptions/optimizations made when coding,
and discussing the results obtained.
The objective of this assignment is to study how the theoretical analysis
of a variety of sorting algorithms compares with their actual performance.
The sorting algorithms you will study are:
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
- Radix Sort
The major emphasis of this assignment will be to analyze the
performance of the algorithms (not on coding the algorithms).
You will be given a template program which implements most of
the above algorithms. You will need to make a few modifications
to test out different strategies. Most of your time should
be spent on designing careful test cases and analyzing your
results in order to draw conclusions regarding the performance
of the various algorithms.
FYI, here is a link to some
interesting animations of sorting algorithms at work (java applets).
What to Turn In
There are two things you must turn in for this assignment:
- A HARDCOPY REPORT. It should address the points
mentioned below. It should be turned in at the beginning
of class on the assigned due date.
Please note that I do not want a hardcopy of your
code, your raw output, or a log of your program's execution.
- AN ELECTRONIC COPY OF YOUR CODE.
This will be turned in with the turnin program available on cs unix
machines. It should be submitted by the beginning of class on
the assigned due date.
If you wrote a new program to generate input, or anything
like that, you can turn it in as well (so that we can recreate your
runs). If you have several files to turnin, then please make a
directory, tar it up and submit the tarfile.
NOTE:
I do not want a hardcopy of your code, your raw output,
or a log of your program's execution.
Template Program
You will be given a template program that runs all the above
sorting algorithms. You will be required to perform minor
modifications to some of these algorithms.
The code is documented well enough for you to take it and modify
it. To compile the program, copy sort.c and Makefile
into your directory and type make (it is written in C and
uses the gcc compiler).
sort.c
Makefile
The template program has a facility for reading in input from
a file, and for creating three types of input lists:
(i) random generated elements,
(ii) elements sorted in increasing order,
(iii) elements sorted in decreasing order.
You may need/want to add additional facilities to this.
The format of the input files is:
[n] /*number of positive integers to sort*/
[element 1]
[element 2]
:
[element n]
Coding Portion of Assignment
The version of QuickSort in the template program uses the
first element in the array as the pivot element.
You will study different strategies for selecting the
pivot:
- Pivot Choice 1: The first element in the list
(used in the template program)
- Pivot Choice 2: A random element in the array.
- Pivot Choice 3: The median of the first, middle,
and last elements in the array.
You should implement Pivot Choices 2 and 3 listed above.
You will then have three different versions of QuickSort (you
should add them to the template program as additional
menu options).
NOTES:
-
First, a disclaimer. The program is not meant to be the definitive
answer and it may contain bugs. That is not my intention - just like
the lecture notes, it is meant to help you. You are not required to
use it.
-
Second, you don't have to use the Makefile if you don't want to.
For that matter you don't even have to use the program if you
don't want to. It is just provided to help you.
-
The times returned are microseconds and seconds *on my machine*.
However, the value returned is system dependent. The time
reported as microseconds is 10^6 times the time reported as
seconds. If you are unsure of the units you will have to check
out the man page for your machine. However, please note that
what is primarily of interest for you will be the *relative*
times - for that the units should not matter except to determine
the constants.
-
I don't think you will find a random number generator that
generates a number in a range you specify. You will need to do a man
on "rand," "drand", and "random" and its counterparts to see how
they work, but most of
them return a number in a specified range, say (0,1). To get a number
in the range you want, say [F,L], you would need to find a way to translate
the random number r in (0,1) to something in [F,L]. There are
a number of ways to do this. Here is just one:
r1 = rand() /* random number between 0 and 1 */
t1 = r1*(L-F) /* random number between 0 and L-F */
r2 = F + t1 /* random number between F and L */
Analysis Portion of Assignment
Do some test runs and collect the data you need to discuss the
following questions in your write-up.
-
Assume the n input elements are integers in the range [0,n-1].
For each algorithm (for Quicksort, you need only analyze the version
with pivot 1), determine what are best, average, and worst-case
inputs. Your write-up should list these for each algorithm.
Include a sentence or two of justification for each one.
Please note that this is a theoretical question.
You should answer what you expect to be true based on a theoretical
analysis (and you should not refer to experimental results).
In the subsequent questions we will compare the experimental
results to these theoretical predictions.
-
Describe your experimental setup.
What kind of machine did you use? What timing mechanism?
How many times did you repeat each experiment?
What times are reported? How did you select the inputs you would use?
Do you use the same inputs for all sorting algorithms?
-
Which of the three versions of Quick sort seems to perform the best?
Graph the average case running time as a function of input size
for the three versions.
Graph the best case running time as a function of input size
for the three versions (use the best case input you determined
for pivot 1 in the previous step).
Graph the worst case running time as a function of input size
for the three versions (use the worst case input you determined
for pivot 1 in the previous step).
For each case, your graphs should include the running times for
all three versions of Quicksort versus the input size.
-
Which of these five sorts seems to perform the best (consider the
best version of Quicksort)?
Graph the average case running time as a function of input size
for the five sorts.
Graph the best case running time as a function of input size
for the five sorts.
Graph the worst case running time as a function of input size
for the five sorts.
For each case, your graphs should include the running times for
all three versions of Quicksort versus the input size.
You will need to consider the best/worst case inputs for all
algorithms.
-
For the comparison sorts, is the number of comparisons done really
a good predictor of the execution time?
In other words, is a comparison a good choice of basic operation
for analyzing these algorithms?
To answer this question you need to analyze your data to see if
the number of comparisons is correllated with the execution time.
Plot time/#comp vs. n and refer to these plots in your answer.
-
To what extent do the best, average and worst case analyses (from
class/textbook) of each sort agree with the experimental results?
To answer this question you need to find a way to compare the
experimental results for a sort with its predicted theoretical
times. For this you will need to do something somewhat rigorous.
For example, one way to compare a time to a predicted time of
O(n2) would be to
divide the times for a number of runs with different input sizes
by n2 and see if you observe a horizontal line (after some
input size n0).
That n0 would represent the n0 value for
the asymptotic analysis. The value on the y-axis (assuming
you put input size on the x-axis) would tell you the constant
value of the big-Oh.
Finally - remember you are supposed to be analyzing the time
for the experiments. I.e., it is not showing anything to show
that the number of comparisons is O(n2)
- you need to see that the time is O(n2)
to determine if the asymptotic analysis is any good.
In particular, you should:
- (a) For each sort, and for each case (best, average, and worst),
determine whether the observed experimental running time
is of the same order as predicted by the asymptotic analysis.
Your determination should be backed up by your experiments and
analysis and you must explain your reasoning.
If you found the sort didn't conform to the asymptotic analysis, you
should try to understand why and provide an explanation.
- (b) For each sort, determine the constants and n0
that are hidden by the asymptotic analysis. These can be computed
experimentally as discussed above. Again, your determination should be
backed up by your experiments and analysis and you must explain
your reasoning.
If you found the sort didn't conform to the asymptotic analysis in
the previous question, then you should try to determine what asymptotic
behavior it does exhibit and answer this question for it.
The choice of test data is up to you (i.e., for each sorting subroutine,
which input sizes should be tested, how many different inputs of the same
size, which particular inputs of a given size.)
Be smart about which experiments to run, i.e., don't run larger
or more tests than you need to answer the above questions reasonably
well.
Also, note that you will need to run your experiments several times
in order to get stable measurements (times will vary depending upon
system load, input, etc.)
Your write-up must include a coherent discussion of which
experiments you ran, how many times you ran them, etc.
Grading on this assignment will put the greatest weight on the
choice of test data and the quality and
insightfulness of your discussion of your results.
Don't be put off too much if there are some discrepancies between the
theoretical results and the experiments.
If that happens, try to explain why it occurred.