CPSC 311-501 and CPSC 311-502: Analysis of Algorithms
Programming Assignment #1

Due: Thursday March 20, 2003 at the beginning of class (Extended one week until Thursday March 27)


General Guidelines for Programming Assignments


The objective of this assignment is to study how the theoretical analysis of a variety of sorting algorithms compares with their actual performance. The sorting algorithms you will study are:

The major emphasis of this assignment will be to analyze the performance of the algorithms (not on coding the algorithms). You will be given a template program which implements most of the above algorithms. You will need to make a few modifications to test out different strategies. Most of your time should be spent on designing careful test cases and analyzing your results in order to draw conclusions regarding the performance of the various algorithms.

FYI, here is a link to some interesting animations of sorting algorithms at work (java applets).

What to Turn In

There are two things you must turn in for this assignment: NOTE: I do not want a hardcopy of your code, your raw output, or a log of your program's execution.

Template Program

You will be given a template program that runs all the above sorting algorithms. You will be required to perform minor modifications to some of these algorithms. The code is documented well enough for you to take it and modify it. To compile the program, copy sort.c and Makefile into your directory and type make (it is written in C and uses the gcc compiler).

sort.c   Makefile

The template program has a facility for reading in input from a file, and for creating three types of input lists: (i) random generated elements, (ii) elements sorted in increasing order, (iii) elements sorted in decreasing order. You may need/want to add additional facilities to this.

The format of the input files is:

[n] /*number of positive integers to sort*/ 
[element 1] 
[element 2]
: 
[element n]

Coding Portion of Assignment

The version of QuickSort in the template program uses the first element in the array as the pivot element. You will study different strategies for selecting the pivot:

You should implement Pivot Choices 2 and 3 listed above. You will then have three different versions of QuickSort (you should add them to the template program as additional menu options).

NOTES:

  1. First, a disclaimer. The program is not meant to be the definitive answer and it may contain bugs. That is not my intention - just like the lecture notes, it is meant to help you. You are not required to use it.

  2. Second, you don't have to use the Makefile if you don't want to. For that matter you don't even have to use the program if you don't want to. It is just provided to help you.

  3. The times returned are microseconds and seconds *on my machine*. However, the value returned is system dependent. The time reported as microseconds is 10^6 times the time reported as seconds. If you are unsure of the units you will have to check out the man page for your machine. However, please note that what is primarily of interest for you will be the *relative* times - for that the units should not matter except to determine the constants.

  4. I don't think you will find a random number generator that generates a number in a range you specify. You will need to do a man on "rand," "drand", and "random" and its counterparts to see how they work, but most of them return a number in a specified range, say (0,1). To get a number in the range you want, say [F,L], you would need to find a way to translate the random number r in (0,1) to something in [F,L]. There are a number of ways to do this. Here is just one:
    r1 = rand() /* random number between 0 and 1 */
    t1 = r1*(L-F) /* random number between 0 and L-F */
    r2 = F + t1   /* random number between F and L */
    

Analysis Portion of Assignment

Do some test runs and collect the data you need to discuss the following questions in your write-up.

  1. Assume the n input elements are integers in the range [0,n-1]. For each algorithm (for Quicksort, you need only analyze the version with pivot 1), determine what are best, average, and worst-case inputs. Your write-up should list these for each algorithm. Include a sentence or two of justification for each one. Please note that this is a theoretical question. You should answer what you expect to be true based on a theoretical analysis (and you should not refer to experimental results). In the subsequent questions we will compare the experimental results to these theoretical predictions.

  2. Describe your experimental setup. What kind of machine did you use? What timing mechanism? How many times did you repeat each experiment? What times are reported? How did you select the inputs you would use? Do you use the same inputs for all sorting algorithms?

  3. Which of the three versions of Quick sort seems to perform the best? Graph the average case running time as a function of input size for the three versions. Graph the best case running time as a function of input size for the three versions (use the best case input you determined for pivot 1 in the previous step). Graph the worst case running time as a function of input size for the three versions (use the worst case input you determined for pivot 1 in the previous step). For each case, your graphs should include the running times for all three versions of Quicksort versus the input size.

  4. Which of these five sorts seems to perform the best (consider the best version of Quicksort)? Graph the average case running time as a function of input size for the five sorts. Graph the best case running time as a function of input size for the five sorts. Graph the worst case running time as a function of input size for the five sorts. For each case, your graphs should include the running times for all three versions of Quicksort versus the input size. You will need to consider the best/worst case inputs for all algorithms.

  5. For the comparison sorts, is the number of comparisons done really a good predictor of the execution time? In other words, is a comparison a good choice of basic operation for analyzing these algorithms? To answer this question you need to analyze your data to see if the number of comparisons is correllated with the execution time. Plot time/#comp vs. n and refer to these plots in your answer.

  6. To what extent do the best, average and worst case analyses (from class/textbook) of each sort agree with the experimental results? To answer this question you need to find a way to compare the experimental results for a sort with its predicted theoretical times. For this you will need to do something somewhat rigorous. For example, one way to compare a time to a predicted time of O(n2) would be to divide the times for a number of runs with different input sizes by n2 and see if you observe a horizontal line (after some input size n0). That n0 would represent the n0 value for the asymptotic analysis. The value on the y-axis (assuming you put input size on the x-axis) would tell you the constant value of the big-Oh. Finally - remember you are supposed to be analyzing the time for the experiments. I.e., it is not showing anything to show that the number of comparisons is O(n2) - you need to see that the time is O(n2) to determine if the asymptotic analysis is any good.

    In particular, you should:

The choice of test data is up to you (i.e., for each sorting subroutine, which input sizes should be tested, how many different inputs of the same size, which particular inputs of a given size.) Be smart about which experiments to run, i.e., don't run larger or more tests than you need to answer the above questions reasonably well. Also, note that you will need to run your experiments several times in order to get stable measurements (times will vary depending upon system load, input, etc.) Your write-up must include a coherent discussion of which experiments you ran, how many times you ran them, etc.

Grading on this assignment will put the greatest weight on the choice of test data and the quality and insightfulness of your discussion of your results. Don't be put off too much if there are some discrepancies between the theoretical results and the experiments. If that happens, try to explain why it occurred.