Assignment 6: Mystery sort

Due: 5:00pm, Friday, October 10. Value: 30 pts.

The MysterySort program reads a sequence of numbers from a file and sorts them, printing the total amount of time consumed during the sorting process. It allows you to choose between five sorting algorithms, labeled A through E, corresponding to the following, though not necessarily in that order.

selection sort
insertion sort
merge sort
quick sort with random pivot
quick sort with the pivot being the center number in the array

Your should determine which algorithm corresponds to each letter and present your analysis in a paper.

To download MysterySort, right-click this link and select Save Link As… or Save Target As…. To start the program, you would run the following command, which says to run algorithm A on the numbers found in the file named numbers.txt. (This should work under Windows, MacOS, or Linux.)

prompt% java -jar MysterySort.jar E numbers.txt
     2.389ms     E   10000       419 trials

The output of MysterySort, illustrated above, includes four columns. The first column is the important one: It reports the number of milliseconds needed to sort the array. The next two columns simply describe the parameters, including the sort identifier (E) and the size of array sorted. The fourth column mentions the number of trials; because of computers measure short times very poorly, the program repeatedly sorts the array until at least one second has passed, and then it reports the average time per sort. This last number is an artifact of the measurement technique and should be ignored.

To generate a file of numbers, you can download the program Generate.java. The below example compiles this program and then executes it, saying to create a file named numbers.txt containing 10,000 numbers.

prompt% javac Generate.java
prompt% java Generate 10000 numbers.txt

As written, Generate.java creates a file with the numbers 0 through n - 1 in random order. Random order is useful for testing, but modifying the generation procedure is useful to distinguish different algorithms' performance.

One possible pitfall is to make assumptions about the built-in implementation. For example, you should not assume that the fastest one will be quicksort just because I said in class that it tends to be faster with most implementations. Instead, you should specific evidence that distinguishes the sorting algorithms based on their behavior on different inputs.

Another pitfall is to attempt to judge an algorithm's Big-O performance based on a single value of n. Such an analysis is not valid. To estimate a Big-O bound for a program, you need to measure performance for multiple n values and examine the results. For an O(n) program, for example, you would expect that doubling n would roughly double the amount of time the program takes.

In the paper you hand in, you should describe the tests that lead to your final conclusion, the results of those tests (including the raw measurements), your conclusion (i.e., the specific mapping from letters to algorithms), and your rationale behind the conclusions. Do not describe tests that turn out to be unimportant to your conclusions.

You will not submit any code for this assignment.

Your paper will be evaluated based on the correctness of your conclusions, the soundness of your reasoning, and the quality of your writing. The quality of your writing includes issues such as grammar, spelling, clarity of presentation, and ability to engage a reader (i.e., don't be too boring!). Your paper should not exceed two pages, and indeed a full two pages is likely to be too long.

(This assignment concept comes from David Levine of Saint Bonaventure University.)