# Sorting in Fortran

Whilst sort routines are part of the standard library in many languages, this is not so in Fortran. As I recently produced a collection of various sort algorithms in Fortran, they are now recorded here in case they are of use to anyone else.

All the routines simply sort an array of real values into ascending order. They are provided with very little explanation or other documentation, and certainly without any guarantees of correctness or fitness for purpose.

Two files are provided, sorts.f90 and bench_all.f90. After downloading both, one can try:

$ gfortran -O3 -c sorts.f90 $ gfortran -O3 bench_all.f90 sorts.o $ ./a.out

to obtain output like

Bubble sort Shell sort Size time/element Size time/element 10 5.99999979E-08 10 3.19999991E-08 100 2.39999991E-07 100 7.10000023E-08 1000 2.15999989E-06 1000 1.04999998E-07 10000 2.13399999E-05 10000 1.77000004E-07 100000 2.24999994E-07 1000000 3.33000003E-07 Insertion sort Dual pivot quicksort Size time/element Size time/element 10 9.99999994E-09 10 1.69999996E-08 100 3.99999998E-08 100 3.69999995E-08 1000 3.30000006E-07 1000 5.60000011E-08 10000 3.15000011E-06 10000 7.59999992E-08 100000 3.12699995E-05 100000 9.69999974E-08 1000000 1.16000002E-07

The very different scaling of the different algorithms is clear. The time per element for bubble sort and insertion sort is linear in the size of the array being sorted, so the total time for the sort increases quadratically with array size. In contrast, for the dual pivot quicksort, which for large arrays is the fastest routine given, the time per element barely increases as the array to be sorted grows larger. However, for very small arrays the complexities of the order N.log(N) algorithms mean that an insertion sort is faster.

The routines currently in sorts.f90 are, in approximately increasing order of performance:

### Order N^{2}

- Bubble sort
- Inline selection sort
- Selection sort
- Insertion sort
- Binary insertion sort
- Pair insertion sort
- Double insertion sort

### Better than order N^{2}, worse than N.log(N)

- Shell Sort
- Batcher odd-even merge sort
- Batcher odd-even merge sort, Knuth's version

### Generally order N.log(N)

- Smoothsort
- Heap sort (with recursion)
- Heap sort (without recursion)
- Merge sort
- Alternating merge sort
- Quicksort (without recursion)
- Quicksort
- Dual pivot quicksort

### Order N

- Radix sort
- Radix sort (optimised)

### Performance

Note that the performance difference between the different N.log(N)
algorithms is not great, and the ordering can be changed by minor
optimisation differences. For the N^{2} algorithms, the
ordering is very size dependent. The inline selection sort crosses
the binary insertion sort algorithm at about 30, and the
(intrinsic) selection sort at about 60, in the example given.

For an array size of 1000, the N.log(N) methods all gave times of
between 56,000 and 150,000. One would therefore use these in
preference to the N^{2} methods whose times start at
180,000. But it may be surprising that for this size the double
insertion sort is the best N^{2} algorithm, at 180,000,
followed by the pair insertion sort at 240,000, then the binary
insertion sort at 310,000, and the plain insertion sort scores
320,000. At a small size of around 70, three of the insertion sorts
are almost indistinguishable in performance, and the binary version
is a factor of two slower.

(Why is the inlined selection sort algorithm ultimately slower than
the non-inlined version? The Fortran compiler used expanded the
intrinsic `maxloc`

function in a way which contained no
conditional branches. It was unable to eliminate the conditional
branches from the explicitly inlined code.)

The radix sorts are interesting for their linear scaling, lack of any if statements (i.e. unpredictable branches), and being entirely based on integer operations. Casting a double to an int is not easy in Fortran, and even in C it risks breaking the standard. The (slightly) more readable version is also a stable sort; equal elements retain their ordering. The more optimised version is not, in that equal negative elements reverse their ordering.

The linear scaling of radix sorts is a bit of a cheat. Rather than being N.log(N), they are N.W, where W is the size in bits of the items being sorted. So single precision can be sorted about twice as fast as the double precision example here.

Both radix sorts, and in particular the optimised version, are examples of how not to code. Uncommented, unreadable, and probably containing subtle bugs as a result. They are very slow at small sizes, and the advantages of this optimised version evaporate once it is out of cache.