Average Case Analysis

Some algorithms like quicksort use randomness to avoid deterministic pathalogical behavior. In this case, we can talk about expected runtime for an algorithm.

Quickselect

We’ll begin with a variant of quicksort that finds the \(k\)th smallest element in an unordered list \(a\), or the \(k\)-th order statistic \(a_{(k)}\). After an initial call to partition, we can tell which array the desired entry lies in by looking at their sizes. If the left array has \(k\) or more entries, then it contains the desired value. If it has fewer than \(k-1\) entries, then the right array contains the desired value If it has exactly \(k-1\) entries, then the desired entry is in the pivot location.

from numpy.random import randint

def partition(a, lo, hi):
    """
    choose a pivot in a[lo:hi+1] randomly
    swap all elements of a[lo:hi+1] less than the pivot value to appear before the pivot
    swap all elements of a[lo:hi+1] greater than the pivot value to appear after the pivot
    """
    pi = randint(lo, hi+1) # pivot index
    a[pi], a[hi] = a[hi], a[pi] # put pivot index in last position: swap(a, pi, hi)
    pivot = a[hi]
    i = lo # i is the pivot index for elements we have seen so far
    for j in range(lo, hi+1):
        if a[j] < pivot:
            a[i], a[j] = a[j], a[i] # swap(a, i, j)
            i = i+1 # increment pivot index
    a[i], a[hi] = a[hi], a[i] # put pivot in correct place: swap(a, i, hi)
    return i

def quickselect(a, k, lo=0, hi=None):
    """
    perform quickselect algorithm on array a
    find the k-th largest element of a
    modifies a in-place
    """
    if hi is None:
        hi = len(a) - 1
        
    i = partition(a, lo, hi) # returns the position of the pivot
    if k < i:
        return quickselect(a, k, lo, i-1)
    elif k > i:
        return quickselect(a, k, i+1, hi)
    else: # k == i
        return a[k]
import numpy as np
a = np.arange(10)
np.random.shuffle(a)
a
array([7, 5, 2, 6, 3, 4, 0, 9, 1, 8])
quickselect(a, 4)
4
a = [c for c in 'abcdefg']
quickselect(a, 3)
'd'

Analysis of Quickselect

We’ll follow the analysis found in “Algorithms from The Book” by Kenneth Lange in both this an the following section.

Let \(b_n\) denote the expected number of operations to find \(a_{(k)}\) on an array \(a\) of length \(n\). We’ll prove that \(b_n \le 4 n\). Because it takes \(n-1\) operations to create the left and right subarrays, we have \begin{equation} b_n = n-1 + \frac{1}{n} \sum_{j=1}^{k-1} b_{n-j} + \frac{1}{n} \sum_{j=k}^n b_{j-1} \end{equation} The terms in the first summation are the events where we must recurse into the right array, and the terms in the second summation are the events where we must recurse into the left array. The weighting by \(1/n\) reflects the fact that we have equal probability of selecting \(a_{(j)}\) as our pivot for all \(j = 1,\dots, n\).

We now argue that \(b_n \le cn\), which is trivially true for \(n=1\). We proceed by induction. Suppose \(b_k\le ck\) for all \(k \le n-1\). Recall \(\sum_{i=1}^m = \binom{m+1}{2}\), the above expression can be bounded \begin{align} b_n &\le n-1 + \frac{c}{n}\sum_{j=1}^{k-1} (n-j) + \frac{c}{n} \sum_{j=k}^n (j-1)\ &= n-1 + \frac{c}{n}\bigg[n(k-1) - \binom{k}{2} + \binom{n}{2} - \binom{k-1}{2}\bigg]\ &= n-1 + \frac{c}{2n}(n^2 + 2nk - 2k^2 - 3n + 4k - 2) \end{align}

We want to find the value of \(k\) which maximizes this function (a quadratic in \(k\)), which is \(k= n/2 + 1\). Substituting this value, we have \begin{equation} b_n \le n-1 + \frac{c}{2n}{3n^2/2 - n} \end{equation} Which is \(\le cn\) where \(c \ge 4\).

Analysis of Quicksort

Recall that in quicksort, we call the partition function in a divide-and-conquer strategy to sort the array \(a\).

def quicksort(a, lo=0, hi=None):
    """
    perform quicksort algorithm on array a
    performs operations in-place
    """
    if hi is None:
        hi = len(a) - 1
        
    if lo < hi:
        i = partition(a, lo, hi)
        quicksort(a, lo, i-1) # recurse on lower half
        quicksort(a, i+1, hi) # recurse on higher half
        
    return a