Average Case Analysis
Contents
Average Case Analysis¶
Some algorithms like quicksort use randomness to avoid deterministic pathalogical behavior. In this case, we can talk about expected runtime for an algorithm.
Quickselect¶
We’ll begin with a variant of quicksort that finds the \(k\)th smallest element in an unordered list \(a\), or the \(k\)-th order statistic \(a_{(k)}\). After an initial call to partition
, we can tell which array the desired entry lies in by looking at their sizes. If the left array has \(k\) or more entries, then it contains the desired value. If it has fewer than \(k-1\) entries, then the right array contains the desired value If it has exactly \(k-1\) entries, then the desired entry is in the pivot location.
from numpy.random import randint
def partition(a, lo, hi):
"""
choose a pivot in a[lo:hi+1] randomly
swap all elements of a[lo:hi+1] less than the pivot value to appear before the pivot
swap all elements of a[lo:hi+1] greater than the pivot value to appear after the pivot
"""
pi = randint(lo, hi+1) # pivot index
a[pi], a[hi] = a[hi], a[pi] # put pivot index in last position: swap(a, pi, hi)
pivot = a[hi]
i = lo # i is the pivot index for elements we have seen so far
for j in range(lo, hi+1):
if a[j] < pivot:
a[i], a[j] = a[j], a[i] # swap(a, i, j)
i = i+1 # increment pivot index
a[i], a[hi] = a[hi], a[i] # put pivot in correct place: swap(a, i, hi)
return i
def quickselect(a, k, lo=0, hi=None):
"""
perform quickselect algorithm on array a
find the k-th largest element of a
modifies a in-place
"""
if hi is None:
hi = len(a) - 1
i = partition(a, lo, hi) # returns the position of the pivot
if k < i:
return quickselect(a, k, lo, i-1)
elif k > i:
return quickselect(a, k, i+1, hi)
else: # k == i
return a[k]
import numpy as np
a = np.arange(10)
np.random.shuffle(a)
a
array([7, 5, 2, 6, 3, 4, 0, 9, 1, 8])
quickselect(a, 4)
4
a = [c for c in 'abcdefg']
quickselect(a, 3)
'd'
Analysis of Quickselect¶
We’ll follow the analysis found in “Algorithms from The Book” by Kenneth Lange in both this an the following section.
Let \(b_n\) denote the expected number of operations to find \(a_{(k)}\) on an array \(a\) of length \(n\). We’ll prove that \(b_n \le 4 n\). Because it takes \(n-1\) operations to create the left and right subarrays, we have \begin{equation} b_n = n-1 + \frac{1}{n} \sum_{j=1}^{k-1} b_{n-j} + \frac{1}{n} \sum_{j=k}^n b_{j-1} \end{equation} The terms in the first summation are the events where we must recurse into the right array, and the terms in the second summation are the events where we must recurse into the left array. The weighting by \(1/n\) reflects the fact that we have equal probability of selecting \(a_{(j)}\) as our pivot for all \(j = 1,\dots, n\).
We now argue that \(b_n \le cn\), which is trivially true for \(n=1\). We proceed by induction. Suppose \(b_k\le ck\) for all \(k \le n-1\). Recall \(\sum_{i=1}^m = \binom{m+1}{2}\), the above expression can be bounded \begin{align} b_n &\le n-1 + \frac{c}{n}\sum_{j=1}^{k-1} (n-j) + \frac{c}{n} \sum_{j=k}^n (j-1)\ &= n-1 + \frac{c}{n}\bigg[n(k-1) - \binom{k}{2} + \binom{n}{2} - \binom{k-1}{2}\bigg]\ &= n-1 + \frac{c}{2n}(n^2 + 2nk - 2k^2 - 3n + 4k - 2) \end{align}
We want to find the value of \(k\) which maximizes this function (a quadratic in \(k\)), which is \(k= n/2 + 1\). Substituting this value, we have \begin{equation} b_n \le n-1 + \frac{c}{2n}{3n^2/2 - n} \end{equation} Which is \(\le cn\) where \(c \ge 4\).
Analysis of Quicksort¶
Recall that in quicksort, we call the partition function in a divide-and-conquer strategy to sort the array \(a\).
def quicksort(a, lo=0, hi=None):
"""
perform quicksort algorithm on array a
performs operations in-place
"""
if hi is None:
hi = len(a) - 1
if lo < hi:
i = partition(a, lo, hi)
quicksort(a, lo, i-1) # recurse on lower half
quicksort(a, i+1, hi) # recurse on higher half
return a