Convolutions and Fourier Transforms

A convolution is a linear operator of the form \begin{equation} (f \ast g)(t) = \int f(\tau) g(t - \tau ) d\tau \end{equation} In a discrete space, this turns into a sum \begin{equation} \sum_\tau f(\tau) g(t - \tau) \end{equation}

Convolutions are shift invariant, or time invariant. They frequently appear in temporal and spatial image processing, as well as in probability.

If we want to represent the discrete convolution operator as a matrix, we get a toeplitz matrix \(G\) \begin{equation} G = \begin{bmatrix} g(0) & g(1) & g(2) & \dots\ g(-1) & g(0) & g(1) &\dots\ g(-2) & g(-1) & g(0) & \dots\ \vdots & & &\ddots \end{bmatrix} \end{equation}

A circulant matrix acting on a discrete signal of length \(N\) is a toeplitz matrix where \(g(-i) = g(N-i)\).

import numpy as np
import scipy as sp
import scipy.linalg as la
import matplotlib.pyplot as plt
c = np.arange(0,-4, -1) # first column
r = np.arange(4) # first row
la.toeplitz(c, r)
array([[ 0,  1,  2,  3],
       [-1,  0,  1,  2],
       [-2, -1,  0,  1],
       [-3, -2, -1,  0]])
c = np.arange(4)
la.circulant(c)
array([[0, 3, 2, 1],
       [1, 0, 3, 2],
       [2, 1, 0, 3],
       [3, 2, 1, 0]])

both toeplitz and circulant matrices have special solve function in scipy.linalg

N = 100
c = np.random.randn(N)
r = np.random.randn(N)
A = la.toeplitz(c, r)

x = np.random.rand(N)

print("standard solve")
%time y1 = la.solve(A, x)

print("\ntoeplitz solve")
%time y2 = la.solve_toeplitz((c,r), x)

print(la.norm(y1- y2))
standard solve
CPU times: user 784 µs, sys: 3.02 ms, total: 3.8 ms
Wall time: 14.6 ms

toeplitz solve
CPU times: user 209 µs, sys: 0 ns, total: 209 µs
Wall time: 166 µs
7.794609144684837e-12
N = 100
c = np.random.randn(N)
A = la.circulant(c)

x = np.random.randn(N)

print("standard solve")
%time y1 = la.solve(A, x)

print("\ncirculant solve")
%time y2 = la.solve_circulant(c, x)

print(la.norm(y1- y2))
standard solve
CPU times: user 975 µs, sys: 0 ns, total: 975 µs
Wall time: 612 µs

circulant solve
CPU times: user 184 µs, sys: 23 µs, total: 207 µs
Wall time: 168 µs
4.367350761595999e-15

Example

One place where convolutions appear is in combining probability distributions. For instance, if we take a sample \(x \sim y + z\) where \(y \sim f\) and \(z \sim g\) and \(f, g\) are probability density functions, then the probability density function that describes the random variable \(x\) is \(f\ast g\).

Fourier Transforms

A Fourier transform takes functions back and forth between time and frequency domains. \begin{equation} \hat{f}(\omega) = \int_{-\infty}^\infty f(x) e^{-2\pi i x \omega} dx \end{equation}

A discrete Fourier transform (DFT) operates on a signal represented as a finite dimensional vector. On a signal of length \(N\), we have

\begin{equation} \hat{f}k = \frac{1}{N} \sum{n=0}^{N-1} f_n e^{-2\pi i k n / N} \end{equation}

In practice, discrete Fourier transforms are often computed using the Fast Fourier Transform (FFT) algorithm, which runs in \(O(N\log N)\) time for a signal of length \(N\). Scipy provides funcitons for FFT as well as the inverse iFFT.

from scipy.fft import fft, ifft

x = np.linspace(0, 2*np.pi, 100)
f = np.sin(4*x) + np.sin(8*x)
plt.plot(x, f)
plt.show()
../_images/convolutions_9_0.png

the function in frequency space typically has real and imaginary components.

fhat = fft(f)
fig, ax = plt.subplots(1,3, figsize=(12,4))
ax[0].plot(np.abs(fhat))
ax[0].set_title("abs")

ax[1].plot(np.real(fhat))
ax[1].set_title("real")

ax[2].plot(np.imag(fhat))
ax[2].set_title("complex")
plt.show()
../_images/convolutions_11_0.png

real-valued signals should have anti-symmetric complex components and symmetric real components.

f2 = ifft(fhat)
plt.plot(np.real(f2))
plt.show()
../_images/convolutions_13_0.png

Fourier transforms can reveal a variety of useful information and have many useful properties. One of which is that convolutions in the time/spatial domain become pointwise multiplication in the frequency domain.

\begin{equation} h = f \ast g \Leftrightarrow \hat{h} = \hat{f} \hat{g} \end{equation}

This means that circulant matrices and their inverses can be applied in \(O(N \log N)\) time using the FFT.

def apply_circulant(c, x):
    """
    apply circulant matrix with first row c to a vector x
    
    y = C x
    """
    c_hat = fft(c)
    x_hat = fft(x)
    y_hat = c_hat * x_hat
    return ifft(y_hat)
N = 100
c = np.random.randn(N)
A = la.circulant(c)

x = np.random.randn(N)

%time y1 = A @ x
%time y2 = apply_circulant(c, x)

la.norm(y1 - y2)
CPU times: user 24 µs, sys: 2 µs, total: 26 µs
Wall time: 29.3 µs
CPU times: user 716 µs, sys: 12 µs, total: 728 µs
Wall time: 508 µs
3.8977870569728296e-14
def apply_circulant_inv(c, x):
    """
    apply inverse of circulant matrix with first row c to a vector x
    
    y = C^{-1} x
    """
    c_hat = fft(c)
    x_hat = fft(x)
    y_hat = x_hat / c_hat
    return ifft(y_hat)
N = 100
c = np.random.randn(N)
A = la.circulant(c)

x = np.random.randn(N)
b = A @ x

%time y1 = la.solve_circulant(c, x)
%time y2 = apply_circulant_inv(c, x)

la.norm(y1 - y2)
CPU times: user 534 µs, sys: 44 µs, total: 578 µs
Wall time: 301 µs
CPU times: user 131 µs, sys: 0 ns, total: 131 µs
Wall time: 91.3 µs
3.471506004545703e-15

Exercise 1

Create a plot that demonstrates the asymptotic scaling of the FFT is \(O(N \log N)\)

## Your code here

Exercise 2

Let \(C\) be a circulant matrix.

Given that the inverse of a circulant matrix can also be applied using a Fourier transform, is \(C^{-1}\) also a circulant matrix?

Write a function to compute the inverse of \(C\).

## Your code here

Higher Dimensions

In 2 dimensions, you can use scipy’s fft2 and ifft2 for 2-dimensional signals such as images. For an \(m \times n\) image, the time complexity is \(O(mn \log(mn))\)

For general image processing, you may find it useful to install scikit-image

(pycourse) conda install scikit-image

Sparse Convolutions

In image processing, you may often encounter sparse convolutions, particularly convolutions that are very localized (e.g. only the first few entries of the firs row c are nonzero). One example is that many camera lenses cause a slight blur which mixes light from nearby sources. In deep learning, these sparse convolutions have been very effective in image processing tasks. Perhaps the simplest explanation of why learning parameters for a convolution is a good idea in this situation is that convolutions are translation invariant allowing the neural net to learn regardless of where a target appears on an image.

In this case, we typically think of convolutions as \(k\times k\) patches which slide over a 2-dimensional image.

The time complexity of applying this convolution using standard for-loops to a \(m\times n\) image is \(O(k^2 mn)\), which is typically faster than using a Fourier transform. GPUs are also very good at this sort of operation.

import skimage

img = skimage.data.camera()
plt.imshow(img, cmap=plt.cm.gray)
plt.show()
../_images/convolutions_25_0.png
from numba import njit

@njit
def apply_convolution_2d(img, c):
    m, n = img.shape
    k1, k2 = c.shape
    img2 = np.zeros((m-k1, n-k2))
    for i in range(m-k1):
        for j in range(n-k2):
            for i1 in range(k1):
                for j1 in range(k2):
                    img2[i,j] = img2[i,j] + img[i+i1, j+j1] * c[i1, j1]
    return img2
c = np.ones((10,10)) / 100 # local averageing operator
%time img2 = apply_convolution_2d(img, c)

plt.imshow(img2, cmap=plt.cm.gray)
plt.show()
CPU times: user 220 ms, sys: 10.7 ms, total: 231 ms
Wall time: 232 ms
../_images/convolutions_27_1.png

If you’re using this sort of convolution in PyTorch, you can use torch.nn.Conv2d or torch.conv2d

import torch

conv = torch.nn.Conv2d(1, 1, (3,3)) # 3 x 3 convolution
c = torch.Tensor(np.ones((1,1,10,10)) / 100)
imgt = torch.Tensor(img).reshape(1,1,*img.shape) # add batch and channel dimension
%time imgt2 = torch.conv2d(imgt, c)
CPU times: user 26.6 ms, sys: 4.14 ms, total: 30.7 ms
Wall time: 7.57 ms
img2 = imgt2.numpy()
img2 = img2[0,0,:,:]

plt.imshow(img2, cmap=plt.cm.gray)
plt.show()
../_images/convolutions_32_0.png

Discrete Cosine Transform (DCT)

One of the potential annoyances with Fourier transforms is that even with real-valued signals they produce complex output. If you want to stick to real output with a similar interpretation, you can use the Discrete Cosine Transform (DCT) or Discrete Sine Transform (DST). Both are implemented in Scipy.

Scipy DCT

from scipy.fft import dct, idct

x = np.linspace(0, 2*np.pi, 100)
f = np.sin(4*x) + np.sin(8*x)

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].plot(x, f)
ax[0].set_title("signal")

fhat = dct(f)
ax[1].plot(fhat)
ax[1].set_title("dct")
plt.show()
../_images/convolutions_34_0.png
x = np.linspace(0, 2*np.pi, 100, endpoint=False)
f = np.cos(4*x) + np.cos(8*x)

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
ax[0].plot(x, f)
ax[0].set_title("signal")

fhat = dct(f)
ax[1].plot(fhat)
ax[1].set_title("dct")
plt.show()
../_images/convolutions_35_0.png

The DCT can also be used for a simple method of image compression - you can simply remove the high frequency componenents of an image and retain most of the large structure while losing some detail.

from scipy import ndimage
from scipy.fft import dctn, idctn
import skimage
img = skimage.data.camera()
plt.imshow(img, cmap=plt.cm.gray)
plt.show()
../_images/convolutions_38_0.png
ihat = dctn(img)
plt.imshow(np.log(np.abs(ihat)))
plt.show()
../_images/convolutions_39_0.png

Now, we’ll cut out the highest-frequency DCT coefficients.

m, n = ihat.shape
ihat2 = ihat[:m//2, :n//2]
img2 = idctn(ihat2)
plt.imshow(img2, cmap=plt.cm.gray)
plt.show()
../_images/convolutions_41_0.png

alternatively, we can make the image the original size by just setting the high-frequency DCT coefficients to 0.

ihat2 = np.zeros_like(ihat)
ihat2[:m//2, :n//2] = ihat[:m//2, :n//2]
img2 = idctn(ihat2)
plt.imshow(img2, cmap=plt.cm.gray)
plt.show()
../_images/convolutions_43_0.png

Now, let’s really compress the image…

ihat2 = np.zeros_like(ihat)
ihat2[:m//10, :n//10] = ihat[:m//10, :n//10]
img2 = idctn(ihat2)
plt.imshow(img2, cmap=plt.cm.gray)
plt.show()
../_images/convolutions_45_0.png