Integration & Quadrature

First, let’s recall a few definitions. An indefinite integral is an integral without bounds, and is defined up to a constant \begin{equation} \int x, dx = \frac{x^2}{2} + C \end{equation} A definite integral has bounds, which are sometimes symbolic \begin{equation} \int_0^y 1, dx = y \end{equation}

Symbolic Integration

sympy includes a variety of functionality for integration.

import sympy as sym
x, y = sym.symbols('x y')

You can compute indefinite integrals

sym.integrate(1, x) 
\[\displaystyle x\]
sym.integrate(x**2 + x - 1)
\[\displaystyle \frac{x^{3}}{3} + \frac{x^{2}}{2} - x\]
sym.integrate(x*y, y)
\[\displaystyle \frac{x y^{2}}{2}\]

If you want to compute definite integrals, you can pass in the bounds

sym.integrate(1, (x, 0, 2))
\[\displaystyle 2\]
sym.integrate(sym.sin(x), (x, 0, sym.pi))
\[\displaystyle 2\]
sym.integrate(x*y, (y, 0, 1))
\[\displaystyle \frac{x}{2}\]
sym.integrate(x, (x, 0, y))
\[\displaystyle \frac{y^{2}}{2}\]

SymPy also provides integral transforms such as the Fourier transform

k = sym.Symbol('k')
sym.fourier_transform(sym.exp(-x**2), x, k)
\[\displaystyle \sqrt{\pi} e^{- \pi^{2} k^{2}}\]

You can find more complete information in the sympy documentation

Numerical Integration (Quadrature)

Numerical integration is used to obtain definite integrals.

First, we should recall the definition of the Riemannian integral: \begin{equation} \int_a^b f(x), dx = \lim_{n\to \infty} \frac{b-a}{n}\sum_{i=1}^n f(x_i) (x_{i+1} - x_i) \end{equation} where the sequence \(\{x_i\}\) is built for each \(n\) so that \(x_{i+1} - x_{i} = (b-a)/n\). It is useful to think of how integrals compute area under a curve. In this case, we are (approximately) computing this area as the sum of rectangles with height \(f(x_i)\) and width \(x_{i+1}-x_i\). More generally, we can think of this as a weighted sum of function values \begin{equation} \sum_{i=1}^n f(x_i) w_i \end{equation} Which looks more like Lesbegue integration, where \(w_i\) is the measure assigned to the point \(x_i\).

In order to numerically integrate functions, we will take a finite sum of weighted function values, where the weights and function evaluation points are designed to give a good approximation to the integral. This is known as quadrature (instead of inegration).

Trapezoidal Rule

One of the simplest quadrature rules is trapezoid rule, which approximates an area under a curve by trapezoids defined by the four points \((x_i,0), (x_{i+1},0), (x_i, f(x_i)), (x_{i+1}, f(x_{i+1}))\). The area of this trapezoid is \begin{equation} \frac{1}{2}\bigg(f(x_{i+1}) + f(x_i)\bigg)(x_{i+1} - x_i) \end{equation} If we have \(a = x_0 \le x_1 \le ... \le x_{n} = b\) equally spaced, then the quadrature weights for \(x_i\) are \(w_i = (b-a)/n\) if \(i\notin \{0,n\}\), and \(w_i= (b-a)/(2n)\) if \(i \in \{0,n\}\)

You can use trapezoid rule in scipy using scipy.integrate.trapz

import numpy as np
from scipy.integrate import trapz

x = np.linspace(0,1,100)
y = x**2
trapz(y, x) # actual answer is 1/3
0.33335033840084355

Other quadrature rules that you can use easily are Simpson’s rule (simps) and Romberg’s rule (romb)

from scipy.integrate import simps

simps(y,x) # again, answer is 1/3
0.333333505101692

Gaussian Quadrature

Gaussian quadrature computes points x_i and weights w_i to optimally integrate polynomials up to some fixed order. This is all wrapped up in fixed_quad or quadrature functions

from scipy.integrate import fixed_quad, quadrature

f = lambda x: x**10

fixed_quad integrates up to fixed polynomial order

1/11
0.09090909090909091
fixed_quad(f, 0, 1, n=11) # n is order of polynomial
(0.09090909090909102, None)

The return tuple is the computed result and None.

quadrature integrates up to a fixed tolerance

quadrature(f, 0, 1)
(0.09090909090909097, 1.1102230246251565e-16)
for ifunction in (fixed_quad, quadrature):
    print(ifunction(f, 0, 1)[0])
0.09090765936004021
0.09090909090909097

The return tuple is the computed result and the error, measured as the difference between the last two estimates.

General Quadrature

The function quad wraps a varitety of functionality. Basic use is just like above

from scipy.integrate import quad

f = lambda x: x**2

quad(f, 0, 1)
(0.33333333333333337, 3.700743415417189e-15)

Again, the function returns the computed result, and an estimate of the absolute error.

You can also compute integrals on unbounded domains

f = lambda x: np.exp(-x**2)

quad(f, 0, np.inf)
(0.8862269254527579, 7.101318390472462e-09)

Functions with Discontinuities

Discontinuities can cause problems with any sort of quadrature scheme that assumes functions are continuous. You can specify problematics points with points

import matplotlib.pyplot as plt

f = lambda x : np.floor(x**2)

x = np.linspace(0,2,100)
plt.plot(x, f(x))
plt.show()
../_images/integration_34_0.png
quad(f, 0, 2)
(1.8537356322944702, 9.277312917888025e-09)
quad(f, 0, 2, points=np.sqrt(np.arange(5)))
(1.8537356300580277, 2.0580599780584437e-14)

Functions with Singularities

Depending on the type of singularity, you can pass in different arguments into the weight and wvar parameters of quad. See the documentation for details.

For example, let’s say we want to compute

\begin{equation} \int_{-1}^2 \frac{1}{x},dx \end{equation}

f = lambda x : 1 / x

x = np.linspace(-1,2,100)
plt.plot(x, f(x))
plt.show()
<ipython-input-22-74bd56e45931>:1: RuntimeWarning: divide by zero encountered in true_divide
  f = lambda x : 1 / x
../_images/integration_38_1.png

There is an odd symmetry in the interval \([-1,1]\), so we can calculate

\begin{equation} \int_{-1}^2 \frac{1}{x},dx = \int_1^2 \frac{1}{x},dx = \log(2) - \log(1) = \log(2) \end{equation}

The 'cauchy' weight will calculate the integral of \(\frac{f(x)}{x - {\tt wvar}}\) (i.e. we will weight the fuction \(f\) by the Cauchy distribution \(\frac{1}{x - {\tt wvar}}\)).

So we can use weight='cauchy' with wvar=0 on the function f(x) = 1 to compute our integral.

quad(lambda x : 1, -1, 2, weight='cauchy', wvar=0) # mulitply function by 1/x and do integration
(0.6931471805599452, 0.0)
np.log(2)
0.6931471805599453

Multivariate Integration

You can integrate over functions of multiple variables using

nquad integrates over a hyper cube defined by ranges in the second argument. dblquad and tplquad can allow you to integrate over more complicated domains.

from scipy.integrate import nquad

f = lambda x, y, z : np.exp( -x**2 - y**2 - z**2)

nquad(f, ((-3,3), (-3,3), (-3,3)))
(5.56795898358481, 1.4605401208842099e-08)

Monte Carlo Integration

Monte Carlo integration is a method which computes integrals by taking a sum over random samples.

\begin{equation} \int_{a}^b f(x) = \mathbb{E}_{U(a,b)} [f] \end{equation}

Where \(U(a,b)\) is the uniform distribution over the interval \([a,b]\). We can estimate this expected value by drawing samples from the distribution, and computing

\begin{equation} \mathbb{E}{U(a,b)} [f] \approx \frac{1}{N} \sum{i=1}^N f(x_i), x_i \sim U(a,b) \end{equation}

Or, more generally, we might define an integral over a probability measure \(\mu\)

\begin{equation} \int f(x), d\mu(x) = \mathbb{E}{\mu} [f] = \frac{1}{N} \sum{i=1}^N f(x_i), x_i \sim \mu(x) \end{equation}

The expected error can be derived from the law of large numbers, and is typically \(\frac{1}{\sqrt{N}}\). In high dimensions, this is typically the most effictive way to integrate functions, as quadrature requires a number of samples that is exponential in terms of the dimension.

Example

The classic example of Monte Carlo integration is computing \(\pi\) using random samples. Recall the area of the unit circle is \(\pi\), and the area of the square \([-1,1]^2\) is \(4\). We sample points uniformly from this square, and compute how many fall in the unit circle - the expected ratio of points inside the circle to the total number of points is \(\pi/4\).

def compute_pi_mc(n=1000):
    x = np.random.rand(n,2)*2 - 1 # sampled in [-1,1] x [-1,1]
    r = np.linalg.norm(x, axis=1) # computes radius of each point
    return 4 * np.sum(r <= 1.) / n
%time compute_pi_mc(10_000_000)
CPU times: user 390 ms, sys: 41.9 ms, total: 432 ms
Wall time: 433 ms
3.1413256

We can use numba to parallelize this

from numba import njit

@njit
def compute_pi_mc_numba(n=1000):
    x = np.random.rand(n,2)*2 - 1
    ct = 0
    for i in range(n):
        ct += x[i,0]**2 + x[i,1]**2 < 1
    return 4 * ct / n
%time compute_pi_mc_numba(10_000_000)
CPU times: user 749 ms, sys: 112 ms, total: 861 ms
Wall time: 880 ms
3.1416048
from numba import njit, prange

@njit(parallel=True)
def compute_pi_mc_numba_parallel(n=1000):
    x = np.random.rand(n,2)*2 - 1
    ct = 0
    for i in prange(n):
        ct += x[i,0]**2 + x[i,1]**2 < 1
    return 4 * ct / n
%time compute_pi_mc_numba_parallel(10_000_000)
CPU times: user 1.21 s, sys: 190 ms, total: 1.4 s
Wall time: 966 ms
3.1408412

if we look at the wall time, we see this runs faster than any of the other options