Linear Algebra in PyTorch

PyTorch is a popular package for developing models for deep learning. In this section, we’ll look at its linear algebra capabilities.

Even if you are not doing deep learning, you can use PyTorch for linear algebra. One of the nice things about PyTorch is that it makes it easy to take advantage of GPU hardware, which is very efficient at certain operations.

You can find PyTorch tutorials on the official website as well as documentation. Note these tutorials are focused on deep learning. This section will focus on the linear algebra capabilities.

When you install PyTorch, use the pytorch channel in conda.

conda install pytorch -c pytorch
import numpy as np
import matplotlib.pyplot as plt
import torch
torch.__version__
'1.7.0'

PyTorch Tensors

PyTorch Tensors are just multi-dimensional arrays. You can go back and forth between these and numpy ndarray.

A = np.random.rand(2,2)
A
array([[0.11111084, 0.55958025],
       [0.27637641, 0.05668989]])
B = torch.Tensor(A)
B
tensor([[0.1111, 0.5596],
        [0.2764, 0.0567]])

To put a tensor on gpu, use cuda(). Note that you must have an NVIDIA GPU in your computer to be able to do this successfully.

torch.cuda.is_available()
True
Bcuda = B.cuda()
print(Bcuda)
Bcpu = B.cpu()
print(Bcpu)
tensor([[0.1111, 0.5596],
        [0.2764, 0.0567]], device='cuda:0')
tensor([[0.1111, 0.5596],
        [0.2764, 0.0567]])
B.to(device='cuda')
tensor([[0.1111, 0.5596],
        [0.2764, 0.0567]], device='cuda:0')

To move a tensor back to CPU, you can use device='cpu'

Linear Algebra Functions

PyTorch provides access to a variety of BLAS and LAPACK-type routines - see documentation here. These do not follow the BLAS/LAPACK naming conventions

torch.addmv is roughly equivalent to axpy, and performs \(Ax + y\)

m = 100
n = 100
device = torch.device('cuda') # 'cuda' or 'cpu'

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)

A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.Tensor(ynp).to(device=device)

z = torch.addmv(y, A, x)

Let’s look at the timing difference between CPU and GPU

m = 1000
n = 1000

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)
print("numpy")
%time z = ynp + Anp @ xnp


for device in ('cpu', 'cuda'):
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.Tensor(ynp).to(device=device)
    z = torch.addmv(y, A, x)

    %time z = torch.addmv(y, A, x)
    
    
numpy
CPU times: user 3.56 ms, sys: 0 ns, total: 3.56 ms
Wall time: 1.01 ms

device = cpu
CPU times: user 655 µs, sys: 263 µs, total: 918 µs
Wall time: 299 µs

device = cuda
CPU times: user 168 µs, sys: 68 µs, total: 236 µs
Wall time: 65.8 µs

torch.mv performs matrix-vector products

Anp = np.random.randn(m,n)
xnp = np.random.randn(n)

print("numpy")
%time z = Anp @ xnp


for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    x = torch.Tensor(xnp).to(device=device)
    y = torch.mv(A, x)

    %time y = torch.mv(A, x)
numpy
CPU times: user 2.81 ms, sys: 0 ns, total: 2.81 ms
Wall time: 577 µs

device = cpu
CPU times: user 379 µs, sys: 0 ns, total: 379 µs
Wall time: 311 µs

device = cuda
CPU times: user 124 µs, sys: 47 µs, total: 171 µs
Wall time: 42 µs

torch.mm performs matrix-matrix multiplications

m = 1000
n = 1000

Anp = np.random.randn(m,n)
Bnp = np.random.randn(n, n)

print("numpy")
%time C = Anp @ Bnp

for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.Tensor(Anp).to(device=device)
    B = torch.Tensor(Bnp).to(device=device)
    C = torch.mm(A, B) # run once to warm up

    %time C = torch.mm(A, B)
numpy
CPU times: user 54.8 ms, sys: 11.6 ms, total: 66.3 ms
Wall time: 16.8 ms

device = cpu
CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms
Wall time: 6.72 ms

device = cuda
CPU times: user 98 µs, sys: 34 µs, total: 132 µs
Wall time: 38.9 µs

Batch operations

Where PyTorch (and GPUs in general) really shine are in batch operations. We get extra efficiency if we do a bunch of multiplications with matrices of the same size.

For matrix-matrix multiplcation, the function is torch.bmm

Because tensors are row-major, we want the batch index to be the first index. In the below code, the batch multiplication is equivalent to

for i in range(k):
    C[i] = A[i] @ B[i]
n = 512 # matrix size
k = 32 # batch size

Anp = np.random.randn(k, n, n)
Bnp = np.random.randn(k, n, n)
# see numpy matmul documentation for how this performs batch multiplication
print("numpy")
%time C = np.matmul(Anp, Bnp)

for device in ['cpu', 'cuda']:
    print(f"\ndevice = {device}")
    A = torch.randn(k, n, n).to(device=device)
    B = torch.randn(k, n, n).to(device=device)
    C = torch.bmm(A, B) # run once to warm up

    %time C = torch.bmm(A, B)
numpy
CPU times: user 440 ms, sys: 44.4 ms, total: 484 ms
Wall time: 122 ms

device = cpu
CPU times: user 109 ms, sys: 27.1 ms, total: 136 ms
Wall time: 34.4 ms

device = cuda
CPU times: user 246 µs, sys: 51 µs, total: 297 µs
Wall time: 301 µs

Sparse Linear Algebra

PyTorch also supports sparse tensors in torch.sparse. Tensors are stored in COOrdinate format.

i = torch.LongTensor([[0, 1, 1],
                      [2, 0, 2]])
v = torch.FloatTensor([3, 4, 5])
torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()
tensor([[0., 0., 3.],
        [4., 0., 5.]])

indices are stored in a 2 x nnz tensor of Long (a datatype that stores integers). Values are stored as floats.

Exercise

Write a function that returns a sparse identity matrix of size n in PyTorch.