Linear Algebra in PyTorch
Contents
Linear Algebra in PyTorch¶
PyTorch is a popular package for developing models for deep learning. In this section, we’ll look at its linear algebra capabilities.
Even if you are not doing deep learning, you can use PyTorch for linear algebra. One of the nice things about PyTorch is that it makes it easy to take advantage of GPU hardware, which is very efficient at certain operations.
You can find PyTorch tutorials on the official website as well as documentation. Note these tutorials are focused on deep learning. This section will focus on the linear algebra capabilities.
When you install PyTorch, use the pytorch
channel in conda.
conda install pytorch -c pytorch
import numpy as np
import matplotlib.pyplot as plt
import torch
torch.__version__
'1.7.0'
PyTorch Tensors¶
PyTorch Tensor
s are just multi-dimensional arrays. You can go back and forth between these and numpy ndarray
.
A = np.random.rand(2,2)
A
array([[0.11111084, 0.55958025],
[0.27637641, 0.05668989]])
B = torch.Tensor(A)
B
tensor([[0.1111, 0.5596],
[0.2764, 0.0567]])
To put a tensor on gpu, use cuda()
. Note that you must have an NVIDIA GPU in your computer to be able to do this successfully.
torch.cuda.is_available()
True
Bcuda = B.cuda()
print(Bcuda)
Bcpu = B.cpu()
print(Bcpu)
tensor([[0.1111, 0.5596],
[0.2764, 0.0567]], device='cuda:0')
tensor([[0.1111, 0.5596],
[0.2764, 0.0567]])
B.to(device='cuda')
tensor([[0.1111, 0.5596],
[0.2764, 0.0567]], device='cuda:0')
To move a tensor back to CPU, you can use device='cpu'
Linear Algebra Functions¶
PyTorch provides access to a variety of BLAS and LAPACK-type routines - see documentation here. These do not follow the BLAS/LAPACK naming conventions
torch.addmv
is roughly equivalent to axpy
, and performs \(Ax + y\)
m = 100
n = 100
device = torch.device('cuda') # 'cuda' or 'cpu'
Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)
A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.Tensor(ynp).to(device=device)
z = torch.addmv(y, A, x)
Let’s look at the timing difference between CPU and GPU
m = 1000
n = 1000
Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
ynp = np.random.randn(m)
print("numpy")
%time z = ynp + Anp @ xnp
for device in ('cpu', 'cuda'):
print(f"\ndevice = {device}")
A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.Tensor(ynp).to(device=device)
z = torch.addmv(y, A, x)
%time z = torch.addmv(y, A, x)
numpy
CPU times: user 3.56 ms, sys: 0 ns, total: 3.56 ms
Wall time: 1.01 ms
device = cpu
CPU times: user 655 µs, sys: 263 µs, total: 918 µs
Wall time: 299 µs
device = cuda
CPU times: user 168 µs, sys: 68 µs, total: 236 µs
Wall time: 65.8 µs
torch.mv
performs matrix-vector products
Anp = np.random.randn(m,n)
xnp = np.random.randn(n)
print("numpy")
%time z = Anp @ xnp
for device in ['cpu', 'cuda']:
print(f"\ndevice = {device}")
A = torch.Tensor(Anp).to(device=device)
x = torch.Tensor(xnp).to(device=device)
y = torch.mv(A, x)
%time y = torch.mv(A, x)
numpy
CPU times: user 2.81 ms, sys: 0 ns, total: 2.81 ms
Wall time: 577 µs
device = cpu
CPU times: user 379 µs, sys: 0 ns, total: 379 µs
Wall time: 311 µs
device = cuda
CPU times: user 124 µs, sys: 47 µs, total: 171 µs
Wall time: 42 µs
torch.mm performs matrix-matrix multiplications
m = 1000
n = 1000
Anp = np.random.randn(m,n)
Bnp = np.random.randn(n, n)
print("numpy")
%time C = Anp @ Bnp
for device in ['cpu', 'cuda']:
print(f"\ndevice = {device}")
A = torch.Tensor(Anp).to(device=device)
B = torch.Tensor(Bnp).to(device=device)
C = torch.mm(A, B) # run once to warm up
%time C = torch.mm(A, B)
numpy
CPU times: user 54.8 ms, sys: 11.6 ms, total: 66.3 ms
Wall time: 16.8 ms
device = cpu
CPU times: user 26.7 ms, sys: 0 ns, total: 26.7 ms
Wall time: 6.72 ms
device = cuda
CPU times: user 98 µs, sys: 34 µs, total: 132 µs
Wall time: 38.9 µs
Batch operations¶
Where PyTorch (and GPUs in general) really shine are in batch operations. We get extra efficiency if we do a bunch of multiplications with matrices of the same size.
For matrix-matrix multiplcation, the function is torch.bmm
Because tensors are row-major, we want the batch index to be the first index. In the below code, the batch multiplication is equivalent to
for i in range(k):
C[i] = A[i] @ B[i]
n = 512 # matrix size
k = 32 # batch size
Anp = np.random.randn(k, n, n)
Bnp = np.random.randn(k, n, n)
# see numpy matmul documentation for how this performs batch multiplication
print("numpy")
%time C = np.matmul(Anp, Bnp)
for device in ['cpu', 'cuda']:
print(f"\ndevice = {device}")
A = torch.randn(k, n, n).to(device=device)
B = torch.randn(k, n, n).to(device=device)
C = torch.bmm(A, B) # run once to warm up
%time C = torch.bmm(A, B)
numpy
CPU times: user 440 ms, sys: 44.4 ms, total: 484 ms
Wall time: 122 ms
device = cpu
CPU times: user 109 ms, sys: 27.1 ms, total: 136 ms
Wall time: 34.4 ms
device = cuda
CPU times: user 246 µs, sys: 51 µs, total: 297 µs
Wall time: 301 µs
Sparse Linear Algebra¶
PyTorch also supports sparse tensors in torch.sparse
. Tensors are stored in COOrdinate format.
i = torch.LongTensor([[0, 1, 1],
[2, 0, 2]])
v = torch.FloatTensor([3, 4, 5])
torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()
tensor([[0., 0., 3.],
[4., 0., 5.]])
indices are stored in a 2 x nnz
tensor of Long
(a datatype that stores integers). Values are stored as floats.
Exercise¶
Write a function that returns a sparse identity matrix of size n
in PyTorch.