Basic Containers and Packages¶

Python Lists¶

Lists are a data structure provided as part of the Python language. Lists and more on lists..

A list is compound data type which is a mutable, indexed, and ordered collection of data.

Lists are often constructed using square brackets: [...]

x0 = [1, 2, 3] # list of numbers
print(x0)
print(type(x0))

[1, 2, 3]
<class 'list'>

x1 = ['hello', 'world'] # list of strings
print(x1)
print(type(x1))

['hello', 'world']
<class 'list'>

x = [x0, x1] # list of lists
print(x)
print(type(x))

[[1, 2, 3], ['hello', 'world']]
<class 'list'>

Usually lists will be generated programatically. One way you can do this is by using the append method

x = [] # empty list
for i in range(5):
    x.append(i)
x

[0, 1, 2, 3, 4]

or the extend method, which appends all elements in another list

x = [1,2,3]
y = [4,5,6]
x.extend(y) # extends x by y
x

[1, 2, 3, 4, 5, 6]

you can also extend lists using the + operator

[1,2,3] + [4,5,6]

[1, 2, 3, 4, 5, 6]

You can also generate lists using list comprehensions. Comprehensions are “Pythonic” which is a vauge term roughly meaning

Pythonic: “something a Python programmer would write”

x = [i for i in range(5)]
x

[0, 1, 2, 3, 4]

x = [i * i for i in range(5)]
x

[0, 1, 4, 9, 16]

Generally, comprensions consist of [expression loop conditional]

This looks a lot like set notation in mathematics. E.g. for the set $$y = \{i \mid i \in x, i \ne 4\}$$ we compute

y = [i for i in x if i != 4]
y

[0, 1, 9, 16]

Indexing¶

Python is 0-indexed (like C, unlike fortran/Matlab). This means a list of length n will have indices that start at 0, and end at n-1.

This is the reason why range(n) iterates through the range 0,...,n-1

words = ["dog", "cat", "house"]
print(words[0])
print(words[1])

dog
cat

you can access elements starting at the back of the array using negative integers. A good way to think of this is the index -1 translates to n-1

print(words[-1])
print(words[-2])

house
cat

Slicing - you can use the colon character : to slice an array. The syntax is start:end:stride

x = [i for i in range(10)]
print(x)
print(x[:])
print(x[2:4])
print(x[2:9:3])
print(x[-3:-1])
print(x[::2])
print(x[::-1])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[2, 3]
[2, 5, 8]
[7, 8]
[0, 2, 4, 6, 8]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Lists are mutable, which means you can change elements

words = ["dog", "cat", "house"]
print(words)

words[0] = "mouse"
print(words)

['dog', 'cat', 'house']
['mouse', 'cat', 'house']

Other Python Collections¶

There are other collections you might use in Python:

Tuples (...) are ordered, indexed, and immutable
Sets {...} are unordered, unindexed, and mutable
Dictionaries {...} are unordered, indexed, and mutable

These collections also support comprehensions.

You can find additional types of collections in the Collections module

x = (1,2,3) # tuple
print(x)
x = (i for i in range(1,4)) # tuple comprehension
print(tuple(x))

(1, 2, 3)
(1, 2, 3)

s = {1,2,3} # set
1 in s

True

s = {i for i in range(1,4)}
s

{1, 2, 3}

d = {'hello' : 0, 'goodbye': 1} # dictionary
print(type(d))
d['hello']

<class 'dict'>

d = {key: val for val, key in enumerate(['hello', 'goodbye'])}
d

{'hello': 0, 'goodbye': 1}

Numpy¶

If you haven’t already:

conda install numpy

Numpy is perhaps the fundamental scientific computing package for Python - just about every other package for scientific computing uses it.

Numpy basically provides a ndarray type (n-dimensional array), and provides fast operations for arrays (i.e. compiled C or Fortran).

We’ll do some deeper dives into numpy in future lectures. For now, we’ll cover some basics. For those who want to dive in now, here are some tutorials

Absolute Beginner Tutorial
Quickstart Tutorial - for those with more experience in other languages

You can find lots of information in the numpy documentation

import numpy as np # import numpy into the np namespace

You can easily generate numpy arrays from list data

x = np.array([1,2,3])
print(x)
print(type(x))

[1 2 3]
<class 'numpy.ndarray'>

A 2-dimensional array can be generated by lists of lists

x = np.array([[1,2,3], [4,5,6]])
print(x)

[[1 2 3]
 [4 5 6]]

a few class members:

print(x.ndim) # number of dimensions
print(x.shape) # shape of array
print(x.size) # total number of elements in array
print(x.dtype) # data type
print(x.itemsize) # number of bytes for data type
print(x.data) # buffer location in memory
print(x.flags) # some flags

2
(2, 3)
6
int64
8
<memory at 0x7f52bcf60e10>
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Other ways of obtaining numpy arrays:

# array from a range
a = np.arange(3)
print(a)

# an array of 1
a = np.ones((3,2), dtype=np.float)
print(a)

# just an array with no initialization - WARNING: data can be anything
a = np.empty((3,2), dtype=np.float)
print(a)

# random normal data
a = np.random.normal(size=(3,2))
print(a)

[0 1 2]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[-0.02886968 -0.28269196]
 [-0.48312791 -0.3424373 ]
 [ 0.25438968  0.29092363]]

Indexing¶

1-dimensional arrays are indexed in the same way as lists (0-indexed, can use slices, etc)

a = np.arange(4)
print(a[:2]) 

[0 1]

you can also index using lists of indices

inds = [0,2]
a[inds] 

array([0, 2])

2-dimensional arrays are a bit different from lists of lists:

a = [[0,1],[2,3]] # list of lists
print(a)
a[1][1] # like indexing in C

[[0, 1], [2, 3]]

anp = np.array(a) # 2-dimensional array
anp[1,1] # like indexing in Matlab, Julia

you can also use slices, index sets, etc. in multi-dimensional arrays.

If you only provide 1 index, you’ll get the corresponding row (or set of rows if slicing)

anp[0]

array([0, 1])

Arithmetic¶

Numpy arrays support basic element-wise arithmetic, assuming arrays are the same shape.

Note: there are more complicated broadcasting rules for different-shaped arrays, which we’ll cover some other time.

x = np.arange(4)
print(x)
print(x + x)
print(x * x)
print(x**3)

def f(x):
    return x**2 - 2*x + 1

print(f(x))

[0 1 2 3]
[0 2 4 6]
[0 1 4 9]
[ 0  1  8 27]
[1 0 1 4]

Warning: The * operator applied to 2-dimensional arrays is not the same as matrix-matrix multiplication. It will perform element-wise multiplication instead.

Numpy provides the @ operator for matrix multiplication. You can also use the matmul() or dot() (dot product) methods.

x = np.arange(4).reshape(2,2)
print(x)
print(x*x)
print(x @ x)
print(x.dot(x))
print(np.matmul(x,x))

[[0 1]
 [2 3]]
[[0 1]
 [4 9]]
[[ 2  3]
 [ 6 11]]
[[ 2  3]
 [ 6 11]]
[[ 2  3]
 [ 6 11]]

Numpy provides a variety of mathematics functions that you can use with numpy arrays. Numpy is vectorized, meaning that it is typically much faster to perform array operations than to use explicit for loops. This should be a familiar concept to Matlab users.

np.sin(x)

array([[0.        , 0.84147098],
       [0.90929743, 0.14112001]])

import time
n = 1_000_000
# list data
x = [i/n for i in range(n)]
# numpy array
xnp = np.array(x)

# square elements in-place
t0 = time.monotonic()
for i in range(n):
    x[i] = x[i] * x[i]
t1 = time.monotonic()
print("time for loop over list: {:.3} sec.".format(t1 - t0))

t0 = time.monotonic()
xnp = xnp * xnp
t1 = time.monotonic()
print("time for numpy vectorization: {:.3} sec.".format(t1 - t0))

time for loop over list: 0.147 sec.
time for numpy vectorization: 0.00184 sec.

PyPlot¶

PyPlot is a go-to plotting tool for Python. It is fully operable with numpy arrays.

conda install matplotlib

import matplotlib.pyplot as plt

# plot a single function
x = np.linspace(-1,1,100)
y = x * x
plt.plot(x,y)
plt.show()

# plot multiple functions
x = np.linspace(-1,1,100)
for n in range(5):
    plt.plot(x,x**n, label=f"x^{n}")
plt.legend()
plt.xlabel("x")
plt.title("Simple polynomials")
plt.show()

CSV files, Pandas¶

The *.csv extension is typically used to denote a “comma seperated value” file. These types of files are often used to store arrays in human-readable plain text.

Here’s an example:

0, 1, 2, 3
4, 5, 6, 7
...

You can save numpy arrays to files using np.savetxt()

# generates example.csv
n = 1000
x = np.arange(4*n).reshape(-1,4)
np.savetxt("example.csv", x, fmt="%d", delimiter=',')

Files can be loaded using np.loadtxt()

y = np.loadtxt('example.csv', dtype=np.int, delimiter=',')
y

array([[   0,    1,    2,    3],
       [   4,    5,    6,    7],
       [   8,    9,   10,   11],
       ...,
       [3988, 3989, 3990, 3991],
       [3992, 3993, 3994, 3995],
       [3996, 3997, 3998, 3999]])

Often, scientific data has some meaning associated with numbers. In this case, the csv file might have a header, and every row is a different data point.

temperature, density, width, length
0, 1, 2, 3
4, 5, 6, 7
...

You can still load using numpy, but it is easy to loose track of what the different columns of the array mean.

The solution for this sort of data is to use a Pandas dataframe

conda install pandas

import pandas as pd

data = pd.read_csv('example.csv', header=None, sep=',')
data

	0	1	2	3
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11
3	12	13	14	15
4	16	17	18	19
...	...	...	...	...
995	3980	3981	3982	3983
996	3984	3985	3986	3987
997	3988	3989	3990	3991
998	3992	3993	3994	3995
999	3996	3997	3998	3999

1000 rows × 4 columns

# this will set the header identitites and save the file
data = pd.read_csv('example.csv', header=None, names=["temperature", "density", "width", "length"])
data.to_csv("example2.csv", index=False) # writes to csv with headers
data

	temperature	density	width	length
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11
3	12	13	14	15
4	16	17	18	19
...	...	...	...	...
995	3980	3981	3982	3983
996	3984	3985	3986	3987
997	3988	3989	3990	3991
998	3992	3993	3994	3995
999	3996	3997	3998	3999

1000 rows × 4 columns

data2 = pd.read_csv("example2.csv") # read csv with headers
data2

	temperature	density	width	length
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11
3	12	13	14	15
4	16	17	18	19
...	...	...	...	...
995	3980	3981	3982	3983
996	3984	3985	3986	3987
997	3988	3989	3990	3991
998	3992	3993	3994	3995
999	3996	3997	3998	3999

1000 rows × 4 columns

You can get columns of a dataframe by using the column label

data2['temperature']

       0
       4
       8
      12
      16
       ... 
  3980
  3984
  3988
  3992
  3996
Name: temperature, Length: 1000, dtype: int64

To get rows, use the iloc parameter:

data2.iloc[1:3]

	temperature	density	width	length
1	4	5	6	7
2	8	9	10	11

You can easily plot labeled columns

data.plot('temperature')
plt.show()

Scientific Computing with Python

Basic Containers and Packages

Contents

Basic Containers and Packages¶

Python Lists¶

Indexing¶

Other Python Collections¶

Numpy¶

Indexing¶

Arithmetic¶

PyPlot¶

CSV files, Pandas¶