Working with Numpy Arrays
Overview
Teaching: 40 min
Exercises: 20 minQuestions
What are the differences between numpy arrays and lists?
How can I use NumPy to do calculations?
Objectives
Be able to name the differences between Python lists and numpy arrays.
Understand the idea of broadcasting
Numpy is a widely used Python library for scientific computing. It has a number of useful features, including the a data structure called an array which we will be using in our domain projects througout the summer school.
NumPy Arrays vs. Python Lists
Previously, you have worked with the built-in types of lists
. NumPy arrays seem similar, but offer some distinct advantages. Numpy arrays take up less space, are faster, and have more associated mathematical operations associated with them. However, unlike lists, they elements all have to be the same type.
There are also differences in how lists and numpy arrays behave. Let’s look at some of these.
To use the numpy library, we have to import it. When numpy
is imported, it is often shortened to np
.
import numpy as np
# Create some lists
r1 = [0.0, 0.1, 0.0]
r2 = [0.0, 0.0, 0.0]
# Create some numpy arrays
r1_array = np.array(r1)
r2_array = np.array(r2)
In this code block, we’ve created two lists (r1
and r2
). We then created numpy array versions of these lists.
When we print r1
and r1_array
, the results look very similar.
print(r1)
print(r1_array)
[0.0, 0.1, 0.0]
[0. 0.1 0. ]
However, one way numpy arrays and lists vary significantly is that you can easily perform element-wise operations on numpy arrays without loops. You can add two arrays together, multiply arrays by scalars, or do element-wise multiplcation of arrays.
For example, you can multiply every element in a numpy array by 10
print(10*r1_array)
[0. 1. 0.]
or, you can multiply two numpy arrays to get their element-wise product.
But if you multiply r1 (which is a list), by 10, the list will just be repeated 10 times.
print(r1*10)
[0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0, 0.0, 0.1, 0.0]
Check your understanding
What does the following code result in?
a1 = np.array([2, 1, 0]) a2 = np.array([1, 3, 5]) print(a1 * a2)
What happens if a1 and a2 are lists?
Answer
The code block results in
[2 3 0]
The first element is
a1[0]*a2[0]
, the second element isa1[1]*a2[1]
, and the third element isa1[2]*a2[2]
.If a1 and a2 were lists instead, you would get a
TypeError
.TypeError: can't multiply sequence by non-int of type 'list'
.
Broadcasting
The multiplication of r1_array*10
is an example of what is called broadcasting in NumPy. This occurs when arrays have different shapes. If possible, the smaller array is “broadcast” across the larger array. For example, in
r1_array * 10
You can think of the scalar 10
being stretched during the operation to be the same size and shape as r1_array
.
Returning to the geometry analysis project
In the Python Data and Scripting Workshop, you had a homework assignment to analyze an xyz file, find the bonds, and print bond lengths.
Today, we will extend that project. We will look at using numpy for the bond length function, and writing a new function to analyze bond angles.
Recall that a solution given for a function for calculating distances between two points was the following
def calculate_distance(rA, rB):
"""Calculate distance between points A and B"""
x_dist = (rA[0] - rB[0]) ** 2
y_dist = (rA[1] - rB[1]) ** 2
z_dist = (rA[2] - rB[2]) ** 2
distance = np.sqrt(x_dist + y_dist + z_dist)
return distance
This function would work for both lists and numpy arrays, because it does not assume that rA and rB can do something like element-wise subtraction.
Exercise
Using what you’ve learned about numpy arrays, rewrite the
calculate_distance
function to use the features of numpy arrays.Answer
If rA and rB are both numpy arrays, we can use element-wise subtraction
def calculate_distance(rA, rB): """Calculate the distance between points A and B""" AB = (rB - rA)**2 distance = np.sqrt(np.sum(AB)) return distance
You might also have used the
numpy
functionnp.linalg.norm
which calculates the magnitude of a given vector.def calculate_distance(rA, rB): """Calculate the distance between points A and B""" dist_vec = (rA-rB) distance = np.linalg.norm(dist_vec) return distance
Redefine your original distance function as calculate_distance_list
. Using both, we see that both functions give the same answer.
print(calculate_distance_list(r1, r2))
print(calculate_distance(r1_array, r2_array))
Some numpy functions - np.linspace, np.zeros, np.reshape
Let’s try out our new function with more test data. We will need to generate an array of 3D points. We will need the dimension to be `(n, 3) where n is the number of atoms.
There are many ways you could generate this test array. For the solution here, we’ll generate 10 evenly-spaced 3 dimensional points.
First, we’ll create a numpy array of evenly spaced points using the function np.linspace
.
help(np.linspace)
First, create 30 evenly spaced points between 0 and 10.
test_array = np.linspace(0, 10, num=30)
print(F'The shape of our test array is {test_array.shape}')
print(F'Our array has {test_array.size} elements\n')
print(test_array)
The shape of our test array is (30,)
Our array has 30 elements
[ 0. 0.34482759 0.68965517 1.03448276 1.37931034 1.72413793
2.06896552 2.4137931 2.75862069 3.10344828 3.44827586 3.79310345
4.13793103 4.48275862 4.82758621 5.17241379 5.51724138 5.86206897
6.20689655 6.55172414 6.89655172 7.24137931 7.5862069 7.93103448
8.27586207 8.62068966 8.96551724 9.31034483 9.65517241 10. ]
To visualize the data, type
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
plt.plot(test_array)
Next, let’s reshape this array so that it something that we’re more used to looking at. Points in our function are assumed to have three parts (x, y, z). We have 30 points in test array, so we will want to reshape it to have 10 rows and 3 columns. We can do this by using the reshape
function in numpy.
reshaped_array = test_array.reshape((10,3))
print(reshaped_array)
print(F'Our reshaped array has {reshaped_array.size} elements\n')
Our reshaped array has 30 elements.
[[ 0. 0.34482759 0.68965517]
[ 1.03448276 1.37931034 1.72413793]
[ 2.06896552 2.4137931 2.75862069]
[ 3.10344828 3.44827586 3.79310345]
[ 4.13793103 4.48275862 4.82758621]
[ 5.17241379 5.51724138 5.86206897]
[ 6.20689655 6.55172414 6.89655172]
[ 7.24137931 7.5862069 7.93103448]
[ 8.27586207 8.62068966 8.96551724]
[ 9.31034483 9.65517241 10. ]]
We can also graph this on a 3D axis.
ax = plt.axes(projection='3d')
ax.scatter3D(reshaped_array[:,0], reshaped_array[:,1], reshaped_array[:,2])
Now, instead of having a 1 dimensional array of 30 points which go from 0 to 10, we have a 3 dimensional array of 10 points which go from (0, 0.34482759, 0.68965517) to (9.31034483, 9.65517241, 10.). Let’s use these points in our distance function to make sure that neighboring points are evenly spaced.
for i in range(len(reshaped_array)-1):
p1 = reshaped_array[i]
p2 = reshaped_array[i+1]
distance = calculate_distance(p1,p2)
print(F'{i} to {i+1} : {distance}')
0 to 1 : 1.7917766974850455
1 to 2 : 1.7917766974850455
2 to 3 : 1.7917766974850458
3 to 4 : 1.7917766974850455
4 to 5 : 1.7917766974850458
5 to 6 : 1.7917766974850458
6 to 7 : 1.7917766974850453
7 to 8 : 1.7917766974850462
8 to 9 : 1.7917766974850446
We see that we have 10 evenly spaced points.
Check your understanding
Edit your code so that the answers are stored in a Python dictionary where the keys are the labels ‘0 to 1’… and the values are the distances.
Answer
# Create empty dictionary distances = {} for i in range(len(reshaped_array)-1): p1 = reshaped_array[i] p2 = reshaped_array[i+1] distance = calculate_distance(p1,p2) distances[F'{i} to {i+1}'] = distance
More often, what you might want to do store the answer in a numpy array. If you know the size of your array, you can preallocte it using np.zeros
.
distances = np.zeros(len(reshaped_array)-1)
for i in range(len(reshaped_array)-1):
p1 = reshaped_array[i]
p2 = reshaped_array[i+1]
distance = calculate_distance(p1,p2)
distances[i] = distance
print(distances)
[1.7917767 1.7917767 1.7917767 1.7917767 1.7917767 1.7917767 1.7917767
1.7917767 1.7917767]
Exercise
Can you think of a way to create 10 evenly spaced points which go from (0, 0, 0) to (10, 10 , 10) using NumPy?
Answer
These are two solutions, though there may be others.
# Create an array of zeros of the desired size. fill_array = np.zeros([10, 3]) dim = np.linspace(0, 10, num=10) fill_array[:,0] = dim fill_array[:,1] = dim fill_array[:,2] = dim # or, you could have called linspace in the following way second_method = np.linspace((0,0,0), (10,10,10), num=10)
Make a point about slicing and copying
This has been a general overview of some of the things that are possible with NumPy. NumPy is very powerful has many more functionalities that you can use. For a better idea of these, check out the NumPy Documentation.
Key Points
NumPy arrays which are the same size use element-wise operations when added or subtracted
NumPy uses something called broadcasting for arrays which are not the same size to allow arrays to be added or multiplied.
NumPy has several functions to create arrays such as linspace (linearly spaced array) and zeros (create an array of 0’s)