Preparing to Plot

In this workshop, we will use matplotlib to visualize Hydrogen Atom orbitals. If you’ve taken an introductory quantum chemistry class, you will have seen visualizations of these before, most likely in your textbook.

You can see some visualizations here. Even if you haven’t yet taken quantum chemistry, the shapes of the s and p orbitals will probably be familar to you from your introductory chemistry classes.

We will be working with pre-calculated data that is in text files. As part of the set-up you should have downloaded these files, and we will explain what is in them as we visualize them. For the purpose of this workshop, it’s not important that you have a deep understanding of the data, or that you understand how it was calculated.

Reading Data using Pandas

In order to plot and create visualizations with our data, we first have to get it into a form that python can recongize and work with. We will be using the python library pandas. To read in our text files. Pandas is a library that is very widely used in data science.

Using pandas, you can read data into python and work with your data. The data we want to work with are in .csv files, or comma separated value files. The first file we will work with is called s_orbitals_1D.csv. This text file contains the value of the 1s, 2s and 3s orbitals in the xy plane for different values of x. If you examine this in a text editor, you will see there is a header, rows, and that values are separated by commas.

Pandas can easily read this type of file using a function called read_csv. You can also open files like this in Excel, or even save csv files from Excel.

First, we will need to import pandas. The pandas library is usually shortened to pd. We will use the function pd.read_csv to read the csv file. We give the read_csv function the path to the file we want to read.

import pandas as pd

s_orbitals = pd.read_csv("s_orbitals_1D.csv")

Our data is now in the variable called s_orbitals. This variable is something called a pandas dataframe. It resembles a spreadsheet - it has rows and columns. We can see a preview of what is in the variable by using s_orbitals.head(). It will show us the first five rows.

s_orbitals.head()
r 1s 2s 3s
0 0.000000 0.564190 0.199471 0.108578
1 0.517241 0.336349 0.088163 0.061683
2 1.034483 0.200519 0.034225 0.029966
3 1.551724 0.119542 0.009473 0.009313
4 2.068966 0.071266 -0.000869 -0.003390

You can see from the above preview that we have something that resembles a spreadsheet. Our first column contains x values, while the following columns contain values for the 1s, 2s, and 3s orbitals. We are going to plot these eventualy, but first we have to understand a little more about how we can use the data we have read in.

A Brief Introduction to Dataframes

As stated previously, pandas stores data in rows and columns. You will see above that the rows are numbered and the columns have names. There are a few ways we can access information in a dataframe.

To get a column, we use the syntax

dataframe["column_name"]

For example, to get the r column of the dataframe we have read in, we would put the column name (“x”). It is very important that this column name be in quotes and that capitalization match what is in the dataframe.

s_orbitals["r"]
0      0.000000
1      0.517241
2      1.034483
3      1.551724
4      2.068966
5      2.586207
6      3.103448
7      3.620690
8      4.137931
9      4.655172
10     5.172414
11     5.689655
12     6.206897
13     6.724138
14     7.241379
15     7.758621
16     8.275862
17     8.793103
18     9.310345
19     9.827586
20    10.344828
21    10.862069
22    11.379310
23    11.896552
24    12.413793
25    12.931034
26    13.448276
27    13.965517
28    14.482759
29    15.000000
Name: r, dtype: float64

We can also “slice” the dataframe to get only a portion of it. We would do this if we wanted only some of the rows, for example. The following syntax shows how to get information if we want to use row and column numbers.

dataframe.iloc[row_slice, column_slice]

We use the same slicing syntax that we use for python lists and numpy arrays. For example, to get the first 10 rows of the first two columns:

s_orbitals.iloc[:10, :2]
r 1s
0 0.000000 0.564190
1 0.517241 0.336349
2 1.034483 0.200519
3 1.551724 0.119542
4 2.068966 0.071266
5 2.586207 0.042486
6 3.103448 0.025329
7 3.620690 0.015100
8 4.137931 0.009002
9 4.655172 0.005367

Pandas is a very useful library for manipulating data in python. We won’t be using it extensively for this workshop, but we will need to use it a little to select the data which we would like to plot. If you would like to learn more about the capabilities of pandas, see this lesson from The Molecular Sciences Software Institute.