Preparing to Plot
Contents
Preparing to Plot#
In this workshop, we will use matplotlib
to visualize Hydrogen Atom orbitals. If you’ve taken an introductory quantum chemistry class, you will have seen visualizations of these before, most likely in your textbook.
You can see some visualizations here. Even if you haven’t yet taken quantum chemistry, the shapes of the s and p orbitals will probably be familar to you from your introductory chemistry classes.
We will be working with pre-calculated data that is in text files. As part of the set-up you should have downloaded these files, and we will explain what is in them as we visualize them. For the purpose of this workshop, it’s not important that you have a deep understanding of the data, or that you understand how it was calculated.
Reading Data using Pandas#
In order to plot and create visualizations with our data, we first have to get it into a form that python can recongize and work with. We will be using the python library pandas. To read in our text files. Pandas is a library that is very widely used in data science.
Using pandas
, you can read data into python and work with your data. The data we want to work with are in .csv
files, or comma separated value
files. The first file we will work with is called s_orbitals_1D.csv
. This text file contains the value of the 1s, 2s and 3s orbitals in the xy plane for different values of x. If you examine this in a text editor, you will see there is a header, rows, and that values are separated by commas.
Pandas can easily read this type of file using a function called read_csv
. You can also open files like this in Excel, or even save csv files from Excel.
First, we will need to import pandas. The pandas library is usually shortened to pd
. We will use the function pd.read_csv
to read the csv file. We give the read_csv
function the path to the file we want to read.
import pandas as pd
s_orbitals = pd.read_csv("s_orbitals_1D.csv")
Our data is now in the variable called s_orbitals
. This variable is something called a pandas dataframe. It resembles a spreadsheet - it has rows and columns. We can see a preview of what is in the variable by using s_orbitals.head()
. It will show us the first five rows.
s_orbitals.head()
r | 1s | 2s | 3s | |
---|---|---|---|---|
0 | 0.000000 | 0.564190 | 0.199471 | 0.108578 |
1 | 0.517241 | 0.336349 | 0.088163 | 0.061683 |
2 | 1.034483 | 0.200519 | 0.034225 | 0.029966 |
3 | 1.551724 | 0.119542 | 0.009473 | 0.009313 |
4 | 2.068966 | 0.071266 | -0.000869 | -0.003390 |
You can see from the above preview that we have something that resembles a spreadsheet. Our first column contains x values, while the following columns contain values for the 1s
, 2s
, and 3s
orbitals. We are going to plot these eventualy, but first we have to understand a little more about how we can use the data we have read in.
A Brief Introduction to Dataframes#
As stated previously, pandas stores data in rows and columns. You will see above that the rows are numbered and the columns have names. There are a few ways we can access information in a dataframe.
To get a column, we use the syntax
dataframe["column_name"]
For example, to get the r
column of the dataframe we have read in, we would put the column name (“x”). It is very important that this column name be in quotes and that capitalization match what is in the dataframe.
s_orbitals["r"]
0 0.000000
1 0.517241
2 1.034483
3 1.551724
4 2.068966
5 2.586207
6 3.103448
7 3.620690
8 4.137931
9 4.655172
10 5.172414
11 5.689655
12 6.206897
13 6.724138
14 7.241379
15 7.758621
16 8.275862
17 8.793103
18 9.310345
19 9.827586
20 10.344828
21 10.862069
22 11.379310
23 11.896552
24 12.413793
25 12.931034
26 13.448276
27 13.965517
28 14.482759
29 15.000000
Name: r, dtype: float64
We can also “slice” the dataframe to get only a portion of it. We would do this if we wanted only some of the rows, for example. The following syntax shows how to get information if we want to use row and column numbers.
dataframe.iloc[row_slice, column_slice]
We use the same slicing syntax that we use for python lists and numpy arrays. For example, to get the first 10 rows of the first two columns:
s_orbitals.iloc[:10, :2]
r | 1s | |
---|---|---|
0 | 0.000000 | 0.564190 |
1 | 0.517241 | 0.336349 |
2 | 1.034483 | 0.200519 |
3 | 1.551724 | 0.119542 |
4 | 2.068966 | 0.071266 |
5 | 2.586207 | 0.042486 |
6 | 3.103448 | 0.025329 |
7 | 3.620690 | 0.015100 |
8 | 4.137931 | 0.009002 |
9 | 4.655172 | 0.005367 |
Pandas is a very useful library for manipulating data in python. We won’t be using it extensively for this workshop, but we will need to use it a little to select the data which we would like to plot. If you would like to learn more about the capabilities of pandas, see this lesson from The Molecular Sciences Software Institute.