Writing Functions
Overview
Teaching: 25 min
Exercises: 15 minQuestions
How do I write include functions in my code?
Objectives
Define functions to perform one computational task.
Use functions within code and inside
for
loops.Write functions that accept user inputs.
Assign default values for function inputs.
Why functions?
Most code is organized into blocks of code which perform a particular task. These code blocks are called functions. A commercial software package likely has hundreds of thousands or millions of functions. Functions break up our code into smaller, more easily understandable statements, and also allow our code to be more modular, meaning we can take pieces and reuse them. Functions also make your code easier to test, which we will see in a later lesson.
In general, each function should perform only one computational task.
Defining and running a function
In Python, the following syntax is used to declare a function:
def function_name(parameters):
** function body code **
return value_to_return
Functions are defined using the def
keyword, followed by the name of the function. The function may have parameters which are passed to it, which are in parenthesis following the function name. A function can have no parameters as well. Most (though not all) functions return some type of information. It is important to note here that defining a function does not execute it.
Writing functions into our geometry analysis project
Let’s go back and consider a possible solution for the geometry analysis project.
import numpy
import os
file_location = os.path.join('data', 'water.xyz')
xyz_file = numpy.genfromtxt(fname=file_location, skip_header=2, dtype='unicode')
symbols = xyz_file[:, 0]
coordinates = xyz_file[:, 1:]
coordinates = coordinates.astype(numpy.float)
num_atoms = len(symbols)
for num1 in range(0, num_atoms):
for num2 in range(0, num_atoms):
if num1 < num2:
x_distance = coordinates[num1, 0] - coordinates[num2, 0]
y_distance = coordinates[num1, 1] - coordinates[num2, 1]
z_distance = coordinates[num1, 2] - coordinates[num2, 2]
bond_length_12 = numpy.sqrt(x_distance ** 2 + y_distance ** 2 + z_distance ** 2)
if bond_length_12 > 0 and bond_length_12 <= 1.5:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
To think about where we should write functions in this code, let’s think about parts we may want to use again or in other places. One of the first places we might think of is in the bond distance calculation. Perhaps we’d want to calculate a bond distance in some other script. We can reduce the likelihood of errors in our code by defining this in a function (so that if we wanted to change our bond calculation, we would only have to do it in one place.)
Let’s change this code so that we write a function to calculate the bond distance. As explained above, to define a function, you start with the word def
and then give the name of the function. In parenthesis are in inputs of the function followed by a colon. The the statements the function is going to execute are indented on the next lines. For this function, we will return
a value. The last line of a function shows the return value for the function, which we can use to store a variable with the output value. Let’s write a function to calculate the distance between atoms.
def calculate_distance(atom1_coord, atom2_coord):
x_distance = atom1_coord[0] - atom2_coord[0]
y_distance = atom1_coord[1] - atom2_coord[1]
z_distance = atom1_coord[2] - atom2_coord[2]
bond_length_12 = numpy.sqrt(x_distance ** 2 + y_distance ** 2 + z_distance ** 2)
return bond_length_12
Now we can change our for
loop to just call the distance function we wrote above.
for num1 in range(0, num_atoms):
for num2 in range(0, num_atoms):
if num1 < num2:
bond_length_12 = calculate_distance(coordinates[num1], coordinates[num2])
if bond_length_12 > 0 and bond_length_12 <= 1.5:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
Next, let’s write another function that checks to see if a particular bond distance represents a bond. This function will be called bond_check
, and will return True
if the bond distance is within certain bounds (At first we’ll set this to be between 0 and 1.5 angstroms).
def bond_check(atom_distance):
if atom_distance > 0 and atom_distance <= 1.5:
return True
else:
return False
This is great! Our function will currently return True
if our bond distance is less than 1.5 angstrom.
Exercise
Modify the
bond_check
function to take a minimum length and a maximum length as user input.Answer
def bond_check(atom_distance, minimum_length, maximum_length): if atom_distance > minimum_length and atom_distance <= maximum_length: return True else: return False
Function Documentation
Recall from our work with tabular data that we were able to use help
to see a help message on a function. As a reminder, we used it like this.
help(numpy.genfromtxt)
Help on function genfromtxt in module numpy.lib.npyio:
genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=None, replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes')
Load data from a text file, with missing values handled as specified.
Each line past the first `skip_header` lines is split at the `delimiter`
character, and characters following the `comments` character are discarded.
Let’s try the same thing on our function.
help(calculate_distance)
Help on function calculate_distance in module __main__:
calculate_distance(atom1_coord, atom2_coord)
There is no help for our function! That is because you haven’t written it yet. In Python, we can document our functions using something called docstrings
. When you call help on something, it will display the docstring you have written. In fact, most Python libraries use docstrings and other automated tools to pull the docstrings out of the code to make online documentation. For example, see the documentation for the genfromtxt
function online.
To add a docstring to our function, we simply add a block quote directly underneath the function definition. We do this in in the same way we type a string, except that we use three quotation marks to open and close the string instead of one.
def calculate_distance(atom1_coord, atom2_coord):
"""Calculate the distance between two three-dimensional points."""
x_distance = atom1_coord[0] - atom2_coord[0]
y_distance = atom1_coord[1] - atom2_coord[1]
z_distance = atom1_coord[2] - atom2_coord[2]
bond_length_12 = numpy.sqrt(x_distance**2+y_distance**2+z_distance**2)
return bond_length_12
We are using a very simple docstring in this example. However, there are many formats for docstrings. Now, you should see a message when you call help on this function.
help(calculate_distance)
Help on function calculate_distance in module __main__:
calculate_distance(atom1_coord, atom2_coord)
Calculate the distance between two three-dimensional points
If you use a well-known format, you can use software to extract the docstring and make a webpage with your documentation. MolSSI recommends using numpy style docstrings. You can learn more about this in our Python Package Development Best Practices Workshop.
Help vs Online Documentation
Many python libraries we have used such as numpy and matplotlib have extensive online documentation. It is a good idea to use online documentation if it is available. Typically, documentation for functions will be pulled from docstrings in the code, but additional information the code developers have provided will also be available through online documentation.
However, if you are offline or using a library without online documentation, you can check for documentation using the
help
function.
Remember, help for your code only exists if you write it! Every time you write a function, you should take some time to also write a docstring describing what the function does.
Function Default arguments
When there are parameters in a function definition, we can set these parameters to default values. This way, if the user does not input values, the default values can be used instead of the code just not working. For example, if we want the default values in bond check to be 0 and 1.5, we can change the function definition to the following:
def bond_check(atom_distance, minimum_length=0, maximum_length=1.5):
"""
Check if a distance is a bond based on a minimum and maximum bond length.
"""
if atom_distance > minimum_length and atom_distance <= maximum_length:
return True
else:
return False
Let’s try out the function now.
print(bond_check(1.5))
print(bond_check(1.6))
True
False
However, we can overwrite minimum_length
or maximum_length
.
print(bond_check(1.6, maximum_length=1.6))
Now that we have our bond_check
function, we can use it in our for
loop to only print the bond lengths that are really bonds.
num_atoms = len(symbols)
for num1 in range(0, num_atoms):
for num2 in range(0, num_atoms):
if num1 < num2:
bond_length_12 = calculate_distance(coordinates[num1], coordinates[num2])
if bond_check(bond_length_12) is True:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
O to H1 : 0.969
O to H2 : 0.969
Exercise
We might also want to write a function that opens and processes our
xyz
file for us. Write a function called open_xyz which takes an xyz file as a parameter and returns the symbols and coordinates.Hint: You can return two values from a function by typing
return variable1, variable2
.Answer
def open_xyz(filename): xyz_file = numpy.genfromtxt(fname=filename, skip_header=2, dtype='unicode') symbols = xyz_file[:, 0] coord = (xyz_file[:, 1:]) coord = coord.astype(numpy.float) return symbols, coord
We’ve now written three functions. Using these functions, our script to print bonded atoms now looks like this:
import numpy
import os
file_location = os.path.join('data', 'water.xyz')
symbols, coord = open_xyz(file_location)
num_atoms = len(symbols)
for num1 in range(0, num_atoms):
for num2 in range(0, num_atoms):
if num1 < num2:
bond_length_12 = calculate_distance(coord[num1], coord[num2])
if bond_check(bond_length_12) is True:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
Time Check
If you are running out of time, this is a good place to end the lesson. You can still complete the testing lesson even if you only got this far.
You can probably think of a further extension to use functions here. What if we wanted to print the bonds for another xyz
file besides water? One option would be to copy and paste the two two for
loops we’ve written. However, the smarter move is to put them in a function.
We could encapsulate the for
loops into a function as well. Let’s do this, then perform the same analysis using both the water.xyz
file and the benzene.xyz
file.
First, we will define a new function print_bonds
, which takes bond symbols and coordinates from an xyz
file as an input.
def print_bonds(atom_symbols, atom_coordinates):
num_atoms = len(atom_symbols)
for num1 in range(0, num_atoms):
for num2 in range(0, num_atoms):
if num1 < num2:
bond_length_12 = calculate_distance(atom_coordinates[num1], atom_coordinates[num2])
if bond_check(bond_length_12) is True:
print(F'{atom_symbols[num1]} to {atom_symbols[num2]} : {bond_length_12:.3f}')
If you were to put all the functions we wrote into a single cell, it looks like this:
import numpy
import os
def calculate_distance(atom1_coord, atom2_coord):
"""
Calculate the distance between two three-dimensional points.
"""
x_distance = atom1_coord[0] - atom2_coord[0]
y_distance = atom1_coord[1] - atom2_coord[1]
z_distance = atom1_coord[2] - atom2_coord[2]
bond_length_12 = numpy.sqrt(x_distance ** 2 + y_distance ** 2 + z_distance ** 2)
return bond_length_12
def bond_check(atom_distance, minimum_length=0, maximum_length=1.5):
"""Check if a distance is a bond based on a minimum and maximum bond length"""
if atom_distance > minimum_length and atom_distance <= maximum_length:
return True
else:
return False
def open_xyz(filename):
"""
Open and read an xyz file. Returns tuple of symbols and coordinates.
"""
xyz_file = numpy.genfromtxt(fname=filename, skip_header=2, dtype='unicode')
symbols = xyz_file[:,0]
coord = (xyz_file[:,1:])
coord = coord.astype(numpy.float)
return symbols, coord
def print_bonds(atom_symbols, atom_coordinates):
"""
Prints atom symbols and bond length for a set of atoms.
"""
num_atoms = len(atom_symbols)
for num1 in range(0, num_atoms):
for num2 in range(0, num_atoms):
if num1 < num2:
bond_length_12 = calculate_distance(atom_coordinates[num1], atom_coordinates[num2])
if bond_check(bond_length_12) is True:
print(F'{atom_symbols[num1]} to {atom_symbols[num2]} : {bond_length_12:.3f}')
We can now open an arbitrary xyz
file and print the bonded atoms. For example, to do this for water and benzene, we could execute a cell like this:
water_file_location = os.path.join('data', 'water.xyz')
water_symbols, water_coords = open_xyz(water_file_location)
benzene_file_location = os.path.join('data', 'benzene.xyz')
benzene_symbols, benzene_coords = open_xyz(benzene_file_location)
print(F'Printing bonds for water.')
print_bonds(water_symbols, water_coords)
print(F'Printing bonds for benzene.')
print_bonds(benzene_symbols, benzene_coords)
Printing bonds for water.
O to H1 : 0.969
O to H2 : 0.969
Printing bonds for benzene.
C to H : 1.088
C to C : 1.403
C to C : 1.403
C to H : 1.088
C to C : 1.403
C to H : 1.088
C to C : 1.403
C to H : 1.088
C to C : 1.403
C to H : 1.088
C to C : 1.403
C to H : 1.088
Extension
In earlier lessons, we used
glob
to process multiple files. How could you useglob
to print bonds for all the xyz files?Solution
import glob xyz_files = glob.glob("data/*.xyz") for xyz_file in xyz_files: atom_symbols, atom_coords = open_xyz(xyz_file) print("Printing bonds for ", xyz_file) print_bonds(atom_symbols, atom_coords)
Key Points
Functions make your code easier to read, more reuseable, more portable, and easier to test.
If a function returns True or False, you can use it in an
if
statement as a condition.