Running code from the Linux Command Line
Overview
Teaching: 15 min
Exercises: 10 minQuestions
How do I move my code from the interactive Jupyter notebook to run from the Linux command line?
How do I make Python scripts with help messages and user inputs using argparse?
Objectives
Make code executable from the Linux command line.
Use argparse to accept user inputs.
Creating and running a python input file
We are now going to move our geometry analysis code out of the Jupyter notebook and into a format that can be run from the Linux command line. Open your favorite text editor and create a new file called “geom_analysis.py” (or choose another filename, just make sure the extension is .py). Paste in your geometry analysis code (the version with your functions) from your jupyter notebook and save your file.
The best practice is to put all your functions at the top of the file, right after your import statements. Your file will look something like this.
import numpy
import os
def calculate_distance(atom1_coord, atom2_coord):
x_distance = atom1_coord[0] - atom2_coord[0]
y_distance = atom1_coord[1] - atom2_coord[1]
z_distance = atom1_coord[2] - atom2_coord[2]
bond_length_12 = numpy.sqrt(x_distance**2+y_distance**2+z_distance**2)
return bond_length_12
def bond_check(atom_distance, minimum_length=0, maximum_length=1.5):
if atom_distance > minimum_length and atom_distance <= maximum_length:
return True
else:
return False
def open_xyz(filename):
xyz_file = numpy.genfromtxt(fname=filename, skip_header=2, dtype='unicode')
symbols = xyz_file[:,0]
coord = (xyz_file[:,1:])
coord = coord.astype(numpy.float)
return symbols, coord
file_location = os.path.join('data', 'water.xyz')
symbols, coord = open_xyz(file_location)
num_atoms = len(symbols)
for num1 in range(0,num_atoms):
for num2 in range(0,num_atoms):
if num1<num2:
bond_length_12 = calculate_distance(coord[num1], coord[num2])
if bond_check(bond_length_12) is True:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
Exit your text editor and go back to the command line. Now all you have to do to run your code is type
$ python geom_analysis.py
in your Terminal window. Your code should either print the output to the screen or write it to a file, depending on what you have it set up to do. (The code example given prints to the screen.)
Changing your code to accept user inputs
In your current code, the name of the xyzfile to analyze, “water.xyz”, is hardcoded; in order to change it, you have to open your code and change the name of the file that is read in. If you were going to use this code to analyze geometries in your research, you would probably want to be able to specify the name of the input file when you run the code, so that you don’t have to change it every single time. You might want to use the script like this:
$ python geom_analysis.py water.xyz
These types of user inputs are called arguments and to make our code accept arguments, we have to import a new python library in our code.
Open your geometry analysis code in your text editor and add this line at the top.
import argparse
We are importing a library called https://docs.python.org/3/library/argparse.html which can be used to easily make scripts with command line arguments. Argparse
has the ability to allow us to easily write documentation for our scripts as well.
We tell argparse that we want to add a command line interface. The syntax for this is
parser = argparse.ArgumentParser(description="This script analyzes a user given xyz file and outputs the length of the bonds.")
We’ve included a description of the script for our users using description=
. This description does not need to explain what the arguments are, that will be done automatically for us in the next steps.
Next, we have to tell argparse
what arguments it should expect. In general, the syntax for this is
parser.add_argument("argument_name", help="Your help message for this argument.")
Let’s add one for the xyz file the user should specify.
parser.add_argument("xyz_file", help="The filepath for the xyz file to analyze.")
Next, we have to get the arguments.
args = parser.parse_args()
Our arguments are in the args
variable. We can get the value of an argument by using args.argument_name
, so to get the xyz file the user puts in, we use args.xyz_file
. Notice that what follows after the dot is the same thing we but in quotation marks when using add_argument.
xyzfilename = args.xyz_file
symbols, coord = open_xyz(xyzfilename)
Save your code and go back to the Terminal window. Make sure you are in the directory where your code is saved and type
$ python geom_analysis.py --help
This will print a help message. The argparse
library has written this help message for us based on the descriptions and arguments we added.
usage: analyze.py [-h] xyz_file
This script analyzes a user given xyz file and outputs the length of the
bonds.
positional arguments:
xyz_file The filepath for the xyz file to analyze.
optional arguments:
-h, --help show this help message and exit
Now try running your script with an xyz file.
$ python analyze.py data/water.xyz
Check that the output of your code is what you expected.
What would happen if the user forgot to specify the name of the xyz file?
usage: analyze.py [-h] xyz_file
analyze.py: error: the following arguments are required: xyz_file
Argparse handles this for us and prints an error message. It tells us that we must specifiy an xyz file.
Try out your program with other XYZ files in your data
folder.
The “main” part of our script
We need to add one more thing to our code. When you write a code that includes function definitions and a main script, you need to tell python which part is the main script. (This becomes very important later when we are talking about testing.) After your import statements and function definitions and before use argparse
if __name__ == "__main__":
Since this is an if
statement, you now need to indent each line of your main script below this if statement. Be very careful with your indentation! Don’t use a mixture of tabs and spaces! A good way to indent multiple lines in many text editors is to highlight the lines you would like to indent, then press tab
.
Save your code and run it again. It should work exactly as before. If you now get an error message, it is probably due to inconsistent indentation.
import os
import numpy
import argparse
def calculate_distance(atom1_coord, atom2_coord):
x_distance = atom1_coord[0] - atom2_coord[0]
y_distance = atom1_coord[1] - atom2_coord[1]
z_distance = atom1_coord[2] - atom2_coord[2]
bond_length_12 = numpy.sqrt(x_distance**2+y_distance**2+z_distance**2)
return bond_length_12
def bond_check(atom_distance, minimum_length=0, maximum_length=1.5):
if atom_distance > minimum_length and atom_distance <= maximum_length:
return True
else:
return False
def open_xyz(filename):
xyz_file = numpy.genfromtxt(fname=filename, skip_header=2, dtype='unicode')
symbols = xyz_file[:,0]
coord = (xyz_file[:,1:])
coord = coord.astype(numpy.float)
return symbols, coord
if __name__ == "__main__":
## Get the arguments.
parser = argparse.ArgumentParser(description="This script analyzes a user given xyz file and outputs the length of the bonds.")
parser.add_argument("xyz_file", help="The filepath for the xyz file to analyze.")
args = parser.parse_args()
symbols, coord = open_xyz(args.xyz_file)
num_atoms = len(symbols)
for num1 in range(0,num_atoms):
for num2 in range(0,num_atoms):
if num1<num2:
bond_length_12 = calculate_distance(coord[num1], coord[num2])
if bond_check(bond_length_12, minimum_length=args.minimum_length, maximum_length=args.maximum_length) is True:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
Extension - Optional Arguments
What’s another argument we might want to include? We also might want to let the user specify a minimum and maximum bond length on the command line. We would want these to be optional, just like they are in our function.
We can add optional arguments by putting a dash (-
) or two dashes (--
) in front of the argument name when we add an argument. Add this line below where you added the fist argument. Note that all add_argument
lines should be above the line with parse_args
.
parser.add_argument('-minimum_length', help='The minimum distance to consider atoms bonded.', type=float, default=0)
We’ve added two new things to our argument as well. We have told argparse
that the argument people will pass for this will be a decimal number (float) and that the default value will be zero. If a user does not use -minimum_length
, the value will be zero.
Now we will change our code to use this value. Find the line where you call bond_check
and change it to use the value from argparse
.
if bond_check(bond_length_12, minimum_length=args.minimum_length) is True:
$ python analyze.py data/water.xyz -minimum_length 1
Now we can override the minimum length when running our script. If we don’t specify it, it will default to using zero.
We can do the same thing for our maximum bond length.
parser.add_argument('-maximum_length', help='The maximium distance to consider atoms bonded.', type=float, default=1.5)
And, don’t forget to update your bond_check
function.
if bond_check(bond_length_12, minimum_length=args.minimum_length, maximum_length=args.maximum_length) is True:
Our final program looks like this:
import os
import numpy
import argparse
def calculate_distance(atom1_coord, atom2_coord):
x_distance = atom1_coord[0] - atom2_coord[0]
y_distance = atom1_coord[1] - atom2_coord[1]
z_distance = atom1_coord[2] - atom2_coord[2]
bond_length_12 = numpy.sqrt(x_distance**2+y_distance**2+z_distance**2)
return bond_length_12
def bond_check(atom_distance, minimum_length=0, maximum_length=1.5):
if atom_distance > minimum_length and atom_distance <= maximum_length:
return True
else:
return False
def open_xyz(filename):
xyz_file = numpy.genfromtxt(fname=filename, skip_header=2, dtype='unicode')
symbols = xyz_file[:,0]
coord = (xyz_file[:,1:])
coord = coord.astype(numpy.float)
return symbols, coord
if __name__ == "__main__":
## Get the arguments.
parser = argparse.ArgumentParser(description="This script analyzes a user given xyz file and outputs the length of the bonds.")
parser.add_argument("xyz_file", help="The filepath for the xyz file to analyze.")
parser.add_argument('-minimum_length', help='The minimum distance to consider atoms bonded.', type=float, default=0)
parser.add_argument('-maximum_length', help='The maximium distance to consider atoms bonded.', type=float, default=1.5)
args = parser.parse_args()
symbols, coord = open_xyz(args.xyz_file)
num_atoms = len(symbols)
for num1 in range(0,num_atoms):
for num2 in range(0,num_atoms):
if num1<num2:
bond_length_12 = calculate_distance(coord[num1], coord[num2])
if bond_check(bond_length_12, minimum_length=args.minimum_length, maximum_length=args.maximum_length) is True:
print(F'{symbols[num1]} to {symbols[num2]} : {bond_length_12:.3f}')
Project Assignment
For this homework assignment, you will return to your first project where you processed the file
03_Prod.mdout
.Create a command line script using
argparse
which can take in anmdout
file from Amber, pull out total energy for each time step, and write a new file containing these values. The script should take a file name (03_Prod.mdout
) and output a file with the namesfilename_Etot.txt
. Modify your week 1 homework to do this.In the first project, the file we wrote had two values at the end which we did not want for the total energy. The last two values were some statistics associated with the md simulation and were not total energies. They are in a section that looks like this:
AV E R A G E S O V E R 30000 S T E P S NSTEP = 30000 TIME(PS) = 80.000 TEMP(K) = 300.15 PRESS = -181.0 Etot = -4829.8013 EKtot = 1143.4036 EPtot = -5973.2049 BOND = 3.3064 ANGLE = 9.2114 DIHED = 11.7034 1-4 NB = 3.2267 1-4 EEL = 46.8587 VDWAALS = 881.6358> EELEC = -6929.1472 EHBOND = 0.0000 RESTRAINT = 0.0000 EKCMT = 565.5639 VIRIAL = 662.5977 VOLUME = 21294.1310 Density = 0.9143 Ewald error estimate: 0.2084E-03 ------------------------------------------------------------------------------> R M S F L U C T U A T I O N S NSTEP = 30000 TIME(PS) = 80.000 TEMP(K) = 6.93 PRESS = 433.7 Etot = 120.4662 EKtot = 26.3938 EPtot = 116.6099 BOND = 1.5276 ANGLE = 2.4317 DIHED = 1.3004 1-4 NB = 0.7537 1-4 EEL = 2.1899 VDWAALS = 35.3865 EELEC = 130.4192 EHBOND = 0.0000 RESTRAINT = 0.0000 EKCMT = 18.9219 VIRIAL = 210.0413 VOLUME = 3338.3220 Density = 0.1153 Ewald error estimate: 0.1568E-03 ------------------------------------------------------------------------------
Exclude these two values from the file you write this time.
You can download a directory containing more mdout files here
Call your script
analyze_mdout.py
. You should be able to call the script in the following way:$ python analyze_mdout.py 03_Prod.mdout
When you call help, you should get the following output:
$ python analyze_mdout.py --help
usage: This script parses amber mdout files to extract the total energy. [-h] [--make_plots] path positional arguments: path The filepath to the file to be analyzed. optional arguments: -h, --help show this help message and exit
Solution
import os import glob import argparse if __name__ == "__main__": # Create the argument parser parser = argparse.ArgumentParser("This script parses amber mdout files to extract the total energy.") parser.add_argument("path", help="The filepath to the file(s) to be analyzed.") args = parser.parse_args() filename = args.path # Read the data from the specified file. f = open(filename) data = f.readlines() f.close() # Figure out the file name for writing the output. fname = os.path.basename(args.path).split('.')[0] etot = [] # Loop through the lines for line in data: split_line = line.split() if 'Etot' in line: etot.append(float(split_line[2])) # Get rid of values we don't need. values = etot[:-2] # Open a file for writing outfile_location = F'{fname}_Etot.txt' outfile = open(outfile_location, 'w+') for value in values: outfile.write(f'{value}\n') outfile.close()
Extension 1
Modify your script so that people can specify a wildcard character to read and write multiple files at once.
You should be able to call the script using:
$ python analyze_mdout.py "data/*.mdout"
Note - In the command above, we have used quotations around the filename. If you are on Linux or Mac, you can leave out the quotations. Your computer will expand the
data/*.mdout
into a list of all the files. This will not work on Windows. If you use the apprroach which works for Mac/Linux, you will need find a way for an argument in argparse to be a list for this. See the documentation for theadd_argument
function - you will find the answer there!Solution - All operating systems
If you are working on Windows, this is the solution you will need. For this solution, you use quotation marks around the filename, and include the wildcard character (
*
). Recall that we used this character along with theglob
library in the multiple file parsing lesson.import os import argparse import glob # Get filename from argparse parser = argparse.ArgumentParser("This script parses amber mdout file to extract the total energy.") parser.add_argument("path", help="The filepath of the file to be analyzed.") args = parser.parse_args() # Use glob to get a list of the files. filenames = glob.glob(args.path) for filename in filenames: # Figure out the file name for writing output fname = os.path.basename(filename).split('.')[0] # Open the file. f = open(filename, 'r') # Read the data. data = f.readlines() # Close the file. f.close() etot = [] # Loop through lines in the file. for line in data: # Get information from lines. split_line = line.split() if 'Etot' in line: #print(split_line[2]) etot.append(f'{split_line[2]}') values = etot[:-2] # Open a file for writing outfile_location = F'{fname}_Etot.txt' outfile = open(outfile_location, 'w+') for value in values: outfile.write(f'{value}\n') outfile.close()
Solution - Mac and Linux
This solution will work for Mac and Linux operating systems. For this solution, you would use the script without the quotes (below). Note that the solution given above will work for any operating system.
$ python analyze_mdout.py data/*.mdout
For this solution, you would have to find the option
nargs
which would be added toadd_argument
which tells argparse that it may receive more than one value for the argument. This works on Mac and Linux because these operating systems will automatically create a list of files when you use the wildcard character (*
).import os import argparse # Get filename from argparse parser = argparse.ArgumentParser("This script parses amber mdout file to extract the total energy.") parser.add_argument("path", help="The filepath of the file to be analyzed.", nargs='*') args = parser.parse_args() # This will already be a list, so we don't need to use glob. filenames = args.path for filename in filenames: # Figure out the file name for writing output fname = os.path.basename(filename).split('.')[0] # Open the file. f = open(filename, 'r') # Read the data. data = f.readlines() # Close the file. f.close() etot = [] # Loop through lines in the file. for line in data: # Get information from lines. split_line = line.split() if 'Etot' in line: #print(split_line[2]) etot.append(f'{split_line[2]}') values = etot[:-2] # Open a file for writing outfile_location = F'{fname}_Etot.txt' outfile = open(outfile_location, 'w+') for value in values: outfile.write(f'{value}\n') outfile.close()
Extension 2
Add an additional optional argument to your script which allows the user to output a plot of the total energy. The names of the plot files should be
file_name.png
where the name of the file wasfile_name.mdout
.Solution
We have to add an optional argument for this. You will also need to use
store_true
. This will mean the variable is True if the optional argument is given, and False otherwise.import os import argparse import glob # Get filename from argparse parser = argparse.ArgumentParser("This script parses amber mdout file to extract the total energy.") parser.add_argument("path", help="The filepath of the file to be analyzed.", nargs='*') # store_true makes this True by default parser.add_argument("-make_plots", help="Flag to create plots", action='store_true') args = parser.parse_args() filenames = glob.glob(args.path) for filename in filenames: # Figure out the file name for writing output fname = os.path.basename(filename).split('.')[0] # Open the file. f = open(filename, 'r') # Read the data. data = f.readlines() # Close the file. f.close() etot = [] # Loop through lines in the file. for line in data: # Get information from lines. split_line = line.split() if 'Etot' in line: #print(split_line[2]) etot.append(f'{split_line[2]}') values = etot[:-2] # Open a file for writing outfile_location = F'{fname}_Etot.txt' outfile = open(outfile_location, 'w+') for value in values: outfile.write(f'{value}\n') outfile.close() if make_plots == True: plt.figure() plt.plot(values) plt.savefig(F'{fname}.png')
Key Points
You must
import argparse
in your code to accept user arguments.You add must first create an argument parser using
parser = argparse.ArgumentParser
You add arguments using
parser.add_argument