This lesson is being piloted (Beta version)

# MD Simulation of Alkanes

## Overview

Teaching: 60 min
Exercises: 60 min
Questions
• How do I run a simulation?

• How do I analyze a simulation?

Objectives
• Understand the steps for running a simulation.

• Be able to perform analysis on properties such as bond length, angles, and torsions.

• Understand a PMF calculation.

## What are molecular dynamics simulations?

Now that we have our force field defined, we are ready to do a simulation! But, what does that mean and what kind of simulation will we be doing? We will use a simulation method called molecular dynamics (MD). In molecular dynamics simulations, we simulate molecules in time by calculating the forces on atoms and updating their positions based on those forces. The output of a molecular dynamics simulation is a trajectory. This is commonly written to a file on the computer where the calculation is running. The trajectory file contains atomic positions for points in time. The trajectory can be visualized to show the movement of a system through time (like watching a movie).

Force is equal to the negative gradient of potential energy. MD simulations use the potential energy function we discussed previously, and Newton’s second law (the force on an object is equal to the object’s mass times its acceleration) to calculate positions of atoms.

$\vec{F} = - \nabla U$ $\vec{F} = m \cdot \vec{a} = m \cdot \frac{d\vec{v}}{dt} = m \cdot \frac{d^{2}\vec{r}}{dt^{2}}$

Looking at these two equations, we se that we we already know the mass for each atom. With our force field defined, we can calculate the force on each atom. The net force on each atom is the sum of forces from all other atoms in the system:

$\vec{F}_i = \sum_{j=1}^{n_{max}}{f_{j}}, (j \neq i)$

This means that we should be able to calculate positions for each atom based on the force field function, the positions of other atoms in the system, and mass of the molecule.

$\vec{a}_{i}(t_{0}) = \frac{1}{m}\vec{F}_{i}(t_{0}) = -\frac{1}{m}_{i}(\nabla U)_{i}$

We know that we can calculate positions ($$\vec{x}$$) based on previous positions and acceleration and an amount of time ($$\Delta t$$)

$\vec{x}_i (t_{0} + \Delta t) = \vec{x}_i (t_{0}) + \vec{v}_{i} (t_{0}) \Delta t + \frac{1}{2} \vec{a}_{i}(t_{0}) \Delta t^{2}$

This is the general approach to molecular dynamics. You have some atoms which have initial positions and velocities when you start your simulation. You can use these positions to calculate the potential energy and the forces on each atom (because force is the negative gradient of potential energy). With the velocities of the atoms, you can calculate new positions. This process is repeated over and over again, and each iteration is typically called a timestep.

The typical time step size ($$\Delta t$$) represents 1 femtosecond of time and is based on X-H bond vibration frequencies. A timestep that is too small will lead to an inefficient simulation, while a time step that is too large will be inaccurate or cause your simulation to fail. Often, we will constrain hydrogen in a simulation and can use a longer timestep of 2 femtoseconds. MD simulations assumes that the Erogodic hypothesis is true, meaning that the time average calculated from the simulation is equal to the ensemble average. For this to be true, the simulation has to sample many conformations. Most simulations have millions to trillions of time steps.

Usually a simulation protocol follow this general procedure:

1. Initialization - Build the system including the topology (connectivity), forcefield parameters, and setting simulation details such as temperature and integrator.
2. Minimization - A brief energy minimization to eliminate “bad” interatomic contacts (i.e., ones that would cause high forces) that would result in a numerically unstable simulation.
3. Equilibration - A brief MD simulation for the purpose of bringing the system temperature or temperature and volume to the desired equilibrium values.
4. Production - A long MD simulation for the purpose of collecting data.
5. Analysis - After you have collected your data from the production run, you must analyze the trajectory to draw conclusions.

## Running your first simulations

We will now use OpenMM to do a molecular dynamics simulation of the ethane and butane molecules we prepared in the previous lesson. It’s important to note at this point that molecular dynamics simulations can be performed using a number of softwares. However, we will be running a simulation with a program called OpenMM. OpenMM has the advantage of being scriptable with Python.

We will first have to make sure we have OpenMM installed. If you are using anaconda, install OpenMM by typing the following command into your terminal or the Anaconda prompt:

$conda install -c conda-forge mdtraj  Open a new Jupyter notebook to do the analysis of your ethane trajectory. First, we will load in our trajectory using MDTraj import mdtraj as md traj = md.load('ethane_sim.dcd', top='ethane.pdb')  The command above reads all of the atomic positions from ethane_sim.dcd and keeps track of atom connectivity (topology) which was given in the PDB file. Next, visualize the trajectory using nglview. Nglview has a special function show_mdtraj that we can use with our trajectory because it was in a specific format from the MDTraj library. import nglview as ngl visualize = ngl.show_mdtraj(traj) visualize  This should show you something that looks sort of like a movie of your ethane molecule. These are the atomic positions calculated by OpenMM during the molecular dynamics run. We can now analyze the positions to find out some things about our molecule. We will use another OpenMM command to pull out our bonds and atoms for analysis atoms, bonds = traj.topology.to_dataframe() atoms  ### Analyzing the C-C bond length Let’s look at what C-C bond lengths our ethane molecule had during the simulation. Before we can measure the bond lengths, we have to decide which atoms from our molecule define the bond angle. Below is the output you should have seen above with the atoms command (though yours will be styled differently in the Jupyter notebook): serial name element resSeq resName chainID segmentID 0 1 C1 C 1 ETH 0 1 2 H11 H 1 ETH 0 2 3 H12 H 1 ETH 0 3 4 H13 H 1 ETH 0 4 5 C2 C 1 ETH 0 5 6 H21 H 1 ETH 0 6 7 H22 H 1 ETH 0 7 8 H23 H 1 ETH 0 We have to pick the atom indices for the C-C bond. An atom’s index is the left-most value in the table above. For our torsion, we’ll measure C1-C2 the indices for these are 0 and 4. We use the function compute_distances in the MDTraj library to measure the distance between these atoms. bond_indices = [0, 4] # atoms to define the bond length bond_length = md.compute_distances(traj, [bond_indices])  We now have the measurement for this torsion angle in radians for each recorded timestep of the trajectory saved in the array bond_length. One way we can examine this data is by plotting it as a histogram using the Python library matplotlib. import matplotlib.pyplot as plt bondcounts, binedges, otherstuff = plt.hist(bond_length, bins=120) plt.title('C-C bond length histogram') plt.xlabel('Bond length (nm)') plt.ylabel('Counts') plt.show()  ## Exercise - Analyzing the H-C-C-H torsion A torsion is made up of four atoms which are bonded to each other. Analyze the torsion angle associated with the atoms H11-C1-C2-H21 for your trajectory. Instead of using the function compute_distance, use compute_dihedrals. Create a histogram plot of the torsion angles. ## Solution First, we need to pick the atom indices of our torsion angle and use the compute_dihedrals function to calculate the dihedrals. phi_indices = [1, 0, 4, 5] # atoms to define the torsion angle phi = md.compute_dihedrals(traj, [phi_indices]) print(phi)  We now have the measurement for this torsion angle in radians for each recorded timestep of the trajectory. Next, we can examine this data by plotting it as a histogram using the Python library matplotlib. import matplotlib.pyplot as plt phicounts, binedges, otherstuff = plt.hist(phi, bins=90) # create a histogram with 90 bins plt.title('H-C-C-H torsion angle') plt.xlabel(r'$\phi$(rad)') plt.ylabel('Counts') plt.show()  ### Potential of Mean Force Calculation So far in our analysis, we have looked at the distribution of bond lengths and torsion angles for ethane. However, we can also use our simulations to calculate thermodynamics properties of our system. For example, we can use our calculated distributions along with Boltzmann’s constant to calculate the potential of mean force (pmf), or energy change associated with changes in the bond length or torsion angle. The potential of mean force is defined by the expression $W(x) = -k_{B}T*ln[p(x)] + C$ where $$p(x)$$ is the probability, or the histogram we calculated previously. For our torsion angle, we can calculate and plot the PMF: kB = 8.31446/1000 # Boltzmann constant in kJ/mol Temp = 298.15 # simulation temperature phicounts[phicounts==0] = 0.1 # get rid of any bins with 0 counts/infinite energy pmf = -kB*Temp*np.log(phicounts) # W(x) = -kT*ln[p(x)] = -kT*ln[n(x)] + C pmf = pmf - np.min(pmf) # subtract off minimum value so that energies start from 0 bincenters = (binedges[1:] + binedges[:-1])/2 # compute centers of histogram bins plt.plot(bincenters, pmf) plt.title('H-C-C-H torsion pmf') plt.xlabel(r'$\phi$(rad)') plt.ylabel('Relative free energy (kJ/mol)') plt.show()  In the code above, we used the line pmf = pmf - np.min(pmf) to subtract the minimum system energy (or $$C$$ from the equation above), giving us the relative free energy. When we examine the plot, we can see that the PMF is not smooth near the free energy maxima. This is due to finite sampling in our relatively short simulation. This makes sense because the configurations of the molecule with the higher energy would not occur as many times during the simulation. To make this smoother, we could run a longer simulation or use a smoothing function on our data. #### C-C PMF Similarly, we can make a plot for the PMF of the C-C bond: bondcounts[bondcounts==0] = 0.1 pmf = -kB*Temp*np.log(bondcounts) pmf = pmf - np.min(pmf) bincenters = (binedges[1:] + binedges[:-1])/2 pmf_smoothed = sm.nonparametric.lowess(pmf, bincenters, frac=0.05) pmf_s = pmf_smoothed[:,1] - np.min(pmf_smoothed[:,1]) plt.plot(bincenters, pmf_s) plt.xlabel('Bond length (nm)') plt.ylabel('Relative free energy (kJ/mol)') plt.title('C-C bond length pmf') plt.show()  Again, we see that our higher energy bond lengths are less smooth. This is because these bond lengths did not occur very much in our simulation (because of the high energy), so our statistics are poor. If we wanted to exclude these areas, we could subset the part of our data which we plot: plt.plot(bincenters[pmf_s < 15], pmf_s[pmf_s < 15]) plt.xlabel('Bond length (nm)') plt.ylabel('Relative free energy (kJ/mol)') plt.title('C-C bond length pmf') plt.show()  ## Your Turn - Analysis of Butane Trajectory Create a copy of your ethane analysis code and modify your code to analyze your butane trajectory. 1. Read in the files butane.pdb and butane_sim.dcd and visualize using NGLview. 2. Analyze the C-C-C-C torsion angle and compute the PMF. 3. Analyze the C-C-C bond angle (use either C1-C2-C3 or C2-C3-C4) and compute the PMF. 4. Make only a histogram of one of the C-H bond lengths. Pick any C-H pair. What do you notice about the distribution of this bond length? ## Solution ### Read in the MD Trajectory traj = md.load('butane_sim.dcd', top='butane.pdb') atoms, bonds = traj.topology.to_dataframe() atoms  ### Visualize visualize = ngl.show_mdtraj(traj) visualize  ### Analyzing the C-C-C-C torsion phi_indices = [0, 4, 7, 10] # atoms to define the torsion angle phi = md.compute_dihedrals(traj, [phi_indices]) phicounts, binedges, otherstuff = plt.hist(phi, bins=90) # create a histogram with 90 bins plt.title('C-C-C-C torsion angle') plt.xlabel(r'$\phi$(rad)') plt.ylabel('Counts') plt.show() print(np.sum(phicounts))  #### C-C-C-C torsion PMF B = 8.31446/1000 # Boltzmann constant in kJ/mol Temp = 298.15 # simulation temperature in K phicounts[phicounts==0] = 0.1 # get rid of any bins with 0 counts/infinite energy pmf = -kB*Temp*np.log(phicounts) # W(x) = -kT*ln[p(x)] = -kT*ln[n(x)] + C pmf = pmf - np.min(pmf) # subtract off minimum value so that energies start from 0 bincenters = (binedges[1:] + binedges[:-1])/2 # compute centers of histogram bins plt.plot(bincenters, pmf) plt.title('C-C-C-C torsion pmf') plt.xlabel(r'$\phi\$ (rad)')
plt.ylabel('Relative free energy (kJ/mol)')
plt.show()


### Analyzing the C-C-C angle

angle_indices = [0, 4, 7] # or could do [4, 7, 10]
bondangle = md.compute_angles(traj, [angle_indices])

anglecounts, binedges, otherstuff = plt.hist(bondangle, bins=100)
plt.title('C-C-C bond angle')
plt.ylabel('Counts')
plt.show()


#### C-C-C angle PMF

anglecounts[anglecounts==0] = 0.1
pmf = -kB*Temp*np.log(anglecounts)
pmf = pmf - np.min(pmf)

bincenters = (binedges[1:] + binedges[:-1])/2

plt.plot(bincenters, pmf)
plt.ylabel('Relative free energy (kJ/mol)')
plt.show()


### Analyzing a C-H bond

bond_indices = [0, 1] # many possibilities!
bondlength = md.compute_distances(traj, [bond_indices])

lengthcounts, binedges, otherstuff = plt.hist(bondlength, bins=100)
plt.title('C-H bond length')
plt.xlabel('Bond length (nm)')
plt.ylabel('Counts')
plt.show()


The C-H bond length does not behave at all like something subject to a harmonic potential. So what’s going on? Remember that in this simulation we have “frozen” all of the covalent bonds involving H atoms so that we can use a 2 fs time step. Therefore only the non-H atoms undergo true dynamics; the positions of the H atoms are calculated after each time step using an interative algorithm (SHAKE). For more information, check out: https://en.wikipedia.org/wiki/Constraint_(computational_chemistry)#The_SHAKE_algorithm

## Key Points

• Typical MD simulations consist of simulation initialization, minimization, equilibration, and production.

• The production simulation gives the data which is analyzed.

• Results from simulation can be used to calculate the potential of mean force (PMF).