Visualizing the Binding Site of a Protein-Ligand Complex#

Learning Objectives

  • Use NGLView to view the 3D structure of our protein and ligand.

  • Prepare molecule structures using PDQ2PQR and RDKit.

  • Analyze the interactions of the protein in the binding site using a 2D map and ProLIF.

Before we begin our docking calculations, we will likely want to investigate the binding site of our ligand of interest. We will want to look at the binding pocket and the interactions of the ligand with the protein residues. For the rest of our studies, we will choose the ligand 13U. A trypsin structure where this ligand is bound is 2ZQ2.

Downloading the Structure#

First we will need to download our protein structure. We will download 2ZQ2, which is a trypsin structure with our ligand of interest bound.

We will use a similar strategy to our last notebook for getting the file. We will use Python’s request module and a URL from the Protein Data Bank.

import os # for making directories
import requests

# make a directory for pdb files
os.makedirs("pdb", exist_ok=True)

pdb_id = "2zq2" # trypsin PDB file with ligand bound

pdb_request = requests.get(f"https://files.rcsb.org/download/{pdb_id}.pdb")
pdb_request.status_code
200

After downloading, we will write the text to a file.

with open(f"pdb/{pdb_id}.pdb", "w+") as f:
    f.write(pdb_request.text)

View the structure#

Before we start to really work with our molecule, let’s investigate the structure. We will use a library called MDAnalysis to first process our PDB, then visualize it with a library called NGLView. In the cell below, we define some convenience functions for NGLView. These are functions that the tutorial writers wrote for our protein ligand system.

import math

def rotate_view(view, x=0, y=0, z=0, degrees=True):
    radians = 1
    if degrees: radians = math.pi / 180
    view.control.spin([1, 0, 0], x*radians)
    view.control.spin([0, 1, 0], y*radians)
    view.control.spin([0, 0, 1], z*radians)

def view_binding_site(protein, ligand):
    """View binding site of 13U to trypsin.

    Parameters
    -----------
    protein: mda.Universe
        The protein as an MDAnalysis universe.

    ligand: mda.Universe
        The ligand as an MDAnalysis universe.
    """
    view = nv.show_mdanalysis(protein)
    view.clear_representations()
    view.add_representation("surface", colorScheme="hydrophobicity")
    lig_view = view.add_component(ligand)
    lig_view.center()
    rotate_view(view, y=180, x=20)
    return view
    

MDAnalysis is a popular tool for processing molecular dynamics trajectories and other molecular structures. The central object in MDAnalysis is called a “Universe”. In MDAnalysis terms, a Universe represents a molecular system. We can load an MDAnalysis universe from a PDB file.

import MDAnalysis as mda
import nglview as nv

u = mda.Universe(f"pdb/{pdb_id}.pdb")

Text here about NGLView

view = nv.show_mdanalysis(u)
view

This view looks a bit messy. MDAnalysis has a human readable selection syntax that allows us to isolate parts of our structure. We will take our MDAnalysis Universe (the variable u) and use the select_atoms function. Inside this function, we will fill in what we want to select.

We will create separate variables for the protein and ligand. We can select all protein residues in MDAnalysis using the word “protein” in the select_atoms function. Then, we will select our ligand using resname 13U. This corresponds to the residue name in the PDB we downloaded.

protein = u.select_atoms("protein")
ligand = u.select_atoms("resname 13U")

We will use our helper function, defined above, to look at how the ligand is bound to the protein. This helper function will use NGLView, like we did previously, but adds coloring the surface by hydrophobicity. It also zooms in on the ligand.

view_binding_site(protein, ligand)

Upon viewing this structure, you will notice that our ligand seems to appear twice. If you open the PDB file to investigate, you will see the following in the ligand section:

HETATM 1673  C14A13U A 501      18.144  -9.216  12.088  0.61 24.22           C  
ANISOU 1673  C14A13U A 501     1755   4793   2654   1752    148   1233       C  
HETATM 1674  C14B13U A 501      18.147  -8.840  11.672  0.39 24.46           C  
ANISOU 1674  C14B13U A 501     2583   4283   2430   1765    353   1279       C  
HETATM 1675  O32A13U A 501      18.209  -8.355  11.186  0.61 24.38           O  
ANISOU 1675  O32A13U A 501     2354   5394   1514   2217    238    919       O

This PDB structure provides alternate locations for each ligand atom. In excerpt above, you will see C14A13U and C14B13U. These are alternate locations of the same atom. By checking the documentation page for MDAnalysis selections, we can see that MDAnalysis is prepared for this scenario. We will want to use the altloc keyword. This keyword is described as:

altLoc alternative-location

a selection for atoms where alternative locations are available, which is often the case with high-resolution crystal structures e.g. resid 4 and resname ALA and altLoc B selects only the atoms of ALA-4 that have an altLoc B record.

We can alter our MDAnalysis selection syntax to isolate our ligands of interest.

protein = u.select_atoms("protein")
ligand_A = u.select_atoms("resname 13U and altLoc A")
ligand_B = u.select_atoms("resname 13U and altLoc B")

Now, we can use our viewing function to see the location of each ligand.

view_binding_site(protein, ligand_A)
view_binding_site(protein, ligand_B)

When we inspect the ligand in the binding site, we notice a few things. First, the binding site has a large hydrophobic area on the surface. If you zoom in on the binding pocket, you’ll also see that the benzene ring and amine groups are inside.

Making a Map of Ligand Contacts#

To get an even better idea of how our ligand is binding to the protein, we might choose to make a 2D map of ligand contacts with protein residues. In this analysis, we’ll want to know how the ligand is interacting with the protein residues including if it is making hydrogen bonds, Van Der Waals interactions, etc.

We will use a library called ProLIF for this analysis. ProLIF is short for “Protein-Ligand Interaction Fingerprints” and it “ is a tool designed to generate interaction fingerprints for complexes made of ligands, protein, DNA or RNA molecules extracted from molecular dynamics trajectories, docking simulations and experimental structures.” (quote taken from ProLIF docs).

Before we use ProLIF, we first have to make sure our ligand and protein file are prepared properly. Hydrogens are absent in most PDB files because they are not well resolved by methods like X-Ray crystallography. We’ll need to add them back in in order to complete our analysis of the binding site.

This process can actually be quite involved, as we’ll see below.

We will start by saving new PDBs of our selections from MDAnalysis. Then, we will add hydrogens.

protein.write(f"pdb/protein_{pdb_id}.pdb")
ligand_A.write(f"pdb/ligand_A.pdb")
/home/janash/miniconda3/envs/biochemist-python/lib/python3.11/site-packages/MDAnalysis/coordinates/PDB.py:1153: UserWarning: Found no information for attr: 'formalcharges' Using default value of '0'
  warnings.warn("Found no information for attr: '{}'"

Structure Preparation#

Before we run the analysis, we need to make sure our protein and ligand have hydrogen atoms. We will do this first for the protein.

We will use a specialized program called PDB2PQR that is made for working with biomolecules like proteins. The advantage of using PDB2PQR is that it will check our protein for missing atoms and multiple occupancy in the protein.

We will use the command-line interface of this program. This means that you would usually type the command below into your terminal You can run command line commands in the Jupyter notebook by putting a ! in front of the command.

! pdb2pqr --pdb-output=pdb/protein_h.pdb --pH=7.4 pdb/protein_2zq2.pdb protein.pqr
INFO:PDB2PQR v3.6.2: biomolecular structure conversion software.
INFO:Please cite:  Jurrus E, et al.  Improvements to the APBS biomolecular solvation software suite.  Protein Sci 27 112-128 (2018).
INFO:Please cite:  Dolinsky TJ, et al.  PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Res 35 W522-W525 (2007).
INFO:Checking and transforming input arguments.
INFO:Loading topology files.
INFO:Loading molecule: pdb/protein_2zq2.pdb
ERROR:Error parsing line: invalid literal for int() with base 10: ''
ERROR:<REMARK     2>
ERROR:Truncating remaining errors for record type:REMARK
WARNING:Warning: pdb/protein_2zq2.pdb is a non-standard PDB file.

ERROR:['REMARK']
INFO:Setting up molecule.
INFO:Created biomolecule object with 223 residues and 1625 atoms.
WARNING:Multiple occupancies found: N in SER A 61.
WARNING:Multiple occupancies found: CA in SER A 61.
WARNING:Multiple occupancies found: C in SER A 61.
WARNING:Multiple occupancies found: O in SER A 61.
WARNING:Multiple occupancies found: CB in SER A 61.
WARNING:Multiple occupancies found: OG in SER A 61.
WARNING:Multiple occupancies found in SER A 61. At least one of the instances is being ignored.
WARNING:Multiple occupancies found: N in SER A 113.
WARNING:Multiple occupancies found: CA in SER A 113.
WARNING:Multiple occupancies found: C in SER A 113.
WARNING:Multiple occupancies found: O in SER A 113.
WARNING:Multiple occupancies found: CB in SER A 113.
WARNING:Multiple occupancies found: OG in SER A 113.
WARNING:Multiple occupancies found in SER A 113. At least one of the instances is being ignored.
WARNING:Multiple occupancies found: N in SER A 122.
WARNING:Multiple occupancies found: CA in SER A 122.
WARNING:Multiple occupancies found: C in SER A 122.
WARNING:Multiple occupancies found: O in SER A 122.
WARNING:Multiple occupancies found: CB in SER A 122.
WARNING:Multiple occupancies found: OG in SER A 122.
WARNING:Multiple occupancies found in SER A 122. At least one of the instances is being ignored.
WARNING:Multiple occupancies found: N in SER A 167.
WARNING:Multiple occupancies found: CA in SER A 167.
WARNING:Multiple occupancies found: C in SER A 167.
WARNING:Multiple occupancies found: O in SER A 167.
WARNING:Multiple occupancies found: CB in SER A 167.
WARNING:Multiple occupancies found: OG in SER A 167.
WARNING:Multiple occupancies found in SER A 167. At least one of the instances is being ignored.
WARNING:Multiple occupancies found: N in SER A 170.
WARNING:Multiple occupancies found: CA in SER A 170.
WARNING:Multiple occupancies found: C in SER A 170.
WARNING:Multiple occupancies found: O in SER A 170.
WARNING:Multiple occupancies found: CB in SER A 170.
WARNING:Multiple occupancies found: OG in SER A 170.
WARNING:Multiple occupancies found in SER A 170. At least one of the instances is being ignored.
WARNING:Multiple occupancies found: N in SER A 236.
WARNING:Multiple occupancies found: CA in SER A 236.
WARNING:Multiple occupancies found: C in SER A 236.
WARNING:Multiple occupancies found: O in SER A 236.
WARNING:Multiple occupancies found: CB in SER A 236.
WARNING:Multiple occupancies found: OG in SER A 236.
WARNING:Multiple occupancies found in SER A 236. At least one of the instances is being ignored.
WARNING:Multiple occupancies found: N in GLN A 240.
WARNING:Multiple occupancies found: CA in GLN A 240.
WARNING:Multiple occupancies found: C in GLN A 240.
WARNING:Multiple occupancies found: O in GLN A 240.
WARNING:Multiple occupancies found: CB in GLN A 240.
WARNING:Multiple occupancies found: CG in GLN A 240.
WARNING:Multiple occupancies found: CD in GLN A 240.
WARNING:Multiple occupancies found: OE1 in GLN A 240.
WARNING:Multiple occupancies found: NE2 in GLN A 240.
WARNING:Multiple occupancies found in GLN A 240. At least one of the instances is being ignored.
INFO:Setting termini states for biomolecule chains.
INFO:Loading forcefield.
INFO:Loading hydrogen topology definitions.
WARNING:Missing atom CG in residue LYS A 222
WARNING:Missing atom CD in residue LYS A 222
WARNING:Missing atom CE in residue LYS A 222
WARNING:Missing atom NZ in residue LYS A 222
WARNING:Missing atom CG in residue LYS A 222
WARNING:Missing atom CD in residue LYS A 222
WARNING:Missing atom CE in residue LYS A 222
WARNING:Missing atom NZ in residue LYS A 222
INFO:Attempting to repair 4 missing atoms in biomolecule.
WARNING:Missing atom CG in residue LYS A 222
WARNING:Missing atom CD in residue LYS A 222
WARNING:Missing atom CE in residue LYS A 222
WARNING:Missing atom NZ in residue LYS A 222
INFO:Added atom CG to residue LYS A 222 at coordinates 30.628, -3.449, -0.010
INFO:Added atom CD to residue LYS A 222 at coordinates 32.074, -3.541, -0.453
INFO:Added atom CE to residue LYS A 222 at coordinates 32.755, -2.198, -0.512
INFO:Added atom NZ to residue LYS A 222 at coordinates 34.167, -2.339, -0.950
INFO:Updating disulfide bridges.
INFO:Debumping biomolecule.
INFO:Adding hydrogens to biomolecule.
INFO:Debumping biomolecule (again).
INFO:Optimizing hydrogen bonds
INFO:Applying force field to biomolecule states.
INFO:Regenerating headers.
INFO:Regenerating PDB lines.
WARNING:Ignoring 390 header lines in output.
WARNING:Ignoring 390 header lines in output.
protein = mda.Universe("pdb/protein_h.pdb")

Adding hydrogens to our ligand is a little bit more difficult. We can’t just use PDB2PQR in this case. Our ligand might not have bond lengths such that the proper bond orders are always recognized, so we will want to make sure that we have a proper reference.

We will use the ideal ligand we downloaded as a reference and use a small molecule manipulation software called RDKit to match bond orders and add hydrogens.

from rdkit import Chem

from rdkit.Chem.AllChem import AssignBondOrdersFromTemplate

template = Chem.MolFromMol2File("ligands/13U_ideal.mol2")
pdb_ligand = Chem.MolFromPDBFile(f"pdb/ligand_A.pdb")

template = Chem.RemoveAllHs(template)
ligand = AssignBondOrdersFromTemplate(template, pdb_ligand)

# Write the ligand to an SDF file
Chem.MolToMolFile(ligand, "pdb/ligand.sdf")
[23:04:16] WARNING: More than one matching pattern found - picking one

Now, we need to make sure this structure has hydrogens for our analysis.

from openbabel import pybel

# Use pybel to read the SDF, add hydrogens, and save as PDB
mol = next(pybel.readfile("sdf", "pdb/ligand.sdf"))
mol.addh()  # Add hydrogens
mol.write("pdb", "pdb/ligand_h.pdb", overwrite=True)

Visualizing the Binding Site#

Now that we have our files with hydrogens prepared, we can make a map of the binding site.

import prolif as plf

protein_h = mda.Universe(f"pdb/protein_h.pdb")
ligand_h = mda.Universe(f"pdb/ligand_h.pdb")
protein_mol = plf.Molecule.from_mda(protein_h)
/home/janash/miniconda3/envs/biochemist-python/lib/python3.11/site-packages/MDAnalysis/converters/RDKit.py:473: UserWarning: No `bonds` attribute in this AtomGroup. Guessing bonds based on atoms coordinates
  warnings.warn(
ligand_mol = plf.Molecule.from_mda(ligand_h)
fp = plf.Fingerprint()
lig_list = [ ligand_mol ] 

interactions = fp.run_from_iterable(lig_list, protein_mol)
view = fp.plot_lignetwork(lig_list[0])
view