Understanding and Visualizing Target Flexibility

Tutorial Created with Software Release: 2024-3
Topics: Hit Discovery, Hit-to-Lead & Lead Optimization, Small Molecule Drug Discovery, Structure Prediction & Target Enablement
Products Used: Desmond, Maestro

Tutorial files

500 MB

This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed

 

Tip: You can hover over a glossary term to display its definition. You can click on an image to expand it in the page.
Abstract:

 

In this tutorial, we will explore a variety of approaches for probing and assessing the flexibility of proteins or other biomacromolecules with the goal of gleaning insights informing structure-based drug design.

We show three different methods for visualizing and understanding the flexibility of these systems starting from one or multiple experimental structures: Visualizing crystallographic B-factors, comparing multiple experimental structures, and sampling conformations using molecular dynamics (MD) simulations.

This tutorial does not cover techniques for modeling structural changes of a receptor upon ligand binding (induced-fit effects). Introductions to Induced Fit Docking can be found in the IFD and IFD-MD tutorials.

 

Tutorial Content
  1. Introduction

  1. Creating Projects and Importing Structures

  1. Identifying Flexible Regions from Static Structures

  1. Assessing Flexibility Using Molecular Dynamics Simulations

  1. Conclusion and References

  1. Glossary of Terms

1. Introduction

Biomacromolecules are much more dynamic than the static images captured in experimental structures imply. Understanding and exploiting these dynamics can be a good approach to tackling difficult drug design problems.

There are a few general scenarios that suggest flexibility in a target is present and warrant further investigation:

  • When a target is known to bind ligands of very different sizes.
  • When a ligand is experimentally known to bind to a target but rigid docking with Glide is unable to capture it.
  • When a large ligand is known to fit in a relatively small active site then it is implied that the target must be flexible to make room for binding.
  • When a target is known to adopt multiple distinct conformational states in vivo.  

In this tutorial, we will show a few common strategies for identifying, visualizing, and analyzing the flexible regions of a target structure applied to a ligand-bound structure of human Beta-secretase 1 (BACE1), which has high conformational flexibility allowing it to accommodate a wide spectrum of inhibitors ranging from a peptide to a small molecule. While the example we use is the protein with a small molecule ligand, the methods can be applied to any system, e.g. to protein-protein interfaces or nucleic acid targets.

You will first learn how to identify flexible regions from (one or multiple) static structures by visualizing crystallographic B-factors and comparing multiple experimental structures. Then, you will learn how to sample conformations using molecular dynamics (MD) simulations and analyze the results.

2. Creating Projects and Importing Structures

At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is saved, the project is automatically saved each time a change is made.

Structures can be built in Maestro or can be imported using File > Import Structures (or drag-and-dropped), and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.

  1. Double-click the Maestro icon.

Figure 2-1. Change Working Directory option.

  1. Go to File > Change Working Directory.
  2. Find your directory, and click Choose.
  3. Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/target_flexibility.zip
  4. After downloading the zip file, unzip the contents in your Working Directorythe location where files are saved for ease of access throughout the tutorial.

Figure 2-2. The Import panel, with desired file selected from the tutorial archive.

  1. Go to File > Save Project As.
  2. Change the File name to flexibility_tutorial, click Save.
    • The project is now named flexibility_tutorial.prj.
  3. Go to File > Import Structures.
  4. Click to select BACE1_4DJY.maegz.
  5. Click Open.

3. Identifying Flexible Regions from Static Structures

3.1 Visualizing B-Factors

Every experimental method for structure determination has some measure of uncertainty. This uncertainty can result from limitations of the method itself (or a bad fit of the structural model to experimental data), but the flexibility of the target can “blur” the captured image as well. How this uncertainty is measured depends on the structure determination method, but this information can be used as a proxy for the flexibility of the system.

If the structural model was obtained from X-ray crystallography, the atomic B-factors give an indication of the positional uncertainty of each atom which can result from differences between the copies of the target within the crystal. Visualizing B-factors can therefore give an indication to the flexibility of the system at negligible computational effort.

In this section, you will learn how to color a structure by B-factors. There are a few common representations of the B-factors, e.g. by varying the thickness of a tube representing the protein backbone or by showing the displacement of individual atoms as spheres or ellipses of varying sizes. Here, we will use the common framing of the B-factors as being connected to thermal motion and apply a red-blue color gradient to show regions as “hot” (high B-factor) or “cold” (low B-factor).

Figure 3-1. Coloring all atoms by PDB B-factor.

  1. In the selection toolbar, click All to select all atoms in the structure.
  2. Open the Style Toolbox.
  3. Click the Color Atoms arrow and choose Atom PDB B factor (Temperature Factor).
    • All atoms are now colored based on their B-factor.  

Note: This structure has been prepared, so it contains Hydrogen atoms. These atoms were not contained in the original structure and have no associated B-factor.

Red color coding indicates areas of high B-factor, high mobility, and high uncertainty. Blue color coding indicates areas of low B-factor, less mobility, and high certainty.

We can now see that the positions of the ligand as well as most of the binding pocket are quite certain, with low associated B-factors, whereas a few of the residues in the loop that covers the binding site (e.g. TYR 132 – GLN 134) have quite high B-factors indicating higher positional uncertainty (which implies higher mobility).

In the case of TYR 132, flexibility might imply that the residue is able to move out of the way to accommodate ligand substituents larger than the cyclopropyl group (which is the case, for example, see the structure with PDB ID 4DJX).

In the case of GLN 134, it is unclear whether the range of motion of the loop and the residue is sufficient to allow it to form meaningful interactions with a ligand.

Figure 3-2. Transferring the color from alpha carbons to ribbons.

Coloring the ribbons based on the B-factors of the α-carbon allows you to easily see the B-factors for the whole system.

  1. Right-click a ribbon in the Workspace.
    • The Edit Ribbon context menu appears.
  2. For Color Scheme, choose CA Atom Color.    

Figure 3-3. Fitting the view to the full structure to identify high mobility regions.

  1. Click the Fit all button in the selection toolbar to see the entire structure at a glance.

Notice that beyond the binding site, only the C-terminal tail of the protein and the loops between GLU 371-GLN 377 and PRO 131-GLY 135 have high B-factors indicating potential mobility.

Keep in mind that tails and loops are frequently quite mobile in solution and the perceived rigidity of loops with low B-factors may be an artifact of the crystallization context.

You can view or adjust the B-factor to color mapping in the Maestro Preferences. Under Molecular representation, choose Color Schemes, choose Atom PDB B factor (Temperature Factor), and click Edit. If you do not wish to change the defaults, click Cancel when you are done. You can also view the residue-level B-factors as labels in the workspace by creating a custom label.

If you find yourself frequently using this coloring, you can create a custom preset which allows you to apply it efficiently. Click the Presets button and choose Manage Custom Presets. For further instructions, see the Create Custom Preset Dialog Box documentation page.

3.2 Aligning Protein Structures

If multiple experimental structures of the target are available, aligning them to each other can also provide an indication of its flexibility. Particularly useful are comparisons between apo- and ligand-bound structures, between structures that contain ligands of very different sizes, as well as structures which have been obtained from different crystallization procedures (and in different space groups).

BACE1 is an established target, so we have the luxury of comparing our 4DJY structure with a structure bound to a ligand from the same series (4DJV), a structure containing a completely different ligand (7F1D), and an apo-structure (3TPJ).

Figure 3-4. Fetching additional structures from the Protein Data Bank.

  1. Go to File > Get PDB.
  2. In the PDB IDs text box, type “4DJV, 7F1D, 3TPJ” (without the quotes).
  3. In the Chain name (optional) text box, type “A” (without the quotes) to fetch chain A of the proteins.
  4. Click Download.
    • Three new entries are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.

Figure 3-5. Protein Structure Alignment of the four BACE1 structures.

There are multiple different approaches to aligning protein structures (e.g. based on sequence, secondary structures or binding sites). In this case, the differences between the methods are negligible and we use a secondary structure based alignment for convenience. See Aligning Structures for more information.

 

  1. Ctrl+click (Cmd+click) to includethe entry is represented in the Workspace, the circle in the In column is blue the 4DJY_24_prepared, 4DJV, 7F1D, and 3TPJ entries.
  2. Go to Style Toolbox and click on Ribbons.
    • The Edit Ribbon menu appears.
  3. For color scheme, choose Secondary Structure.
  4. Go to Tasks > Browse > Structure Alignment > Protein Structure Alignment.
  5. Click Align.
    • The structures are aligned and the Protein Structure Alignment Results window opens.
  6. Close the Protein Structure Alignment panel and the results window.

 

Figure 3-6. Aligned protein structures.

 

  1. Open the Presets menu and choose Binding Mode Comparison.
  2. Examine the aligned structures in the workspace.

The comparison of different structures confirms some of the observations derived from examining the B-factors: The loop containing TYR 132 appears to be quite flexible, as do a few other solvent-exposed loops. Additionally, you can see that in the deepest part of the pocket, the loop between GLY 69 and TYR 76 can move aside to accommodate the large bicyclic group.

4. Assessing Flexibility Using Molecular Dynamics Simulations

The key challenge with using MD simulations to explore a system’s conformational space is sampling. Improving sampling can be achieved by a variety of methods. The simplest method is to increase the simulation time or the temperature for the whole system or for specific regions (see the Replica Exchange Panel documentation). If specific degrees of freedom are of interest, other enhanced sampling methods and metadynamics can be used.

For a detailed introduction to setting up and analyzing MD simulations with the Schrödinger suite, see the Introduction to All-Atom Molecular Dynamics Simulations with Desmond and Introduction to MD Trajectory Analysis with Desmond tutorials.

In this section, we will run a 200-ns-long unrestrained MD simulation of the BACE1 system at a temperature of 300 K and use the resulting trajectory to gain insights into the flexibility of the system. This should be sufficient to explore the range of motion of quite a few side chains, but loop motions   may not be thoroughly sampled. As our goals are more exploratory than quantitative, this is sufficient.

Note that the MD simulation is quite long and can only be run on a Linux host with a GPU. We recommend using the trajectory provided in the tutorial zip archive instead of running the job yourself. You can find the instructions to reproduce the simulation in section 4.1, but if using the pre-generated trajectory, feel free to skip ahead to section 4.2.

4.1 Setting up the MD simulation

MD simulation setup consists of two steps: First, the model system must be constructed by placing the structure in a simulation box, adding the appropriate amount of solvent molecules to fill the simulation box, as well as the appropriate number of counterions to neutralize charges present in the structure. Adding physiological concentrations of salt is also good practice. After the system is set up, the MD simulation settings such as simulation time and temperature are specified and the simulation job can be run.

Figure 4-1. The System Builder panel.

  1. Includethe entry is represented in the Workspace, the circle in the In column is blue the entry 4DJY_24_prepared.
  2. Double-click Presets.
  3. Go to Tasks > Browse > Classical Simulation > System Setup.
    • The System Builder panel opens in the Solvation tab.
  4. In the Boundary conditions section, click Minimize Volume.
    • An orthorhombic box is placed around the system. Check the Show boundary box option to see it in the workspace.

Figure 4-2. The Ions tab of the System Builder panel.

  1. Switch to the Ions tab of the System Builder.
  2. Next to the Neutralize by adding 9 Na+ ions, click Recalculate.
  3. Check the Add Salt option.
  4. Change the Job name to desmond_setup_4DJY and click Run.
    • This job takes approximately two minutes.
    • The results are automatically incorporated.
    • You can find pre-generated results in the desmond_setup_4DJY folder of the tutorial zip archive.
  5. Close the System Builder panel.

Figure 4-3. Setting up the simulation in the Molecular Dynamics panel.

  1. Includethe entry is represented in the Workspace, the circle in the In column is blue the entry 4DJY_24_prepared from the group named MD: desmond_setup_4DJY.
  2. Go to Tasks > Browse > Classical Simulation > Molecular Dynamics.
    • The Molecular Dynamics Panel opens.
  3. In the Model system section, click Load.
  4. In the Simulation section, set Simulation time (ns) to 200.
  5. In the Analysis section, check the box next to Run interactions analysis when simulation job completes.
  6. If you want to run the job yourself, change job name MD_flexibility_BACE1 and click Run.
    • Warning! This is a long simulation requiring a Linux host with a GPU. This simulation takes approximately 6 hours on a current-generation GPU.
    • You can find the job results in the MD_flexibility_BACE1 folder of the tutorial archive.

4.2 Analyzing the Results of the Simulation

At a temperature of 300 K and a simulation time of 200 ns, we would expect individual side chains to explore their range of motion significantly, and see some movement in flexible loops. Large-scale conformational changes (e.g. a kinase changing between DFG-in and DFG-out) would be beyond the reach of this type of sampling.

Figure 4-4. Importing MD results.

  1. Go to File > Import Structures.
  2. Find and open MD_flexibility_BACE1-out.cms in the MD_flexibility_BACE1 folder of the tutorial zip archive.

Figure 4-5. Loading the Trajectory into the Workspace.

  1. Click the Trajectory workflow action menu () next to the 4DJY_24_prepared entry in the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.
  2. Click Load Trajectory.
    • The trajectory player opens.
  3. Click Play to view the trajectory.

In the following, you will use the Simulation Interactions Diagram to examine the results in detail. Feel free to switch back and forth between the analysis and the trajectory in the workspace.

Figure 4-6. Loading the Simulation Interactions Diagram to view the graphical representation of simulation data.

  1. Go to Tasks > Browse > Classical Simulation > Simulation Interactions Diagram.
  2. Click Load.
  3. Find and open the MD_flexibility_BACE1.eaf in the MD_flexibility_BACE1 folder of the tutorial zip archive.  

Figure 4-7. The Protein-RMSF tab of the Simulation Interactions Diagram showing details of TRP 331.

  1. Go to the P-RMSF tab.
  2. Check the Side Chains and B factor (PDB) options.
  3. Open the Workspace Configuration Drawer (plus sign at bottom right) and click Sequence Viewer.

You can now see the RMSF and average B-factor for each residue in the BACE1 protein. The x axis is numbered by residue index (counting from 1), and you can hover over the plot to see the type and number of each residue (purple box). The Workspace Sequence Viewer shows both numbering schemes. To explore the results, you can click-and-drag to select residues in the sequence viewer and use the Z keyboard shortcut to zoom to the selection.

 

The B-factor frequently correlates quite well with the RMSF of C alpha atoms. Notable exceptions to this are a few solvent-exposed loops whose conformations were stabilized by the crystal contacts and are now relaxing (e.g. CYS 330 – ILE 340).

 

Next, we’ll have a more detailed look at the protein-ligand contacts, first from the perspective of the protein, then from the perspective of the ligand.

Figure 4-8. The Protein-Ligand Contacts tab.

  1. Switch to the PL-Contacts tab
  2. Hover over the bar plot to see details about each of the listed interactions (purple box).

 

Some interactions (e.g. the H-bonds to ASP 93 and ASP 289, the catalytic site) are present throughout the simulation. Others, such as the (water-mediated) H-bond to GLN 134 are only present for a fraction of the simulation frames.

Figure 4-9. The Ligand-Protein Contacts tab with water-mediated interaction between the ligand and GLN 134.

Let’s see these interactions from the ligand perspective:

  1. Switch to the LP-Contacts tab.
  2. Adjust the Minimum contact strength slider to 20%.
    • The water-mediated interaction to GLN 134 becomes visible.

GLN 134 forms a water-mediated H-bond interaction to the carbonyl of the imidazolidinone that was not visible in the initial crystal structure. Note that this residue has quite a large associated B-factor. So while analyzing the B-factors showed us that this residue might be flexible, we could only observe its range of motion and confirm that it is able to interact with the ligand by using molecular dynamics.

 

These are only a few examples of target flexibility that can be observed from a molecular dynamics simulation. For a more detailed tutorial on the MD analysis capabilities in Maestro, see the Introduction to MD Trajectory Analysis with Desmond tutorial.

5. Conclusion and References

In this tutorial, we learned how to use crystallographic B-factors and molecular dynamics simulations to understand and visualize the flexibility of a protein target. While B-factors can give a first indication of potentially mobile loops and residues, molecular dynamics allows for an exploration of the range of motion.

The flexibility of a target can cause methods that rely on static structures (e.g. Glide, SiteMap, MM-GBSA) to struggle to be predictive. Accurately capturing the effects of protein flexibility can be achieved with different methods, depending on the specific system, available data, and research goals. See the following additional resources:

For further reading:
  • Viewing and Analyzing Desmond Simulations
  • Metadynamics Simulations
  • To answer “Does my protein have alternative pockets?”:
    • Use mixed solvent MD, or
    • Use SiteMap over the whole MD trajectory:
      $SCHRODINGER/run trajectory_binding_site_volumes.py.
      Find ensembles from the trajectory by clustering the shapes of the sites mapped out by SiteMap  (Osgathorpe et. al.) available as a Knime workflow
  • To design specific inhibitors that open out pockets via loop refinement:
    • Use IFD or IFD-MD to predict induced-fit poses.
    • Use the Prime Refine Loops panel to model relevant loop portions
    • Use the IFD module to automate loop refinement and re-docking. For more information see the Refine Loops Panel documentation
  • To create an ensemble from your MD run:
    • Search for Cluster trajectory in the Task Tool, click on the result, and enter the information in the newly opened window.

6. Glossary of Terms

Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion

Included - the entry is represented in the Workspace, the circle in the In column is blue

Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data

Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)

Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project

Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries

Working Directory - the location where files are saved

Workspace - the 3D display area in the center of the main window, where molecular structures are displayed