Identifying impactful mutations using FEP+ residue scanning

Tutorial Created with Software Release: 2024-4
Topics: Biologics Drug Discovery, Enzyme Engineering, Free Energy Perturbation (FEP)
Products Used: BioLuminate

Tutorial files

9 MB

This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed

 

Tip: You can hover over a glossary term to display its definition. You can click on an image to expand it in the page.
Abstract:

 

In this tutorial, we will learn to set up FEP+ residue scanning calculations to quantitatively probe the effects of a set of mutations on the binding strength of human HLA-A2 presenting a viral peptide to a T-cell receptor.

 

Tutorial Content
  1. Introduction

  1. Creating Projects and Importing Structures

  1. Structure Preparation and Preliminary Steps

  1. Setting up the FEP+ Residue Scan

  1. Analyzing the Results of the Scan

  1. Conclusion and References

  1. Glossary of Terms

1. Introduction

Protein engineering requires iterative cycles of variation and testing in order to find promising candidates. Experimentally, this involves either generating many random mutations e.g. via Phage display or using a vector to express a specific, manually designed mutant for testing. Both of these approaches require significant investments in both time and money.

Computationally, it is possible to identify residues which strongly affect the protein-protein interface (PPI) and binding affinity by using methods such as MM-GBSA and systematically testing many different mutations of residues at the PPI. However, MM-GBSA is not accurate enough to quantitatively predict changes in binding affinity resulting from mutations of specific sites.

For obtaining highly accurate predictions of binding affinities, free energy perturbation (FEP) methods are the state of the art. However, this approach has a significantly higher computational cost and as such is better suited at validating promising candidates before wet lab testing than for scanning the hundreds of interesting mutations which result from even a small number of hot spot sites due to combinatorial effects.

Figure 1: Progressing from identifying promising mutation sites to experimentally testing

FEP+ residue scanning bridges the gap between the fast but approximate MM-GBSA and costly but accurate Protein FEP+. FEP+ residue scanning is based on the lambda dynamics simulation approach, in which a continuous coupling parameter λ variable is introduced for each mutation state on each mutation site. The simulation is performed using multiple replicas, with certain replicas biasing toward the wild type and some biasing toward the mutants. This ensures that different regions of the configurational space are adequately sampled. A replica exchange methodology is employed to facilitate transitions between replicas. Finally, the relative populations of each residue at each site are converted to relative ∆Gs. The resulting relative free energies of the mutants are far more accurate than the MM-GBSA predictions. For more details on the lambda dynamics approach used in the FEP+ residue scan, see the Further Reading section at the end of this tutorial.

In this tutorial, you will use FEP+ residue scanning to explore the effect of mutations in the HLA-A2 histocompatibility complex on the affinity of the HLA-peptide complex binding to a T-cell receptor.

2. Creating Projects and Importing Structures

At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is saved, the project is automatically saved each time a change is made.

Structures can be built in Maestro or can be imported using File > Import Structures (or drag-and-dropped), and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.

  1. Double-click the BioLuminate icon

All of the workflows shown here can also be done in Maestro.

Figure 2-1. Change Working Directory option.

  1. Go to File > Change Working Directory
  2. Find your directory, and click Choose
  3. Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/lambda_dynamics.zip
  1. After downloading the zip file, unzip the contents in your Working Directorythe location where files are saved for ease of access throughout the tutorial
  2. Go to File > Save Project As
  3. Change the File name to fep_residue_scan_tutorial, click Save
  • The project is now named fep_residue_scan_tutorial.prj

Figure 2-3. The Import panel, with desired file selected from the tutorial archive.

  1. Go to File > Import Structures
  2. Click to select 1OGA_prepared.maegz
  3. Click Open

3. Structure preparation and preliminary steps

Before running the protein FEP+ residue scanning calculation, your structure needs to be fully prepared for modeling. Structure files obtained from the PDB, vendors, and other sources often lack necessary information for performing modeling-related tasks. Typically, these files are missing hydrogens, partial charges, side chains, and/or whole loop regions. Proteins in their raw state may also have incorrect bond order assignments and group orientations. To make these structures suitable for modeling tasks, they must be prepared and common structural issues resolved. The structure used in this tutorial has already been prepared with the Protein Preparation Workflow. See the Introduction to Structure Preparation and Visualization tutorial for more details on this process.

In addition, you need to identify hot-spot sites to investigate. While knowledge of the system can provide ideas for residues to investigate, this information can also be efficiently obtained from an MM-GBSA residue scan. For detailed instructions on setting up and running MM-GBSA residue scans, see the “Identifying Binding Hot Spots using Residue Scanning” section in the Peptide Modeling with BioLuminate tutorial.

The system used in this tutorial is a ternary complex of human HLA-A2 (chain name A in the 1OGA structure), a peptide from the influenza matrix protein (chain C), and the complementarity determining region of a T-cell receptor which recognizes this peptide (chains D and E). The HLA-A2 mutations you will investigate in this tutorial have been studied in Zhang, H. et al. and are included in the SKEMPI 2.0 database (see the Further Reading section at the end of this tutorial for more details). Note that while this complex is not an active target for protein design, the wealth of experimental and computational information makes it an instructive example.

4. Setting up the FEP+ Residue Scan

In this section, you will set up the residue scanning simulation using the FEP+ Protein Mutation panel in order to investigate the effect of a subset of the mutations from the Zhang, H. et al. publication on the binding affinity of the complex.

Figure 4-1. Loading the structure into the FEP Protein Mutation panel.

  1. Select the 1OGA_prepared entry from the Entry List.
  2. Go to Tasks > Browse All > FEP+ > Protein FEP+
  3. For Use structures from, choose Project Table.
  4. Click Load
    • The mutation table populates.

The yellow warning triangle highlights potential issues with the protein structure. Clicking it opens the Protein Reliability Report, where you can investigate the cause for the warning. In this structure, there are minor deviations from planarity for three peptide bonds which should not impact the simulation.

Figure 4-2. Enabling lambda dynamics to perform an FEP+ residue scan.

  1. For Calculation type, choose Selectivity
  2. Enable Quick scan with λD (Beta).

 

Note: Choosing this option deactivates the Multi-site Mutation tab as the residue scan implicitly calculates all selected mutants.

 

  1. For Binding Partners, choose Chains A and C.

Figure 4-3. Setting up mutations for a FEP+ residue scan.

Next, you need to specify the mutations for which to run the scan. FEP+ residue scanning currently supports mutations to all standard residues except for Proline.

 

  1. In the table, find the row for A:65 (ARG) and click the corresponding field in the Mutation column.
  2. Select ALA to mutate this residue to Alanine.
  3. Repeat this process for the other residues identified in section 3, introducing the following mutations (by residue ID):
    • Mutate to ALA: 65, 66, 68, 72, 73, 75, 76, 146, 152, 155, 163
    • Mutate to GLY: 69, 149, 150, 158

 

Note: You should have 15 active mutations in total.

Figure 4-4. Changing the λD Advanced Options.

To increase the sampling during the simulation, we will increase the simulation time to 10 ns.

 

  1. Click the Cog icon and choose Advanced Options.
  2. In the Advanced Options window, set the Simulation time to 10 ns.
  3. Click OK.

The default simulation time of 5 ns is the minimum to obtain reliable results. Our best practice recommendation is to run 10 ns simulations.

 

In addition to lengthening the simulation, you could improve the accuracy of the prediction by increasing the number of replicas in the Advanced Options window.

Additionally, you can increase the -adapt-interval and -adapt-iterations parameters by writing out the job files and editing the <jobname>.sh file. See the fep_residue_scanning command help page.

 

Writing out the job files is also useful if you want to efficiently start multiple FEP+ residue scans on closely related structures. The mutations are specified in the mutations.txt file, which you can copy, edit, or programmatically generate.

Figure 4-5. Running the job.

This is a very long calculation and you can find the simulation results in the tutorial files.

 

  1. Change Job name to fep_lamD_prot_mutation_1OGA.
  2. Click Run.
    • This job requires a Linux host with a supported GPU and takes approximately 8 hours on 2 GPUs.
    • You can find the results of the calculation in the fep_lamD_prot_mutation_1OGA-out.fmp file in the tutorial zip archive.

The full output for this job resembles the output of other FEP+ calculations and is compatible with the generic result analysis functionality in the FEP+ panel. See the FEP+ Input and Output Files and Directories documentation page for more information.

5. Analyzing the Results of the Scan

In this section, we will look into the results of the FEP+ residue scan and compare them to the experimental references from Zhang, H. et al.

Figure 5-1. Importing the results of the scan into the FEP+ panel.

The FEP+ panel provides various tools to analyze the results of FEP+ calculations of any type.

  1. Go to Tasks > Browse All > FEP+ > FEP+
  2. In the FEP+ panel, click Browse.
  3. Find and choose the fep_lamD_prot_mutation_1OGA-out.fmp file in the tutorial zip archive.
  4. Click Next.
    • The FEP+ panel populates with the results of the residue scan.
    • Additionally, the structures of the receptor and wild type HLA-A2 as well as all mutants are imported as entries.

Figure 5-2. Sorting the results by predicted ∆G.

Sorting the table by predicted ∆G allows us to identify potentially impactful mutations.

  1. Click the title of the Pred. Selectivity (∆G) column twice to sort by descending predicted ∆G.

You can now explore the results of the residue scan. The Pred. Selectivity (∆G) column lists the effect of each mutation on the free energy of binding between the HLA-peptide complex and the T-cell receptor’s complementarity-determining region.

The residues whose mutations are most impactful in disrupting the complex are LYS146, LYS68 and ARG75. As an optional step, feel free to find these residues in the workspace and investigate their role in the protein-protein-peptide interface.

Figure 5-3. Importing experimental affinity data for comparison to the FEP residue scan results.

For this set of mutations, you can now compare the results to experimental measurements.

 

  1. Click Affinity.
  2. Click Experimental Data.

 

Note: You can also switch between displaying the selectivity or stability of the complex as a property of interest here.

Figure 5-4. Configuring the affinity reference data import.

  1. Click Browse.
  2. Find and choose the experimental_reference.csv file from the tutorial zip archive.
  3. In the Select ligand title section, choose mutation.
  4. In the Select affinity property section, choose dG LD.
  5. Click OK.
    • The FEP+ panel will update to include the reference data in the Exp. Selectivity (dG) column.

You can now compare the predicted vs the experimental dG value for each of the mutations. The Predicted Selectivity Plot column shows the experimental value as a blue line and the prediction is shown as a black range based on the predicted error. For more details on the functionality available for reviewing FEP+ calculations, see the FEP+ panel documentation. Note that the available features depend on the specific type of FEP+ calculation you want to review.

Figure 5-5. Plotting the prediction versus the experimental reference for all mutants.

To see the correlation between the prediction and experimental data at a glance as well as some additional statistics, you can generate a scatter plot.

  1. Click Plot.
    • The Correlation Plot (FEP+) panel opens.

 

Note: You can find a detailed explanation of the values shown in the statistics table next to the plot and the available options in the Correlation Plot (FEP+) documentation page.

The overall correlation between the predictions and the experimental reference is fair, with an RMSE of 0.75 kcal/mol for all mutant pairs. The largest outliers are just outside the 1 kcal/mol error band indicated in dark gray.

 

In order to further refine the results, a next step would be to run full Protein FEP+ simulations for residues of interest (usually those predicted to be most impactful).

 

Optional: You can also find the predicted ∆G values from full Protein FEP+ calculations for each mutant in the dG FEP column of the csv file. You can follow steps 6-12 to load the full Protein FEP+ results as a reference and compare them to the results from the FEP residue scan. Performing regular Protein FEP+ calculations on this same set of mutations would result in a pairwise RMSE of 0.58 kcal/mol compared to the experimental values.

6. Conclusion and References

In this tutorial, you learned how to perform a Protein FEP residue scan to calculate the effect of a set of mutations on the binding affinity of the complex between human HLA-A2, a viral peptide and a T-cell receptor and analyze the results of the calculation.

For further learning:

For some related practice, proceed to explore other relevant tutorials:

For further reading:
  • For more details on the λ dynamics approach, see
  • For more details on the biological system used in this tutorial, see
    • Zhang, H. et al. The contribution of major histocompatibility complex contacts to the affinity and kinetics of T cell receptor binding. Sci. Rep. 6, 35326; https://doi.org/10.1038/srep35326 (2016).
      The 1OGA structure used for this tutorial corresponds to the JM22 receptor in this publication.
    • Jankauskaitė J, et al (2019) SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 35, 462–469 (https://doi.org/10.1093/bioinformatics/bty635)
      The SKEMPI database contains among others the experimental references for the ∆Gs of the mutations used in this tutorial (entries for the 1OGA structure).

7. Glossary of Terms

Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion

Included - the entry is represented in the Workspace, the circle in the In column is blue

Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data

Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)

Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project

Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries

Working Directory - the location where files are saved

Workspace - the 3D display area in the center of the main window, where molecular structures are displayed