Validating a Protein Free Energy Perturbation Model for Thermostability Predictions for Single Point Mutations
Tutorial Created with Software Release: 2025-2
Topics: Antibody Design , Biologics Drug Discovery , Enzyme Engineering , Free Energy Perturbation (FEP) , Structure Prediction & Target Enablement
Products Used: FEP+
|
42 MB |
This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayedthe 3D display area in the center of the main window, where molecular structures are displayed
Abstract:
In this tutorial, you will learn how to use Free Energy of Perturbation (FEP) calculations to make thermostability predictions for several single point mutations in the T4 Lysozyme protein.
This tutorial guides you along the process of screening mutations across a number of sites in order to validate the system preparation and FEP+ protocol, and focuses on the technical and troubleshooting aspects of the output analysis. The Introduction to Protein Thermostability Prediction using Protein FEP+ tutorial focuses on targeted mutation applied to an individual residue of the T4 Lysozyme to improve its thermostability.
Tutorial Content
1. Introduction to Protein Thermostability Prediction
When designing protein-based therapeutics or enzymes for industrial or consumer applications, it is crucial to understand whether an introduced mutation would adversely affect their thermostability, i.e. the ability of a protein to resist structural changes caused by heat. Experimental approaches to determine a change in thermostability due to a mutation can be expensive, slow and may sometimes not be feasible at all.
Computational or in silico mutagenesis, on the other hand, can be performed on any system, and there are different tools that can be used to predict thermostability. Most lack the ability to capture the dynamics of a protein system, as well as the effect of solvent molecules on the stability of a protein's folded state. One such method is residue scanning with Prime MM-GBSA (molecular mechanics with generalized Born and surface area), which provides qualitative results. Its high-throughput nature makes it an attractive tool for classifying a mutation as beneficial, deleterious, or unlikely to have a significant effect on thermostability, and so can be used to filter through large numbers of mutations.
Protein Free Energy Perturbation (FEP+) is a physics-based approach to the thermostability prediction task that uses explicit solvent, molecular dynamics, and a state-of-the-art force field (OPLS4) to compute both the enthalpic and entropic contributions to the free energy of a system. This gives accurate predictions, but at a higher computational cost. It can be afforded to be used for a selected number of mutations.
FEP+ residue scanning bridges the gap between the fast but approximate MM-GBSA and costly but accurate Protein FEP+, as it’s suited for screening larger numbers of mutations while still capturing the dynamics of the system. See the Identifying impactful mutations using FEP+ residue scanning tutorial for a guided introduction to applying this method for the prediction of binding affinities within a ternary complex.
In addition to prospectively identifying promising mutants, accurate in silico methods can help provide a rationale for the success or failure of specific mutations using computational structural analysis. In other words, they can be used to understand and explain the underlying structural mechanism, which can be difficult to achieve in experiments.
In this tutorial, you will reproduce the results from this publication, performing Protein FEP+ for a set of 15 mutants across 9 sites of T4 lysozyme, an enzyme commonly used in crystallography experiments where thermostability is desirable. You will go through the job setup and analysis of the results, with a focus on validating the predictivity of the model and investigating outliers. For an example of using a validated Protein FEP+ model for targeted mutation of an individual residue of the T4 lysozyme to improve its thermostability, see the Introduction to Protein Thermostability Prediction using Protein FEP+ tutorial.
2. Creating Projects and Importing Structures
At the start of the session, change the file path to your chosen Working Directorythe location that files are saved in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is created, the project is automatically saved each time a change is made.
Structures can be imported from the PDB directly, or from your Working Directorythe location that files are saved using File > Import Structures, and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.
- Double-click the Maestro or BioLuminate icon
- (No icon? See Starting Maestro)
- This tutorial uses Maestro, but this workflow can be performed in Maestro or BioLuminate. Use whichever interface you are comfortable with or typically use for your projects.
- Go to File > Change Working Directory
- Find your directory, and click Choose
- Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/protein_fep_thermostability.zip
- After downloading the zip file, unzip the contents in your Working Directory for ease of access throughout the tutorial.
- Go to File > Save Project As Change the File name to Protein_FEP_1L63_Thermostability, click Save
- The project is now named
Protein_FEP_1L63_Thermostability.prj
- The project is now named
3. Prerequisites for Protein FEP+ and System Preparation
Structures obtained from the PDB, vendors, and other sources often lack the necessary information for performing modeling-related tasks. Typically, these files are missing hydrogens, partial charges, side chains, and/or whole loop regions. In order to make these structures suitable for modeling tasks, we use the Protein Preparation Workflow to resolve issues.
In this tutorial, the 1L63 protein structure was retrieved from the PDB and prepared following the method described in the reference paper in order to save time. However, these preparation steps are a necessary part of the process and must be done before any FEP+ calculations. Please see the Introduction to Structure Preparation and Visualization tutorial for instructions on using the Protein Preparation Workflow and Preparing Protein and Ligand Structures for FEP+ and the Protein FEP+ Best Practices for tips on structure preparation for FEP+.
The processed, H-bond optimized, minimized “1L63_prepared.maegz” structure was downloaded with the tutorial files. In this section, we will load the pre-prepared 1L63 structure into the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
In addition to the protein structure, you should have a small set of mutants whose thermostability relative to the wild type is known experimentally. Optimally, these mutants should span a large range of ∆∆G compared to the wild type. In this tutorial, you will be using the dataset from this publication, containing 15 mutants with experimental ∆∆Gs ranging from -0.6 kcal/mol – 1.9 kcal/mol.
- Go to File > Import Structures
- Select the file
1L63_prepared.maegz - Click Open
- The prepared 1L63 protein is loaded into the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed
4. Setting Up Protein FEP+ Stability Calculations
In this section, we will set up the FEP+ Job files by choosing which residues to mutate. During the FEP+ calculation the selected residues will be perturbed and transformed from the starting residue to the desired mutant. Figure 3-7 shows the list of residues that we will be mutating and analyzing, along with their experimental ΔΔG. This list can be found in the supplementary information section of the reference paper. For more information on setting up protein FEP+ calculations see the FEP Protein Mutation Panel documentation page and the Protein FEP+ Best Practices.
- Go to Tasks > Browse > FEP+ > Protein FEP or in the Tasks search bar type “protein FEP”
- The FEP Protein Mutation panel opens.
The FEP Protein Mutation panel supports several workflows for investigating the effect of protein mutations. In this tutorial, you will use the Stability mode to gain insights into the effect of mutations on the protein’s thermostability.
Selectivity mode is available when the structure contains multiple protein chains in order to understand the effect of mutations on the affinity between them.
The Residue scan with λD option enables the residue scanning mode, which uses additional approximations to make the calculations faster. This mode allows you to screen more potential mutants while still capturing the dynamics of the system, resulting in improved accuracy compared to MM-GBSA residue scanning.
See the Identifying impactful mutations using FEP+ residue scanning tutorial for an example of using FEP+ residue scanning to predict the effect of mutations on the selectivity of a MHC-peptide-TCR system.
First, you need to load the prepared structure.
- Go to Tasks > Browse > FEP+ > Protein FEP or in the Tasks search bar type “protein FEP”
- For Use structures from, choose Project Table (1 selected entry) and click Load
- Protein to mutate shows 1L63_prepared OK
- For Calculation type, choose Stability
The panel confirms that the structure has been properly prepared using the Protein Preparation Workflow with the green OK. If yellow exclamations appear, you can hover over them for more information.
The table in the panel lists all residues in the structure. You can now specify the mutations you want to introduce for each residue in the protein separately. By default, the panel will generate the single mutants. If you wish to investigate the combined effect of introducing several mutations, you can switch to the Multi-site Mutation tab to generate multi-mutants.
You can now introduce your first mutation: S38N.
- Scroll down through the Residue list until you reach A:38(SER)
- Click in the Mutation column to the right of A:38(SER) to edit
- A pane with mutation options is displayed
- Select ASN to indicate that we want to mutate SER38 to ASN
- ASN is now listed in the Mutation column of the table
- Click Close
Note: The check boxes for some residues are disabled because they are not valid mutations. Hover over a disabled residue to see why it’s invalid.
You can now set up the mutations investigated in the experimental reference.
- Using the list in Figure 3-7, set up all of the listed mutations using the FEP Protein Mutation panel
- Residues 59(THR) and 109(THR) require more than one mutation, click to select all of the amino acids from the list.
- Each of the selected mutations is displayed in the Mutation column of the table.
- Once all mutations are added there will be a total of 9 residues to mutate, resulting in 15 mutants.
Tips and tricks for specifying mutations:
- If preferred, you can select the Pick residue box and use this option to pick residues to mutate from within the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
- You can click Select in the mutation pane to choose multiple residues by property (e.g. small or charged residues).
- You can also specify mutations by editing the
mutations.txtfile after making all other settings in the panel and writing out the job files. This can be useful if you’re planning to use the same set of mutations for multiple FEP+ jobs, e.g. using different input structures. - To mutate to non-standard residues, you can enable those you are interested in from the Nonstandard Residues Panel. You can choose residues from the built-in library or create your own. They will then appear in all protein mutation menus.
Figure 4-6. Enabling non-standard residues to appear in the Protein FEP panel.
- Change the Job name to Protein_FEP_1L63
- Click on the cog next to Run or click the drop-down arrow and choose Job Settings
- The FEP Protein Mutation - Job Settings panel opens
- Choose your CPU Host and GPU Host
Note: Ensure Maximum simultaneous subjobs are set to 0. This removes the limit on the number of subjobs, so they are all submitted to the subjob host queue. If you do not have license checking enabled, set the number of subjobs to ensure that you do not exhaust your licenses.
- Click OK
- Optional: Click Run if you want to run the job yourself.
- Warning! This is a resource intensive job using several dozens of GPU hours.
- Optional: Click on the cog next to Run and select Write to write out the input files
- The input files are saved to your Working Directorythe location that files are saved and the .sh file can be used to run the job using the command line.
- This allows you to inspect and edit the job files (e.g.
mutations.txt) before submitting the job.
Note: For more information on how to run the files you have written out see Using the Command Line with the Schrodinger Platform.
5. Analyzing Protein FEP+ Stability Results
In this section, we will analyze pre-generated protein FEP+ results to view the estimates of the free energies for the perturbations compared to experimental free energies. We will use the FEP+ panel to inspect the results and the errors associated with the calculated ΔΔG values to determine if the model is predictive and can be used for prospective use. See the FEP+ Panel - FEP+ Protein Mutation Analysis Tab documentation for more information.
5.1 Comparing prediction and reference across the entire set of mutations
First, you will load the FEP+ results into the FEP+ panel and add the experimental reference values for comparison. Then, you can visualize the overall results as a scatter plot and check key quality metrics.
- To import the perturbation map, choose File and click Browse
- The Select Input File panel opens
- Select the file
Protein_FEP_1L63_out.fmp - Click Open
- The pre-generated protein FEP results are loaded into the FEP+ panel
- 16 new entries, one for the wildtype, and one of each mutation, are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion
Optional: Includethe entry is represented in the Workspace, the circle in the In column is blue and step through the new structures in the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion to visualize each mutant in the 3D workspace.
- In the FEP+ panel, click Next
- The FEP+ panel expands to display the results
Three tabs are displayed in the FEP+ panel. The Overview tab shows the protein mutations and the results of the FEP+ calculations. The Predicted Binding Plot column provides a graphical representation of the estimated ΔG shown as a range based on the predicted error.
Note: For more information about the results displayed in the Overview tab, see the FEP+ Panel - Overview Tab documentation.
- Click on the Pred. Stability (∆G) column header to sort the mutants by the impact they have on the thermostability of the protein.
We know the experimental ΔΔG values for our mutants and can add these to the Exp. Selectivity (ΔG) column.
- In the bottom toolbar, click Affinity and choose Experimental Data.
- In the Choose Affinity Property panel, click Browse and locate
experimental_values.csvin the tutorial zip archive. - For select ligand title, choose mutation
- For select affinity property, choose ddg
-
Click OK
- Blue vertical lines representing the experimental ΔG values are added to the Predicted Binding Plot column for all of the mutants giving a visual indication of the reliability of the prediction.
Note: You can also add the experimental values one by one by clicking in the Exp. Stability column for each mutant.
Now, you can compare the predicted change in stability to the experimental reference for each mutant. Only a few mutations stabilize the protein, the most notable of which is ASN144GLU, where the longer and anionic glutamate seems to result in a slightly more efficient hydrogen bond network.
The two mutations predicted to be most destabilizing are ASP47ALA and LEU46ALA, and while LEU46ALA is correctly ranked as the most destabilizing mutation, the effect of the ASP47ALA mutation is overpredicted by 1.2 kcal/mol.
- Optional: Click on the Export option and choose Overview Table
- The Overview Table is saved in CSV or Excel format
Note: You can also export an updated version of the perturbation map (.fmp) including the experimental ΔG values that we added from this menu. In future Maestro sessions, this updated map can be loaded in place of the original perturbation map as described in Section 4 step 2.
You can now visualize the correlation between prediction and reference over the entire data set.
- Click on Plot near the bottom of the FEP+ panel
- A new panel titled Correlation Plot (FEP+) opens displaying the predicted versus the experimental ΔG
In the plot, you can hover over the individual points to see which mutations they represent. The X-axis shows the experimental reference for the stability change, while the Y-axis shows the FEP+ predictions. The dark gray band in the plot shows the area where prediction and reference are within 1 kcal/mol of each other, the light gray band represents an error between 1-2 kcal/mol.
On the right-hand side, you can see various statistical metrics for the data set. The root-mean-square error (RMSE) is a good measure of how good the predictions are with respect to the experimental affinities. Unlike the R2, the RMSEs are independent of the dynamic range of the dataset and hence are more reliable metrics for the goodness of the predictions. Our FEP+ best practice is to proceed with prospective FEP+ calculations for models showing errors less than ~1.3 kcal/mol in retrospective validations. For a more detailed discussion of these metrics, see the Correlation Plot (FEP+) panel documentation.
This plot is also a good way to identify outliers for further investigation.
- Click on one of the outliers in the Correlation Plot
- The selected data point will be highlighted in green on the plot
- The corresponding mutation will be highlighted in the FEP+ panel Overview table
- Optional: Click Save As to save this plot as a PDF or PNG image
The worst outlier in this example is the ASP → ALA mutation at residue position 47, with a prediction error of 1.2 kcal/mol. This is nonetheless a reasonably good prediction, but you can still check the available metrics to see what might be the cause of this comparatively larger error.
5.2 Analyzing an individual edge more deeply
First, have a look at the perturbation map which was generated by the Protein FEP panel based on the similarities between the mutants.
- Back in the FEP+ panel, switch to the Map tab.
In the graphic on the right-hand side of the tab, you can see how each of the mutations is connected to the wild type. When there are several mutants at the same site, additional edges are added to connect them to each other. This allows for the application of the cycle closure correction, improving the accuracy of the prediction for these mutants.
Sadly, the ASP47ALA mutant is not part of a cycle, so no correction could be applied.
- Click on the FEP+ Protein Mutation Analysis tab
- An analysis of the results of the FEP+ mutation job and information about the trajectories is displayed.
The Analysis tab is the entry point for digging deeper into the individual edges. See the FEP+ Panel - FEP+ Protein Mutation Analysis Tab documentation for details on everything you can do here.
The Energy convergence, Ligand RMSD, REST exchange and cycle closure convergence columns group these technical metrics for simulation quality into the categories Good, Fair, or Bad. You should look out for bad edges, which may be due to mutations between residues with very different properties, issues with the receptor structure near the mutated residue, or receptor reorganization in response to the mutation.
None of the edges for our mutant set seem to have any obvious problems, including the one for our ASP47ALA mutant.
Optional: To view the trajectories in the Workspace you can click on the trajectory length (default 5.0 ns) in either the Solvent or Fragment Trajectory column of the Analysis tab for the relevant mutation.
This allows you to use the trajectory analysis tools to analyze the system’s behavior throughout the simulation, e.g. for troubleshooting or understanding why a particular mutation is impactful.
See the Introduction to MD Trajectory Analysis with Desmond tutorial for an overview of the available tools.
You can find more information on a specific edge by opening the edge analysis.
- In the Edge Analysis column, click View… for the ASP47ALA mutation
- An Analysis panel for the selected mutation will open
The panel opens in the Summary tab, showing an overview of the mutation.
Figure 5-14. Comparing the interactions formed by the mutated residue and its counterpart in the wild type.
- Switch to the Residue interactions tab.
The Residue Interactions tab contrasts the interactions present in the wild type and mutant between this residue and its environment, providing insights into what is driving the changes in thermostability.
As expected, replacing the aspartate with an alanine greatly disrupts the hydrogen bond network in this part of the protein. In particular, the interactions with ASN 53, THR 54, ASN 55 and GLY 56 are completely lost.
The final three tabs – Protein Details, Convergence, and REST Sampling provide visualizations useful for troubleshooting outliers and bad edges. See the FEP+ – Analysis Panel documentation for a detailed explanation of what these graphs can tell you.
- Switch to the Convergence tab.
In the Convergence tab, you can see how the ∆G of the system changes over the course of the fragment (top) and solvent (bottom) legs of the simulation.
The plot on the left shows how ∆G converges from the beginning to the end of the simulation. The middle plot shows the same value, but calculated as if the trajectory had happened in reverse. The right-hand plot shows the ∆G calculated in a “sliding window” approach. Ideally all three plots should be similar, indicating good sampling.
In the case of the ASP47ALA mutant, it seems like the fragment leg is not quite converged, with ∆G still significantly decreasing at the 5 ns mark where the simulation ends. Extending the simulation time beyond the default of 5 ns would likely reduce the prediction error for this (and other) mutations. This improvement would come at the cost of increased simulation time, so the decision of how to balance these factors becomes dependent on the environment and pace of your specific project.
6. Conclusion and References
In this tutorial, we set up a Free Energy of Perturbation (FEP) model to make thermostability predictions for the T4 Lysozyme protein. You validated the model on a dataset of 15 single point mutations across 9 residues and found that it was generally accurate enough to enable prospective use to identify promising mutations to improve the T4 Lysozyme's thermostability. You investigated the largest outlier in the data set and came to the conclusion that the model’s performance could potentially be improved by increasing the simulation time to allow for proper convergence of the free energy.
With the validated model in hand, you can now proceed with Introduction to Protein Thermostability Prediction using Protein FEP+ tutorial, where you will mutate a single site for increasing the thermostability of the protein in a targeted fashion.
For further reading:
- See the original reference paper: Improving the Accuracy of Protein Thermostability Predictions for Single Point Mutations
7. Glossary of Terms
Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion
included - the entry is represented in the Workspace, the circle in the In column is blue
incorporated - once a job is finished, output files from the Working Directory are added to the project and shown in the Entry List and Project Table
Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data
Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project
selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries
Working Directory - the location that files are saved
Workspace - the 3D display area in the center of the main window, where molecular structures are displayed