Protein pKa Prediction with Constant pH Molecular Dynamics

Tutorial Created with Software Release: 2024-4

Topics: Biologics Drug Discovery, Enzyme Engineering, Peptide Design, Structure Prediction & Target Enablement

Products Used: Desmond, FEP+, Maestro

Tutorial files

593 MB

This tutorial is written for use with a 3-button mouse with a scroll wheel.

Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed

Tip: You can hover over a glossary term to display its definition. You can click on an image to expand it in the page.

Abstract:

In this tutorial, you will learn how to set up and run a series of MD simulations on a protein over a range of pH values using the continuous constant pH MD method in a single job to predict pKa values for titratable protein residues. You will analyze the validity of your simulations by examining convergence plots, titration curves, and trajectories, and by comparing the values obtained with experimental pKa values.

Tutorial Content

Introduction to Constant pH MD Simulations

Creating Projects and Importing Structures

Setting Up and Running the CpHMD Simulation

Analyzing the Results

Conclusion and References

Glossary of Terms

1. Introduction to Constant pH MD Simulations

Proteins contain many titratable residues whose protonation state depends on the pH of the environment and the pKa of the individual residue, which can be influenced by ligand binding or protein-protein interactions. A change in the protonation state can induce structural changes in the protein, which in turn can lead to a change in biological function or activity. The continuous constant pH molecular dynamics (CpHMD) method allows the pKa values of multiple titratable protein residues to be calculated simultaneously in a single job. This is especially important for buried residues within the protein or coupled titratable residues, which are often found as the catalytic residues of enzymes.

The method is based on a molecular dynamics (MD) and lambda dynamics (λD) approach. In addition to the standard force field description of the molecular system, the protonation state of a titratable residue is described by a continuous and dynamic variable (λ). Schrodinger's implementation is based on a multi-site lambda dynamics approach, meaning every protonation state of a titratable residue is assigned a lambda value, and the sum of them equals one. Learn more about this in J. Chem. Theory Comput. 2011, 7, 9, 2728–2739.

This results in an extended Hamiltonian that allows the simultaneous sampling of spatial (real) and titration (virtual) coordinates in an MD simulation. Several such simulations are run at different pH values, and replica exchange is periodically performed between simulations of adjacent pH values to increase the sampling. From every individual simulation, the fraction of frames in the deprotonated state relative to all frames (S) for a titratable residue is calculated and plotted against the corresponding pH values. Non-physical states that are neither fully deprotonated, nor fully protonated (0.1 < λ < 0.9) are filtered out. This results in a titration curve from which the pKa value can be estimated by fitting with the Hill equation (). A CpHMD run can simultaneously incorporate multiple titratable residues. Learn more about this in J. Chem. Theory Comput. 2016, 12, 5411–5412.

In this tutorial, you will use the CpHMD method implemented in the Schrödinger suite to calculate the pKa values of the titratable residues in the peripheral subunit-binding domain of 2-oxoglutarate dehydrogenase from Escherichia coli, commonly referred to as BBL. You will learn how to set up and run the simulation from the Maestro graphical user interface (GUI), which utilizes functionality from Schrödingers MD engine Desmond with the underlying force field OPLS4 as well as FEP+ for lambda dynamics. This tutorial will guide you through the process of setting up and running a CpHMD job. You will then learn how to analyze the results of the simulation by comparing the obtained pKa values to experimental values and checking the reliability from the corresponding titration curves, convergence plots and trajectories.

2. Creating Projects and Importing Structures

At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is created, the project is automatically saved each time a change is made.

Structures can be imported from the PDB, or from your Working Directorythe location where files are saved using File > Import Structures, and are automatically added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.

Figure 2-1. Importing the input data for this tutorial.

Open Maestro and create a new project named cphmd.prj for this tutorial.
- Don’t know how? See First steps in Maestro.
Download the tutorial zip file including input files and reference outputs here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/membrane-bound_fep_a2a.zip
After downloading the zip file, unzip the contents in your Working Directorythe location where files are saved for ease of access throughout the tutorial.
Go to File > Import Structures.
Find and choose 1W4H_prepared.maegz from the tutorial files.

Structure files obtained from the PDB, vendors, and other sources often lack necessary information for performing modeling-related tasks. Typically, these files are missing hydrogens, have incorrect or missing bond order assignments, charge states, side chain orientation, or are missing loop regions. In order to make these structures suitable for modeling tasks, the Protein Preparation Workflow is used to resolve common structural issues.

In this tutorial, the protein has already been prepared. If you are following along with your own structure, make sure it is fully prepared before proceeding to the setup of the CpHMD simulation. The Introduction to Structure Preparation and Visualization tutorial as well as the Best Practices for Protein Preparation can guide you through the basics of the process. For more information tailored to MD applications, see the Introduction to All-Atom Molecular Dynamics Simulations with Desmond tutorial.

3. Setting Up and Running the CpHMD Simulation

In this section, you will learn how to set up and run a CpHMD job with the Schrödinger suite via the Maestro graphical user interface with the Protein Constant pH Simulation panel, starting with a prepared protein system. Although CpHMD jobs are based on MD simulations, you do not need to set up a Desmond model system yourself or worry about minimizing and equilibrating the system. These steps are performed automatically in the CpHMD workflow. The MD simulations are run in the μVT ensemble with explicit solvent and grand canonical Monte Carlo (GCMC) sampling of water molecules, which improves water sampling especially in buried protein pockets. To learn more about setting up and running simple MD simulations with Desmond, see the Introduction to All-Atom Molecular Dynamics Simulations with Desmond tutorial.

As an alternative to the GUI, CpHMD jobs can be run from the command line, which provides more options for customizing your jobs. For more information on usage and options, see the CpHMD command help.

Figure 3-1. The Protein Constant pH Simulations panel showing the detected titratable residues.

Make sure the entry 1W4H_prepared is includedthe entry is represented in the Workspace, the circle in the In column is blue and selected(1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries.
Go to Tasks > Browse > FEP+ > Constant pH Simulations….
- The Protein Constant pH Simulations panel opens.
For Use structures from, select Workspace (1 entry), and click Load.
- The structure is loaded and analyzed. 11 titratable residues are detected.

Figure 3-2. Selecting the titratable residues for the CpHMD run.

In the Select titratable residues for pKa calculations section, click Select.
- A panel opens.
Select all residue types and click Apply.
- All 11 titratable residues are selected.

Currently the titratable protein residues are limited to Asp, Cys, Glu, His and Lys. You can select individual residues, residue types, or all residues.

Figure 3-3. Setting the pH range in the Constant pH Advanced Options dialog box.

Click on Change, to select the pH range.
- The Constant pH Advanced Options dialog box opens.
Set the pH lower limit to 0.0 and the pH upper limit to 13.0.
- The number of replicas is updated to 27.
Set the Simulation time to 20.0 ns
Click Save.
- The used pH range is updated.

The selected pH range for the titration curve is recommended to span the expected pKa values of the selected titratable residues with a margin of 2-3 units on either side. This recommendation is displayed in the panel. For Asp/Glu/His the recommended range is 0-8, for Lys it is 7-13, and for a mixture it is 0-13.

The simulation time affects the accuracy of the results. Longer simulation times improve the accuracy but increase computation time.

Figure 3-4. The Protein Constant pH Simulations - Job Settings dialog box.

Change the Job name to constant_ph_1W4H.
Optional: Click the cog.
- The Protein Constant pH Simulation - Job Settings dialog box opens.
Optional: Select a Linux-based host with GPUs and select a total number of GPUs to use for the job.
Click OK.

The constant pH MD job is now ready to run. Because the workflow utilizes Desmond and FEP+, the job can only run on Linux-based hosts with GPUs. For this reason, and to save time, you do not need to run the actual job. Pre-generated result files for a 20ns simulation of the system can be found in the zip file you downloaded at the beginning. Feel free to run the simulation if you have access to suitable hardware.

4. Analyzing the Results

In this section, you will analyze the results by comparing the estimated pKa values from your simulation with experimental values, where possible. In addition, you will check the validity of the pKa values obtained with CpHMD by looking at the Hill coefficients and convergence plots. The former may reveal coupled residues. Finally, you will check a trajectory from your run to see the protonation states in different frames.

Please note that calculations may have been performed with an earlier version of the software and the results may not be exactly the same as those you produced in this tutorial. Especially for methods with a statistical component, such as MD simulations, an exact reproduction of a trajectory and its properties is only possible if the same random seed, software and hardware configuration are used. With sufficient sampling, all properties of the ensemble should be reproducible, while the properties of individual frames and the exact course of a trajectory may differ.

Figure 4-1. The results of the CpHMD simulation in the csv file: Inequality, pKa values and Hill coefficients. Experimental pKa values are added for comparison.

With a file explorer, navigate to your Working Directorythe location where files are saved and from there into the job directory named constant_ph_1W4H.
Open the file constant_ph_1W4H.csv.

Note: The experimental pKa values have been manually added to the file for this tutorial. They were taken from J. Mol. Biol. (2009) 387, 986–992 and J. Mol. Biol. (2010) 403, 313–327. No experimental values are available for the Lys residues.

The first and second columns contain the site numbers of the selected titratable residues and the residue names as chain_residuetype_residuenumber. The third column shows an inequality sign (< or >) if the estimated pKa value is outside the range of the simulated pH range and therefore can not be obtained reliably. The fourth column displays the estimated pKa values. If there is a sign in the inequality column, this value should be interpreted as either a lower or upper bound to the true pKa value. The last column contains the Hill coefficients (n in the Hill equation), which indicate the quality of the fit in terms of single residue titration. Values close to one imply a close to ideal fit, while larger deviations suggest that residues may be coupled and fitting of a single pKa model is insufficient.

Here, all estimated pKa values are within the simulated pH range and are predicted within 1 pH unit of the experiment, most are reproduced within half a pH unit. The Hill coefficients do not deviate significantly from one, indicating that there are no strong coupling effects between residues in this protein.

Figure 4-2. Convergence plots for site 2 (Glu 141).

More detailed results can be found in the folder constant_ph_1W4H_8-out.tgz. Extract the contents of this folder and have a look at the files within.
Have a look at the convergence plots, which can be found for every residue, e.g. constant_ph_1W4H_A_GLU141_splot.png.

The convergence plots show the fraction of frames in the deprotonated state relative to all frames (deprotonated plus protonated), denoted S, along the simulation time. A curve is plotted for each simulation at a different pH. The plots will help you estimate if your simulation time is long enough and the sampling is sufficient.

In this case, most curves approach a stable value at about 10 ns of simulation time. The only residue for which the convergence does not look excellent is ASP 162, but it is good enough to reproduce the experimental pKa value.

Figure 4-3. Titration curve for site 2 (Glu 141).

Have a look at the titration curves, which can be found for every residue, e.g. constant_ph_1W4H_site2_A_GLU41_titr.png.

The titration curves show the fraction of frames in the deprotonated state (S) for the simulations at different pH values. The pKa value is extracted from these curves by fitting the Hill equation to the data points. It is important that the simulated pH range completely covers the transition from the deprotonated to the protonated state, i.e. the titration curve should reach S values near to zero and one. Here all titration curves look reasonable in terms of fitting and pH range.

Figure 4-4. Importing the final structure of the simulation with pH 4.0.

You can load individual trajectories of simulations to troubleshoot or extract snapshots for use in other calculations. Here, you’ll check the trajectory of pH 7.5 (the cyctosolic pH of E. Coli).

Go to File > Import Structures….
- The Import dialog box opens.
Navigate to your extracted results directory and choose constant_ph_1W4H_6_pH__7.5-out.cms.
- A banner appears and a group is added to your Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.

Figure 4-5. Loading the trajectory via the workflow action menu.

Load the trajectory by either double-left-clicking on the small T button in the Title column of the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion, or by right-clicking on the T button to open the workflow action menu and selecting Load Trajectory.
- The Trajectory Player opens with the loaded trajectory in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed and the toolbar at the bottom.

Figure 4-6. Visualizing the protonation states along the trajectory.

Choose a visualization that is most helpful to you, e.g. by only displaying the protein as ribbons and the titratable residues with a ball-and-stick representation.
Click on λ Sites in the Trajectory Player toolbar.
- All titratable residues (lambda sites) are shown.
Select A: 142 (HIE 64%)
- The Workspacethe 3D display area in the center of the main window, where molecular structures are displayed view is fitted to the selected residue.
Go through some trajectory frames by using the arrow keys on your keyboard, by clicking the step forward button or the play button in the toolbar and observe how the protonation state of the residue changes over the course of the simulation.

The protonation states of the lambda sites/titratable residues are automatically updated and displayed in the Workspace according to the highest weight of the state (in %), which is shown in parenthesis after the residue label. You can use the trajectory to identify stable hydrogen bonding patterns or explore pH dependent structural changes.

You can use the Trajectory Player tools to measure and plot distances, fluctuations, or other properties along the course of the simulation. See the Introduction to MD Trajectory Analysis with Desmond tutorial and the Trajectory Player help for examples.

5. Conclusion and References

In this tutorial, you learned how to set up and run a CpHMD simulation from the Maestro GUI to obtain pKa values for multiple titratable protein residues in a single job. You compared the resulting pKa values from your simulation with experimental values and found a close match. In addition, you analyzed the simulation in terms of validity by reviewing the Hill coefficients, convergence plots, and by inspecting the trajectories.

Click to Expand

For further learning:

BACE1 Inhibitor Design Using Free Energy Perturbation Tutorial

Click to Expand

For further reading:

6. Glossary of Terms

Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion

Included - the entry is represented in the Workspace, the circle in the In column is blue

Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data

Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)

Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project

Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries

Working Directory - the location where files are saved

Workspace - the 3D display area in the center of the main window, where molecular structures are displayed