FEP+ Best Practices

This document describes best practices derived from the deployment of FEP+ in drug discovery projects conducted by Schrödinger, and how issues with FEP+ are identified and addressed.

Before You Begin

Before engaging in a drug discovery project, in particular, one that will rely heavily on FEP+, a target analysis is performed to confirm that the target is suitable for deployment of structure-based technologies (e.g., Glide, WScore, WaterMap, FEP+). To perform this target analysis:

  1. Identify all available crystal structures of the target, and analyze crystallographic data to attempt to gauge the propensity of the binding site to reorganize upon ligand binding.

  2. Refine ligand-protein co-crystal structures using PrimeX, consult with in-house crystallographers, and make sure that the structures are complete (no missing residues or atoms) and consistent with the electron density. Model in any missing atoms or residues. All crystallographic water molecules should be retained and included in the x-ray refinement, and crystal waters not overlapping with the ligand(s) should be included in the subsequent FEP+ calculation setup.

  3. Identify at least 10 congeneric ligands with known affinity data (≥20 preferred); make sure there is a co-crystal structure with one of these ligands or a similar ligand. Note: In the case of Absolute Binding FEP+ calculations, several known active compounds should be sufficient.

  4. Confirm that the experimental methods used to measure the affinity data can be unambiguously related to the binding free energy, and that the crystal construct has one-to-one agreement with the protein construct in the assay. It is critical to thoroughly understand the details of the assay, e.g. if the protein is a monomer or dimer in the assay, etc. It is also critical to understand the pH, ionic strength, denaturing/stabilizing agents, added ions, temperature, and solvent conditions of the assay. If these deviate significantly from the conditions of the simulations or the crystallography, systematic errors may emerge.

  5. Determine the best possible binding modes for the ligand(s) that have known binding affinity using all available information, including all relevant protomers and tautomers, possible flipped states of asymmetric rings, etc.

  6. Run the Force Field Builder to parametrize any missing torsions for the ligands. See the Force Field Builder - OPLS4/OPLS5 Panel for more information.

  7. Run FEP+ on the ligand(s) and compare the predicted ∆∆G values with experimental ∆∆G values (for relative binding FEP+), or compare the predicted ∆G values with experimental ∆G values (for absolute binding FEP+).

  8. If the root-mean-square error (RMSE) is less than about 1.3 kcal/mol, and there is no evidence of convergence problems requiring further investigation, proceed with using FEP+ prospectively to prioritize ligand designs for synthesis. Evidence of convergence error(s) that require investigation may include the following:

    • Bennett errors greater than 0.3 kcal/mol.
    • Bad perturbations labeled in the FEP+ map indicating high hysteresis. Typically, high hysteresis is an indication of binding mode changes between edges.
  9. If the RMSE is greater than about 1.3 kcal/mol, or if there is other evidence of convergence problems, consider the sources of error described in the section below. If the source of error can be identified and addressed, proceed with using FEP+ prospectively to prioritize ligand designs for synthesis. Warning: If the source of the error cannot be identified and addressed, do not proceed with the prospective use of FEP+.

  10. When transitioning to prospective use of FEP+, make sure that you include appropriate selection bias corrections (link). A script to apply these corrections is available in the distribution, which you can run with the command:

    $SCHRODINGER/run -FROM scisol fep_selection_bias_correction.py

For more information before beginning, see Preparing for FEP+ Jobs.

Sources of Error

The following are potential sources of errors in the FEP+ predictions:

  1. Inadequate sampling:

    • Large Bennett error for compounds

    • Motions relevant to the binding affinity not being sampled on the time scale of the dynamics simulations. These may be as extreme as DFG-in to DFG-out, or possibly more mundane such as changes in preferred binding modes, changes in residue orientations, etc.

  2. Inadequate force-field parameters:

    • Ligand nonbonded parameters, typically partial charges

    • Ligand torsion-torsion coupling

    • Polarizability / π-cation contributions to binding

    • Inadequate protein force field quality (increasingly uncommon)

  3. Inadequate system preparation or calculation setup:

    • Uncertainty in the structure of the protein or the ligand binding mode

    • Missing residues in the binding site that cannot be reliably modeled

    • Significant and unpredictable rearrangement of the protein upon ligand binding

    • Incorrect protein preparation or protonation state assignment

    • Change in the protonation state of the protein residues or ligand upon binding

    • Uncertainty in the protonation state or tautomeric state of the ligand

    • Ligand stereochemistry uncertainties or pseudo-stereochemistry uncertainties, e.g., tertiary amines cannot invert their stereochemistries during MD, conformation of saturated rings potentially causing sampling issues, etc.

    • Sterically-confined water molecules at the ligand-protein interface, or buried within the protein, neglected during protein preparation

  4. Incorrect experimental data:

    • The protein crystal/model structure construct differs substantially from the protein used in the assay

    • Partner protein that is in the assay, but not the crystal structure (e.g., cyclin in a CDK)

    • No binding assay is available or no assay that has a readout that is expected to correlate with free energy of binding, e.g., only functional activity is measured rather than binding

    • Incorrect ligand synthesized

    • Insoluble ligand matter or other confounding variables prevented accurate measurement

    • Ligand decomposes on the time scale of the measurement

Troubleshooting

To identify which of the above problems may be the source of errors in FEP+ predictions, a hypothesis for the possible source of the error must be formulated and tested over all the available compounds. If the outliers improve without other data points appreciably worsening, then you may have some confidence that the right answers are being obtained for the right reasons. Examples of such hypothesis-driven outlier resolutions are as follows:

  1. Inadequate sampling – to resolve such cases, a hypothesis must be generated for the slow degrees of freedom in the system, and a divide-and-conquer or enhanced sampling scheme must be devised to resolve the problem.

    1. Investigate restraints being set through the absolute binding FEP+.

      • Macrocycles often do not sample adequately between the bioactive and in-solution confirmation. Extending the solvent leg may be required.

    2. An example of a divide-and-conquer solution might be used to resolve a case where the ligand R-group orientations are ambiguous. In such cases, all possible R-group orientations can be included in the FEP+ calculations, and the appropriate statistical mechanical relationships can be used to compute the binding free energies. An example of this approach being used to resolve ambiguous JNK1 R-group orientations can be found here.

    3. Examples of enhanced sampling strategies to resolve slow degrees of freedom may be found in the following:

      • Inclusion of a portion of the ligand in the REST enhanced sampling region to facilitate ring flipping here.

      • Inclusion of an intermediate compound to assign a larger than typical alchemical region to encourage ring flipping and R-group motion here.

      • Inclusion of a portion of the protein in the REST enhanced sampling region to encourage reversible sampling of the relevant protein motion here.

    4. Regions of the protein that are highly flexible or poorly resolved in the x-ray or cryEM structure can be positionally restrained.

  2. Unusually high hysteresis (in relative binding FEP+) or Bennett error may require an introduction of intermediate compounds to improve phase space overlap between the ligands in the FEP+ simulation maps, or longer than standard production sampling times. High Bennett error may also require the introduction of additional lambda windows to improve the phase space overlap of adjacent replicas.

  3. Inadequate force-field parameters – any suspected force-field problem should be addressed to Schrödinger Support or an Application Scientist for investigation.

  4. Inadequate system preparation or calculation setup – a hypothesis must be developed about the nature of the error, a fix proposed, and systematically tested across all the project compounds. Various common hypotheses are discussed below:

    • The protonation states of titratable binding site residues must be handled with care, and it may be necessary to divide subseries into those interacting with, for example, a protonated histidine versus a neutral histidine, etc.

    • The protonation and tautomeric states of ligands must be handled with care as well. Ligands with pKa values close to the pH of the experimental binding affinity assay may exist in both neutral and charged forms. Similarly, the dominant tautomeric state of the ligands might also be different in the protein binding pocket than in the bulk solution.

      • In cases using relative binding FEP+: for those ligands, we recommend modeling all the possible protonation and tautomeric states in FEP+ calculations, and then using the Groups feature in the FEP+ panel or the fep_groups.py script to rigorously take into account all the possible states.

    • Crystallographic waters that do not have severe clashes with ligands should be initially retained during FEP+ simulation. The GCMC-enhanced water sampling is used by default in FEP+ simulations. If water sampling is still suspected to be problematic, run FEP+ with and without the crystal waters to compare the results.

    • When using relative-binding FEP+: incorrect interaction mapping or incorrect atom mapping, although rare, is occasionally observed. This can sometimes be confirmed by examining the atom mapping for the edge in the graph. The Schrödinger Support or an Application Scientist should be contacted to investigate such a problem.

Warning: If none of the above reveals the source of error, do not proceed with the prospective use of FEP+ on active drug discovery projects.

Scoring Errors and Examples

To indicate how to approach FEP+ scoring errors and how such errors may need to be addressed, several cases in which scoring errors have been identified and addressed by Schrödinger are listed below. Some issues can be analyzed and addressed by users; others may require improvements in the software. We strongly encourage you to contact Schrödinger Support or an Application Scientist if you encounter scoring errors.

The first two examples involve the OPLS2 predecessor to the OPLS3 force field and how it was improved in OPLS3:

  1. A set of FEP+ scoring failures using the OPLS2 force field were strongly suspected to be due to incorrect assignment of partial charges, and inadequate description of the electrostatic potential around aromatic nitrogen atoms and aryl halides. This led to the introduction of off-center partial charges in the OPLS3 force field, which resolved a number of these issues. Solvation free energy, heat of vaporization, transfer free energy, and other condensed phase reference data were used to parametrize the off-center charges. The FEP+ scoring failures were then used as true test set data to confirm these potential energy function improvements were leading to transferable potentials likely to avoid similar problems in the future.

    • An illustrative example of this class of failure was encountered during a nitrogen walk around a bicyclic core. This nitrogen walk led to the scoring of novel heterocycles, which had not been considered in the development of the OPLS force field. Charges assigned to these unparameterized heterocycles were found to deviate strongly from those obtained from QM ESP fit charges, and the resulting FEP+ scoring was in significant disagreement with the experimental results. Introduction of a more sophisticated atom typing and partial charge assignment scheme was found to resolve this problem.

  2. A set of FEP+ scoring failures using the OPLS2 force field were strongly suspected to be due to inadequate protein force field parameters. This led to a major R&D project to significantly improve the quality of the OPLS protein force field through more extensive parametrization of the potential to reproduce quantum mechanical, peptide folding thermodynamics, and protein structure reference data. The FEP+ scoring failures suspected to be due to protein force field problems were then used as true test set data to confirm that these potential energy function improvements were leading to transferable potentials likely to avoid similar problems in the future. Examples of such cases are extensively discussed in the OPLS3 reporting manuscript.

Some other examples:

  1. A set of FEP+ scoring failures were found to contain sequential torsions in the bad-actor small molecule R-groups where quantum mechanical coupling between the sequential torsions makes the intra-ligand potential energy surface difficult to describe with the OPLS3 Fourier series approach to describing intra-ligand torsional energies.

    • An illustrative example of this mode of failure was encountered during a perturbation of a hydrogen atom to a methyl where, experimentally, the introduction of the methyl group reduced the binding affinity of the molecule by 2 log orders, but FEP+ scoring indicated the methyl to be tolerated. The error was ultimately diagnosed to be due to torsion-torsion coupling within the molecule introduced by the methyl, such that the two torsions of the molecule did not satisfactorily fit with a simple Fourier series. A more sophisticated treatment of the coupled torsions that reproduced the QM potential energy surface for the coupled torsions was required, which led to an FEP+ prediction in good agreement with the experimental data.

  2. Errors due to inadequate sampling can often be identified from large calculation hysteresis and high Bennett error of drift in the free energy time series data. Occasionally, the system can be trapped in a spurious local minimum in the 5 ns production simulation across all perturbations within a calculation graph, which may lead to convergence problems not being identified by these various convergence measures. If convergence is suspected to be problematic, non-standard REST region assignments, expanded alchemical region assignments, increased number of lambda windows, and extended simulation times can be used to determine if this is the case.

    • An illustrative example of this mode of failure was identified when an erroneous FEP+ prediction for a molecule led to the acquisition of a crystal structure showing a surprising rearrangement of a residue immediately adjacent to the perturbed ligand moiety. Retrospective testing indicated that the residue, when included in the REST region, could reorganize on the time scale of the simulation to stabilize a new interaction pattern between the ligand and receptor. This residue was included in the REST region for all subsequent FEP+ calculations run for that discovery project.

  3. A nontrivial number of FEP+ scoring errors are ultimately traced to incorrectly synthesized compounds, assay variability, compounds reacting or decomposing on the time scale of the affinity measurement, the tautomer or ionization state of the small molecule being incorrectly assigned, the crystal construct not agreeing with the assay construct, or incorrect protein preparation. Such sources of calculation error are outside of the scope of the FEP+ prediction method and require input from the discovery team to sort out.

    • An illustrative example of this mode of failure involved the introduction of a nitrile moiety in the ligand. The introduction of the moiety was predicted to maintain affinity, however the experimental testing indicated that the molecule had no measurable binding affinity. The 1D NMR spectra and mass spec analysis indicated that the correct compound was synthesized. After some discussion with the project team, a 2D NMR spectra was obtained, and the nitrile was found to have been attached in an unexpected location that could not be determined from the earlier experimental data.

For more information on troubleshooting, see Troubleshooting Common Issues and Troubleshooting FEP+ Technical Failures.

For questions and support, please contact the following team leads, or an Application Scientist:

  • Schrödinger FEP+ Development Team Lead: Lingle.Wang@schrodinger.com

  • Schrödinger Force Field Development Group Lead: Ed.Harder@schrodinger.com

Related Topics