Investigating Large Outliers in FEP Simulations

FEP simulations, if performed properly, usually give reasonably accurate free energy predictions. If the FEP predicted binding free energies deviate from the experimental results by a large amount, please check whether the following situations have occurred.

  • Crystal structure and binding affinity assay correspond to two different proteins.

    It frequently happens that the binding affinity assay was performed in one isoform and the crystal structure was in another isoform.

  • Experimental binding assay was performed in different conditions from the FEP simulation.

    Sometimes the binding affinity assay was performed for a dimer but the FEP simulation was run for a monomer.

  • Other protein partners important for the binding were not modeled in the FEP simulation.

    Sometimes the binding affinity assay was performed when the receptor is in an active/inactive state in close contact with another protein, and the other protein partners alter the receptor conformation or interact with the ligands, but the FEP simulations did not model the protein partner.

  • Uncertainty in protonation and tautomeric states for the ligands.

    Sometimes the ligands have titratable groups with pKa values close to 7, in which case both the deprotonated and protonated forms of the ligands can contribute to the binding free energy. In other cases, the ligands may have multiple tautomeric states, or the protonation state of the titratable side chains in the protein binding pocket may change with ligand perturbations. For these cases, run FEP+ with all the possible protonation and tautomeric states, and perform the pKa and tautomeric state correction using the fep_groups.py script as described in Ref. 9. Please note that Jaguar pKa calculations are required to calculate the pKa values for these compounds in solvent.

  • Water molecules in a deeply buried binding pocket are not properly sampled.

    • Water molecules enter or exit the deeply buried binding pocket slowly and this can become the rate limiting step for converging FEP simulations that do not use GCMC. GCMC automatically adjusts the number of water molecules around the ligand throughout the simulation and should produce results that are independent of the initial water structure. The GCMC sampling of water is on by default in FEP, which alleviates many prior sampling issues related to water.
    • If you suspect that water sampling is still an issue even with GCMC, try starting FEP jobs with and without crystal waters to make sure.
    • Inconsistent hydration of the same ligand across different edges is a source of hysteresis. If you find this is the case with GCMC, try extending those edges with longer FEP simulations.
  • Protein residue conformation change is not properly sampled.

    • If you know from available information that some particular protein residue should stay in different conformations for different ligands, but FEP did not sample the different residue conformations, try including that particular protein residue in the REST region.
    • The REST region can be adjusted through the ASL in the solute_tempering section of the .msj file. For example, the following ASL includes the residue with ptype LIG in the REST region.
    • solute_tempering = {
           atom = "asl: res.ptype LIG "
           temperature = {
              exchange_probability = 0.3
              generator = PvdS
           }