FEP Cycle Closure Method and Error Estimates
Robust error estimates from FEP calculations are critical for the successful application of free energy calculations in prospective drug discovery projects. In FEP+, we use the cycle closure method to estimate the errors. For details about cycle closure error estimate methods, see reference 7.
Let us start with the simplest case, a set of three ligands, L1 , L2 , and L3. The experimental binding free energy of ligand i is Fiexp. The relative binding free energy of ligand i and ligand j is Fijexp = Fiexp - Fjexp. The free energy is a thermodynamic property, therefore for our three ligands, F13exp+ F32exp+F21exp = 0.
We now perform the FEP simulations L1→ L2, L2 → L3, and L3 → L1 , and calculate the BAR free energy differences, FijBAR. If the simulations are perfectly converged and the force field is perfect, we would expect FijBAR = Fijexpand F13BAR+ F32BAR+F21BAR = 0. In practice, however, there are errors associated with the calculated free energies, so that in general, the BAR free energy differences are not equal to the experimental free energy differences, and F13BAR+ F32BAR+F21BAR = Δ ≠ 0. We call Δ the hysteresis of the free energy calculation associated with this closed thermodynamic cycle.
The errors in FEP calculations compared to experimental values can be separated into two categories: the systematic error coming from the difference between the force field used in the simulations and the real interactions, and the unbiased error coming from the unconverged calculation, either due to incomplete sampling of the phase space (the protein or the ligand or both are trapped in a local minimal in the conformation space) , or from the free energy estimator itself. We define Fij as the theoretical free energy difference between two thermodynamic states for the underlying force field from an infinitely long unbiased simulation and unbiased free energy estimator. If there is no systematic error in the force field, then Fij =Fijexp . In practical FEP calculations, the simulation is run with a finite amount of time and the sampling may have some bias, thus the calculated free energy FijBAR might deviate from its theoretical value Fij, and the deviation depends on the initial configuration of the simulation. Suppose we repeat the same FEP calculation an infinite number of times starting from different initial configurations and different random seeds for the velocities, then the calculated free energies have a distribution. We assume that the calculated free energies are Gaussian distributed with average Fij (no systematic bias) and standard deviation σij . Then the probability density that one single FEP calculation gives a value of FijBAR for this mutation is:
For a given set of theoretical free energy differences F13, F32, and F21, the overall likelihood, L, that the three FEP simulations give values of F13BAR, F32BAR, and F21BAR is:
According to the maximum likelihood method, the most likely values of F13, F32, and F21 are the set of values that maximize the above likelihood. Taking the log of the above expression, it is clear that the set of values that maximizes the likelihood is the set of values that minimize the following function:
Using Lagrange multipliers, it can be shown that the set of values that maximizes the likelihood is:
and similarly for the other two mutations. It is also straightforward to prove that the above estimators have no systematic bias, and there will be no discrepancy between the free energy predictions from different paths. We can interpret the above estimators as the weighted average from the two paths. For example, the free energy difference between ligand L1 and L2 , F21 , can be estimated from F21BAR , or from −(F13BAR+ F32BAR), and the best estimator is a weighted average of the two predictions. The smaller the standard deviation for the mutation is, the larger the weight it has in the best estimator, and vice versa.
In addition, according to the above model, the hysteresis of the cycle closure, Δ , is also Gaussian distributed with an average of 0 and a standard deviation
If the sum of the three calculated free energies deviates by more than 2s from 0, it is almost certain (P = 0.95) that the calculations are not converged and the results are not reliable; if the sum of the three calculated free energies deviates by more than s from 0, it is highly probable (P = 0.68) that the calculation is not converged and the results may not be reliable. Thus the above model provides a simple way to determine whether the predictions are reliable.
In practical FEP calculations, due to the high computational expense, a given calculation is run only once, and thus one cannot estimate the standard deviation for each prediction. In this case, the rule of thumb assumption is that the standard deviation for each calculation is the same, for example 0.8 kcal/mol for each mutation. Then if the discrepancy of the free energy results from two paths is larger than 1.4 kcal/mol, it indicates the calculation is not converged and probably not reliable. Under this assumption, the best free energy estimator has the following simple expression:
and similarly for the other two mutations. This free-energy estimate is called ddG_cycle_closure.
In addition, under the assumption that the standard deviation for each calculation is the same, the hysteresis of the cycle closure itself provides an estimate of the error associated with each free energy prediction. As discussed above, the hysteresis Δ of the cycle closure is Gaussian distributed with average of 0 and standard deviation of s = √3σ . So the probability that a given calculation generates a hysteresis with value Δ for the cycle closure is:
The value of sigma that maximizes the above probability gives the maximum likelihood estimate of the standard deviation associated with each free energy calculation:
This value is called Error_cycle_closure. Note that in the above derivation, the estimated free energies and the associated errors are the optimal estimates based on all the information gained from FEP calculations. If the underlying force field in the calculations has systematic bias, the perfectly converged FEP calculation results may still deviate from the experimentally measured values, which is something that cannot be corrected based on the above analysis. Note also that the experimental values have errors as well, and will not perfectly fulfill the cycle closure condition.
The above model can be easily generalized to more complicated cases with more members in the cycle closure and several cycles sharing the same edge.