Epik Evaluation
A ligand’s ionizing sites are identified using a rule-based approach and are used to enumerate the protonation states across a range of charge levels. Each ionizing site of each protonation state has its pKa predicted with the machine learning (ML) algorithm by generating and evaluating against a three-layer GCNN which extends the chemical neighborhood out to six bonds away from the ionizing site. The ML model is a five-fold cross validation ensemble of the top three models, which have been trained on almost 43,000 structure / pKa value pairs. On the resulting predicted pKas across all enumerated states we perform a self-consistent protonation state population calculation to obtain both the fractional population and state penalty for each protonation state with respect to the ensemble. Finally, we output the most populated states above a user-defined threshold.
Preparing Inputs for Epik