Field-Based QSAR Background

The field-based QSAR models are based on CoMFA [1] and CoMSIA [2, 3].

CoMFA field-based models are constructed by calculating the value of fields, such as the electrostatic field, on a rectangular grid that encompasses the molecules in the training set. Grid points that are closer than 2 Å to any atom in the training set are excluded. The grid locations are the independent variables that are used in a partial-least-squares (PLS) fitting procedure to produce a relationship between the values of the fields and the activity of the training set molecules.

CoMSIA fields are also evaluated at points on a rectangular grid. The fields are calculated by summing the values of properties of a given atom, weighted by a Gaussian function of the distance between the grid point and the atom. The steric contribution is derived from the third power of the atomic radius; the electrostatic field from the partial atomic charges, and the hydrophobic field from estimated ALOGP values. Hydrogen-bond receptor and donor fields have a value of 1 at the projected point locations.

The field-based QSAR models are an implementation of the CoMFA and CoMSIA methods with a specific set of parameters. The Lennard-Jones steric potentials are taken from the OPLS_2005 force field, as are the atomic charges for the electrostatic fields (by default). Hydrophobic fields are based on the atom types and hydrophobic parameters from Ghose et al. [4]. Hydrogen-bond acceptor and donor fields are based on Phase pharmacophore feature definitions, with projected points, as are aromatic ring fields, with projected points 1.8 Å above and below the ring plane. As the models are not exactly the same as the standard CoMFA and CoMSIA models, different names have been used in Phase: Force Field for CoMFA-like models, and Gaussian for CoMSIA-like models.

Before performing the PLS regression, scaling is applied to each field, by evaluating the standard deviation in the fields at each point, then averaging the SD over all points. Each field is scaled by the ratio of the maximum of these averages over all fields to the average for the field.