Build QSAR Model - Options Dialog Box

The Build QSAR Model - Options dialog box provides controls for the randomization of the training set and for the definition of the QSAR model.

Build QSAR Model - Options Dialog Box Features

The dialog box is divided into two sections: Training set and QSAR model.

Random seed text box

Non-negative integer that provides a seed for the randomization of the training set. If you specify a value of zero, the assignment is always random. If you specify a positive integer, this seed is used each time a randomization occurs, so the results are the same.

Keep actives and inactives in training set option

When selecting a random set of ligands for the training set, keep the ligands designated as actives and inactives for pharmacophore model development (the pharm set) in the training set, and select randomly from the rest of the ligands.

Sample uniformly over activity coordinate option

When selecting ligands for the training set, ensure that the distribution of activities in the training set is close to uniform. This is done by sorting the ligands into bins by activity, with the number of bins equal to the number of ligands needed for the training set. One ligand is then chosen at random from each bin.

Grid spacing text box

Length of the sides of the cubic volume elements, which are arranged in a 3D grid covering the space occupied by the ligands. Valid range is 0.5 Å to 2.0 Å.

Maximum PLS Factors text box

Maximum number of partial least squares factors in the regression model. You can choose any value, but you should not generally set this value to more than N/5, where N is the number of ligands in the training set, because of the risk of overfitting. To avoid overfitting, the standard deviation of regression should be larger than the experimental uncertainty in the activity values for the model that you use.

Eliminate variables with |t-value| < option and text box

Select this option to use a t-value filter to eliminate independent variables (i.e. bits) whose regression coefficients are overly sensitive to small changes in the training set composition, and enter the threshold for eliminating variables in the text box. The resulting models have fewer uninformative variables and tend to give better predictions on test set compounds.

Model type controls

Options for the type of model to use: atom-based or pharmacophore-based.

In the atom-based model, atoms are represented by spheres whose radius is the van der Waals radius of the atom. The atoms are classified into six types

  • D -- Hydrogen-bond donor
  • H -- Hydrophobic or nonpolar
  • N -- Negative ionic
  • P -- Positive ionic
  • W -- Electron-withrawing (includes hydrogen-bond acceptors)
  • X -- Miscellaneous (all other types)

In the pharmacophore-based model, the pharmacophore features are represented by spheres whose radii is given in the Tolerance column of the Feature radii table. The pharmacophore-based model ignores parts of the molecule that do not match the features of the hypothesis.