Field-Based QSAR Panel

In this panel you can set up a field-based QSAR model (CoMFA/CoMSIA) from a set of aligned ligands, and use the model to predict activities for other molecules. You can also visualize the QSAR model in the Workspace and create a scatter plot of the activities.

For background information, see Field-Based QSAR Background.

To open this panel: click the Tasks button and browse to Lead Optimization → 3D Field-Based.

Using
Features
Additional Resources

Using the Field-Based QSAR Panel

To generate a QSAR model, you must first ensure that the ligands are prepared and aligned. See Preparing the Ligands for Field-Based QSAR for more information.

The basic steps in building and using model are:

Add ligands for training and testing the model.
Partition the ligands into training and test sets, either on the basis of a property, or at random, or by manual assignment.
Build and test the model.
Examine the model.
- Examine the QSAR statistics, which are described in Table 1.
- Visualize the QSAR model in the Workspace.You can view a representation of the fields as contours (surfaces), or as color intensities of the fields on the grid.
Apply the model to other ligands.

Toolbar

The toolbar has two buttons, for controlling what is displayed in the Workspace for the model. The choice of fields and parameters can be made in the Field-Based QSAR Visualization Settings Panel, which you open by clicking QSAR Visualization. The buttons are not available until a model has been built.

	View Contours Display surfaces in the Workspace for each field that are contoured at a particular value ("isovalue") of the field. By default, the first field in the list in the Field-Based QSAR Visualization Settings Panel is displayed.
	View Intensities View the field intensities at each grid point as colored spheres. By default, the first field in the list in the Field-Based QSAR Visualization Settings Panel is displayed.

Add ligands buttons

These buttons allow you to add ligands to the ligands table. You can use these buttons more than once to add multiple sets of ligands. The ligands you add are always appended to the Ligands table: there is no replacement of ligands, and no checking for duplicates is done.

From Project—Opens the Add From Project Dialog Box, in which you can choose a set of entries, select an activity property, and assign training and test set memberships based on a property value.
From File—Opens a file selector, in which you can navigate to and select the file. When you click OK, the Choose Activity Property Dialog Box opens, in which you can select an activity property, converting it into the appropriate units if need be, and select a QSAR set property, to define the members of the training set and the test set by a value of this property.

The ligands you add must be fully prepared 3D structures that are properly aligned. No facility is provided in this panel for preparing the structures or aligning the ligands.

If you want to assign ligands to test and training sets based on a property value, you will have to create an appropriate property beforehand.

Delete and Delete All buttons

These buttons allow you to delete ligands from the ligands table.

Delete—delete the selected ligands from the table. This allows you to change a model by removing ligands or replacing ligands, for example, or to delete duplicate ligands.
Delete All—remove all ligands from the table. This is useful if you want to create a model with a different set of ligands.

Ligands table

This table contains the list of ligands. When the ligands are first read, all ligands are included in the training set, and the # Factors, Predicted Activity and Prediction Errorcolumns are empty. These columns are added after the QSAR model is built. The table columns are described below.

Most of the columns of this table are noneditable. You can change the activity values, select the training and test sets, and display the ligands in the Workspace. You can sort the table by the values in a column, by clicking the column heading. Use shift-click and control-click to select multiple rows.

In	Inclusion status of the ligand. The diamond has a cross in it if the ligand is included in the Workspace, and is empty if the ligand is excluded. You can include and exclude ligands with click, shift-click and control-click.
Ligand Name	The name of the ligand.
QSAR Set	Indicates whether a ligand is in the training set, the test set, or neither (the ligand is ignored). The column is blank if the ligand is ignored. Click the column repeatedly to cycle the ligand through the three possible states. Control-click to cycle the selected ligands through the three states. The state for the selected ligands is set to the state for the row that is clicked.
Activity	The ligand's activity. You can alter the activity values by directly editing the table cells.
# Factors	Number of factors in the partial least squares regression model.
Predicted Activity	Activity predicted by the QSAR model. The number of rows in each cell is equal to the maximum number of PLS factors specified in the Build QSAR Model - Options Dialog Box. Each row contains the prediction from a model containing the number of PLS factors indicated in the # Factors column.
Prediction Error	Error in the activity predicted by the QSAR model.
% Extrapolated	Percentage of field values for the ligand that lie outside the range found in the training set.

Random Training Set Controls

These controls allow you to randomly select the training set from the current training and test sets. The remaining ligands are assigned to the test set. The assignment overwrites the previous assignment of training and test sets. Ligands that are not assigned to either of these sets are not included in the pool for selection. Thus, the result is a reassignment of the training and test sets without adding or removing any ligands.

Random training set text box: Specify the percentage of ligands to include in the training set by random selection from the current training and test set ligands.
Apply button: Click to apply a random selection of the training set from the current training and test sets. The ligands that are not selected from this pool are assigned to the test set.
Random seed text box: Enter the seed for the random selection of the training set in this text box. A zero value means that a different seed will be selected each time, and hence a different training set. A nonzero value means that the same seed is used each time, which produces the same training set.

Model buttons

These buttons allow you to perform different actions on the QSAR model.

Build: Build the model. Opens the Build Field-Based Model Dialog Box, in which you can specify parameters for building the model, and then build the model
Import: Import an existing model. The model includes the ligands, the QSAR training and test set membership, and the regression information. Opens a file selector, in which you can navigate to and select the desired .qsar file.
Test: Generate predicted activities for the test set, after building the model. If you have ligands that you did not include in the test set, you can include them and click Test to recalculate the predicted activity and update the QSAR statistics for the test set.

QSAR statistics table

The QSAR Results table shows the statistics of the fit for the training set and the test set. Each row presents the results for a hypothesis. Within each row are lines for regression models with a particular number of partial least squares factors included. The columns are described in the table below.

The most important statistics are the test set statistics: RMSE, Q^2, and Pearson-r, which indicate how good the predictions are. If the predictions are not improving (much) as the number of PLS factors increases, the extra factors are not adding to the model and the model is probably over-fit. Of the training set statistics, the Stability is an indicator of the sensitivity of the model to omissions from the training set. When the R² value is larger than the stability value, this is an indication that the data set is over-fit.

Table 1. Description of the QSAR statistics table columns
Column	Description
# Factors	Number of factors in the partial least squares regression model.
SD	Standard deviation of the regression. This is the RMS error in the fitted activity values, distributed over n−m−1 degrees of freedom (n ligands, m PLS factors).
R^2	Value of R² for the regression (the coefficient of determination). A value of 0.80, for example, means that the model accounts for 80% of the variance in the observed activity data. R² is always between 0 and 1.
R^2 CV	Cross-validated R² value, computed from predictions obtained by a leave-N-out approach. The value of N is specified in the Build Field-Based Model Dialog Box.
R^2 Scramble	Average value of R² from a series of models built using scrambled activities. Measures the degree to which the molecular fields can fit random data. A low value means that the model cannot fit random data, but a high value merely means that the variable set is fairly complete and can fit anything.
Stability	Stability of the model predictions to changes in the training set composition. Maximum value is 1. A high value indicates a model that is not sensitive to omissions from the training set. A stability value that is lower than the R² value is an indication of over-fitting.
F	The ratio of the model variance to the observed activity variance. The model variance is distributed over m degrees of freedom and the activity variance is distributed over n−m−1 degrees of freedom (n ligands, m PLS factors). Large values of F indicate a more statistically significant regression.
P	The significance level of F when treated as a ratio of Chi-squared distributions. Smaller values indicate a greater degree of confidence. A P value of 0.05 means F is significant at the 95% level.
RMSE	Root-mean-square error in the test set predictions.
Q^2	Value of Q² for the predicted activities. Directly analogous to R-squared, but based on the test set predictions. Q² can take on negative values if the variance in the errors is larger than the variance in the observed activity values.
Pearson-r	Pearson r value for the correlation between the predicted and observed activity for the test set.

Field fractions table

Displays the fraction of each field in the QSAR model for each number of PLS factors used in the model. This information can give you a general idea of the overall relative impact of each field type on activity. For example, if steric and hydrophobic Gaussian field fractions are much larger than the other types (as is often the case), that suggests that most of the binding energy is coming from hydrophobic interactions.

Action buttons

The following buttons can be used to perform actions once a QSAR model is available.

Export: Export the QSAR model to files. The model data is written to the named file, with a .qsar extension. The ligands are written to a Maestro file, with the same base name and a _qsar_pred.mae extension. The QSAR Set property is included in the ligand file, so you have a record of which ligands were used for training and test sets.
QSAR Visualization: Opens the Field-Based QSAR Visualization Settings Panel, in which you can make settings for the visualization of the QSAR model in the Workspace.
Predict: Predict the activity for one or more molecules. These molecules must exist as entries in the Project Table. Opens an entry chooser, in which you can choose the entries. The predicted activities are added as properties to the project entries.