DeepAutoQSAR Panel

Use deep learning methods on large structure sets to generate and apply a QSAR model.

To open this panel: click the Tasks button and browse to Discovery Informatics and QSAR → DeepAutoQSAR.

To write out the input file and a script for running the job from the command line, click the arrow next to the Settings button and choose Write. For information on command usage and options, see deepautoqsar Command Help.

Overview of DeepAutoQSAR

DeepAutoQSAR methods are designed to efficiently handle data sets of tens or hundreds of thousands of structures, where the traditional methods in the AutoQSAR are useful up to a few thousand structures in the training set. DeepAutoQSAR gives similar results to AutoQSAR for the same size data sets, but is superior for larger sizes. For more information on DeepAutoQSAR, see DeepAutoQSAR.

Warning on Molecular Standardization:

When running DeepAutoQSAR, molecular standardization is not applied by default. Molecular standardization may modify the representations of compounds in a number of ways, including:

  • Removal of explicit hydrogens

  • Removal of disconnected molecular components

  • Modification of charges

  • Modification of stereochemistry

Since default DeepAutoQSAR behavior will skip these steps, users must consider the implications of file conversions (e.g., MAE/MAEGZ/SDF ←→ CSV) when training and inferring models. For example, SMILES in CSV data may not contain explicit hydrogens, whereas a MAE(GZ) output from Maestro typically does. In these cases, it is possible for DeepAutoQSAR to return different model predictions for the same starting ligands because of differences in representation of molecules across these file types.

DeepAutoQSAR Panel Features

Choose task options

Choose the task to perform. The remaining controls depend on the choice of task.

Options section

In this section you set options for model building, when the task is set to Build model.

Model type options

Choose the type of model.

  • Classification—use for categorical models where the data takes discrete values.
  • Regresssion—use for regression models based on numeric values, whether discrete or continuous.
Use structures from option menu

Choose the structure source for building the model.

  • Project Table (n selected entries)—Use the entries that are currently selected in the Project Table or Entry List. The number of entries selected is shown on the menu item. An icon is displayed to the right which you can click to open the Project Table and select entries.
  • File—Use the specified file. When this option is selected, the File name text box and Browse button are displayed. The allowed file types are: Maestro and SD.
Open Project Table button

Open the Project Table panel, so you can select the entries for the structure source.

File name text box and Browse button

Enter the file name in this text box, or click Browse and navigate to the file. The name of the file you selected is displayed in the text box.

Prediction property option menu

Choose the property to be predicted by the QSAR model. The menu is populated with properties from the Project Table or the file, depending on the structure source.

Add Descriptors button

Add descriptors to the autogenerated set. Opens a dialog box in which you can choose Maestro properties to be used as descriptors. The structures used to build the model must contain these properties.

Training set option menu and text box.

Choose the method for splitting the structure set into training and test sets.

  • Random split—split the set randomly according to the percentage specified in the text box.

  • Scaffold split—split the data by scaffold. This split method divides molecules into training and test splits by chemical similarity. A structure-based distance matrix is calculated between molecules in the data set, and used to create clusters of similar molecules, from which the training and test sets are assembled. This is a more rigorous test of model generalization across chemical space than the other methods.

    This option can take large amounts of memory.

  • Custom split—Split the set by values of a selected property. The Split on property and Split threshold option menus are displayed, for choice of the relation to use for the split and the threshold value.

Split on property option menu

Split the structure set by the value of the selected property. The properties on the menu are loaded from the structure source.

Split threshold option menu and text box

Choose the relation for the property that defines the training set, and specify the threshold value for the split.

Log transform option

Use the logarithm of the property values in the regression. Only available when the model type is Regression.

Training time text box

Specify the maximum training time for the DeepAutoQSAR training, in hours. When this amount of time has elapsed, the training is completed for the current model, but no new models are trained after that. The elapsed time can be significantly longer than the limit specified here, if a model takes a long time to train. The minimum training done is two replicates for each model type.

Set sizes text

The sizes of the training and test sets are displayed here.

Model file text box and Browse button

Enter the model file name in this text box, or click Browse and navigate to the model file. The name of the file you selected is displayed in the text box. When the file is opened a summary is displayed in the Model Summary section.

Model Summary section

In this section, a summary of the statistics of the model is presented.

View Full Report button

View the file containing the full details on the qsar model, and a ROC plot for categorical models or scatter plot and regression line for regression models. Opens the DeepAutoQSAR Report Viewer panel.

Make Predictions section

Make predictions of the property for a set of structures.

Use structures from option menu

Choose the structure source for the predictions.

  • Project Table (n selected entries)—Use the entries that are currently selected in the Project Table or Entry List. The number of entries selected is shown on the menu item. An icon is displayed to the right which you can click to open the Project Table and select entries.
  • File—Use the specified file. When this option is selected, the File name text box and Browse button are displayed. The allowed file types are: Maestro.
Open Project Table button

Open the Project Table panel, so you can select the entries for the structure source.

File name text box and Browse button

Enter the file name in this text box, or click Browse and navigate to the file. The name of the file you selected is displayed in the text box. The allowed file types are: Maestro and SD.

Output property name text box

Enter a label for the predicted property. This label is included in the property name reported in Maestro, which is Predlabel for numeric values (regression), labelClass and labelProb for categorical values. The label must not contain white space; use an underscore instead, as underscores are replaced with spaces when the property name is used in Maestro.

Additional required X-values option menus

Select the properties that contain the additional X-values added to the descriptors for the model (via Add Descriptors). These additional properties must exist in the structures you are making predictions for. Each property is set using one of the option menus, which are labeled with the property names of the added descriptors in the model.

Job toolbar

Manage job submission and settings. See Job Toolbar for a description of this toolbar.

Status bar

The status bar displays information about the current job settings and status for the panel. The settings includes the job name, task name and task settings (if any), number of subjobs (if any) and the host name and job incorporation setting. The job status can include messages about job start, job completion and incorporation. It also displays a progress bar for the job.

Use the Reset button to reset the panel to its default settings and clear any data from the panel. You can also reset the panel from the Job toolbar.

The status bar also contains the Help button , which opens the help topic for the panel in your browser. If the panel is used by one or more tutorials, hovering over the Help button displays a button, which you can click to display a list of tutorials (or you can right-click the Help button instead). Choosing a tutorial opens the tutorial topic.