Preparing Inputs for Active Learning Glide
A single Active Learning Glide calculation amounts to a screen of a single ligand library (LIGANDFILE) into a single receptor (GRIDFILE). These two files must be prepared and available prior to running an Active Learning Glide calculation.
GRIDFILE
The Active Learning Glide workflow requires a previously generated Glide grid. The receptor should be prepared using the Protein Preparation Workflow Panel prior to generating a Glide grid.
Note:
The Grid-based constraints (Metal Constraints, Metal Coordination Constraints) feature is unsupported by Active Learning Glide at this time.
LIGANDFILE
Ligands are prepared on-the-fly using LigPrep. 3D preparation is performed only for those ligands that are docked for the purpose of training the underlying machine learning model or top scoring ligands that are rescored in the final step of the workflow. The ligand file should be provided using a 2D representation of the ligand. Both SMILES (.smi) and SMILESCSV (.csv) formats are supported.
SMILES Format (.smi)
Consistent with the SMILES format used throughout the Schrödinger Core Suite, a ligand file provided to Active Learning Glide in the SMILES file format should adhere to the following convention- the SMILES representation of the ligand structure should be in in the first column and the title should be in the second column.
-------- beginning of example ligand file (SMILES format) ---------------------
C1CCCCC1COc2nc(nc(c23)[nH]cn3)Nc4ccc(cc4)S(=O)(=O)N example-ligand-1
C1CCCCC1COc2nc(nc(c23)[nH]cn3)Nc4cc(S(=O)(=O)N)ccc4 example-ligand-2
Note:
SMILES files compressed as .bzip2 files can also be used as inputs.
SMILESCSV Format (.csv)
A comma-delimited CSV file. By default, Active Learning Glide expects a SMILESCSV formatted ligand file to have the title of each compound in the first column and the 2D, SMILES representation of the ligand in the second column. This can be overridden by the command-line arguments -smi_index and and -name_index. For example, -smi_index 1 would inform Active Learning Glide that the SMILES structure of each ligand would be in the first column.
-------- beginning of example ligand file (SMILESCSV format) ---------------------
SMILES,Title
C1CCCCC1COc2nc(nc(c23)[nH]cn3)Nc4ccc(cc4)S(=O)(=O)N, example-ligand-1
C1CCCCC1COc2nc(nc(c23)[nH]cn3)Nc4cc(S(=O)(=O)N)ccc4, example-ligand-2
Optional Inputs:
Configuration input file
Active Learning Glide settings can be specified on the command line or in an input file, similar to Glide.
Here is an example of a valid Active Learning Glide input file:
TASK pilot
GRID gridgen.zip
INFILE library.csv
TRAIN_TIME 12
PILOT_SIZE 100
JOBNAME JOBNAME
SMI_INDEX 2
NAME_INDEX 1
TRAIN_SIZE 50000
NUM_TRAIN_CORE 1
Note:
PILOT_SIZE is only required for pilot runs, not in production runs.