glide_active_learning.py screen Command Help
Command: $SCHRODINGER/run -FROM glide glide_active_learning.py screen
usage: $SCHRODINGER/run glide_active_learning.py screen [-h]
[-infile_list_file <infile_path_list>]
[-block_size <num_lig_per_block>]
[-smi_index <smiles_column_index>]
[-name_index <title_column_index>]
[-no_header]
[-result_prefix <output_file_prefix>]
[-remote_input_ligands]
[-restart_file <restart_pkl_file>]
[-avoid_splitting_csv_files]
[-infile INFILE]
[-grid <gridfile>]
[-extra_docking_inputs <glide_input_file_of_extra_inputs>]
[-known_docking_score_file <known_docking_score_file>]
[-jobname <jobname>]
[-stop_after <stop_workflow_after_stage>]
[-max_ml_eval_cpu <maximum_ml_evaluation_cpu>]
[-eval_with_mq]
[-max_glide_cpu <maximum_glide_cpu>]
[-glide_mq]
[-rescore_host <rescore_host>]
[-ncpu_rescore <ncpu_rescore>]
[-train_size <num_ligands_per_iteration>]
[-selection_rule <training_ligands_selection_protocol>]
[-dise_similarity_threshold <frac>]
[-num_iter <num_of_iteration>]
[-train_time <hours>]
[-train_host <ligand_ml_training_host>]
[-num_train_core <ligand_ml_training_core>]
[-chosen_models <ligand_ml_chosen_models>]
[-force_single_model]
[-pilot_score_file PILOT_SCORE_FILE]
[-undockable_ratio <undockable_ligand_ratio>]
[-allow_unlimited_block_size]
[-random_seed <random_seed_number>]
[-overwrite_args]
[-force_restart]
[-ligprep_args <ligprep_arguments>]
[-num_report_poses_rescore <num_best_poses_rescore>]
[-no_rescore_poses]
[-glide_subjob_size <ligs_per_glide_subjob> | -num_glide_subjobs <number_of_glide_subjobs>]
[-new_glide | -classic_glide]
[-glide_rescore_subjob_size <ligs_per_glide_rescore_subjob> | -num_glide_rescore_subjobs <number_of_glide_rescore_subjobs>]
[-keep <num_returned_ligand> | -keep_fraction <fraction_of_returned_ligand>]
[-num_rescore_ligand <num_rescore_ligand> | -rescore_ligand_fraction <fraction_of_rescore_ligand>]
options:
-h, --help show this help message and exit
-random_seed <random_seed_number>
Random seed number for shuffling all the ligands
and seeding ligand_ml training.
-overwrite_args Overwrite previous arguments, Default is False.
-force_restart Force the workflow to restart when some restarting files are missing.
-ligprep_args <ligprep_arguments>
Arguments for using ligprep to prepare the ligands. Default is -pht 1.0 -epik -s16.
-num_report_poses_rescore <num_best_poses_rescore>
Number of the best poses returned in
.mae library pose file and virtual screening database
(.vsdb) file for rescored ligands.
Equals to 'NREPORT' keyword in Glide.
Default is return all poses.
-no_rescore_poses Do not generate the .mae pose file or
.vsdb file for the rescored ligands.
Default generates the pose file and vsdb.
-glide_subjob_size <ligs_per_glide_subjob>
Number of ligands in each Glide subjob. Default is allowing
Glide to distribute the ligands automatically.
-num_glide_subjobs <number_of_glide_subjobs>
Number of subjobs for Glide job.
-new_glide Run new Glide backend in AL-Glide. The default follows feature flag NEW_GLIDE.
-classic_glide Run classic Glide backend in AL-Glide. The default follows feature flag NEW_GLIDE.
-glide_rescore_subjob_size <ligs_per_glide_rescore_subjob>
Number of ligands in Glide subjob of rescore stage.
Default is the same as the value of -glide_subjob_size.
-num_glide_rescore_subjobs <number_of_glide_rescore_subjobs>
Number of subjobs for Glide job of rescore stage.
Default is the same as the value of -num_glide_subjobs.
-keep <num_returned_ligand>
Number of best ligands to be returned.
Default is 10000000.
-keep_fraction <fraction_of_returned_ligand>
Fraction of the ligands to be returned.
-num_rescore_ligand <num_rescore_ligand>
Number of the best ligands to run rescore with Glide.
Default is 1000000.
-rescore_ligand_fraction <fraction_of_rescore_ligand>
Fraction of the best ligands to run rescore with GLIDE.
file options:
-infile_list_file <infile_path_list>
A file that contains a list of infile paths.
-block_size <num_lig_per_block>
Number of ligands in each sub input ligands file. Default is
15000 for AL-FEP+/AL-ABFEP and 100,000 for AL-Glide.
-smi_index <smiles_column_index>
1-based column index of ligand's SMILES.
Default is 1.
-name_index <title_column_index>
1-based column index of ligand's title.
Default is 2.
-no_header Whether the input file(s) has header in the first line.
-result_prefix <output_file_prefix>
prefix of the .csv result files. Default is -jobname.
-remote_input_ligands
Whether input ligand files are located at remote. Absolute
paths of input ligand files are required if this flag is
provided.
-restart_file <restart_pkl_file>
.pkl file for restarting or continuing the active learning
workflow.
-avoid_splitting_csv_files
Enable this option to use pre-split ligand CSV files, avoiding redundancy.
Requirements:
- Reorder columns: 'SMILES' first, 'Title' second.
- Each split CSV must have the same header: 'SMILES, Title'.
- Remove any additional columns.
- Optionally gzip compress each split CSV file.
- Archive all pre-split CSV files into one directory and zip compress it.
Pass the zipped directory to '-infile'.
-infile INFILE .csv or .smi file(s) that contains all the ligand SMILES.
Multiple .csv or .smi files can be included by specifying
multiple -infile options.
-grid <gridfile> Glide grid file for docking.
-extra_docking_inputs <glide_input_file_of_extra_inputs>
Glide input-like file that contains extra inputs for
docking stages.
-known_docking_score_file <known_docking_score_file>
CSV score file with known docking scores.
The data from this file will be used to train the first round of ML
models and reduce the number of glide calculations needed in the first
round of active learning. The CSV file should have the following columns:
SMILES, Title, docking_score.
job options:
-jobname <jobname> Job name of the active learning workflow run.
-stop_after <stop_workflow_after_stage>
Terminate the workflow after the specified stage finished.
Specify FinishAll (case-insensitive) to run all the remaining stages.
For convenience, you can also specify 'iter_X' to stop after
iteration X (e.g., 'iter_1', 'iter_2', etc.).
-max_ml_eval_cpu <maximum_ml_evaluation_cpu>
Allowed maximum number of CPU for
machine learning evaluation subjobs.
-eval_with_mq Enables ligand_ml evaluation using zeroMQ.
Specify the number of evaluation jobs with -max_ml_eval_cpu.
If running a screen or pilot, the -chosen_models flag will
automatically set to TorchGraphConv models.
-max_glide_cpu <maximum_glide_cpu>
Allowed maximum number of CPU for Glide subjobs.
-glide_mq Turn on Glide execution with ZMQ.
Use -num_glide_subjobs to control the number of subjobs.
-rescore_host <rescore_host>
Rescoring host name.
When using this option, the ncpu_rescore option must also be set.
Default is the same as HOST
-ncpu_rescore <ncpu_rescore>
Equivalent of -NJOBS for Glide.
Specifies number of CPUs provided to the rescore host.
Must be used with rescore_host.
screen options:
-train_size <num_ligands_per_iteration>
Number of training ligands for each active learning round.
A minimal number of 40 is required.
User can also specify the selection rule for each iteration by providing
a list of selection rules separated by space such as '500 200 200'.
Default value is 50000.
-selection_rule <training_ligands_selection_protocol>
Protocol for selecting training set ligands. Supported selection protocols:
random most_uncertain greedy diversity dise distinct_scaffolds
User can also specify the selection protocol for each iteration by providing
a list of selection protocols separated by space. For example, 'diversity greedy greedy'
Note that dise has only been tested for AL-FEP and AL-ABFEP.
Default value is diversity.
-dise_similarity_threshold <frac>
Threshold similarity score (between 0 and 1) for DISE selection
of training data. Default is 0.5. Lower values increase
diversity within training data
-num_iter <num_of_iteration>
Number of active learning iteration. Default is 3.
-train_time <hours> Floating point time limit in hours for training deep learning models.
Default value is 8.0.
-train_host <ligand_ml_training_host>
ligand_ml training host name. Default is the same as -HOST.
-num_train_core <ligand_ml_training_core>
Number of cpu or gpu for ligand_ml training.
Default is 1.
-chosen_models <ligand_ml_chosen_models>
Type of model architecture(s) to consider in ligand_ml.
chosen_models should be contained in a string and separated by space.
Default is all available models in ligand_ml.
-force_single_model Force the screen mode to train with only
one model (graph convolutional network). Cannot be combined
with -chosen_models. Ideal for training 1 million+ compounds.
-pilot_score_file PILOT_SCORE_FILE
This argument has been deprecated.
Use -known_docking_score_file option to use previously calculated scores for initial training.
-undockable_ratio <undockable_ligand_ratio>
undockable/dockable ligands ratio in training set when
constraints are applied in docking.
Default is 1.0.
-allow_unlimited_block_size
Allow block size to be unlimited. Use this option along with the
-block_size option to set block size values larger than 300,000