glide_active_learning.py screen Command Help

Command: $SCHRODINGER/run -FROM glide glide_active_learning.py screen

usage: $SCHRODINGER/run glide_active_learning.py screen [-h]
                                                        [-infile_list_file <infile_path_list>]
                                                        [-block_size <num_lig_per_block>]
                                                        [-smi_index <smiles_column_index>]
                                                        [-name_index <title_column_index>]
                                                        [-no_header]
                                                        [-result_prefix <output_file_prefix>]
                                                        [-remote_input_ligands]
                                                        [-restart_file <restart_pkl_file>]
                                                        [-avoid_splitting_csv_files]
                                                        [-infile INFILE]
                                                        [-grid <gridfile>]
                                                        [-extra_docking_inputs <glide_input_file_of_extra_inputs>]
                                                        [-known_docking_score_file <known_docking_score_file>]
                                                        [-jobname <jobname>]
                                                        [-stop_after <stop_workflow_after_stage>]
                                                        [-max_ml_eval_cpu <maximum_ml_evaluation_cpu>]
                                                        [-eval_with_mq]
                                                        [-max_glide_cpu <maximum_glide_cpu>]
                                                        [-glide_mq]
                                                        [-rescore_host <rescore_host>]
                                                        [-ncpu_rescore <ncpu_rescore>]
                                                        [-train_size <num_ligands_per_iteration>]
                                                        [-selection_rule <training_ligands_selection_protocol>]
                                                        [-dise_similarity_threshold <frac>]
                                                        [-num_iter <num_of_iteration>]
                                                        [-train_time <hours>]
                                                        [-train_host <ligand_ml_training_host>]
                                                        [-num_train_core <ligand_ml_training_core>]
                                                        [-chosen_models <ligand_ml_chosen_models>]
                                                        [-force_single_model]
                                                        [-pilot_score_file PILOT_SCORE_FILE]
                                                        [-undockable_ratio <undockable_ligand_ratio>]
                                                        [-allow_unlimited_block_size]
                                                        [-random_seed <random_seed_number>]
                                                        [-overwrite_args]
                                                        [-force_restart]
                                                        [-ligprep_args <ligprep_arguments>]
                                                        [-num_report_poses_rescore <num_best_poses_rescore>]
                                                        [-no_rescore_poses]
                                                        [-glide_subjob_size <ligs_per_glide_subjob> | -num_glide_subjobs <number_of_glide_subjobs>]
                                                        [-new_glide | -classic_glide]
                                                        [-glide_rescore_subjob_size <ligs_per_glide_rescore_subjob> | -num_glide_rescore_subjobs <number_of_glide_rescore_subjobs>]
                                                        [-keep <num_returned_ligand> | -keep_fraction <fraction_of_returned_ligand>]
                                                        [-num_rescore_ligand <num_rescore_ligand> | -rescore_ligand_fraction <fraction_of_rescore_ligand>]

options:
  -h, --help            show this help message and exit
  -random_seed <random_seed_number>
                        Random seed number for shuffling all the ligands
                        and seeding ligand_ml training.
  -overwrite_args       Overwrite previous arguments, Default is False.
  -force_restart        Force the workflow to restart when some restarting files are missing.
  -ligprep_args <ligprep_arguments>
                        Arguments for using ligprep to prepare the ligands. Default is -pht 1.0 -epik -s16. 
  -num_report_poses_rescore <num_best_poses_rescore>
                        Number of the best poses returned in 
                        .mae library pose file and virtual screening database 
                        (.vsdb) file for rescored ligands. 
                        Equals to 'NREPORT' keyword in Glide. 
                        Default is return all poses.
  -no_rescore_poses     Do not generate the .mae pose file or 
                        .vsdb file for the rescored ligands. 
                        Default generates the pose file and vsdb.
  -glide_subjob_size <ligs_per_glide_subjob>
                        Number of ligands in each Glide subjob. Default is allowing 
                        Glide to distribute the ligands automatically. 
  -num_glide_subjobs <number_of_glide_subjobs>
                        Number of subjobs for Glide job.
  -new_glide            Run new Glide backend in AL-Glide. The default follows feature flag NEW_GLIDE.
  -classic_glide        Run classic Glide backend in AL-Glide. The default follows feature flag NEW_GLIDE.
  -glide_rescore_subjob_size <ligs_per_glide_rescore_subjob>
                        Number of ligands in Glide subjob of rescore stage. 
                        Default is the same as the value of -glide_subjob_size.
  -num_glide_rescore_subjobs <number_of_glide_rescore_subjobs>
                        Number of subjobs for Glide job of rescore stage. 
                        Default is the same as the value of -num_glide_subjobs.
  -keep <num_returned_ligand>
                        Number of best ligands to be returned. 
                        Default is 10000000.
  -keep_fraction <fraction_of_returned_ligand>
                        Fraction of the ligands to be returned.
  -num_rescore_ligand <num_rescore_ligand>
                        Number of the best ligands to run rescore with Glide. 
                        Default is 1000000.
  -rescore_ligand_fraction <fraction_of_rescore_ligand>
                        Fraction of the best ligands to run rescore with GLIDE.

file options:
  -infile_list_file <infile_path_list>
                        A file that contains a list of infile paths.
  -block_size <num_lig_per_block>
                        Number of ligands in each sub input ligands file. Default is
                        15000 for AL-FEP+/AL-ABFEP and 100,000 for AL-Glide.
  -smi_index <smiles_column_index>
                        1-based column index of ligand's SMILES. 
                        Default is 1.
  -name_index <title_column_index>
                        1-based column index of ligand's title. 
                        Default is 2.
  -no_header            Whether the input file(s) has header in the first line.
  -result_prefix <output_file_prefix>
                        prefix of the .csv result files. Default is -jobname.
  -remote_input_ligands
                        Whether input ligand files are located at remote. Absolute 
                        paths of input ligand files are required if this flag is 
                        provided.
  -restart_file <restart_pkl_file>
                        .pkl file for restarting or continuing the active learning 
                        workflow.
  -avoid_splitting_csv_files
                        Enable this option to use pre-split ligand CSV files, avoiding redundancy.
                        
                        Requirements:
                        - Reorder columns: 'SMILES' first, 'Title' second.
                        - Each split CSV must have the same header: 'SMILES, Title'.
                        - Remove any additional columns.
                        - Optionally gzip compress each split CSV file.
                        - Archive all pre-split CSV files into one directory and zip compress it.
                        Pass the zipped directory to '-infile'.
                        
  -infile INFILE        .csv or .smi file(s) that contains all the ligand SMILES. 
                        Multiple .csv or .smi files can be included by specifying 
                        multiple -infile options.
  -grid <gridfile>      Glide grid file for docking.
  -extra_docking_inputs <glide_input_file_of_extra_inputs>
                        Glide input-like file that contains extra inputs for 
                        docking stages.
  -known_docking_score_file <known_docking_score_file>
                        CSV score file with known docking scores. 
                        The data from this file will be used to train the first round of ML 
                        models and reduce the number of glide calculations needed in the first 
                        round of active learning. The CSV file should have the following columns: 
                        SMILES, Title, docking_score.

job options:
  -jobname <jobname>    Job name of the active learning workflow run.
  -stop_after <stop_workflow_after_stage>
                        Terminate the workflow after the specified stage finished. 
                        Specify FinishAll (case-insensitive) to run all the remaining stages. 
                        For convenience, you can also specify 'iter_X' to stop after 
                        iteration X (e.g., 'iter_1', 'iter_2', etc.).
  -max_ml_eval_cpu <maximum_ml_evaluation_cpu>
                        Allowed maximum number of CPU for 
                        machine learning evaluation subjobs.
  -eval_with_mq         Enables ligand_ml evaluation using zeroMQ. 
                        Specify the number of evaluation jobs with -max_ml_eval_cpu. 
                        If running a screen or pilot, the -chosen_models flag will 
                        automatically set to TorchGraphConv models.
  -max_glide_cpu <maximum_glide_cpu>
                        Allowed maximum number of CPU for Glide subjobs.
  -glide_mq             Turn on Glide execution with ZMQ. 
                        Use -num_glide_subjobs to control the number of subjobs.
  -rescore_host <rescore_host>
                        Rescoring host name. 
                        When using this option, the ncpu_rescore option must also be set. 
                        Default is the same as HOST
  -ncpu_rescore <ncpu_rescore>
                        Equivalent of -NJOBS for Glide. 
                        Specifies number of CPUs provided to the rescore host. 
                        Must be used with rescore_host.

screen options:
  -train_size <num_ligands_per_iteration>
                        Number of training ligands for each active learning round. 
                        A minimal number of 40 is required. 
                        User can also specify the selection rule for each iteration by providing 
                        a list of selection rules separated by space such as '500 200 200'. 
                        Default value is 50000.
  -selection_rule <training_ligands_selection_protocol>
                        Protocol for selecting training set ligands. Supported selection protocols:
                        random  most_uncertain  greedy  diversity  dise  distinct_scaffolds 
                        User can also specify the selection protocol for each iteration by providing 
                        a list of selection protocols separated by space. For example, 'diversity greedy greedy' 
                        Note that dise has only been tested for AL-FEP and AL-ABFEP. 
                        Default value is diversity.
  -dise_similarity_threshold <frac>
                        Threshold similarity score (between 0 and 1) for DISE selection 
                        of training data. Default is 0.5. Lower values increase 
                        diversity within training data 
  -num_iter <num_of_iteration>
                        Number of active learning iteration. Default is 3.
  -train_time <hours>   Floating point time limit in hours for training deep learning models. 
                        Default value is 8.0.
  -train_host <ligand_ml_training_host>
                        ligand_ml training host name. Default is the same as -HOST.
  -num_train_core <ligand_ml_training_core>
                        Number of cpu or gpu for ligand_ml training. 
                        Default is 1.
  -chosen_models <ligand_ml_chosen_models>
                        Type of model architecture(s) to consider in ligand_ml. 
                        chosen_models should be contained in a string and separated by space. 
                        Default is all available models in ligand_ml. 
  -force_single_model   Force the screen mode to train with only 
                        one model (graph convolutional network). Cannot be combined 
                        with -chosen_models. Ideal for training 1 million+ compounds.
  -pilot_score_file PILOT_SCORE_FILE
                        This argument has been deprecated. 
                        Use -known_docking_score_file option to use previously calculated scores for initial training.
  -undockable_ratio <undockable_ligand_ratio>
                        undockable/dockable ligands ratio in training set when 
                        constraints are applied in docking. 
                        Default is 1.0.
  -allow_unlimited_block_size
                        Allow block size to be unlimited. Use this option along with the 
                        -block_size option to set block size values larger than 300,000