deepautoqsar Command Help

Command: $SCHRODINGER/run deepautoqsar
usage: deepautoqsar [-h] [-build] [-y <propname>] [-classification]
                    [-regression] [-split <fract>]
                    [-split_type <random OR scaffold>]
                    [-holdout <holdoutFile>] [-time <hours>] [-seed SEED]
                    [-descriptastorus DESCRIPTASTORUS]
                    [-include_geometric_models INCLUDE_GEOMETRIC_MODELS]
                    [-pretrained_fingerprints PRETRAINED_FINGERPRINTS]
                    [-optimize_hps OPTIMIZE_HPS]
                    [-exclude_training_data EXCLUDE_TRAINING_DATA] [-predict]
                    [-pred <predfile>] [-reinvent] [-prior_dir PRIOR_DIR]
                    [-property_space_axes PROPERTY_SPACE_AXES]
                    [-target_file TARGET_FILE] [-config_file CONFIG_FILE]
                    [-starting_ligands STARTING_LIGANDS]
                    [-rejected_structures REJECTED_STRUCTURES]
                    [-n_steps N_STEPS] [-n_compounds N_COMPOUNDS]
                    [-n_out_compounds N_OUT_COMPOUNDS]
                    [-min_max_reference_csv MIN_MAX_REFERENCE_CSV]
                    [-min_max_reference_target MIN_MAX_REFERENCE_TARGET]
                    [-reinvent_output_file REINVENT_OUTPUT_FILE]
                    [-reinvent_sample] [-denovo_dir DENOVO_DIR]
                    [-agent_name AGENT_NAME]
                    [-n_sample_compounds N_SAMPLE_COMPOUNDS]
                    [-reinvent_sample_output_file REINVENT_SAMPLE_OUTPUT_FILE]
                    [-log] [-prop <string>] [-smiles_col <string>]
                    [-skip_standardization SKIP_STANDARDIZATION] [-i <infile>]
                    [-o <outfile>] [-x <xfile>] [-report] [-JOB <jobname>]
                    [-HOST <host>[:<n>]] [-NUM_SUBJOBS <number>]
                    [-TMPDIR <dir>] [-WAIT] [-SUBJOB]
                    <model>.qzip

Runs DeepAutoQSAR under JobControl. DeepAutoQSAR is a wrapper which calls the 
ligand_ml backend.

This driver script is a simple wrapper for running DeepAutoQSAR under JobControl,
and performs three tasks:

    - build and train a DeepAutoQSAR model
    - predict on new data using an existing pre-trained model
    - generate a report of an existing pre-trained model

The build task will generate a <model>.qzip file, which is a required input for
the predict and report tasks. Accepted input file formats are MAE, SDF, and CSV.
However, note that only MAE and CSV file formats are accepted for prediction
outputs as of now.

Additional descriptors / features may be specified in a text file using the -x
flag, where each descriptor is specified on a new line.

The build task may be run in serial or parallel mode depending on the
-NUM_SUBJOBS argument to increase the number of ML models trained and evaluated
from the random hyper-parameter search space.

Usage Examples:

BUILD:

    MAE input:
    $SCHRODINGER/run deepautoqsar model.qzip -build -regression -y
    r_m_log-solubility -i example.mae.gz -smiles_col smiles -JOB BuildTask_1
    -HOST localhost:1 -NUM_SUBJOBS 1

    The target / y value must be a Maestro property, and must exist for each
    structure in the given input file.

    SDF input:
    $SCHRODINGER/run deepautoqsar model.qzip -build -regression -y
    r_sd_log-solubility -i example.sdf -smiles_col smiles

    Note that all properties referred in the SDF input must be in Maestro
    structure property format, as given here by 'r_sd_log-solubility'.

    CSV input:
    $SCHRODINGER/run deepautoqsar model.qzip -build -regression -y
    log-solubility -i example.csv -smiles_col smiles

    For CSV files, a header line must be included and the SMILES column must be
    denoted as 'SMILES' or must be set using the '-smiles_col' argument as shown
    above.

PREDICT:

    $SCHRODINGER/run deepautoqsar model.qzip -predict -i example.mae
    -pred predict_output.mae

REPORT:

    $SCHRODINGER/run deepautoqsar model.qzip -report

Copyright Schrodinger LLC, All Rights Reserved.

positional arguments:
  <model>.qzip          QSAR model archive. Will be created if building
                        models; must already exist if predicting or generating
                        a report.

options:
  -h, --help            show this help message and exit
  -i <infile>           Input file with structures and activities. Must be of
                        type MAE, CSV, or SDF. Note, if a CSV file is given,
                        it must include a header line, and the SMILES column
                        must be denoted as 'SMILES' or must be set using the
                        `-smiles_col` optional arg.
  -o <outfile>          Write output to indicated file. Allowed for use with
                        -report.
  -x <xfile>            File with the names of numeric properties from
                        <infile> that should be included in the pool of
                        independent variables.

build new model arguments:
  Options to build new ligand_ml models.

  -build                Build new model from the provided structures and
                        activity values.
  -y <propname>         Activity property name. May be an integer, real or
                        string. If an MAE or SDF is file given, the structure
                        must contain a response target labeled target_value_%,
                        where % is the index starting from 0, or the response
                        can be set using the `-response_col` optional arg.
  -classification       Build classification model from an integer, real or
                        string property.
  -regression           Build continuously-valued numeric model from an
                        integer or real activity property.
  -split <fract>        Fraction of compounds to assign to training set vs
                        holdout set when building models. Must be less than
                        1.0 (default: 0.75).
  -split_type <random OR scaffold>
                        Type of split to apply when generating the training
                        and holdout set. Must be either "random" or "scaffold"
                        (default: "random"
  -holdout <holdoutFile>
                        After building models, validate them by making
                        predictions on the structures in <holdoutFile>. This
                        file must contain an observed activity value for every
                        structure. Must be of type MAE, CSV, or SDF.
  -time <hours>         Floating point time limit for training deep learning
                        models.
  -seed SEED            Random seed to set to ensure identical training and
                        holdout data sets. This value will only be used if a
                        holdout file has not been given by the flag -holdout
  -descriptastorus DESCRIPTASTORUS
                        Enables the DescriptaStorus to be included as an
                        additional featurizer during training
  -include_geometric_models INCLUDE_GEOMETRIC_MODELS
                        Enables geometric (PyTorch) models to be included
                        during training
  -pretrained_fingerprints PRETRAINED_FINGERPRINTS
                        Use pretrained fingerprint featurizer
  -optimize_hps OPTIMIZE_HPS
                        Enable hyperparameter optimization using Bayesian
                        optimization
  -exclude_training_data EXCLUDE_TRAINING_DATA
                        Exclude training SDF file from packaged qzip

predict using existing models arguments:
  Options to predict using existing DeepAutoQSAR models.

  -predict              Evaluate one or more models in the provided QSAR
                        archive on the input structures.
  -pred <predfile>      Output file for predictions. Must be a Maestro file or
                        CSV for outputting predictions.

reinvent arguments:
  Options to generate compounds using existing DeepAutoQSAR models.

  -reinvent             Use one or more models in the provided QSAR archive to
                        generatestructures.
  -prior_dir PRIOR_DIR  Directory containing the trained prior model
  -property_space_axes PROPERTY_SPACE_AXES
                        Comma separated list of relevant property spaces
  -target_file TARGET_FILE
                        json file containing the targets to be optimized
  -config_file CONFIG_FILE
                        Configuration file for the reinvent run
  -starting_ligands STARTING_LIGANDS
                        Starting ligand for the reinvent run
  -rejected_structures REJECTED_STRUCTURES
                        Rejected structures for the reinvent run
  -n_steps N_STEPS      Number of steps for the reinvent run
  -n_compounds N_COMPOUNDS
                        Number of compounds for scorer calibration during the
                        reinvent run
  -n_out_compounds N_OUT_COMPOUNDS
                        Number of compounds to sample from the reinvent run
  -min_max_reference_csv MIN_MAX_REFERENCE_CSV
                        compounds with corresponding values for scorer, meant
                        for static buffer curation
  -min_max_reference_target MIN_MAX_REFERENCE_TARGET
                        min or max
  -reinvent_output_file REINVENT_OUTPUT_FILE
                        Reinvent output file

reinvent sample arguments:
  Options to generate compounds using existing DeepAutoQSAR models.

  -reinvent_sample      Use one or more models in the provided QSAR archive to
                        generatestructures.
  -denovo_dir DENOVO_DIR
                        Directory containing the trained agents
  -agent_name AGENT_NAME
                        Agent Name
  -n_sample_compounds N_SAMPLE_COMPOUNDS
                        Number of compounds to sample from the reinvent run
  -reinvent_sample_output_file REINVENT_SAMPLE_OUTPUT_FILE
                        Reinvent output file

common arguments:
  Common input data manipulation arguments.

  -log                  Add log transformer to models.
  -prop <string>        Incorporate <string> into prediction property names.
  -smiles_col <string>  Denotes the SMILES column name for CSV file inputs.
  -skip_standardization SKIP_STANDARDIZATION
                        Skip standardization of input structures

report generation arguments:
  Options to specify report generation.

  -report               Write a summary of all models or generate a detailed
                        report for a single model.

jobcontrol arguments:
  Options to manipulate job control related parameters.

  -JOB <jobname>        Override the default job name.
  -HOST <host>[:<n>]    Run job on <host>. Include :<n> to specify the maximum
                        number of jobs to run at a time on host
  -NUM_SUBJOBS <number>
                        Number of subjobs to create. If this option is
                        omitted, -NUM_SUBJOBS will be set to the number of
                        cpus requested with the -HOST option.
  -TMPDIR <dir>         Store temporary job files in <dir>.
  -WAIT                 Do not return prompt until the job completes.
  -SUBJOB               Denotes whether this task is a subjob or not