deepautoqsar Command Help
Command: $SCHRODINGER/run deepautoqsar
usage: deepautoqsar [-h] [-build] [-y <propname>] [-classification]
[-regression] [-split <fract>]
[-split_type <random OR scaffold>]
[-holdout <holdoutFile>] [-time <hours>] [-seed SEED]
[-descriptastorus DESCRIPTASTORUS]
[-include_geometric_models INCLUDE_GEOMETRIC_MODELS]
[-pretrained_fingerprints PRETRAINED_FINGERPRINTS]
[-optimize_hps OPTIMIZE_HPS]
[-exclude_training_data EXCLUDE_TRAINING_DATA] [-predict]
[-pred <predfile>] [-reinvent] [-prior_dir PRIOR_DIR]
[-property_space_axes PROPERTY_SPACE_AXES]
[-target_file TARGET_FILE] [-config_file CONFIG_FILE]
[-starting_ligands STARTING_LIGANDS]
[-rejected_structures REJECTED_STRUCTURES]
[-n_steps N_STEPS] [-n_compounds N_COMPOUNDS]
[-n_out_compounds N_OUT_COMPOUNDS]
[-min_max_reference_csv MIN_MAX_REFERENCE_CSV]
[-min_max_reference_target MIN_MAX_REFERENCE_TARGET]
[-reinvent_output_file REINVENT_OUTPUT_FILE]
[-reinvent_sample] [-denovo_dir DENOVO_DIR]
[-agent_name AGENT_NAME]
[-n_sample_compounds N_SAMPLE_COMPOUNDS]
[-reinvent_sample_output_file REINVENT_SAMPLE_OUTPUT_FILE]
[-log] [-prop <string>] [-smiles_col <string>]
[-skip_standardization SKIP_STANDARDIZATION] [-i <infile>]
[-o <outfile>] [-x <xfile>] [-report] [-JOB <jobname>]
[-HOST <host>[:<n>]] [-NUM_SUBJOBS <number>]
[-TMPDIR <dir>] [-WAIT] [-SUBJOB]
<model>.qzip
Runs DeepAutoQSAR under JobControl. DeepAutoQSAR is a wrapper which calls the
ligand_ml backend.
This driver script is a simple wrapper for running DeepAutoQSAR under JobControl,
and performs three tasks:
- build and train a DeepAutoQSAR model
- predict on new data using an existing pre-trained model
- generate a report of an existing pre-trained model
The build task will generate a <model>.qzip file, which is a required input for
the predict and report tasks. Accepted input file formats are MAE, SDF, and CSV.
However, note that only MAE and CSV file formats are accepted for prediction
outputs as of now.
Additional descriptors / features may be specified in a text file using the -x
flag, where each descriptor is specified on a new line.
The build task may be run in serial or parallel mode depending on the
-NUM_SUBJOBS argument to increase the number of ML models trained and evaluated
from the random hyper-parameter search space.
Usage Examples:
BUILD:
MAE input:
$SCHRODINGER/run deepautoqsar model.qzip -build -regression -y
r_m_log-solubility -i example.mae.gz -smiles_col smiles -JOB BuildTask_1
-HOST localhost:1 -NUM_SUBJOBS 1
The target / y value must be a Maestro property, and must exist for each
structure in the given input file.
SDF input:
$SCHRODINGER/run deepautoqsar model.qzip -build -regression -y
r_sd_log-solubility -i example.sdf -smiles_col smiles
Note that all properties referred in the SDF input must be in Maestro
structure property format, as given here by 'r_sd_log-solubility'.
CSV input:
$SCHRODINGER/run deepautoqsar model.qzip -build -regression -y
log-solubility -i example.csv -smiles_col smiles
For CSV files, a header line must be included and the SMILES column must be
denoted as 'SMILES' or must be set using the '-smiles_col' argument as shown
above.
PREDICT:
$SCHRODINGER/run deepautoqsar model.qzip -predict -i example.mae
-pred predict_output.mae
REPORT:
$SCHRODINGER/run deepautoqsar model.qzip -report
Copyright Schrodinger LLC, All Rights Reserved.
positional arguments:
<model>.qzip QSAR model archive. Will be created if building
models; must already exist if predicting or generating
a report.
options:
-h, --help show this help message and exit
-i <infile> Input file with structures and activities. Must be of
type MAE, CSV, or SDF. Note, if a CSV file is given,
it must include a header line, and the SMILES column
must be denoted as 'SMILES' or must be set using the
`-smiles_col` optional arg.
-o <outfile> Write output to indicated file. Allowed for use with
-report.
-x <xfile> File with the names of numeric properties from
<infile> that should be included in the pool of
independent variables.
build new model arguments:
Options to build new ligand_ml models.
-build Build new model from the provided structures and
activity values.
-y <propname> Activity property name. May be an integer, real or
string. If an MAE or SDF is file given, the structure
must contain a response target labeled target_value_%,
where % is the index starting from 0, or the response
can be set using the `-response_col` optional arg.
-classification Build classification model from an integer, real or
string property.
-regression Build continuously-valued numeric model from an
integer or real activity property.
-split <fract> Fraction of compounds to assign to training set vs
holdout set when building models. Must be less than
1.0 (default: 0.75).
-split_type <random OR scaffold>
Type of split to apply when generating the training
and holdout set. Must be either "random" or "scaffold"
(default: "random"
-holdout <holdoutFile>
After building models, validate them by making
predictions on the structures in <holdoutFile>. This
file must contain an observed activity value for every
structure. Must be of type MAE, CSV, or SDF.
-time <hours> Floating point time limit for training deep learning
models.
-seed SEED Random seed to set to ensure identical training and
holdout data sets. This value will only be used if a
holdout file has not been given by the flag -holdout
-descriptastorus DESCRIPTASTORUS
Enables the DescriptaStorus to be included as an
additional featurizer during training
-include_geometric_models INCLUDE_GEOMETRIC_MODELS
Enables geometric (PyTorch) models to be included
during training
-pretrained_fingerprints PRETRAINED_FINGERPRINTS
Use pretrained fingerprint featurizer
-optimize_hps OPTIMIZE_HPS
Enable hyperparameter optimization using Bayesian
optimization
-exclude_training_data EXCLUDE_TRAINING_DATA
Exclude training SDF file from packaged qzip
predict using existing models arguments:
Options to predict using existing DeepAutoQSAR models.
-predict Evaluate one or more models in the provided QSAR
archive on the input structures.
-pred <predfile> Output file for predictions. Must be a Maestro file or
CSV for outputting predictions.
reinvent arguments:
Options to generate compounds using existing DeepAutoQSAR models.
-reinvent Use one or more models in the provided QSAR archive to
generatestructures.
-prior_dir PRIOR_DIR Directory containing the trained prior model
-property_space_axes PROPERTY_SPACE_AXES
Comma separated list of relevant property spaces
-target_file TARGET_FILE
json file containing the targets to be optimized
-config_file CONFIG_FILE
Configuration file for the reinvent run
-starting_ligands STARTING_LIGANDS
Starting ligand for the reinvent run
-rejected_structures REJECTED_STRUCTURES
Rejected structures for the reinvent run
-n_steps N_STEPS Number of steps for the reinvent run
-n_compounds N_COMPOUNDS
Number of compounds for scorer calibration during the
reinvent run
-n_out_compounds N_OUT_COMPOUNDS
Number of compounds to sample from the reinvent run
-min_max_reference_csv MIN_MAX_REFERENCE_CSV
compounds with corresponding values for scorer, meant
for static buffer curation
-min_max_reference_target MIN_MAX_REFERENCE_TARGET
min or max
-reinvent_output_file REINVENT_OUTPUT_FILE
Reinvent output file
reinvent sample arguments:
Options to generate compounds using existing DeepAutoQSAR models.
-reinvent_sample Use one or more models in the provided QSAR archive to
generatestructures.
-denovo_dir DENOVO_DIR
Directory containing the trained agents
-agent_name AGENT_NAME
Agent Name
-n_sample_compounds N_SAMPLE_COMPOUNDS
Number of compounds to sample from the reinvent run
-reinvent_sample_output_file REINVENT_SAMPLE_OUTPUT_FILE
Reinvent output file
common arguments:
Common input data manipulation arguments.
-log Add log transformer to models.
-prop <string> Incorporate <string> into prediction property names.
-smiles_col <string> Denotes the SMILES column name for CSV file inputs.
-skip_standardization SKIP_STANDARDIZATION
Skip standardization of input structures
report generation arguments:
Options to specify report generation.
-report Write a summary of all models or generate a detailed
report for a single model.
jobcontrol arguments:
Options to manipulate job control related parameters.
-JOB <jobname> Override the default job name.
-HOST <host>[:<n>] Run job on <host>. Include :<n> to specify the maximum
number of jobs to run at a time on host
-NUM_SUBJOBS <number>
Number of subjobs to create. If this option is
omitted, -NUM_SUBJOBS will be set to the number of
cpus requested with the -HOST option.
-TMPDIR <dir> Store temporary job files in <dir>.
-WAIT Do not return prompt until the job completes.
-SUBJOB Denotes whether this task is a subjob or not