ml_formulations_driver.py Command Help
Command: $SCHRODINGER/run ml_formulations_gui_dir/ml_formulations_driver.py
usage: $SCHRODINGER/run ml_formulations_gui_dir/ml_formulations_driver.py
[-h] [-csv CSV]
[-models [{Elastic Net,Random Forest,XGBoost,Support Vector Machine,Dense Neural Network,Set2Set,Graph-based Models} ...]]
[-featurizers [{MACCS Keys,Fingerprint,Learned Fingerprint,Matminer,RDKit Descriptors,All Descriptors,Graph Representation,User Input Features Only,Composition only} ...]]
-mode {train,predict,xai} [-target [TARGET ...]]
[-descriptors [DESCRIPTORS ...]] [-hyperparameter HYPERPARAMETER]
[-time TIME] [-test_size TEST_SIZE] [-model [MODEL ...]]
[-model_type {Regression,Classification}]
[-custom_model_json CUSTOM_MODEL_JSON]
[-custom_model_tar_file CUSTOM_MODEL_TAR_FILE] [-downsample DOWNSAMPLE]
[-out_split] [-cross_validation CROSS_VALIDATION]
[-split_seed SPLIT_SEED] [-remove_correlated REMOVE_CORRELATED]
[-group_info GROUP_INFO] [-ingredients_desc INGREDIENTS_DESC] [-xai]
[-impute] [-allow_missing_smiles] [-HOST <hostname>] [-D]
[-VIEWNAME <viewname>] [-JOBNAME JOBNAME]
Driver to Train and Predict Formulations using Machine Learning Copyright
Schrodinger, LLC. All rights reserved.
options:
-h, -help Show this help message and exit.
-csv CSV CSV file containing the formulation data. The file
must contain a columns with the SMILES and their
corresponding composition in comp column. The file
must also contain columns with with descriptor
properties if they are used and the target property to
train on if training. (default: None)
-models [{Elastic Net,Random Forest,XGBoost,Support Vector Machine,Dense Neural Network,Set2Set,Graph-based Models} ...]
Select the models that will be used for training
(default: None)
-featurizers [{MACCS Keys,Fingerprint,Learned Fingerprint,Matminer,RDKit Descriptors,All Descriptors,Graph Representation,User Input Features Only,Composition only} ...]
The featurizers that should be used for training the
selected models (default: None)
-mode {train,predict,xai}
Use train to train models, predict to predict using a
trained model, and xai to calculate feature importance
(default: None)
-target [TARGET ...] The target property on which model will be trained or
used for prediction. This property must be present in
the input CSV file (default: None)
-descriptors [DESCRIPTORS ...]
The additional descriptor properties used for
training. These properties must be present in the
input CSV file (default: None)
-hyperparameter HYPERPARAMETER
The hyperparameter to use for training the models.
Either -hyperparameter or -time must be provided. Both
cannot be provided. (default: None)
-time TIME The time limit in hours for training the models.
Either -hyperparameter or -time must be provided. Both
cannot be provided. (default: None)
-test_size TEST_SIZE The proportion of the dataset to include in the test
split. Should be between 0.0 and 1.0 (default: 0.1)
-model [MODEL ...] The trained model (.mlform) to use for prediction.
(default: None)
-model_type {Regression,Classification}
Select the algorithm type to use for training
(default: None)
-custom_model_json CUSTOM_MODEL_JSON
The json file containing the custom model used to
generate features (default: None)
-custom_model_tar_file CUSTOM_MODEL_TAR_FILE
The tar file containing the custom model used to
generate features (default: None)
-downsample DOWNSAMPLE
The number of samples to downsample the dataset during
hyperparameter tuning (default: 10000)
-out_split Split the training data to include unique formulations
instead of random train-test split (default: False)
-cross_validation CROSS_VALIDATION
Number of splits for cross validation (default: 5)
-split_seed SPLIT_SEED
Seed for splitting the dataset into train and test
sets (default: 1234)
-remove_correlated REMOVE_CORRELATED
The threshold for removing correlated features.
Features with a correlation coefficient greater than
this threshold will be removed (default: 0.9)
-group_info GROUP_INFO
CSV file containing group information. The file must
contain a column with SMILES of component,
composition, and group information. This is required
for training models for formulations of mixtures.
(default: None)
-ingredients_desc INGREDIENTS_DESC
CSV file containing the ingredients description. The
file must contain a column with the SMILES of the
ingredient and a columns with descriptors of the
ingredient. (default: None)
-xai Calculate feature importance after training the model
(default: False)
-impute Enable imputation for missing descriptor/target
values. (default: False)
-allow_missing_smiles
Allow missing SMILES in the input formulations.
(default: False)
Job Control Options:
-HOST <hostname> Run job remotely on the indicated host entry.
(default: localhost)
-D, -DEBUG Show details of Job Control operation. (default:
False)
-VIEWNAME <viewname> Specifies viewname used in job filtering in maestro.
(default: False)
-JOBNAME JOBNAME Provide an explicit name for the job. (default: None)