ml_formulations_driver.py Command Help

Command: $SCHRODINGER/run ml_formulations_gui_dir/ml_formulations_driver.py
usage: $SCHRODINGER/run ml_formulations_gui_dir/ml_formulations_driver.py
       [-h] [-csv CSV]
       [-models [{Elastic Net,Random Forest,XGBoost,Support Vector Machine,Dense Neural Network,Set2Set,Graph-based Models} ...]]
       [-featurizers [{MACCS Keys,Fingerprint,Learned Fingerprint,Matminer,RDKit Descriptors,All Descriptors,Graph Representation,User Input Features Only,Composition only} ...]]
       -mode {train,predict,xai} [-target [TARGET ...]]
       [-descriptors [DESCRIPTORS ...]] [-hyperparameter HYPERPARAMETER]
       [-time TIME] [-test_size TEST_SIZE] [-model [MODEL ...]]
       [-model_type {Regression,Classification}]
       [-custom_model_json CUSTOM_MODEL_JSON]
       [-custom_model_tar_file CUSTOM_MODEL_TAR_FILE] [-downsample DOWNSAMPLE]
       [-out_split] [-cross_validation CROSS_VALIDATION]
       [-split_seed SPLIT_SEED] [-remove_correlated REMOVE_CORRELATED]
       [-group_info GROUP_INFO] [-ingredients_desc INGREDIENTS_DESC] [-xai]
       [-impute] [-allow_missing_smiles] [-HOST <hostname>] [-D]
       [-VIEWNAME <viewname>] [-JOBNAME JOBNAME]

Driver to Train and Predict Formulations using Machine Learning Copyright
Schrodinger, LLC. All rights reserved.

options:
  -h, -help             Show this help message and exit.
  -csv CSV              CSV file containing the formulation data. The file
                        must contain a columns with the SMILES and their
                        corresponding composition in comp column. The file
                        must also contain columns with with descriptor
                        properties if they are used and the target property to
                        train on if training. (default: None)
  -models [{Elastic Net,Random Forest,XGBoost,Support Vector Machine,Dense Neural Network,Set2Set,Graph-based Models} ...]
                        Select the models that will be used for training
                        (default: None)
  -featurizers [{MACCS Keys,Fingerprint,Learned Fingerprint,Matminer,RDKit Descriptors,All Descriptors,Graph Representation,User Input Features Only,Composition only} ...]
                        The featurizers that should be used for training the
                        selected models (default: None)
  -mode {train,predict,xai}
                        Use train to train models, predict to predict using a
                        trained model, and xai to calculate feature importance
                        (default: None)
  -target [TARGET ...]  The target property on which model will be trained or
                        used for prediction. This property must be present in
                        the input CSV file (default: None)
  -descriptors [DESCRIPTORS ...]
                        The additional descriptor properties used for
                        training. These properties must be present in the
                        input CSV file (default: None)
  -hyperparameter HYPERPARAMETER
                        The hyperparameter to use for training the models.
                        Either -hyperparameter or -time must be provided. Both
                        cannot be provided. (default: None)
  -time TIME            The time limit in hours for training the models.
                        Either -hyperparameter or -time must be provided. Both
                        cannot be provided. (default: None)
  -test_size TEST_SIZE  The proportion of the dataset to include in the test
                        split. Should be between 0.0 and 1.0 (default: 0.1)
  -model [MODEL ...]    The trained model (.mlform) to use for prediction.
                        (default: None)
  -model_type {Regression,Classification}
                        Select the algorithm type to use for training
                        (default: None)
  -custom_model_json CUSTOM_MODEL_JSON
                        The json file containing the custom model used to
                        generate features (default: None)
  -custom_model_tar_file CUSTOM_MODEL_TAR_FILE
                        The tar file containing the custom model used to
                        generate features (default: None)
  -downsample DOWNSAMPLE
                        The number of samples to downsample the dataset during
                        hyperparameter tuning (default: 10000)
  -out_split            Split the training data to include unique formulations
                        instead of random train-test split (default: False)
  -cross_validation CROSS_VALIDATION
                        Number of splits for cross validation (default: 5)
  -split_seed SPLIT_SEED
                        Seed for splitting the dataset into train and test
                        sets (default: 1234)
  -remove_correlated REMOVE_CORRELATED
                        The threshold for removing correlated features.
                        Features with a correlation coefficient greater than
                        this threshold will be removed (default: 0.9)
  -group_info GROUP_INFO
                        CSV file containing group information. The file must
                        contain a column with SMILES of component,
                        composition, and group information. This is required
                        for training models for formulations of mixtures.
                        (default: None)
  -ingredients_desc INGREDIENTS_DESC
                        CSV file containing the ingredients description. The
                        file must contain a column with the SMILES of the
                        ingredient and a columns with descriptors of the
                        ingredient. (default: None)
  -xai                  Calculate feature importance after training the model
                        (default: False)
  -impute               Enable imputation for missing descriptor/target
                        values. (default: False)
  -allow_missing_smiles
                        Allow missing SMILES in the input formulations.
                        (default: False)

Job Control Options:
  -HOST <hostname>      Run job remotely on the indicated host entry.
                        (default: localhost)
  -D, -DEBUG            Show details of Job Control operation. (default:
                        False)
  -VIEWNAME <viewname>  Specifies viewname used in job filtering in maestro.
                        (default: False)
  -JOBNAME JOBNAME      Provide an explicit name for the job. (default: None)