oled_ml_formulations_driver.py Command Help

Command: $SCHRODINGER/run oled_ml_formulations_gui_dir/oled_ml_formulations_driver.py
usage: $SCHRODINGER/run oled_ml_formulations_gui_dir/oled_ml_formulations_driver.py
       [-h] [-csv CSV] [-groups [GROUPS ...]] -mode {train,predict}
       [-target [TARGET ...]] [-descriptors [DESCRIPTORS ...]]
       [-hyperparameter HYPERPARAMETER] [-time TIME] [-test_size TEST_SIZE]
       [-model [MODEL ...]] [-model_type {Regression,Classification}]
       [-custom_model_json CUSTOM_MODEL_JSON]
       [-custom_model_tar_file CUSTOM_MODEL_TAR_FILE] [-downsample DOWNSAMPLE]
       [-out_split] [-cross_validation CROSS_VALIDATION]
       [-split_seed SPLIT_SEED] [-remove_correlated REMOVE_CORRELATED]
       [-HOST <hostname>] [-D] [-VIEWNAME <viewname>] [-JOBNAME JOBNAME]

Driver to Train and Predict OLED Device Machine Learning Models Copyright
Schrodinger, LLC. All rights reserved.

options:
  -h, -help             Show this help message and exit.
  -csv CSV              CSV file containing the formulation data. The file
                        must contain the layer type, layer thickness, layer
                        smiles, and layer composition columns. The file must
                        also contain the target property column and any
                        additional descriptor columns (default: None)
  -groups [GROUPS ...]  JSON file(s) containing the group names and the SMILES
                        that belong to each group. For multiprediction,
                        provide one file per model. The file must contain the
                        group name as the keys and the list of SMILES header
                        as the values (default: None)
  -mode {train,predict}
                        Use train to train models, predict to predict using a
                        trained model. (default: None)
  -target [TARGET ...]  The target property on which model will be trained or
                        used for prediction. This property must be present in
                        the input CSV file (default: None)
  -descriptors [DESCRIPTORS ...]
                        The additional descriptor properties used for
                        training. These properties must be present in the
                        input CSV file (default: None)
  -hyperparameter HYPERPARAMETER
                        The hyperparameter to use for training the models.
                        Either -hyperparameter or -time must be provided. Both
                        cannot be provided. (default: None)
  -time TIME            The time limit in hours for training the models.
                        Either -hyperparameter or -time must be provided. Both
                        cannot be provided. (default: None)
  -test_size TEST_SIZE  The proportion of the dataset to include in the test
                        split. Should be between 0.0 and 1.0 (default: 0.1)
  -model [MODEL ...]    The trained model (.mlform) to use for prediction.
                        (default: None)
  -model_type {Regression,Classification}
                        Select the algorithm type to use for training
                        (default: None)
  -custom_model_json CUSTOM_MODEL_JSON
                        The json file containing the custom model used to
                        generate features (default: None)
  -custom_model_tar_file CUSTOM_MODEL_TAR_FILE
                        The tar file containing the custom model used to
                        generate features (default: None)
  -downsample DOWNSAMPLE
                        The number of samples to downsample the dataset during
                        hyperparameter tuning (default: 10000)
  -out_split            Split the training data to include unique formulations
                        instead of random train-test split (default: False)
  -cross_validation CROSS_VALIDATION
                        Number of splits for cross validation (default: 5)
  -split_seed SPLIT_SEED
                        Seed for splitting the dataset into train and test
                        sets (default: 1234)
  -remove_correlated REMOVE_CORRELATED
                        The threshold for removing correlated features.
                        Features with a correlation coefficient greater than
                        this threshold will be removed (default: 0.9)

Job Control Options:
  -HOST <hostname>      Run job remotely on the indicated host entry.
                        (default: localhost)
  -D, -DEBUG            Show details of Job Control operation. (default:
                        False)
  -VIEWNAME <viewname>  Specifies viewname used in job filtering in maestro.
                        (default: False)
  -JOBNAME JOBNAME      Provide an explicit name for the job. (default: None)