Formulation Machine Learning Panel

Build and apply machine learning models for predicting formulations properties.

To display this panel: click the Tasks button and browse to Materials → Informatics → Formulation Machine Learning

The following licenses are required to use this panel: MS Maestro, MS Formulation ML, MS Informatics (optional), (Deep)AutoQSAR

Using the Formulation Machine Learning Panel

The Formulation Machine Learning Panel can be used to build and apply machine learning (ML) models to predict formulations properties. Formulations are defined as a mixture of multiple chemical species with a composition ratio for each ingredient/component. Simple formulations consist of single chemical species with composition ratio of each species. In contrast, complex formulations are mixtures of simple formulations such that the composition of each simple formulation can be varied, which is often found in real-world formulation design. Alternative to traditional experimental methods and physics-based simulations, machine learning models from this panel can be used as a data-driven approach to determining formulation properties.

The four tabs in the panel, Training Data, Build, Performance, and Predict, enable a complete ML workflow in one tool:

  • Training Data—Load, view, edit, and plot formulations data for training ML models.

  • Build—Specify parameters for training ML models and the properties to be predicted.

  • Performance—Analyze the quality of trained models.

  • Predict—Make predictions for the property of interest on a formulation data set using a trained ML model from this panel.

The inputs for this panel are CSV file(s) containing structural and composition information about the formulations and properties of interest. This panel does not interact with the workspace or Project Table. Formulation data sets can be generated by extracting literature data, performing experiments, or performing physics-based simulations.

The CSV file(s) must contain a specific set of headers to be compatible with the panel.

For simple formulations, the file must specify the structures of the formulation as either SMILES strings (SMILES_n) or with descriptive labels (label_n), their corresponding relative composition in percentages (comp_n), and at least one other data type (e.g. an identifier, label, property, descriptor, and so on). The compositions must sum up to 100% for each formulation. The file must also contain columns with the target property if building or applying a model and any additional descriptors if they are used as inputs to the model. For example, a system of up to 3 components must have 3 columns for SMILES strings and compositions. An ID can also be included to organize our formulation data set. The required headers for such a CSV file are:

SMILES_0,comp_0,SMILES_1,comp_1,SMILES_2,comp_2,ID

An example data set could be formatted as follows:

ID,SMILES_0,comp_0,SMILES_1,comp_1,SMILES_2,comp_2,SMILES_3,comp_3,SMILES_4,comp_4,title,density,num_components
0,O=C/C=C/c1ccccc1,100,,,,,,,,,trans-Cinnamaldehyde,1.055,1
1,OCc1ccco1,33.33,CN(C)c1ccccc1,33.33,CCCN(CCC)c1ccccc1,33.33,,,,,"Furfuryl alcohol|N,N-Dimethylaniline|N,N-Dipropylaniline",0.998,3
2,CC(C)CCO,20,O=Cc1ccccc1,20,CCCCOC(C)=O,20,CCO,20,c1ccncc1,20,3-Methyl-1-butanol|Benzaldehyde|Butyl acetate|Ethanol|Pyridine,0.898,5

where there are columns for the identifier (ID), components of the mixture (SMILES_n), relative composition (comp_n), a descriptive name (title), target property (density), and any additional information about the formulation (e.g., num_components, temperature, and so on).

For complex formulations two CSV files are required.

The first CSV file requires the same headers as the input for a simple formulation; however, the SMILES_n columns are replaced with the COMPONENT_n columns, which are now identifiers for the simple formulations instead of SMILES structures. An example is shown below:

COMPONENT_0,comp_0,COMPONENT_1,comp_1,COMPONENT_2,comp_2,COMPONENT_3,comp_3,COMPONENT_4,comp_4,ID,log(avg_viscosity),log(shear_rate)
water,78.65,plantapon_acg_50,8.7,texapon_sb_3_kc,5.44,arlypon_f,4.81,dehyquart_cc7_benz,2.4,304,1.05,0.04
water,79.02,dehyton_ml,12.09,texapon_sb_3_kc,5.15,dehyquart_cc7_benz,2.43,arlypon_f,1.31,317,2.23,0.04
water,81.97,dehyton_mc,9.24,texapon_sb_3_kc,4.94,dehyquart_cc7_benz,2.51,arlypon_f,1.34,321,2.33,0.15
water,80.55,plantapon_amino_scg-l,10.46,texapon_sb_3_kc,5.72,dehyquart_cc7_benz,1.68,arlypon_f,1.59,328,0.88,0.04

The second CSV file must specify the ingredient structures for each unique simple formulation that was labeled in the COMPONENT_n columns of the previous file. An example is shown below:

SMILES,COMPOSITION,COMPONENT
O,100,water
CCCCCCCCCCCCCCOCCOCCO,90.5,arlypon_f
O,9.5,arlypon_f
CCCCCCCCC=CCCCCCCCC(=O)NCCO,12.5,comperlan_100
CCCCCCCCCCCC(=O)OC,2,comperlan_100
NCCO,2,comperlan_100
O,83.5,comperlan_100
CCCCCCCCCCCCOCCOCCOCCOC(=O)C(CC(=O)[O-])S(=O)(=O)[O-].[Na+].[Na+],36,texapon_sb_3_kc
O,63.1,texapon_sb_3_kc
C/C=C/C=C/C(=O)[O-].[K+],0.5,texapon_sb_3_kc
OC(=O)CC(O)(CC(=O)O)C(=O)O,0.4,texapon_sb_3_kc
  • SMILES — the structure.

  • COMPOSITION — the percentage of the structure in the mixture. The compositions must sum up to 100% for each unique mixture.

  • COMPONENT — the identifier of the mixture that the structure belongs to, which should match the COMPONENT name in the first CSV file.

Columns titled label_n can be used in addition to SMILES_n columns. The panel supports specifying ingredients using either designation. Note that it is necessary to put a value of "MISSING" in place of any missing SMILES string in the CSV to use the corresponding label from the label_n columns.

To see a complete example, please see the Machine Learning for Formulations tutorial.

Due to the inherent randomness of selected steps in the workflow (e.g. train/test split and hyperparameter selection), running a job with the exact same formulations dataset and parameters as another job may not generate the same machine learning models with the same performance.

To efficiently re-train machine learning models previously generated with this panel, use the ML Model Manager Panel.

To write out the input file and a script for running the job from the command line, click the arrow next to the Settings button and choose Write. For information on command usage and options, see ml_formulations_driver.py Command Help.

Formulation Machine Learning Panel Features

Formulation Type options

Select a formulation type to use in this panel. The options include:

  • Simple—A mixture of multiple chemical species with a composition ratio for each component. A CSV file is loaded with information relevant to the formulation. See the Using the Formulation Machine Learning Panel section for more information.

  • Complex—A blend of multiple mixtures, where each mixture consists of ingredient structure and compositions. Composition ratios are specified at two levels: between the different mixtures in the blend, and among the individual chemical species within each mixture. Two CSV files are loaded with information relevant to the formulation. See the Using the Formulation Machine Learning Panel section for more information.

Training Data tab

Load, view, edit, and plot formulations data for training ML models. Data sets are randomly split to a train and test set to build and evaluate the model as specified in the Training set size text box in the Build tab.

Load training data button

Load a CSV file with formulations data to train the ML model on. Click to open the Select the formulations CSV file for training dialog box, where you can navigate to the file. The name of the file you selected is displayed in the text box. This CSV file is copied into the job directory as jobname_input.csv. For complex formulations, the Select the group information CSV file dialog box is used to specify the second CSV file.

The CSV file(s) must contain a specific set of headers to be compatible with the panel.

For simple formulations, the file must specify the structures of the formulation as either SMILES strings (SMILES_n) or with descriptive labels (label_n), their corresponding relative composition in percentages (comp_n), and at least one other data type (e.g. an identifier, label, property, descriptor, and so on). The compositions must sum up to 100% for each formulation. The file must also contain columns with the target property if building a model and any additional descriptors if they are used as inputs to the model. For example, a system of up to 3 components must have 3 columns for SMILES strings and compositions. An ID can also be included to organize our formulation data set. The required headers for such a CSV file are:

SMILES_0,comp_0,SMILES_1,comp_1,SMILES_2,comp_2,ID

For complex formulations two CSV files are required.

The first CSV file requires the same headers as the input for a simple formulation; however, the SMILES_n columns are replaced with the COMPONENT_n columns, which are now identifiers for the simple formulations instead of SMILES structures.

The second CSV file must specify the ingredient structures for each unique simple formulation that was labeled in the COMPONENT_n columns of the previous file:

  • SMILES — the structure.

  • COMPOSITION — the percentage of the structure in the mixture. The compositions must sum up to 100% for each unique mixture.

  • COMPONENT — the identifier of the mixture that the structure belongs to, which should match the COMPONENT name in the first CSV file.

Columns titled label_n can be used in addition to SMILES_n columns. The panel supports specifying ingredients using either designation. Note that it is necessary to put a value of "MISSING" in place of any missing SMILES string in the CSV to use the corresponding label from the label_n columns.

See the Using the Formulation Machine Learning Panel section for more information.

View ingredients as options

Select how to view ingredients in the formulations input table. The options include:

  • SMILES—List each ingredient as a SMILES string. For non-missing or invalid SMILES, the structures will be editable.

  • Labels—List each ingredient with its label. The structures are not editable in this representation of the ingredients.

Formulations input table

Displays the formulations data from the CSV file. Click on the component to view/edit its chemcial structure, or in the case of complex formulations, view the chemical composition. Additionally, edit data values shown in the panel by clicking on them and making the desired change. For simple formulations, this is helpful to visualize and validate 2D structures instead of manually editing SMILES. Editing the data in the panel does not edit the imported CSV file; however, the updated data can be saved in a new file using the Export data button.

Data from the formulation input table is plotted to the right and can be configured using the Properties option menus. The size of the table and plot can be toggled by dragging the divider between them side to side.

Formulation information—The components of the formulation and their relative compositions. For simple formulations, clicking on the SMILES string opens the Component editor, similar to the 2D Sketcher, to edit the component structure. Click OK to save your changes or Cancel to discard any changes. For complex formulations, clicking on the component opens a dialog with information on the formulations' chemical composition. Click on the percentage to edit its value. The sum of all relative compositions for a particular mixture must add up to 100%. If the SMILES_n field contains a non-valid SMILES string in the CSV file, then that field value is shown here. If "MISSING" is written as the field value, then the comp_n label is shown instead. For cases of non-valid SMILES, 2D structures are not available to edit.

Additional columns—Additional properties specified in the CSV file such as target property or descriptors. To edit any values, click on the row to enable editing. Click on a header to plot a histogram for the data type.

Properties option menus

Choose the x and y axes to display in the plot. Options for the x and y axes are the additional columns in the formulations input table. To view a histogram of the property specified on the x axis, set the y axis to (none).

Plot options

Select an option to show the value in the displayed scatter plot or modify the plot appearance. Statistics are not available for histograms. The options include:

  • R^2—R-squared value (coefficient of determination)

  • RMSE—Root-mean-square error

  • Pearson’s r—Pearson’s correlation coefficient

  • Same x and y axis—Select this option to enforce the x and y axis to have the same scale and range

Plot toolbar

The toolbar has tools for manipulating the plot and for saving images. The buttons that are common to all plot toolbars are described in the Plot Toolbar topic.

Plot area

This area displays the plot of the x vs y axes chosen in the Properties option menus. This can be a histogram or a scatter plot. For scatter plots, a gray dashed line marks the y=x line. It can be useful to visualize the distribution of the data set prior to model construction.

Export Data button

Export data in the formulations input table to a CSV file. Opens the Export Formulations Data dialog box so you can navigate to a location and name the file.

Build tab

Specify parameters for training ML models and the property to be predicted.

Model type options

Specify the model type to use for training:

  • Regression—A regression model type is used for the training data set. Numerical values are required for the Target property when using this option.
  • Classification—A classification model type is used for the training data set. Binary values are required for the Target property when using this option.
Load training data button

Load a CSV file with formulations data to train the ML model on. Click to open the Select the formulations CSV file for training dialog box, where you can navigate to the file. The name of the file you selected is displayed in the text box. This CSV file is copied into the job directory as jobname_input.csv. For complex formulations, the Select the group information CSV file dialog box is used to specify the second CSV file.

The CSV file(s) must contain a specific set of headers to be compatible with the panel.

For simple formulations, the file must specify the structures of the formulation as either SMILES strings (SMILES_n) or with descriptive labels (label_n), their corresponding relative composition in percentages (comp_n), and at least one other data type (e.g. an identifier, label, property, descriptor, and so on). The compositions must sum up to 100% for each formulation. The file must also contain columns with the target property if building a model and any additional descriptors if they are used as inputs to the model. For example, a system of up to 3 components must have 3 columns for SMILES strings and compositions. An ID can also be included to organize our formulation data set. The required headers for such a CSV file are:

SMILES_0,comp_0,SMILES_1,comp_1,SMILES_2,comp_2,ID

For complex formulations two CSV files are required.

The first CSV file requires the same headers as the input for a simple formulation; however, the SMILES_n columns are replaced with the COMPONENT_n columns, which are now identifiers for the simple formulations instead of SMILES structures.

The second CSV file must specify the ingredient structures for each unique simple formulation that was labeled in the COMPONENT_n columns of the previous file:

  • SMILES — the structure.

  • COMPOSITION — the percentage of the structure in the mixture. The compositions must sum up to 100% for each unique mixture.

  • COMPONENT — the identifier of the mixture that the structure belongs to, which should match the COMPONENT name in the first CSV file.

Columns titled label_n can be used in addition to SMILES_n columns. The panel supports specifying ingredients using either designation. Note that it is necessary to put a value of "MISSING" in place of any missing SMILES string in the CSV to use the corresponding label from the label_n columns.

See the Using the Formulation Machine Learning Panel section for more information.

Featurizers option menu

Select the featurizers to use in training ML model(s). You can choose between one, all, or a combination of the listed featurizers. The All Descriptors option concatenates Fingerprint, Matminer, MACCS Keys, and RDKit descriptors. The number of featurizers selected is shown in the option menu. All featurizers are tested independently. Featurizers and machine learning models are interdependent, accordingly, some machine learning models are not available for some featurizers.

For more information on featurizers, see Formulation Machine Learning Featurizer and Model Information.

Machine learning models option menu

Select the machine learning algorithms to use in training the ML model(s). The number of models selected is shown in the option menu. Some machine learning models may not be available depending on the selected featurizers.

For more information on models, see Formulation Machine Learning Featurizer and Model Information.

Target property option menu

Specify the properties on which models will be trained and used for prediction. The Target property must be present in the input CSV file. If multiple target properties are selected then individual models are trained sequentially for each.

Descriptors section
Formulation descriptors option menu

Select any additional descriptor properties to add to the model training. These properties must be numerical and present in the input CSV file. The property selected in the Target property option menu is not available as a descriptor. By default, none are selected.

Load Ingredient Descriptors button

Load a CSV file with descriptors for ingredients. The CSV must have a column with the SMILES of the ingredients and any additional columns with descriptors of interest. Click to open the Select the CSV file containing the ingredient descriptors dialog box, where you can navigate to the file. The name of the file you selected is displayed in the text box. After loading, the text to the right of the button describes the number of ingredients and descriptors loaded.

Unload button

Click to remove the CSV file with ingredient descriptors loaded using the Load Ingredient Descriptors button.

Hyperparameter tuning steps option and text box

Set the number of hyperparameter optimization cycles n. Hyperparameters are defined as the selection of featurizers or models. In the initial steps, hyperparameters are randomly selected to train the first model. In subsequent steps, the hyperparameters and performance of the previous models are used to select hyperparameters by Bayesian Optimization to maximize the model performance of the next model. A total of n model architectures are explored. The final model uses an ensemble of 3 top-performing models to generate predictions and uncertainties. As a result, a minimum value of 3 is required for this parameter. Increasing values of this setting increases the computation time and model accuracy.

Time limit option and text box

Specify the maximum training time for the model training, in hours. When this amount of time has elapsed, the training is completed for the current model, but no new models are trained after that. The elapsed time can be significantly longer than the limit specified here, if a model takes a long time to train.

Training set size text box

Specify the percentage of the formulations data which should be used for training the model. The remaining data will be used to test the model.

Pretrained Models option and menu

Select this option to use the predictions from pretrained models as inputs when training a ML model. Choose pretrained models of interest using the menu. Learn more about the models available in the menu in the documentation for the Machine Learning Property Prediction Panel.

Custom DeepAutoQSAR Model option and menu

Select this option to choose and use a model trained from the DeepAutoQSAR Panel and use its predictions as inputs when training a ML model. When this option is selected, the Browse and Delete Selected Models buttons are displayed. Use the Browse button to load DeepAutoQSAR models of interest.

Browse button

Click Browse to open the Select DeepAutoQSAR model dialog box, where you can navigate to the file and click Open. This opens the Enter Model Name dialog box so you can name the DeepAutoQSAR model for use in the panel.

Only available when the Custom DeepAutoQSAR Model option is selected.

Delete Selected Models button

Remove the models selected in the Custom DeepAutoQSAR Model option and menu.

Only available when the Custom DeepAutoQSAR Model option is selected.

Advanced Options button

Set further options for training the ML model. Opens the Training Options dialog box.

Downsample option and text box

Select this option to downsample the data by the specified factor for hyperparameter tuning. This can help speed up training of ML models, particularly for large (> 10,000 structures) training sets.

Out-of-sample splitting option

Select this option to test the model on unique formulations not seen in the training set, instead of randomly splitting the data. This option is useful for assessing how well a model might generalize to new formulations.

Cross validation splits text box

Specify the number of splits for cross validation.

Random seed for training/test set splitting text box

Select this option to specify a random seed to be used for splitting the training and test set.

Correlation threshold text box

Specify the threshold for removing highly correlated features.

Enable descriptor imputation option

Select this option to calculate values for any formulation descriptors selected for the training with missing values.

Calculate feature importance option

Select this option to calculate feature importance at the end of model training. The results can be visualized in the Feature Importance sub-tab of the Performance tab.

Performance tab

Analyze the quality of trained models. A summary of the statistics of the model is presented alongside various plots.

Load Model button

Load ML models generated using the Formulations Machine Learning panel. Opens the Select Formulations Models dialog box, where you can load and manage formulations ML models. Select the models to analyze the performance of from the option menu after loading.

Training parameters summary

This section displays the parameters used to train the selected ML model.

Target text

Displays the target property when generating the ML model. Noneditable.

Featurizers text

Lists the featurizers selected when generating the ML model. Noneditable.

Machine learning models text

Lists the machine learning algorithms selected when generating the ML model. Noneditable.

Plot options

Select an option to modify the plot appearance in the tabs below. The options include:

  • Same x and y axis—Select this option to enforce the x and y axis to have the same scale and range.

  • Plot XY—Select this option to plot a dashed line for x=y.

  • Show marginals—Select this option to display individual distributions of the x-axis and y-axis.

  • Show stats—Select this option to display the relevant statistics for the model on the plot.

Parity tab

This area displays a scatter plot of the predicted versus observed target property values. Only present for Regression models.

ROC tab

This area displays a receiver operating characteristic (ROC) curve of the true positive rate versus false positive rate. An ideal model would have a ROC plot shifted upper left with a True Positive Rate of 1, False Positive Rate of 0, and Area Under the Curve of 1. Only present for Classification models.

Confusion Matrix tab

This area displays a confusion matrix for the train and test sets. Only present for Classification models.

Feature Importance tab

Use the Calculate Feature Importance button after evaluating the ML model performance. After the calculation completes, use the Model/ Featurizer option menu to select the data of interest. This area displays a bar chart of the descriptor on the y-axis and the corresponding mean SHapley Additive exPlanations (SHAP) value on the x-axis. Click a descriptor name on the y-axis to see a description and any associated image if available.

Use the Previous and Next buttons to view more features. Not available for the Set2Set and Graph-based machine learning models.

Use the Export button to save the feature importance data for the model as a CSV. Opens a file selector, to name and save the file.

Model Comparison tab

This area displays a histogram of each featurizer-model pair's performance score for the loaded ML model. The CV score is a 5-fold cross validation R2 score, meaning a measure of how well a model can generalize across the training data. The error bar shows the minimum and maximum CV score across all trials on each featurizer-model pair.

Plot toolbar

The toolbar has tools for manipulating the plot and for saving images. The buttons that are common to all plot toolbars are described in the Plot Toolbar topic.

Model summary table

For Regression models, this table lists the R-squared value (R2) and the root-mean-square error (RMSE) for the Train and Test sets.

For Classification models, this table lists the area under the ROC curve (Roc Auc), F1 Sensitivity, F1 Specificity, F1 Precision, Accuracy, and F1 Score for the Train and Test sets.

Data sets are randomly split to a train and test set to evaluate model performance as specified in the Training set size text box in the Build tab.

Predict tab

Choose structures to make predictions for and the models to apply. When the job finishes, the results are saved to a CSV file in the job directory with the name _predict.csv.

Load Model button

Load ML models generated using the Formulations Machine Learning panel. Opens the Select Formulations Models dialog box, where you can load and manage formulations ML models. Select the models to use for the prediction task from the option menu after loading.

Predict Data option menu and Load button

Choose to load a test set for prediction or the results of a prediction calculation for review.

Prediction Input—Load a CSV file with formulations data for which we want to predict the target property. Click Load to open the Select the formulations CSV file for predictions dialog box, where you can navigate to the file. The name of the file you selected is displayed in the text box. This CSV file will be copied into the job directory as jobname_input.csv. See the Using the Formulation Machine Learning Panel section for a description of the required format for the CSV file.

Prediction Output—Load a CSV file generated from a job run from the Predict tab of the Formulations Machine Learning panel. Click Load to open the Select the formulations CSV file with predictions output dialog box, where you can navigate to the file. The name of the file you selected is displayed in the text box. The file must have an extension of _predict.csv.

Prediction parameters summary

This section displays the parameters pertinent to using the selected ML model for property prediction.

Target text

Displays the target property for the ML model. Noneditable.

Descriptors text

Lists the additional descriptors used in generating the ML model. These descriptors must be available in the prediction data set if they were used to train the ML models. Noneditable.

View ingredients as options

Select how to view ingredients in the formulations input table. The options include:

  • SMILES—List each ingredient as a SMILES string. For non-missing or invalid SMILES, the structures will be editable.

  • Labels—List each ingredient with its label. The structures are not editable in this representation of the ingredients.

Export Training Ingredient Descriptors button

Export the ingredient descriptors used in training the loaded ML model to a CSV file. Opens a file selector, to name and save the files.

Update Ingredient Descriptors button

Load a CSV file with new descriptors for ingredients not in the original training set. If a descriptor that existed in the training set is present in this file, the value from this CSV file is used in the prediction. The CSV must have a column with the SMILES of the ingredients and any additional columns with descriptors of interest. Click to open the dialog box, where you can navigate to the file. The name of the file you selected is displayed in the text box. After loading, the text to the right of the button describes the number of ingredients and descriptors loaded.

Formulations results table

Displays the formulations data from _input.csv or _prediction.csv file. Click on the component to view/edit its chemcial structure, or in the case of complex formulations, view the chemical composition. Additionally, edit data values shown in the panel by clicking on them and making the desired change. For simple formulations, this is helpful to visualize and validate 2D structures instead of manually editing SMILES. Editing the data in the panel does not edit the imported CSV file, however, the updated data can be saved in a new file using the Export data button.

Data from the formulation input table is plotted to the right and can be configured using the Properties option menus. The size of the table and plot can be toggled by dragging the divider between them side to side.

Formulation information—The structures of the formulation and their relative compositions. For simple formulations, clicking on the SMILES string will open the Component editor, similar to the 2D Sketcher, to edit the component structure. Click OK to save your changes or Cancel to discard any changes. For complex formulations, clicking on the component opens a dialog with information on the formulations' chemical composition. Click on the percentage to edit its value. The sum of all relative compositions for a formulation must add up to 100%. If the SMILES_n field contains a non-valid SMILES string in the CSV file, then that field value is shown here. If "MISSING" is written as the field value, then the comp_n label is shown instead. For cases of non-valid SMILES, 2D structures are not available to edit.

Additional columns—Properties specified in the CSV file such as observed values of the target property or descriptors. For results data, the prediction values and uncertainties are appended to the list. To edit any values, click on the row to enable editing. Click on a header to plot a histogram for the data type.

Properties option menus

Choose the x and y axes to display in the plot. For Predict Input Data, the options for the x and y axes are the target property and any additional descriptors specified. For Predict Result Data, the options for the x and y axes are the target property, predicted property, prediction uncertainties, and any additional descriptors specified. To view a histogram of the property specified on the x axis, set the y axis to (none).

Plot options

Select an option to show the value in the displayed scatter plot or modify the plot appearance. Statistics are not available for histograms. The options include:

  • R^2—R-squared value (coefficient of determination)

  • RMSE—Root-mean-square error

  • Pearson’s r—Pearson’s correlation coefficient

  • Same x and y axis—Select this option to enforce the x and y axis to have the same scale and range

Plot toolbar

The toolbar has tools for manipulating the plot and for saving images. The buttons that are common to all plot toolbars are described in the Plot Toolbar topic.

Plot area

This area displays the plot of the x vs y axes chosen in the Properties option menus. This can be a histogram or a scatter plot. For scatter plots, a gray dashed line marks the y=x line. It can be useful to visualize a scatter plot of the predicted versus observed target property values to assess prediction quality.

Export Data button

Export data in the formulations input table to a CSV file. Opens the Export Formulations Data dialog box so you can navigate to a location and name the file.

Job toolbar

Manage job submission and settings. See Job Toolbar for a description of this toolbar.

The Job Settings button opens the Formulation Machine Learning - Job Settings Dialog Box, where you can make settings for running the job.

Status bar

Use the Reset button to reset the panel to its default settings and clear any data from the panel. If the panel has a Job toolbar, you can also reset the panel from the Settings button menu.

If you can submit a job from the panel, the status bar displays information about the current job settings and status for the panel. The settings include the job name, task name and task settings (if any), number of subjobs (if any) and the host name and job incorporation setting. The job status can include messages about job start, job completion and incorporation.

The status bar also contains the Help button , which opens an option menu with choices to open the help topic for the panel (Documentation), launch Maestro Assistant, or if available, choose from an option menu of Tutorials. If the panel is used by one or more tutorials, hover over the Tutorials option to display a list of tutorials. Choosing a tutorial opens the tutorial topic.