Applied Machine Learning for Formulations
Tutorial Created with Software Release: 2025-3
Topics: Catalysis & Reactivity , Consumer Packaged Goods , Organic Electronics , Pharmaceutical Formulations , Polymeric Materials , Thin Film Processing
Methodology: Machine Learning
Products Used: MS Formulation ML , MS Maestro
|
342.6 MB |
This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed
Abstract:
In this tutorial, we will learn to apply the Formulation Machine Learning Panel across a range of materials applications. This tutorial assumes that you have already completed the Machine Learning for Formulations tutorial.
Tutorial Content
1. Introduction to Applied Machine Learning for Formulations
Chemical mixtures with specific compositions of ingredients, or formulations, are ubiquitous across materials science applications. In a previous tutorial (Machine Learning for Formulations), we have described a workflow to leverage machine learning (ML) algorithms to rapidly and accurately predict properties of mixtures using ingredient structure and composition as inputs. The Machine Learning for Formulations tutorial focuses on predicting mixture properties outputted from molecular dynamics simulations, which is a specific application towards solvent mixtures using computed properties.
In this tutorial, we demonstrate the Formulation Machine Learning panel can be applied to distinct experimental datasets from the literature, which will showcase the flexibility of this tool to broad materials design. We focus on training machine learning models to predict four relevant materials properties:
- Temperature-dependent drug solubility in pure or binary solvents. Drug solubility - or the amount of drug dissolved in solution - is useful for understanding how to better design or process drugs in various solvent environments. Drug solubility is important for pharmaceutical formulation applications to design new medicine.
- Temperature-dependent viscosity of pure or binary solvents. Viscosity is useful to measure the “stickiness” of a solution; for example, honey is sticky and has a high viscosity, whereas water flows easily and has a low viscosity. Viscosity is a crucial parameter that is found in battery, consumer goods, and pharmaceutical applications.
- Glass transition temperature (Tg) of copolymer systems. Tg dictates the temperature that a polymer transforms from amorphous phase (soft) to glassy phase (hard), which is critical for designing plastic material polymers that are stable in a desired temperature range.
- Compression strength of geopolymer concrete. Compression strength dictates how much pressure a material can withstand, which is useful to designing reliable concrete for roads and homes.
The overall workflow for one of the demonstrations is summarized in the figure below:
Figure 1. General workflow of inputting a CSV file containing ingredients and compositions, training new machine learning models with the formulation machine learning panel, and prediction of new ingredients and compositions using a trained machine learning model.
2. Creating Projects and Importing Structures
At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in MS Maestro to make file navigation easier. Each session in MS Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A MS Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is saved, the project is automatically saved each time a change is made.
Structures can be built in MS Maestro or can be imported using File > Import Structures (or drag-and-dropped), and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.
-
Double-click the Materials Science icon
- (No icon? See Starting Maestro)
- Go to File > Change Working Directory
- Find your directory, and click Choose
- Pre-generated files are included for running jobs or examining output. Download the zip file here: schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/ml_formulations_applied.zip
- After downloading the zip file, unzip the contents in your Working Directorythe location where files are saved for ease of access throughout the tutorial
- Go to File > Save Project As
-
Change the File name to ml_formulations_applied_tutorial, click Save
-
The project is now named
ml_formulations_applied_tutorial.prj
-
The project is now named
3. Building and Applying a Machine Learning Model for Solubility
In this section, we will use the Formulation Machine Learning panel to build a model for predicting solubility of formulations. Then, we will learn to apply the model generated to predict solubility on a set of formulations unseen by the model.
The provided .csv file is loaded directly into the Formulation Machine Learning panel (not into MS Maestro):
-
Go to Tasks > Materials > Informatics > Formulations Machine Learning
- The Formulation Machine Learning panel opens
- Click Load CSV …
-
Navigate to the provided files, presumably in your working directorythe location where files are saved, select
Section_03 > Pharmaceutical_Formulations-2025BAOTowards-train.csvand click Open- The training CSV data contains the drug as SMILES_0, first solvent as SMILES_1, and second solvent (if present) as SMILES_2. The composition of the drug (comp_0) is fixed at 50%, and relative compositions of the solvents (comp_1 and comp_2) are set such that they sum up to 50%.
- The panel is populated with the training data
Take a moment to view the data in the panel. Note that the data is editable within the panel. To alter any molecules, click on the corresponding SMILES string to open the 2D Sketcher. Click on the row to edit the values. Importantly, editing the data in the panel will not edit the imported .csv file. If you edit the structure, you can create a new CSV file through the Export Data button to save your changes. For this example, we will not edit the data.
The panel offers a useful way to visualize the distribution of the data set prior to model construction.
-
On the right side of the panel, for Properties, choose LogS
- Statistical parameters such as R2, root-mean-squared error (RMSE) and Pearson’s r correlation coefficient are printed, and can be toggled on and off with the checkboxes above the plot
- Go to the Build tab
-
Select Fingerprint and Graph Representation as the Featurizers
- Deselect all other options
- Select Dense Neural Networks, Set2Set, and Graph-based Models for the Machine learning models
-
Change the Target property to LogS
- The target property will be predicted by the machine learning model
-
Select Temperature (K) as the Descriptors
- If you have additional descriptors relevant to the target property, you always add them as additional inputs into the machine learning model. In this example, solubility was measured as a function of temperature, which is why temperature is included in the model.
- Change the Hyperparameter tuning steps to 20
-
Open the Advanced Options
- If additional input features like temperature are missing for data points, you can check the Enable descriptor imputation option to train a model and infer missing features.
-
Check Out-of-sample splitting
- Instead of randomly splitting, we chose out-of-sample splitting because we have temperature-dependent data that can have the repeated drug-solvent combinations at different temperatures. Out-of-sampling splits the training and testing data such that the test set has unique combinations of structures outside of the training set. Out-of-sampling is a better approach to rigorously measure whether a model can predict new drug-solvent combinations.
- Click OK to close the panel
- Change the Job name to formulation_ml_train_logS
-
Adjust the job settings (
) as needed
- This job requires a CPU host. The job will be completed in about 1 day on a CPU host
- If you would like to perform the calculation, click Run. Otherwise, we will import pre-generated results in the next step.
By default, the panel will automatically perform a 90:10 train:test split on your dataset, which means we use 90% of the data for training the model and leave out 10% of the data as the test set to evaluate model generalizability to unseen formulations. Performance on the train and test sets are reported after building a model.
If you performed the calculation, the results will automatically be incorporated into the panel when the job is complete. Here, we will assume that you are proceeding with the provided files:
- Go to the Performance tab
- Click Load Model …
-
In the provided files, go to the
Section_03 > formulation_ml_train_logSdirectory, choose theformulation_ml_train_logS.mlformfile -
Click Open
- If you get a warning saying that the uploaded ML model was trained on an older version, feel free to click okay and proceed. If you wish to retrain the model, you can use the ML Model Manager panel
Note: If you performed the calculations yourself, you should expect slight variance in the results arising from the randomness of the train/test split and machine learning model hyperparameters.
The performance tab is populated with the corresponding scatterplot.
Above the plot, the Featurizers and Machine learning models used in building the model are listed.
The plot contains predicted versus actual values from the train and test set, with corresponding R2 and RMSE values in a table below.
In this case, we can clearly see that the model generalized well on the test set with an R2 of 0.85 (ideal model would have R2 of 1).
- In the Formulation Machine Learning panel, go to the Predict tab
-
Ensure that next to the Load Model, the
.mlformfile is shown. If not, load theformulation_ml_train_logS.mlformfile again
We can load new formulation data into the panel and use our model to predict solubility values.
- Make sure the dropdown shows Prediction Input and click Load …
-
Navigate to the provided tutorial files and choose the
Pharmaceutical_Formulations-2025BAOTowards-test.csvfile. Click Open- Note that this test CSV file has unique drug-solvent combinations outside of model training to truly evaluate new formulations.
The panel is updated with the data in the Pharmaceutical_Formulations-2025BAOTowards-test.csv file. Note that the experimental solubilities are included only to compare the quality of the predictions; the practical application of this model would be to predict the solubility of these formulations using only SMILES, composition, and temperature information (i.e. without experimental solubility).
Similar to the “Training Data” tab, the data is editable within the “Predict” tab. To alter any molecules, click on the corresponding SMILES string to open the 2D Sketcher. To edit any values, click on the row to enable editing. For this example, we will not edit the data.
With the trained model and input data loaded, we can run the job to predict solubility for each formulation.
If you performed the calculation, the results will automatically be incorporated into the panel when the job is complete. We will assume that you are proceeding with the provided files:
- Change the dropdown from Predict Input Data to Prediction Output
- Click Load …
-
Navigate to the
formulation_ml_test_logS_predict.csvfile in theSection_03 > formulation_ml_test_logS > formulation_ml_test_logS_predict.csv - Click Open
The blue text data columns contain the experimental temperature and solubility. The black text data columns contain the outputs from the machine learning model, such as the predicted solubility and the uncertainty of the predictions.
The panel enables quick generation of a scatter plot to assess the performance of the model.
- Click on the LogS column header to pull up the plotter tool
- Set the Properties from the dropdowns to LogS and LogS_predict
A scatterplot of predicted solubility versus provided solubility is shown. Statistical parameters such as R2, RMSE and Pearson’s r are printed, and can be toggled on and off with the checkboxes above the plot.
The best fit line between predicted and calculated values shows a reasonable R2 of 0.90 (an ideal model would have an R2 of 1.00). The results suggest that the ML model derived from the Formulation Machine Learning panel could accurately predict new drug solubilities with varying solvent structure, composition, and temperature.
4. Building and Applying a Machine Learning Model for Viscosity
In this section, we will use the Formulation Machine Learning panel to build a model for predicting viscosity of formulations. Then, we will learn to apply the model generated to predict viscosity on a set of formulations unseen by the model.
-
Reset the Formulation Machine Learning panel

- Click Load Training Data …
-
Navigate to the provided files, presumably in your working directorythe location where files are saved, select
Section_04 > Viscosity-Bilodeau2023-train.csvand click Open- The training CSV file contains the first solvent as SMILES_0 and second solvent (if present) as SMILES_1. Their corresponding compositions are stored as comp_0 and comp_1 such that the compositions sum up to 100%.
- The panel is populated with the training data
Once again, take a moment to view the data in the panel.
-
On the right side of the panel, for Properties, choose logV
- Statistical parameters such as R2, root-mean-squared error (RMSE) and Pearson’s r correlation coefficient are printed, and can be toggled on and off with the checkboxes above the plot
- Go to the Build tab
-
Select Fingerprint and Graph Representation as the Featurizers
- Deselect all other options
- Select Dense Neural Networks, Set2Set, and Graph-based Models for the Machine learning models
-
Change the Target property to logV
- The target property is that which we wish to predict with the machine learning model
-
Select Temperature (K) as the Descriptors
- If you had additional descriptors available with your dataset, you could refer to them here
- Change the Hyperparameter tuning steps to 10
- Open the Advanced Options
-
Check Out-of-sample splitting
- Similar to the previous example, this dataset has temperature-dependent viscosity data. Out-of-sampling splits the training and testing data such that the test set has unique combinations of structures outside of the training set. Out-of-sampling is a better approach to rigorously measure whether a model can predict new pure or binary solvent combinations.
- Click OK to close the panel
- Change the Job name to formulation_ml_train_viscosity
-
Adjust the job settings (
) as needed
- This job requires a CPU host. The job will be completed in about 10 hours on a CPU host
- If you would like to perform the calculation, click Run. Otherwise, we will import pre-generated results in the next step.
By default, the panel will automatically perform a random 90:10 train:test split on your dataset. The panel will use the training set for model training and testing set to evaluate whether the model can generalize to unseen formulations. Performance on the train and test sets are reported after building a model.
If you performed the calculation, the results will automatically be incorporated into the panel when the job is complete. Here, we will assume that you are proceeding with the provided files:
- Go to the Performance tab
- Click Load Model …
-
In the provided files, go to the
Section_04 > formulation_ml_train_viscositydirectory, choose theformulation_ml_train_viscosity.mlformfile and click Open- If you get a warning saying that the uploaded ML model was trained on an older version, feel free to click okay and proceed. If you wish to retrain the model, you can use the ML Model Manager panel
Note: If you performed the calculations yourself, you should expect slight variance in the results.
The performance tab is populated with the corresponding scatterplot.
Above the plot, the Featurizers and Machine learning models used in building the model are listed.
The plot contains predicted versus actual values from the train and test set, with corresponding R2 and RMSE values in a table below.
In this case, we can clearly see that the model generalized well on the test set.
- In the Formulation Machine Learning panel, go to the Predict tab
-
Ensure that next to Load Model, the
.mlformfile is shown. If not, load theformulation_ml_train_viscosity.mlformfile
We can load new formulation data into the panel and use our model to predict viscosity values.
- Make sure the dropdown shows Prediction Input and click Load …
-
Navigate to the provided tutorial files and choose the
Viscosity-Bilodeau2023-test.csvfile. Click Open
The panel is updated with the data in the Viscosity-Bilodeau2023-test.csv file. Note that the experimental viscosities are provided only to compare the quality of the predictions; the machine learning model will be able to predict the viscosity of these formulations with only SMILES, composition, and temperature information.
Similar to the “Training Data” tab, the data is editable within the “Predict” tab. To alter any molecules, click on the corresponding SMILES string to open the 2D Sketcher. To edit any values, click on the row to enable editing. For this example, we will not edit the data.
With the trained model and input data loaded, we can run the job to predict viscosity for each formulation.
- Change the Job name to formulation_ml_test_viscosity
-
Adjust the job settings (
) as needed
- This job requires a CPU host. The job will be completed in about 1 minute on a CPU host
- If you would like to perform the calculation, click Run. Otherwise, we will import pre-generated results in the next step.
If you performed the calculation, the results will automatically be incorporated into the panel when the job is complete. We will assume that you are proceeding with the provided files:
- Change the dropdown from Predict Input Data to Prediction Output
- Click Load …
-
Navigate to
Section_04 > formulation_ml_test_viscosity > formulation_ml_test_viscosity_predict.csvdirectory - Click Open
The first data column contains the provided temperature and the second data column contains the provided logV. The third and fourth columns contain the outputs from the machine learning model, specifically the predicted viscosity and associated uncertainty.
The panel enables quick generation of a scatter plot to assess the performance of the model.
- Set the Properties from the dropdowns to logV and logV_predict
A scatterplot of predicted logV versus provided logV is shown. Statistical parameters such as R2, RMSE and Pearson’s r are printed, and can be toggled on and off with the checkboxes above the plot.
The best fit line between predicted and calculated values shows a reasonable R2 of 0.96 (an ideal model would have an R2 of 1.00). The results suggest that the ML model derived from the Formulation Machine Learning panel could accurately predict viscosity of pure or binary solvents with varying structure, composition, and temperature.
5. Building a Machine Learning Model for Glass Transition Temperature
In this section, we will use the Formulation Machine Learning panel to build a model for predicting the glass transition temperature for copolymers.
-
Reset the Formulation Machine Learning panel

- Click Load Training Data …
-
Navigate to the provided files, presumably in your working directorythe location where files are saved, select
Section_05 > Copolymer_Tg_penzel1997Glass.csvand click Open- The training CSV file contains binary copolymer systems, where the repeat unit of monomer 1 and 2 is stored as SMILES_0 and SMILES_1, respectively. The compositions (comp_0 and comp_1) denote the extent of these monomers present in the copolymer system.
- The panel is populated with the training data
Take a moment to view the values in the panel. Each input structure contains a monomer repeat unit of a copolymer. The compositions represent the relative ratios of each monomer unit for a random copolymer system.
Note: Th and Lr are dummy atoms that serve as the head and tail groups of the monomer, and to mark where to connect the monomers to form a polymer system.
-
On the right side of the panel, for Properties, choose Tg(K)
- Statistical parameters such as R2, root-mean-squared error (RMSE) and Pearson’s r correlation coefficient are printed, and can be toggled on and off with the checkboxes above the plot
- Go to the Build tab
- Select Fingerprint and MACCS Keys as the Featurizers
- Select Dense Neural Networks and Set2Set for the Machine learning models
- Change the Target property to Tg(K)
- Change the Hyperparameter tuning steps to 20
- Open the Advanced Options
-
Check Out-of-sample splitting
- Instead of randomly splitting, we chose out-of-sample splitting because we have a copolymer dataset with varying compositions per copolymer combination. Out-of-sampling splits the training and testing data such that the test set has unique combinations of copolymer structures outside of the training set. Out-of-sampling is a better approach to rigorously measure whether a model can predict new copolymer combinations.
- Click OK to close the panel
- Change the Job name to formulation_ml_train_Tg
-
Adjust the job settings (
) as needed
- This job requires a CPU host. The job will be completed in about 20 minutes on a CPU host
- If you would like to perform the calculation, click Run. Otherwise, we will import pre-generated results in the next step.
By default, the panel will automatically perform a 90:10 train:test split on your dataset. The panel will use the training set for model training and testing set to evaluate whether the model can generalize to unseen formulations. Performance on the train and test sets are reported after building a model.
If you performed the calculation, the results will automatically be incorporated into the panel when the job is complete. Here, we will assume that you are proceeding with the provided files:
- Go to the Performance tab
- Click Load Model …
-
In the provided files, go to the
Section_05 > formulation_ml_train_Tgdirectory, choose theformulation_ml_train_Tg.mlformfile and click Open- If you get a warning saying that the uploaded ML model was trained on an older version, feel free to click okay and proceed. If you wish to retrain the model, you can use the ML Model Manager panel
Note: If you performed the calculations yourself, you should expect slight variance in the results.
The performance tab is populated with the corresponding scatterplot.
Above the plot, the Featurizers and Machine learning models used in building the model are listed.
The plot contains predicted versus actual values from the train and test set, with corresponding R2 and RMSE values in a table below.
In this case, we can clearly see that the model generalized well on the test set.
The best fit line between predicted and calculated values shows a reasonable R2 of 0.98 (an ideal model would have an R2 of 1.00). The results suggest that the ML model derived from the Formulation Machine Learning panel could accurately predict Tg as a function of monomer structure and composition for binary copolymer systems.
6. Building a Machine Learning Model for Compression Strength
In this section, we will use the Formulation Machine Learning panel to build a model for predicting the compression strength for concrete geopolymers.
-
Reset the Formulation Machine Learning panel

- Click Load Training Data …
-
Navigate to the provided files, presumably in your working directorythe location where files are saved, select
Section_06 > Geopolymer_Concrete-Rao2018Quantitative.csvand click Open- The training CSV file contains 240 mixtures of fly ash (FA) and ground granulated blast-surface slag (GGBFS) used to create the geopolymer systems, where their compositions are labeled as comp_0 and comp_1, respectively. Chemical structures are ignored since these systems cannot be easily represented as a SMILES; hence, they are denoted as “MISSING” in SMILES_0 and SMILES_1. We use the composition of each ingredient as inputs to machine learning models.
- The panel is populated with the training data
Take a moment to view the data in the panel. Since no SMILES structure is available, each ingredient is labeled as “comp_0” and “comp_1” to represent their individual compositions.
-
On the right side of the panel, for Properties, choose target_compression_strength_MPa
- Statistical parameters such as R2, root-mean-squared error (RMSE) and Pearson’s r correlation coefficient are printed, and can be toggled on and off with the checkboxes above the plot
- Go to the Build tab
-
Select Composition only as the Featurizers
- Deselect all other options
- Composition only option will ignore chemical structure and use only the composition as inputs to machine learning models
- Select All for the Machine learning models
- Change the Target property to target_compression_strength_MPa
-
Select Powderkg, Liquidkg, WC, Admixturekg, Aggregateskg, temperature as the Descriptors. The additional descriptors are experimental features that were varied and summarized below:
- Powderkg: Powder of fly ash + GGBFS content in kg/m3
- Liquidkg: Alkaline solution in kg/m3
- WC: Water-to-cement ratio / Alkali-binder ratio
- Admixturekg: Total superplasticizer in kg/m3 (4% of binder)
- Aggregateskg: Total aggregate in kg/m3
- temperature: Curing temperature in Celsius
- Change the Hyperparameter tuning steps to 20
- Change the Job name to formulation_ml_train_geopolymer
-
Adjust the job settings (
) as needed
- This job requires a CPU host. The job will be completed in about 10 minutes on a CPU host
- If you would like to perform the calculation, click Run. Otherwise, we will import pre-generated results in the next step.
By default, the panel will automatically perform a random 90:10 train:test split on your dataset. The panel will use the training set for model training and testing set to evaluate whether the model can generalize to unseen formulations. Performance on the train and test sets are reported after building a model. Note that for this example, we select random splitting since the compositions of the same ingredients are being varied across the dataset (e.g. extent of FA versus GGBFS). If there were distinct, unique formulations with varying compositions, out-of-sample splitting would be useful to evaluate whether the model can generalize to new formulations as shown in the previous examples.
If you performed the calculation, the results will automatically be incorporated into the panel when the job is complete. Here, we will assume that you are proceeding with the provided files:
- Go to the Performance tab
- Click Load Model …
-
In the provided files, go to the
Section_06 > formulation_ml_train_geopolymerdirectory, choose theformulation_ml_train_geopolymer.mlformfile and click Open- If you get a warning saying that the uploaded ML model was trained on an older version, feel free to click okay and proceed. If you wish to retrain the model, you can use the ML Model Manager panel
Note: If you performed the calculations yourself, you should expect slight variance in the results.
The performance tab is populated with the corresponding scatterplot.
Above the plot, the Featurizers and Machine learning models used in building the model are listed.
The plot contains predicted versus actual values from the train and test set, with corresponding R2 and RMSE values in a table below.
In this case, we can clearly see that the model generalized well on the test set.
- Go to the Feature Importance tab
- Change the Job name to formulation_ml_feature_importance_geopolymer
-
Click Calculate Feature Importance
- This calculation will run in the panel in just a couple minutes
If you performed the calculation, the results should automatically be incorporated into the panel when the job is complete. Here, we will assume that you are proceeding with the provided files:
- Go to the Performance tab
- Click Load Model …
-
In the provided files, go to the
Section_06 > formulation_ml_feature_importance_geopolymerdirectory, choose theformulation_ml_train_geopolymer.mlformfile and click Open-
This is an updated
formulation_ml_train_geopolymer.mlformfile that includes the calculated feature importance - If you get a warning saying that the uploaded ML model was trained on an older version, feel free to click okay and proceed. If you wish to retrain the model, you can use the ML Model Manager panel
-
This is an updated
The high test set R2 of 0.99 (an ideal model would have an R2 of 1.00) demonstrates that the ML model derived from the Formulation Machine Learning panel can be used to screen compression strength of geopolymer concrete. In addition, feature importance tools shows which composition plays the largest role to strength; for example, COMPOSITION_1 shows a high positive Mean |SHAP| value, demonstrating that increasing the extent of GGBFS can lead to increased compression strength. Similarly, increasing curing temperature or decreasing WC (water-to-cement ratio) can yield higher compression strengths.
7. Conclusion and References
In this tutorial, we learned how to use the formulation machine learning panel to train practical models with experimental datasets derived from the literature. These machine learning models can be applied to broad materials applications, such as pharmaceutical formulations, consumer packaged goods, batteries, plastics, and solid materials. While this tutorial demonstrates the utility of this tool for a subset of literature examples, one can envision training broad formulation machine learning models for diverse properties, which can then be used for downstream screening of large formulation libraries to suggest best candidates with tailored material properties.
For further learning:
For introductory content, focused on navigating the Schrödinger Materials Science interface, an Introduction to Materials Science Maestro tutorial is available. Please visit the materials science training website for access to 70+ tutorials. For scientific inquiries or technical troubleshooting, submit a ticket to our Technical Support Scientists at help@schrodinger.com.
For self-paced, asynchronous, online courses in Materials Science modeling, including access to Schrödinger software, please visit the Schrödinger Online Learning portal on our website.
For some related practice, proceed to explore other relevant tutorials:
-
For more machine learning:
- Machine Learning for Formulations
- Optimization of Formulation Using Machine Learning
- Machine Learning for Materials Science
- Periodic Descriptors for Inorganic Solids
- Molecular Dynamics Descriptors for Machine Learning
- Optoelectronics Active Learning
- Machine Learning for Sweetness
- Machine Learning for Ionic Conductivity
- Cheminformatics Machine Learning for Homogeneous Catalysis
- Machine Learning Property Prediction
For further reading:
- AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. DOI:10.1038/s41597-019-0151-1
- Boosting the predictive performance with aqueous solubility dataset curation. DOI:10.1038/s41597-022-01154-3
- Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. DOI:10.48550/arXiv.2102.09548
- Leveraging High-throughput Molecular Simulations and Machine Learning for the Design of Chemical Mixtures. DOI:10.1038/s41524-025-01552-2
- Machine learning for predicting the viscosity of binary liquid mixtures. DOI: 10.1016/j.cej.2023.142454
- The glass transition temperature of random copolymers: 1. Experimental data and the Gordon-Taylor equation. DOI: 10.1016/S0032-3861(96)00521-6
- Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning. DOI: 10.1186/s13321-024-00911-3
- LLMs can Design Sustainable Concrete – aSystematic Benchmark. DOI: 10.13140/RG.2.2.33795.27686
8. Glossary of Terms
Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion
Included - the entry is represented in the Workspace, the circle in the In column is blue
Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data
Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)
Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project
Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries
Working Directory - the location where files are saved
Workspace - the 3D display area in the center of the main window, where molecular structures are displayed