Optoelectronics Active Learning
Tutorial Created with Software Release: 2025-3
Topics: Informatics and Team Collaboration , Organic Electronics
Methodology: Machine Learning , Molecular Quantum Mechanics
Products Used: GA Optoelectronics , MS Maestro
|
10 MB |
This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed
Abstract:
In this tutorial, we will learn to use the Optoelectronics Active Learning panels in the Materials Science (MS) Maestro interface. We will study an example set of octahedral iridium complexes for which we wish to efficiently identify the complexes with a triplet energy closest to a specified target value.
Tutorial Content
1. Introduction to Optoelectronics Active Learning
The discovery and development of optoelectronic materials are of great interest, particularly with respect to organic light-emitting diodes (OLEDs) and organic photovoltaics (OPVs). When identifying optimal materials for optoelectronic applications, various properties are often considered: internal and condensed-phase properties, efficient charge injection and transport, and chemical and thermophysical stability, to name a few. The optoelectronics capabilities in the Materials Science Suite are designed to leverage rapid screening to complement experimental development by elucidating molecular properties and informing future synthetic targets.
One method to assess internal properties of optoelectronics materials is the Optoelectronics Calculations panel, which implements efficient screening modes using the MS Jaguar quantum mechanics engine, minimizing computational expense while delivering valuable, accurate data. For an overview of using the Optoelectronics Calculations panel, the Optoelectronics tutorial is recommended.
In some instances, despite the efficiency of the Optoelectronic Calculations approach, it may be preferable to not perform expensive quantum mechanical calculations for all the molecules in a set of optoelectronic molecules. For example, in cases with large data sets (>1000 molecules) and limited computational resources, it may be preferable to use a machine learning (ML) approach to intelligently select a subset of candidates for quantum mechanical (QM) calculations and identify the best candidate molecules.
The Optoelectronics Active Learning and Prediction panels are an ideal solution for identifying promising molecules while limiting computational expense. In the active learning protocol, a set of parameters are chosen with target values or ranges (e.g. maximize oxidation potential, minimize triplet energy, find a reduction potential close to 2.4 eV). Then, an initial set size is chosen to run QM calculations. An ML model is built based on these QM calculations, and applied to the rest of the molecules in the library to predict their properties. A subsequent iteration occurs in which several additional molecules (identified as potentially promising based on the previous ML model) are passed through for QM calculations, allowing for the creation of a new ML model. This iterative approach proceeds until a stopping criteria is met. Ideally this allows us to run QM calculations on relatively few molecules, but enough to produce a sufficiently good ML model to identify the molecule(s) that best match the desired parameters.
Figure 1. Active learning framework to identify best candidate molecules with minimal number of density functional theory (DFT) calculations.
For a complete description of the Optoelectronics Active Learning workflow, visit the help documentation. A publication is also available from Schrödinger scientists detailing the active learning approach for OLED discovery.
In this tutorial, we will learn to use the Optoelectronics Active Learning Training and Prediction panels to identify iridium complexes most likely to match a target triplet energy. This data set is relatively small for example purposes, but the same workflow can be applied to significantly larger systems (e.g. 1000s of molecules). Moreover, we will target only a triplet energy of interest herein, but we will discuss how the panels can be used to find optoelectronics molecules that meet several conditions (as described above) using multi-property optimization (MPO) capabilities.
2. Creating Projects and Importing Structures
At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in MS Maestro to make file navigation easier. Each session in MS Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A MS Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is saved, the project is automatically saved each time a change is made.
Structures can be built in MS Maestro or can be imported using File > Import Structures (or drag-and-dropped), and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.
- Double-click the Materials Science icon
- (No icon? See Starting Maestro)
- Go to File > Change Working Directory
- Find your directory, and click Choose
- Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/opto_al.zip
- After downloading the zip file, unzip the contents in your Working Directory for ease of access throughout the tutorial
- Go to File > Save Project As
- Change the File name to optoelectronics_AL_tutorial, click Save
- The project is now named
optoelectronics_AL_tutorial.prj
- The project is now named
For this tutorial, we will study a relatively small data set of octahedral iridium complexes for example purposes. In general, the active learning and prediction workflow is typically performed with larger data sets. Let’s proceed to import the structures:
- Go to File > Import Structures
- Navigate to where you downloaded the tutorial files (if not already in your Working Directory) and select
iridium_complexes.mae - Click Open
- A new entry group is added to the entry list containing 249 entries
Note: These structures have not been optimized at the QM level. The workflow utilized herein will include QM optimizations on a subset of the data.
3. Running the Active Learning Training
Let’s suppose that we wished to efficiently identify the iridium complex(es) in the molecular library with a triplet energy closest to 2.80 eV for the design of an electronic device. Rather than performing QM calculations on each molecule, this section will demonstrate how to use the Optoelectronic Active Learning Training panel to achieve this goal.
- Select(1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries the entire optoelectronic_al_Ir_complexes (249) entry group from the entry list
- Recall that selection means to highlight all of the entries in the group, which can be accomplished by clicking on the entry group name
- Go to Tasks > Materials > Informatics > Optoelectronics Active Learning Training
- The Optoelectronics Active Learning Training panel opens
- In the Property section, choose First triplet Energy and Targeted Value
- This indicates that we wish to target complexes with triplet energies near a specific value
- Click Add
- Triplet Energy - Targeted Value appears in the Multi-property Optimization (MPO) section of the panel
Note: There are many properties available in the Optoelectronics Active Learning panel. For more on these properties, please visit the help documentation.
- Set the Targeted value to 2.80 eV
- Set the Inner tolerance to 0.20 eV, the Outer tolerance to 0.40 eV and retain the Weight of 1.00
- These quantities indicate which triplet energies will be considered as ‘good’ and ‘bad’ by the model, which we will describe in more detail shortly
In the Training parameters section:
- Set the Initial set size to 10
- Set Additional compounds per iteration to 5
- For Stop training if, choose The number of iterations reaches and input 15
These settings are used to parameterize the active learning training protocol. QM calculations will be performed on ten compounds initially. For each subsequent loop, the top five compounds identified by the ML algorithm will be used as input for QM calculations. The looping procedure will stop after 15 iterations.
Let’s familiarize ourselves a bit more with some of the options in the Optoelectronics Active Learning Training panel. For a complete description, visit the detailed help documentation.
- Mode and Advanced Options refer to the specifications for the QM optimizations and property calculations. In general, Screening mode is recommended for efficiently and accurately running optoelectronics calculations. See the Optoelectronics tutorial and help topic for more information
- Using the Property section of the panel, you can add properties to your Multi-property optimization (MPO):
- Materials design is often a multi-optimization problem that requires a balance among many parameters. Multi-parameter profiles condense values for a collection of properties into a single numeric value, i.e. the MPO score, allowing for rapid compound prioritization
- Choose a property to add to your MPO, as well as if you would like to target a specific value, or maximize or minimize this value
- Based on your selection, input values and/or tolerances will be used to define ranges for determining if a data point is considered ‘good’ or ‘bad’
- In this example, we target specific values for a single property, but many applications may require targeting several properties with various criterion
- Adjust the Weight if you would like some properties to be more heavily prioritized in the MPO score determination
- For a complete description of MPO, visit the help documentation
- In the Training parameters section of the panel, define how you would like the active learning protocol to proceed
- Initial set size is the initial amount of compounds to perform QM calculations on (selected randomly from the data set)
- Additional compounds per iteration is the number of top candidates that will also be subject to QM calculations after each cycle
- The Stop training if section allows you to specify stopping criterion. Note that OR logic is supported
- Finally, note that this job proceeds in a highly parallelized manner, so take care when selecting your hosts and how to distribute subjobs
- Change the Job name to optoelectronics_al_iridium
This job takes several days on a 10+ CPU host. It is not necessary to run the job for this tutorial. However, if you would like to, adjust the job settings (
) as needed and click Run
- Otherwise, close the Optoelectronic Active Learning Training panel and we will proceed to import pre-generated results
4. Analyzing the Output of the Active Learning
We will proceed to import and analyze the results from pre-generated files downloaded in Section 2. If you chose to perform the job yourself, feel free to proceed with your own data instead. Note that if you ran the job yourself, your data may have some variations.
- From the main menu, go to File > Import Structures
- Navigate to the provided files and choose the
Section_04 > optoelectronics_al_iridium > optoelectronics_al_iridium-out.maegzfile - Click Open
- A new entry group is added to the entry list entitled optoelectronics_al_iridium-out1 (249) containing the same 249 entries as the original data set
Various properties from the job are now associated with the output entries and can be visualized in the Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data
Two new properties are displayed by default: optoelal MPO Score and optelec Triplet Energy (eV)
The former refers to the MPO score for each entry predicted from the final machine learning model from the active learning protocol. The latter is the triplet energy calculated by DFT for any entries that underwent QM calculations.
- Go to the Property Tree (
), expand All > Materials Science > Secondary and check optoelal iteration and optoelal MPO Score DFT- The two additional properties are added to the Project Table
- optoelal iteration refers to the iteration in the active learning protocol in which the entry was selected for DFT calculations (if any)
- MPO Score DFT refers to the MPO score calculated for that entry using the triplet energy from the QM job rather than the machine learning algorithm
Note: You can export directly from the Project Table to spreadsheet form if needed
- Close the Project Table before proceeding
This active learning protocol can be used to both a) find the ‘best’ molecules from a data set and b) efficiently develop ML models. To visualize the models and make predictions, the Optoelectronic Active Learning Prediction panel is used
- Go to Tasks > Materials > Informatics > Optoelectronics Active Learning Prediction
- The Review and Apply Optoelectronics Active Learning Model panel opens
- Click Load models
- Navigate to the provided files and choose all of the
Section_04 > optoelectronics_al_iridium > optoelectronics_al_iridium_mpo_*.alomgzfiles, where the * represents each iteration of the active learning loop - Click Open
All fifteen ML models (one per iteration) are loaded into the panel. For each model, you can see the various parameters of interest, as well as view the corresponding scatter plot.
It is important to note that the ‘best’ models can sometimes be located before the final iteration. In this example, we will choose a model with a high R2 for both the test and training sets as our ‘best’ model. In practice, please choose the model based on your research needs. In this case, it appears that the sixth iteration (optoelectronics_al_iridium_mpo_6.alomgz) gives the best ML model for predicting MPO score (R2 (training) = 0.904, R2 (test) = 0.895; these values are discussed in the help documentation).
It is also important to note that these models are for predicting MPO score as opposed to triplet energy directly.
- Click Show for
optoelectronics_al_iridium_mpo_6.alomgz- The scatter plot is shown. The predicted data are the MPO scores based on the ML algorithm and the trained data are the MPO scores as determined by the DFT calculated properties for all of the compounds preceding this iteration
- Click OK to close the scatter plot
We can also view the ML models for predicting triplet energy directly. For example:
- Click Load models
- Navigate to the provided files and choose the
optoelectronics_al_iridium_triplet_6.alomgzfile - Click Open
- An additional model is added to the table
- Click Show for
optoelectronics_al_iridium_triplet_6.alomgz
This ML model is for directly predicting triplet energy, and the corresponding scatter plot can again be visualized, this time comparing predicted to calculated triplet energy.
- Click OK to close the scatter plot
- Select all 15 MPO models (Shift + Click)
- Click Plot Learning Curves
The Learning Curves plot displays the R2 (Training) and R2 (Test) for the ML model at each iteration of the active learning loop. This learning curve is useful for checking the convergence of a model and choosing which is the ‘best’. In this Figure we plot the MPO learning curves, but we could also plot the curves directly for our property of interest. Here we see that after iteration 2, the best MPO does not change. The top scoring compound remains the same for the rest of the iterations.
- Click OK
Proceed to explore any of the models as you wish. At this point, there are a few possible next steps. You could proceed to:
- Scan the Project Table for the compounds that best match your target triplet energy.
Optional: To facilitate searching, you can sort the project table by clicking the arrow under the “optelec Triplet Energy (eV)” column and clicking “Sort All (Ascending)”
- Optional: Use one of the top models to make predictions on a new set of compounds. To do so, simply select the structures from the entry list, choose a model from the Optoelectronic Active Learning Prediction panel and run the job
5. Conclusion and References
In this tutorial, we learned how to use the Optoelectronics Active Learning panels to efficiently identify the iridium complexes with a triplet energy closest to a specified target value. These panels allow us to fine-tune properties for optoelectronic applications without having to perform expensive DFT calculations for a large number of compounds. Furthermore, the machine learning models outputted by the active learning workflow could be used to predict properties of new compounds, which could be useful for screening large libraries of compounds.
For further learning:
For introductory content, focused on navigating the Schrödinger Materials Science interface, an Introduction to Materials Science Maestro tutorial is available. Please visit the materials science training website for access to 70+ tutorials. For scientific inquiries or technical troubleshooting, submit a ticket to our Technical Support Scientists at help@schrodinger.com.
For self-paced, asynchronous, online courses in Materials Science modeling, including access to Schrödinger software, please visit the Schrödinger Online Learning portal on our website.
For some related practice regarding organic electronics, proceed to explore other relevant tutorials:
- Optoelectronics
- Kinetic Monte Carlo (KMC) Charge Mobility
- Band Shape
- Excited State Analysis
- Calculating Transition Dipole Moments (TDM), TDM Distributions, and Order Parameter
- Singlet Excitation Energy Transfer
- Singlet-Triplet Intersystem Crossing Rate
- Optoelectronics Active Learning
- Dielectric Properties
- Modeling Surfaces
- Molecular Deposition
- Building a Polymer-Polymer Interface Model
For some related practice regarding machine learning, proceed to explore other relevant tutorials:
- Machine Learning for Materials Science
- Polymer Descriptors for Machine Learning
- Periodic Descriptors for Inorganic Solids
- Machine Learning for Ionic Conductivity
- Machine Learning for Sweetness
- Cheminformatics Machine Learning for Homogeneous Catalysis
- Machine Learning Property Prediction
- Molecular Dynamics Descriptors for Machine Learning
- Machine Learning for Formulations
- Optimizing Viscosity and Cost in Formulations with Missing Structural Data
For further reading:
- Design of Organic Electronic Materials With a Goal-Directed Generative Model Powered by Deep Neural Networks and High-Throughput Molecular Simulations. DOI:10.3389/fchem.2021.800370
- Active Learning Accelerates Design and Optimization of Hole-Transporting Materials for Organic Electronics. DOI:10.3389/fchem.2021.800371
- Accelerated design and optimization of OLED materials via active learning. DOI:10.1117/12.2598140
- Atomistic-scale Simulation for the Analysis, Optimization and Accelerated Development of Organic Optoelectronic Materials. DOI:10.11370/isj.54.561
- Estimation of charge carrier mobility in amorphous organic materials using percolation corrected random-walk model. DOI:10.1016/j.orgel.2015.11.021
- Virtual screening for OLED materials. DOI:10.1117/12.2066565
- High-throughput quantum chemistry and virtual screening for OLED material components. DOI:10.1117/12.2025092
- Virtual screening of electron acceptor materials for organic photovoltaic applications. DOI:10.1088/1367-2630/15/10/105029
- Organic Electronics overview from the Schrödinger website
- Help documentation on Optoelectronic Properties
- Help documentation on Optoelectronics Active Learning
6. Glossary of Terms
Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion
Included - the entry is represented in the Workspace, the circle in the In column is blue
Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data
Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)
Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project
Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries
Working Directory - the location where files are saved
Workspace - the 3D display area in the center of the main window, where molecular structures are displayed