Optoelectronics Active Learning

Tutorial Created with Software Release: 2025-3
Topics: Informatics and Team Collaboration, Organic Electronics
Methodology: Machine Learning, Molecular Quantum Mechanics
Products Used: GA Optoelectronics, MS Maestro

Tutorial files

10 MB

This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed

 

Tip: You can hover over a glossary term to display its definition. You can click on an image to expand it in the page.
Abstract:

 

In this tutorial, we will learn to use the Optoelectronics Active Learning panels in the Materials Science (MS) Maestro interface. We will study an example set of octahedral iridium complexes for which we wish to efficiently identify the complexes with a triplet energy closest to a specified target value.

 

Tutorial Content
  1. Introduction to Optoelectronics Active Learning 

  1. Creating Projects and Importing Structures 

  1. Running the Active Learning Training

  1. Analyzing the Output of the Active Learning

  1. Conclusions and References

  1. Glossary of Terms

1. Introduction to Optoelectronics Active Learning

The discovery and development of optoelectronic materials are of great interest, particularly with respect to organic light-emitting diodes (OLEDs) and organic photovoltaics (OPVs). When identifying optimal materials for optoelectronic applications, various properties are often considered: internal and condensed-phase properties, efficient charge injection and transport, and chemical and thermophysical stability, to name a few. The optoelectronics capabilities in the Materials Science Suite are designed to leverage rapid screening to complement experimental development by elucidating molecular properties and informing future synthetic targets.

One method to assess internal properties of optoelectronics materials is the Optoelectronics Calculations panel, which implements efficient screening modes using the MS Jaguar quantum mechanics engine, minimizing computational expense while delivering valuable, accurate data. For an overview of using the Optoelectronics Calculations panel, the Optoelectronics tutorial is recommended.

In some instances, despite the efficiency of the Optoelectronic Calculations approach, it may be preferable to not perform expensive quantum mechanical calculations for all the molecules in a set of optoelectronic molecules. For example, in cases with large data sets (>1000 molecules) and limited computational resources, it may be preferable to use a machine learning (ML) approach to intelligently select a subset of candidates for quantum mechanical (QM) calculations and identify the best candidate molecules.

The Optoelectronics Active Learning and Prediction panels are an ideal solution for identifying promising molecules while limiting computational expense. In the active learning protocol, a set of parameters are chosen with target values or ranges (e.g. maximize oxidation potential, minimize triplet energy, find a reduction potential close to 2.4 eV). Then, an initial set size is chosen to run QM calculations. An ML model is built based on these QM calculations, and applied to the rest of the molecules in the library to predict their properties. A subsequent iteration occurs in which several additional molecules (identified as potentially promising based on the previous ML model) are passed through for QM calculations, allowing for the creation of a new ML model. This iterative approach proceeds until a stopping criteria is met. Ideally this allows us to run QM calculations on relatively few molecules, but enough to produce a sufficiently good ML model to identify the molecule(s) that best match the desired parameters.

Figure 1. Active learning framework to identify best candidate molecules with minimal number of density functional theory (DFT) calculations.

For a complete description of the Optoelectronics Active Learning workflow, visit the help documentation. A publication is also available from Schrödinger scientists detailing the active learning approach for OLED discovery.

In this tutorial, we will learn to use the Optoelectronics Active Learning Training and Prediction panels to identify iridium complexes most likely to match a target triplet energy. This data set is relatively small for example purposes, but the same workflow can be applied to significantly larger systems (e.g. 1000s of molecules). Moreover, we will target only a triplet energy of interest herein, but we will discuss how the panels can be used to find optoelectronics molecules that meet several conditions (as described above) using multi-property optimization (MPO) capabilities.

2. Creating Projects and Importing Structures

At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in MS Maestro to make file navigation easier. Each session in MS Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A MS Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is saved, the project is automatically saved each time a change is made.

Structures can be built in MS Maestro or can be imported using File > Import Structures (or drag-and-dropped), and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.

  1. Double-click the Materials Science icon

Figure 2-1. Change Working Directory option.

  1. Go to File > Change Working Directory
  2. Find your directory, and click Choose
  3. Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/opto_al.zip
  4. After downloading the zip file, unzip the contents in your Working Directory for ease of access throughout the tutorial

Figure 2-2. Save Project panel.

  1. Go to File > Save Project As
  2. Change the File name to optoelectronics_AL_tutorial, click Save
    • The project is now named optoelectronics_AL_tutorial.prj

Figure 2-3. Importing the iridium complexes.

For this tutorial, we will study a relatively small data set of octahedral iridium complexes for example purposes. In general, the active learning and prediction workflow is typically performed with larger data sets. Let’s proceed to import the structures:

  1. Go to File > Import Structures
  2. Navigate to where you downloaded the tutorial files (if not already in your Working Directory) and select iridium_complexes.mae
  3. Click Open
    • A new entry group is added to the entry list containing 249 entries

Note: These structures have not been optimized at the QM level. The workflow utilized herein will include QM optimizations on a subset of the data.

3. Running the Active Learning Training

Let’s suppose that we wished to efficiently identify the iridium complex(es) in the molecular library with a triplet energy closest to 2.80 eV for the design of an electronic device. Rather than performing QM calculations on each molecule, this section will demonstrate how to use the Optoelectronic Active Learning Training panel to achieve this goal. 

Figure 3-1. Selecting the entry group and opening the panel.

  1. Select(1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries the entire optoelectronic_al_Ir_complexes (249) entry group from the entry list
    • Recall that selection means to highlight all of the entries in the group, which can be accomplished by clicking on the entry group name
  2. Go to Tasks > Materials > Informatics > Optoelectronics Active Learning Training

Figure 3-2. Selecting a property.

  1. In the Property section, choose First triplet Energy and Targeted Value
    • This indicates that we wish to target complexes with triplet energies near a specific value
  2. Click Add
    • Triplet Energy - Targeted Value appears in the Multi-property Optimization (MPO) section of the panel

Note: There are many properties available in the Optoelectronics Active Learning panel. For more on these properties, please visit the help documentation.

Figure 3-3. Defining the property to be optimized.

  1. Set the Targeted value to 2.80 eV
  2. Set the Inner tolerance to 0.20 eV, the Outer tolerance to 0.40 eV and retain the Weight of 1.00
    • These quantities indicate which triplet energies will be considered as ‘good’ and ‘bad’ by the model, which we will describe in more detail shortly

Figure 3-4. Defining the Training parameters.

In the Training parameters section:

  1. Set the Initial set size to 10
  2. Set Additional compounds per iteration to 5
  3. For Stop training if, choose The number of iterations reaches and input 15

These settings are used to parameterize the active learning training protocol. QM calculations will be performed on ten compounds initially. For each subsequent loop, the top five compounds identified by the ML algorithm will be used as input for QM calculations. The looping procedure will stop after 15 iterations.

Let’s familiarize ourselves a bit more with some of the options in the Optoelectronics Active Learning Training panel. For a complete description, visit the detailed help documentation.

  • Mode and Advanced Options refer to the specifications for the QM optimizations and property calculations. In general, Screening mode is recommended for efficiently and accurately running optoelectronics calculations. See the Optoelectronics tutorial and help topic for more information
  • Using the Property section of the panel, you can add properties to your Multi-property optimization (MPO):
    • Materials design is often a multi-optimization problem that requires a balance among many parameters. Multi-parameter profiles condense values for a collection of properties into a single numeric value, i.e. the MPO score, allowing for rapid compound prioritization
    • Choose a property to add to your MPO, as well as if you would like to target a specific value, or maximize or minimize this value
    • Based on your selection, input values and/or tolerances will be used to define ranges for determining if a data point is considered ‘good’ or ‘bad’
    • In this example, we target specific values for a single property, but many applications may require targeting several properties with various criterion
    • Adjust the Weight if you would like some properties to be more heavily prioritized in the MPO score determination
    • For a complete description of MPO, visit the help documentation
  • In the Training parameters section of the panel, define how you would like the active learning protocol to proceed
    • Initial set size is the initial amount of compounds to perform QM calculations on (selected randomly from the data set)
    • Additional compounds per iteration is the number of top candidates that will also be subject to QM calculations after each cycle
    • The Stop training if section allows you to specify stopping criterion. Note that OR logic is supported
  • Finally, note that this job proceeds in a highly parallelized manner, so take care when selecting your hosts and how to distribute subjobs

Figure 3-5. Preparing and running the job.

  1. Change the Job name to optoelectronics_al_iridium

 

This job takes several days on a 10+ CPU host. It is not necessary to run the job for this tutorial. However, if you would like to, adjust the job settings () as needed and click Run

  1. Otherwise, close the Optoelectronic Active Learning Training panel and we will proceed to import pre-generated results

4. Analyzing the Output of the Active Learning

We will proceed to import and analyze the results from pre-generated files downloaded in Section 2. If you chose to perform the job yourself, feel free to proceed with your own data instead. Note that if you ran the job yourself, your data may have some variations.

Figure 4-1. Importing the pregenerated results.

  1. From the main menu, go to File > Import Structures
  2. Navigate to the provided files and choose the Section_04 > optoelectronics_al_iridium > optoelectronics_al_iridium-out.maegz file
  3. Click Open
    • A new entry group is added to the entry list entitled optoelectronics_al_iridium-out1 (249) containing the same 249 entries as the original data set

Figure 4-2. Analyzing the Project Table.

Various properties from the job are now associated with the output entries and can be visualized in the Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data

  1. Open the Project Table ()

Two new properties are displayed by default: optoelal MPO Score and optelec Triplet Energy (eV)

The former refers to the MPO score for each entry predicted from the final machine learning model from the active learning protocol. The latter is the triplet energy calculated by DFT for any entries that underwent QM calculations.

Figure 4-3. Adding and viewing additional properties from the Property Tree.

  1. Go to the Property Tree (), expand All > Materials Science > Secondary and check optoelal iteration and optoelal MPO Score DFT
    • The two additional properties are added to the Project Table
    • optoelal iteration refers to the iteration in the active learning protocol in which the entry was selected for DFT calculations (if any)
    • MPO Score DFT refers to the MPO score calculated for that entry using the triplet energy from the QM job rather than the machine learning algorithm

Note: You can export directly from the Project Table to spreadsheet form if needed

  1. Close the Project Table before proceeding

Figure 4-4. Importing the ML models generated by the active learning loop.

This active learning protocol can be used to both a) find the ‘best’ molecules from a data set and b) efficiently develop ML models. To visualize the models and make predictions, the Optoelectronic Active Learning Prediction panel is used

  1. Go to Tasks > Materials > Informatics > Optoelectronics Active Learning Prediction
  2. Click Load models

Figure 4-5. Selecting model files.

  1. Navigate to the provided files and choose all of the Section_04 > optoelectronics_al_iridium > optoelectronics_al_iridium_mpo_*.alomgz files, where the * represents each iteration of the active learning loop
  2. Click Open

Figure 4-6. Viewing the MPO ML models.

All fifteen ML models (one per iteration) are loaded into the panel. For each model, you can see the various parameters of interest, as well as view the corresponding scatter plot.

It is important to note that the ‘best’ models can sometimes be located before the final iteration. In this example, we will choose a model with a high R2 for both the test and training sets as our ‘best’ model. In practice, please choose the model based on your research needs. In this case, it appears that the sixth iteration (optoelectronics_al_iridium_mpo_6.alomgz) gives the best ML model for predicting MPO score (R2 (training) = 0.904, R2 (test) = 0.895; these values are discussed in the help documentation).

It is also important to note that these models are for predicting MPO score as opposed to triplet energy directly.

Figure 4-7. Scatter plot for an MPO ML model.

  1. Click Show for optoelectronics_al_iridium_mpo_6.alomgz
    • The scatter plot is shown. The predicted data are the MPO scores based on the ML algorithm and the trained data are the MPO scores as determined by the DFT calculated properties for all of the compounds preceding this iteration
  2. Click OK to close the scatter plot

Figure 4-8. Loading a triplet energy ML model.

We can also view the ML models for predicting triplet energy directly. For example:

  1. Click Load models
  2. Navigate to the provided files and choose the optoelectronics_al_iridium_triplet_6.alomgz file
  3. Click Open
    • An additional model is added to the table

Figure 4-9. Scatterplot for a triplet energy ML model.

  1. Click Show for optoelectronics_al_iridium_triplet_6.alomgz

This ML model is for directly predicting triplet energy, and the corresponding scatter plot can again be visualized, this time comparing predicted to calculated triplet energy.

  1. Click OK to close the scatter plot

Figure 4-10. Plotting the Learning Curves.

  1. Select all 15 MPO models (Shift + Click)
  2. Click Plot Learning Curves

Figure 4-11. Viewing the Learning Curves.

The Learning Curves plot displays the R2 (Training) and R2 (Test) for the ML model at each iteration of the active learning loop. This learning curve is useful for checking the convergence of a model and choosing which is the ‘best’. In this Figure we plot the MPO learning curves, but we could also plot the curves directly for our property of interest. Here we see that after iteration 2, the best MPO does not change. The top scoring compound remains the same for the rest of the iterations.

  1. Click OK

Figure 4-12. A top candidate structure based on the active learning protocol.

Proceed to explore any of the models as you wish. At this point, there are a few possible next steps. You could proceed to:

  • Scan the Project Table for the compounds that best match your target triplet energy.

Optional: To facilitate searching, you can sort the project table by clicking the arrow under the “optelec Triplet Energy (eV)” column and clicking “Sort All (Ascending)”

Figure 4-13. Using the panel for prediction.

  • Optional: Use one of the top models to make predictions on a new set of compounds. To do so, simply select the structures from the entry list, choose a model from the Optoelectronic Active Learning Prediction panel and run the job

5. Conclusion and References

In this tutorial, we learned how to use the Optoelectronics Active Learning panels to efficiently identify the iridium complexes with a triplet energy closest to a specified target value. These panels allow us to fine-tune properties for optoelectronic applications without having to perform expensive DFT calculations for a large number of compounds. Furthermore, the machine learning models outputted by the active learning workflow could be used to predict properties of new compounds, which could be useful for screening large libraries of compounds.

For further learning:

For introductory content, focused on navigating the Schrödinger Materials Science interface, an Introduction to Materials Science Maestro tutorial is available. Please visit the materials science training website for access to 70+ tutorials. For scientific inquiries or technical troubleshooting, submit a ticket to our Technical Support Scientists at help@schrodinger.com.

For self-paced, asynchronous, online courses in Materials Science modeling, including access to Schrödinger software, please visit the Schrödinger Online Learning portal on our website.

For some related practice regarding organic electronics, proceed to explore other relevant tutorials:

For some related practice regarding machine learning, proceed to explore other relevant tutorials:

 

For further reading:

 

 

6. Glossary of Terms

Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion

Included - the entry is represented in the Workspace, the circle in the In column is blue

Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data

Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)

Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project

Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries

Working Directory - the location where files are saved

Workspace - the 3D display area in the center of the main window, where molecular structures are displayed