MD Descriptors Panel

Calculate descriptors for molecular structures using molecular dynamics simulations, which can then be used in machine learning.

To display this panel: click the Tasks button and browse to Materials → Informatics → Molecular Dynamics

The following licenses are required to use this panel: MS Maestro, MS Informatics, OPLS (optional), MS Force Field Applications (optional), Desmond

Using the MD Descriptors Panel

This panel utilizes molecular dynamics (MD) simulations to generate descriptors that capture intermolecular interactions. These MD descriptors can be integrated into machine learning (ML) models to predict properties like viscosity, melting point, and glass transition temperatures, which rely heavily on these interactions. By supplementing traditional cheminformatics descriptors with MD descriptors from this panel, the accuracy of ML models can be significantly enhanced, particularly in cases involving small datasets.

The MD Descriptors panel accepts pure materials and formulations as inputs. Pure materials are defined as molecular compounds. See the Use pure materials from option and option menu description for more information on inputting pure materials. Formulations are defined as a mixture of multiple chemical species with a composition ratio for each component. Formulations must be input as a CSV file containing structural and composition information about the formulations. For each formulation, a single entry is added to the Project Table containing one structure of each component of the formulation for visualization. These entries have secondary properties containing the names of the components (Component n) and the composition ratio (Composition Weight %).

The CSV file for formulations must adhere to a specific format with headers that are compatible with the panel. These headers must include identification for each formulation (ID), the structures of the components in SMILES format (SMILES_n) or paths to the files containing the component structure (STRUCTURE_FILE_n), and their relative compositions in percentages (comp_n). Optionally the names of all components separated by the pipe ("|") character in the same order as the SMILES_n columns (label) and the total number of components in the formulation (num_components) can be specified. If the label and num_components are not specified, they are assumed based on the provided SMILES strings. If any SMILES_n field is left empty, that component should not be included in the total num_components. The compositions (comp_n) must sum up to 100% for each formulation.

For example, a system of up to 3 components should have 3 columns for SMILES strings or structure files, and compositions. The headers for such a CSV file are:

ID,SMILES_0,SMILES_1,SMILES_2,STRUCTURE_FILE_0,STRUCTURE_FILE_1,STRUCTURE_FILE_2,comp_0,comp_1,comp_2,label,num_components

An example data set could be formatted as follows:

ID,SMILES_0,SMILES_1,SMILES_2,STRUCTURE_FILE_0,STRUCTURE_FILE_1,STRUCTURE_FILE_2,comp_0,comp_1,comp_2,label,num_components
0,,,,/path-to-file/trans-Cinnamaldehyde.mae,,,100.0,0.0,0.0,trans-Cinnamaldehyde,1
1,CC(=O)C(C)C,CCCCCCCOC(C)=O,,,,,50.0,50.0,0.0,3-Methyl-2-butanone|Heptyl acetate,2
2,CCOCCO,CC(C)NC(C)C,CCO,,,,33.3333,33.3333,33.3333,2-Ethoxyethanol|Diisopropylamine|Ethanol,3

A high-throughput, routine molecular dynamics simulation is run on each pure material or each formulation. The protocol includes automated construction of amorphous simulation cells of approximately the specified number of atoms, equilibration, and tabulation of descriptors. The descriptors are generated through analyzing the final stage of the molecular dynamics simulation and running free volume, radial distribution function, and structure factor calculations. For more details about the construction of the amorphous cell or the following molecular dynamics simulations, please see Table 1 and Ref. 57.

The output includes a Maestro file, which can be incorporated into the project, and a CSV file, with all calculated descriptors. It is possible that subjobs in the automated protocol, such as MD simulations, fail. If so, the structure(s) corresponding to the failure will not be incorporated when the calculation is complete.

There are two types of calculated descriptors, bulk and component. Bulk descriptors are computed over the entire formulation and are labeled according to Table 2. Component descriptors are computed separately for each component in the formulation, and then the average (ave), minimum (min), maximum (max), and standard deviation (std) are taken over all the components in the formulation and reported. For single component formulations the average, minimum, and maximum will all be equivalent. Pure materials are treated the same as single component formulations. For more details about the calculated descriptors, please see Table 2 and Ref. 57.

The descriptors generated from this panel can be used in building ML models with tools such as the Formulation Machine Learning Panel, AutoQSAR Panel, and DeepAutoQSAR Panel. When applying a ML model trained with MD based descriptors, ensure the descriptors for the test set are generated using the same simulation parameters as the training set.

To write out the input file and a script for running the job from the command line, click the arrow next to the Settings button and choose Write. For information on command usage and options, see md_descriptors_driver.py Command Help.

MD Descriptors Panel Features

Use pure materials from option and option menu

Choose the structure source for for generation of descriptors.

  • Project Table (n selected entries)—Use the entries that are currently selected in the Project Table or Entry List. The number of entries selected is shown on the menu item. An icon is displayed to the right which you can click to open the Project Table and select entries.
  • Workspace (n included entries)—Use the entries that are currently included in the Workspace, treated as separate structures. The number of entries in the Workspace is shown on the menu item. An icon is displayed to the right which you can click to open the Project Table and include or exclude entries.
  • File—Use the specified file. When this option is selected, the File name text box and Browse button are displayed.
Open Project Table button

Open the Project Table panel, so you can include the entries for the structure source.

File name text box and Browse button

Enter the file name in this text box, or click Browse and navigate to the file. The name of the file you selected is displayed in the text box.

Use formulations from option and Load button

Select this option to calculate descriptors for a CSV file with formulations data. Click Load to open the Formulation Data File dialog box, where you can navigate to the file. The name of the file you selected is displayed to the right of the Load button. This CSV file is copied into the job directory as jobname.csv.

The CSV file for formulations must adhere to a specific format with headers that are compatible with the panel. These headers must include identification for each formulation (ID), the structures of the components in SMILES format (SMILES_n) or paths to the files containing the component structure (STRUCTURE_FILE_n), and their relative compositions in percentages (comp_n). Optionally the names of all components separated by the pipe ("|") character in the same order as the SMILES_n columns (label) and the total number of components in the formulation (num_components) can be specified. The compositions (comp_n) must sum up to 100% for each formulation.

For example, a system of up to 3 components should have 3 columns for SMILES strings or structure files, and compositions. The headers for such a CSV file are:

ID,SMILES_0,SMILES_1,SMILES_2,STRUCTURE_FILE_0,STRUCTURE_FILE_1,STRUCTURE_FILE_2,comp_0,comp_1,comp_2,label,num_components

See the Using the MD Descriptors Panel section for more information.

Simulation protocols section

Specify options for the MD simulation.

Temperature text box

Specify the temperature to be used, in kelvin.

Pressure text box

Specify the pressure to be used, in bar.

Maximum system size text box

Specify the maximum number of atoms to pack into the amorphous simulation cell. This is treated as a target value, the actual number of atoms may be less than the specified value to maintain the specified number of molecules or ratio of molecules.

Save intermediate data option and menu

Select this option to save data from the Desmond MD simulations. By default it is not selected, as the simulation files can be large and are not needed for the generation of descriptors. The menu has two choices:

  • CMS files—save the CMS files from each of the Desmond simulations. These are the files that contain the structure and force field information.
  • CMS and trajectory—save the CMS files and the trajectories from each of the Desmond simulations. Note that trajectory files can be large and may take up a lot of disk space.
Job toolbar

Manage job submission and settings. See Job Toolbar for a description of this toolbar.

The Job Settings button opens the MD Descriptors - Job Settings Dialog Box, where you can make settings for running the job.

Status bar

Use the Reset button to reset the panel to its default settings and clear any data from the panel. If the panel has a Job toolbar, you can also reset the panel from the Settings button menu.

If you can submit a job from the panel, the status bar displays information about the current job settings and status for the panel. The settings include the job name, task name and task settings (if any), number of subjobs (if any) and the host name and job incorporation setting. The job status can include messages about job start, job completion and incorporation.

The status bar also contains the Help button , which opens an option menu with choices to open the help topic for the panel (Documentation), launch Maestro Assistant, or if available, choose from an option menu of Tutorials. If the panel is used by one or more tutorials, hover over the Tutorials option to display a list of tutorials. Choosing a tutorial opens the tutorial topic.