Maestro File Format

All Schrödinger products use the Maestro file format as their primary method of storing molecular structure information. The Maestro file format is extensible. That is, users and third-party programs can add fields that will be accepted and retained when the files are read by Schrödinger products. Maestro files are also self-describing; data items are named rather than simply identified by position, and the data type (string, real number, integer and Boolean) is included as a single-digit code letter at the beginning of each data name.

Since Maestro files can accept user-defined fields, it is not possible to give an all-inclusive example of a Maestro file. Instead, this topic gives a general overview of the Maestro file format and describes the components used for most Schrödinger applications.

Basic File Description

Maestro format files are free-format ASCII text. All white space (spaces, tab characters, end of lines etc.) is ignored. Schrödinger applications write files in a consistent layout in order to enhance human readability but the files are not, in fact, required to follow this layout. Except for the requirement that string type data items that contain blanks are surrounded by double quotes, there are no restrictions, such as width or precision of the numeric fields, on the data items.

Data Blocks

The basic unit of a Maestro file is a block, a series of data items contained within a pair of curly braces { }. For instance, the following is the first block of the example file in shown in this topic:

{
s_m_m2io_version
:::
1.0.0
}

Most blocks in the Maestro file are proceeded by a name, for example, f_m_ct, where ‘f’ stands for ‘full’ (see below), ‘m’ generally stands for ‘Maestro’ because the data originated from Maestro, and ‘ct’ stands for ‘CT’—short for ‘Connection Table’, which can generally be thought of as a collection of atoms.

Blocks can be nested within other blocks, and if a block contains a list of data values, the block may be indexed. That is, the number of data values contained in the block can be appended to the name. Indexed blocks are identified by having names of the form: 'name[<number of items>]'. For example, the file shown in Section A.5 contains an indexed block of the name m_atom[18]. In this block name, 'm' means that the data originated in Maestro, 'atom' means that the data values included in the block belong to atoms, and '18' indicates that are 18 sets of data in the indexed block (representing 18 atoms).

Within each block there are two sections: a list of the names corresponding to the data fields included in each data item, and the actual data items. In the example file, the first three data field names in the m_atom[18] block are:

    i_m_mmod_type
r_m_x_coord
r_m_y_coord

The end of the first section is denoted by a separator, :::, and followed by the actual data items. In the example file, the first three data items are:

    1 3  0.547623 1.262401 -0.990300 1 " " X " " 2 0.00000 0.00000 CHEX "    " "    " 6 0 0 1 ""
2 3 -0.930177 1.296701 -1.411700 1 " " X " " 2 0.00000 0.00000 CHEX " " " " 6 0 0 1 ""
3 3 -1.821477 1.772701 -0.253300 1 " " X " " 2 0.00000 0.00000 CHEX " " " c1" 6 0 0 1 ""

There must be the same number of data values as there are data names, and the fields within the data items must appear in the same order as the data names are listed. In the above example, for instance, the first field in the first line of data ('1') is the index number. The second number ('3') is the MacroModel atom type, indicated by the data name i_m_mmod_type. Index numbers are required in an indexed block and do not count against data names.

The end of the data item section is indicated by a second ::: separator. A file may contain any number of structures (CT blocks), and each structure block may contain any number of atom and bond data items.

Compressed Format

Files may use a compressed format, most commonly when storing information for a set of conformers. In a compressed file, the first CT block must be a “full” block containing all the information described above. The name of this block is f_m_ct. This full block can be followed by any number of “partial” CT blocks with the name p_m_ct. The partial blocks contain only information that is different from the last preceding full CT block. For example, a file of conformers will have only one copy of the bond table (the m_bond[36] block in the example file), and it will be stored in the full CT. Subsequent partial CT blocks will read bond information from the preceding full CT block.

Data Item Names

The names of data items in a Maestro format file follow a convention that allows the type of the data item and the “owner” of the data to be identified. Names have the form t_o_d, where t is the type descriptor, o is the “owner” and d is the actual data name. For instance, the first data name in the m_bond[36] block in the example file is i_m_mmod_type. The first character of the name represents the data type. The acceptable types are:

i integer
s string value
r real number
b Boolean value

Owner values indicate the application from which the data item most likely originated. Including this field in the data name allows multiple applications to store identically titled quantities. For instance, two applications could store data fields named energy. Currently, basic geometrical and connectivity information, which is shared by most Schrödinger applications, have m (for Maestro) owner fields. Data introduced by MacroModel have mmod owner fields, data introduced by QikProp have qp owner fields.

In the i_m_mmod_type example, i indicates that the value in the corresponding data field is an integer. The letter m in the owner field means that the data originated from Maestro, and mmod_type is an abbreviation for “MacroModel atom type”, which is what the data field represents. See Property Names for more information.

Example Maestro File

Below is an extract from a Maestro format file. The complete file is not shown. Where data has been exempted, it has been noted with an ellipsis (...). The text in italic type is provided as annotation and is not a part of the file.

The first block in the file is unnamed. This contains required information that is relevant to the whole file.

{
 
s_m_m2io_version
 
:::
 
1.0.0
 
}
 
f_m_ct { The “CT” block. Each structure in the file is contained in such a block. This is a full block.
s_m_title The only CT-level data name. A string value representing the title of this structure.
::: The separator between the data names and data values.
"Cyclohexane" The value of the “title” data item.
m_atom[18] { The start of the atom block for this CT block. There are 18 atoms in this block.
i_m_mmod_type The MacroModel atom type.
r_m_x_coord The X-coordinate.
r_m_y_coord The Y-coordinate.
r_m_z_coord The Z-coordinate.
i_m_residue_number The residue number.
s_m_insertion_code The PDB insertion code.
s_m_mmod_res The one-letter MacroModel residue code.
s_m_chain_name The PDB chain name.
i_m_color The color for this atom.
r_m_charge1 The partial atomic charge.
r_m_charge2 The partial atomic charge.
s_m_pdb_residue_name The PDB residue name.
s_m_pdb_atom_name The PDB atom name.
s_m_grow_name The name used by the Maestro structure builder.
i_m_atomic_number The atomic number.
i_m_formal_charg The formal charge.
i_m_representation The representation used to draw this atom.
i_m_visibility A flag to indicate whether this atom is displayed in Maestro or not.
s_m_atom_name The user-specified atom name.
::: The separator for the end of the data items.

Next follow the data values. The first column contains an index number that is assigned automatically. Remaining columns represent the data values in the same order as data names given above.

 1  3  0.547623  1.262401 -0.990300 1 " " X " "  2 0.00000 0.00000 CHEX "    " "    " 6 0 0 1 ""
 2  3 -0.930177  1.296701 -1.411700 1 " " X " "  2 0.00000 0.00000 CHEX "    " "    " 6 0 0 1 ""
 3  3 -1.821477  1.772701 -0.253300 1 " " X " "  2 0.00000 0.00000 CHEX "    " "  c1" 6 0 0 1 "" 
 4  3 -1.624777  0.890901  0.990300 1 " " X " "  2 0.00000 0.00000 CHEX "    " "    " 6 0 0 1 ""
....
17 41  0.493223 -0.677999  0.001900 1 " " X " " 21 0.00000 0.00000 CHEX "    " "  n3" 1 0 0 1 ""
18 41  1.817023  0.395501  0.566000 1 " " X " " 21 0.00000 0.00000 CHEX "    " "  n2" 1 0 0 1 ""
::: The separator for the end of the data values.
} The end of the atoms block.
m_bond[36] { The bond block. There are 36 bonds.
i_m_from The atom the bond is from.
i_m_to The atom the bond is to.
i_m_order The bond order.
i_m_from_rep The graphical representation for the “from” half-bond.
i_m_to_rep The graphical representation for the “to” half-bond.
::: The end of the data names for bond data.

What follows are the data items for the bond block. The first column contains an automatically assigned index number for each item.

 1  1 2 1 1 1
 
 2  1 6 1 1 1
 
...
 
35 17 6 1 1 1
 
36 18 6 1 1 1
 
::: The end of the bond data items.
} The end of the bond block.
} The end of the CT block.