Ligand Alignment Methodology

The methods used for aligning ligands in the Ligand Alignment Panel and the underlying align_ligands program are as follows:

A Bemis-Murcko scaffold analysis is performed, and structures are clustered according to their largest scaffold, where scaffolds are represented using bond orders but not elemental types (i.e., fuzzy scaffolds).
Clusters are sorted by decreasing scaffold size, and the structures within each cluster are sorted by decreasing size to establish a canonical order for all subsequent work. The very first structure in the canonical list is the primary reference, and subsequent structures in the list tend to be aligned to reference structures that appear earlier in the list.
A scaffold-based distance matrix is computed, where the i, j entry is the scaffold number (0, 1, etc.) of the largest scaffold that's shared by structures i and j. Larger scaffolds have lower scaffold numbers, so the scaffold number can be used as a distance. When constructing the matrix, embedded matching is employed (e.g., imidazole matches benzimidazole) in order to increase the size of the largest shared scaffold.
A minimum spanning tree is constructed from the distance matrix, with structures being added to the tree using the canonical order described in step 2. Each edge of the tree holds a structure, the reference to which it is to be aligned, and the largest scaffold they share. The ordered edges of this tree provide a prescription for sequentially aligning each structure to a previously aligned reference with which it shares the largest possible scaffold.
The ordered edges of the minimum spanning tree are traversed, with the applicable alignment method (core snapping, flexible least-squares core alignment, full flexible shape-based superposition) being used to find the best superposition of a given structure to its reference. Once a structure is aligned to its reference, the aligned structure replaces the original structure, and it is made available as a reference for subsequent structures in the tree.
The structures for which core snapping was successful are post-processed in an attempt to superimpose side chains that are common to two or more structures.
The aligned structures are written out in the order they were provided.

Note that the preferred alignment method is to snap the core onto the reference and conformationally sample the region outside the core. If the core cannot be snapped because of a stereo failure, open ring failure, etc., the entire structure is conformationally sampled and each conformer is aligned using least-squares superposition of the core atoms. If the structure shares no scaffolds with its reference, fully-flexible shape-based alignment is done. The best alignment, regardless of method, is the superposition that yields the highest all-atom shape similarity with atomic overlaps differentiated by elemental type.

Each aligned structure contains the following properties (given as internal names):

i_phase_Input_Structure_Number—The original structure number, which may be used to determine whether structures were skipped due to to multiple fragments or no rings. For example, if 10 structures were provided and the 4th and 7th structures were skipped, the values of this property for the remaining 8 structures would be 1, 2, 3, 5, 6, 8, 9, 10.
i_phase_Structure_Number—Contiguous structure numbers for structures that are actually processed. In the above example, the values of this property would be 1, 2, 3, 4, 5, 6, 7, 8.
i_phase_Reference_Structure—The number of the structure to which a given structure was aligned. These are taken from the contiguous structure numbers.
r_phase_Similarity_to_Reference—The shape similarity of a given structure to its reference.
s_phase_Alignment_Method—The method used to align a given structure to its reference. Possible values are "snapped core", "flexible least-squares core" and "flexible shape-based".
s_phase_Alignment_Reason—If the core cannot be snapped, this property provides a reason, e.g., "creation of a close contact", "open ring in core mapping", "stereochemistry change".
s_phase_Core_SMARTS—Fuzzy SMARTS that represents the core on which a given structure was aligned. Will be an empty string if the structure shared no scaffolds with any other structure.