Clustering Based on Volume Overlap Panel

In this panel, you can cluster structures based on their volume overlap, as calculated by the Phase utility phase_volCalc.

To open this panel: click the Tasks button and browse to Discovery Informatics and QSAR → Clustering of Ligands.
To open this panel from the entry group for the results of a Glide docking job, and load the results, use the Workflow Action Menu .

Using
Features
Additional Resources

Using the Clustering Based on Volume Overlap Panel

This panel generates a matrix of volume scores for all input molecules. The volume score is the overlap volume of two molecules divided by the total volume occupied by the molecules. The use of the volume score rather than the volume overlap normalizes the volume similarity, so that the diagonal elements of the volume score matrix are 1, and the off-diagonal elements represent the fractional overlap.

Once the matrix is generated, it is used to cluster the molecules using a hierarchical agglomerative clustering method. The output is a set of files for a chosen number of clusters or a specified merging cutoff distance, named jobname_n.mae, where n is the cluster index.

Volume overlaps are calculated by summing up the occupations of squares on a cubic grid. A cube is occupied if the center of the cube lies inside an atomic sphere. To be included in the volume overlap, each cube must be occupied by atoms from both molecules. The calculated volume is an integer multiple of the cube volume; as such it is not highly accurate (about 0.5% error) but is sufficient for the evaluation of volume overlaps.

The volume score is the fraction of the total volume that is common to both molecules. The total volume is calculated by adding up the number of cubes that are occupied by either of the molecules. The volue score can be expressed by (A AND B)/(A OR B), where A and B represent the volume occupied by the two molecules. It is in fact the Tanimoto similarity for the volume.

Clustering Based on Volume Overlap Panel Features

Structure input controls
Overlapping volumes section
Clustering section

Structure input controls

These controls allow you to select the source of the structures.

Use structures from option menu
Input file text box and Browse button

Use structures from option menu: Choose the source of the structures to cluster from this option menu. The available choices are the Workspace (included entries), the selected entries in the Project Table, and a file. If you choose File, the Input file option menu and Browse button are activated, so you can specify the file.
Input file text box and Browse button: Specify the input structure file in the text box, by typing in its path, or click Browse to navigate to and select the file.

Overlapping volumes section

In this section, you specify the atoms for which you want to calculate the volume overlap (which need not be all atoms), and make settings for how the overlap is calculated.

Use ASL controls
Include hydrogens option
Only consider MacroModel atom types option
Compute volume score option
Fixed radius option and text box
Grid spacing text box

Use ASL controls: Specify the atoms to use when calculating the volume, by providing an ASL expression. You can type it in the text box, or click Select to construct an ASL expression for the Workspace structure in the Atom Selection dialog box. If you want to start over, click Clear to clear the text box.
Include hydrogens option: Select this option to include hydrogens in the volume overlap. By default, hydrogens are not included. This option overrides any hydrogen specifications in the ASL expression.
Only consider MacroModel atom types option: Select this option to require that atoms have the same MacroModel atom types when calculating the overlap. For a grid cube to be included in the overlap, at least one atom from each molecule with the same MacroModel atom type must occupy the cube.
Compute volume score option: Calculate the volume score for the overlap, which is the overlap divided by the total volume.
Fixed radius option and text box: Select this option if you want to set the radii of all atoms to a fixed value, and specify the value in the text box. If this option is not selected, the van der Waals radii are used to define the radii of the atomic spheres.
Grid spacing text box: Specify the grid spacing (the dimension of the cubes) for the calculation of the volume. The volumes are calculated by summing up the occupied cubes. A cube is occupied if its center is inside an atomic sphere. For the overlap volume, a cube must be occupied by an atom from both molecules; for the total volume, a cube must be occupied by an atom from either molecule.

Clustering section

In this section you can make settings to control the clustering.

Linkage method option menu
Use N as text box and options
Incorporate results option

Linkage method option menu

Choose a linkage method for clustering from the following:

Single	Shortest distance between inter-cluster pairs. Produces diffuse, elongated clusters
Complete	Longest distance between inter-cluster pairs. Produces compact, spherical clusters
Average	Average distance between all inter-cluster pairs
Centroid	Euclidean distance between cluster centroids
McQuitty	Average distance to the two clusters merged in forming a given cluster
Ward	Sum of squared distances to merged cluster centroid (minimum variance)
Weighted Centroid	Weighted center of mass distance, also known as median
Flexible beta	Weighted average intra-cluster and inter-cluster distances (Lance-Williams) with beta=0.25.
Schrödinger	Closest distance between terminal (right-to-left) points in 1D cluster orderings.

Use N as text box and options

Choose the method for determining how many clusters are reported, and enter the appropriate value in the text box.

Number of clusters—the number of clusters for which the output is written is given by the value in the text box.
Merging distance cutoff—the value in the text box is the number of clusters that is formed at or below the merging distance specified in the text box. The merging distance is the distance at which two clusters are merged into a single cluster, so the cutoff gives the minimum distance between clusters.

Incorporate results option

Select this option to incorporate the results as a set of entry groups in the Project Table. The entry groups are named jobname_n, where n is the cluster index.

Tutorials

Evaluating Large Ligand Libraries with Active Learning Glide