Clustering Based on Volume Overlap Panel
In this panel, you can cluster structures based on their volume overlap, as calculated by the Phase utility phase_volCalc.
To open this panel: click the Tasks button and browse to Discovery Informatics and QSAR → Clustering of Ligands.
To open this panel from the entry group for the results of a Glide docking job
.
- Using
- Features
- Additional Resources
Using the Clustering Based on Volume Overlap Panel
This panel generates a matrix of volume scores for all input molecules. The volume score is the overlap volume of two molecules divided by the total volume occupied by the molecules. The use of the volume score rather than the volume overlap normalizes the volume similarity, so that the diagonal elements of the volume score matrix are 1, and the off-diagonal elements represent the fractional overlap.
Once the matrix is generated, it is used to cluster the molecules using a hierarchical agglomerative clustering method. The output is a set of files for a chosen number of clusters or a specified merging cutoff distance, named jobname_n.mae, where n is the cluster index.
Volume overlaps are calculated by summing up the occupations of squares on a cubic grid. A cube is occupied if the center of the cube lies inside an atomic sphere. To be included in the volume overlap, each cube must be occupied by atoms from both molecules. The calculated volume is an integer multiple of the cube volume; as such it is not highly accurate (about 0.5% error) but is sufficient for the evaluation of volume overlaps.
The volume score is the fraction of the total volume that is common to both molecules. The total volume is calculated by adding up the number of cubes that are occupied by either of the molecules. The volue score can be expressed by (A AND B)/(A OR B), where A and B represent the volume occupied by the two molecules. It is in fact the Tanimoto similarity for the volume.
Clustering Based on Volume Overlap Panel Features
Structure input controls
These controls allow you to select the source of the structures.
- Use structures from option menu
-
Choose the source of the structures to cluster from this option menu. The available choices are the Workspace (included entries), the selected entries in the Project Table, and a file. If you choose File, the Input file option menu and Browse button are activated, so you can specify the file.
- Input file text box and Browse button
-
Specify the input structure file in the text box, by typing in its path, or click Browse to navigate to and select the file.
Overlapping volumes section
In this section, you specify the atoms for which you want to calculate the volume overlap (which need not be all atoms), and make settings for how the overlap is calculated.
- Use ASL controls
- Include hydrogens option
- Only consider MacroModel atom types option
- Compute volume score option
- Fixed radius option and text box
- Grid spacing text box
- Use ASL controls
-
Specify the atoms to use when calculating the volume, by providing an ASL expression. You can type it in the text box, or click Select to construct an ASL expression for the Workspace structure in the Atom Selection dialog box. If you want to start over, click Clear to clear the text box.
- Include hydrogens option
-
Select this option to include hydrogens in the volume overlap. By default, hydrogens are not included. This option overrides any hydrogen specifications in the ASL expression.
- Only consider MacroModel atom types option
-
Select this option to require that atoms have the same MacroModel atom types when calculating the overlap. For a grid cube to be included in the overlap, at least one atom from each molecule with the same MacroModel atom type must occupy the cube.
- Compute volume score option
-
Calculate the volume score for the overlap, which is the overlap divided by the total volume.
- Fixed radius option and text box
-
Select this option if you want to set the radii of all atoms to a fixed value, and specify the value in the text box. If this option is not selected, the van der Waals radii are used to define the radii of the atomic spheres.
- Grid spacing text box
-
Specify the grid spacing (the dimension of the cubes) for the calculation of the volume. The volumes are calculated by summing up the occupied cubes. A cube is occupied if its center is inside an atomic sphere. For the overlap volume, a cube must be occupied by an atom from both molecules; for the total volume, a cube must be occupied by an atom from either molecule.
Clustering section
In this section you can make settings to control the clustering.
- Linkage method option menu
-
Choose a linkage method for clustering from the following:
Single Shortest distance between inter-cluster pairs. Produces diffuse, elongated clusters Complete Longest distance between inter-cluster pairs. Produces compact, spherical clusters Average Average distance between all inter-cluster pairs Centroid Euclidean distance between cluster centroids McQuitty Average distance to the two clusters merged in forming a given cluster Ward Sum of squared distances to merged cluster centroid (minimum variance) Weighted Centroid Weighted center of mass distance, also known as median Flexible beta Weighted average intra-cluster and inter-cluster distances (Lance-Williams) with beta=0.25. Schrödinger Closest distance between terminal (right-to-left) points in 1D cluster orderings. - Use N as text box and options
-
Choose the method for determining how many clusters are reported, and enter the appropriate value in the text box.
-
Number of clusters—the number of clusters for which the output is written is given by the value in the text box.
-
Merging distance cutoff—the value in the text box is the number of clusters that is formed at or below the merging distance specified in the text box. The merging distance is the distance at which two clusters are merged into a single cluster, so the cutoff gives the minimum distance between clusters.
-
- Incorporate results option
-
Select this option to incorporate the results as a set of entry groups in the Project Table. The entry groups are named jobname
_n, where n is the cluster index.