Spectral Clustering Panel
Cluster the selected entries in the Project Table panel by the spectral method, in which a similarity matrix is diagonalized and cluster memberships are determined from the weights of each molecule in each eigenvector. Results can be reported as cluster memberships, or cluster contributions.
To open this panel: click the Tasks button and browse to Discovery Informatics and QSAR → Spectral Clustering.
To open this panel from the entry group for the results of a Glide docking job
.
- Using
- Features
- Additional Resources
Using the Spectral Clustering Panel
In the spectral clustering method, a similarity matrix based on a set of fingerprints is set up and diagonalized. The similarity matrix can be filtered beforehand with a Gaussian weight function, which reduces the similarity values. The eigenvalues and eigenvectors are then used to determine the cluster membership.
The eigenvectors of this matrix can be used to define clusters: if the original similarity matrix is fairly well blocked, then the eigenvectors of the similarity matrix will only have important contributions from molecules that are in the same block. The cluster membership is determined by applying a threshold to the squared eigenvector elements or weights (corresponding to molecules). If this value is above the threshold, the molecule is regarded as belonging to the cluster.
The eigenvalues represent the cohesiveness of the cluster. If the similarity matrix is block diagonal and all the molecules in a block are identical, there is one nonzero eigenvalue for each block whose value is the size of the block (cluster), and the rest of the eigenvalues are zero. The population of the block is therefore equal to the eigenvalue. If the molecules in a block are not identical, the largest eigenvalue is smaller than the maximum possible, and the smallest eigenvalue is larger than zero. The closer are the largest and smallest eigenvalue, the greater is the spread of similarities within the block. Each eigenvector then contributes a fraction of the cluster members, proportional to the eigenvalue. The difference between the largest eigenvalue and the number of cluster members is a measure of how dissimilar the molecules in the cluster are.
To assign the cluster memberships, the eigenvalues are taken in turn from the highest to the lowest, and for each eigenvector, the unassigned molecules whose weight is greater than the threshold are assigned to a cluster. Eigenvalues that are less than a threshold are ignored, because their contributions should already be included in a cluster. Any molecules that are unassigned at the end of this procedure are assigned to a "leftover" cluster.
Spectral Clustering Panel Features
Fingerprints tab
In this tab, you set up the fingerprint calculation.
- Precision options
-
Choose the fingerprint precision. A higher number of bits reduces the chance of collisions; with 64-bit fingerprints, collisions should be extremely rare. A higher number of bits also means that the calculation takes longer to run.
- Fingerprint type option menu
-
Choose the type of fingerprint to calculate. Radial, dendritic, or MolPrint2D often give the best results.
- Atom typing scheme list
-
Choose an atom typing scheme for the fingerprint calculation. If you choose a more specific atom typing scheme, the fingerprints for each molecule will be more distinct, and the similarities between molecules will be smaller.
Similarity tab
In this tab you choose the metric to be used when calculating the similarity matrix. You must generate fingerprints for all of the ligands before you calculate similarities: any ligands that don't have fingerprints are silently ignored.
- Similarity metric option menu
-
Choose the metric that is used for calculating similarities. See canvasFPMatrix for a list of metrics and their definitions.
- Tversky alpha and Tversky beta text boxes
-
Specify the alpha and beta parameters if the Tversky metric is chosen from the Similarity metric option menu.
Cluster tab
In this tab you calculate and apply the clustering.
- Similarity Matrix Filter Factor text box
- Calculate Clustering button
- Assign entries to clusters option
- All cluster data above thresholds option
- Lambda cutoff text box
- Minimum cluster contribution text box
- Similarity Matrix Filter Factor text box
-
Set the exponent of the Gaussian function that is used to filter the similarity matrix. The filtered matrix elements are given by
Sij exp(-α (Sij-1)2
- Calculate Clustering button
-
Run the clustering calculation, and add properties to the Project Table according to the options and cutoffs selected below.
- Assign entries to clusters option
-
Assign entries to clusters by considering the weight of the entry in the eigenvectors, as described above. The result is a Cluster Index property, which reports the index of the cluster that each entry is assigned to, the weight of the entry in its assigned cluster (Cluster Contribution), and the cohesiveness (largest eigenvalue) of the cluster (Cluster Cohesiveness).
- All cluster data above thresholds option
-
Report the weight of each cluster for each eigenvector that is above the threshold specified in the Lambda cutoff text box, as Cluster Contribution[l=eigenvalue].
- Lambda cutoff text box
-
Specify the eigenvalue cutoff for discarding unimportant eigenvectors.
- Minimum cluster contribution text box
-
Specify the minimum weight that an entry must have to be considered part of a cluster.