Canvas Similarity and Clustering Panel

In this panel, you can generate Canvas fingerprints for the selected entries in the Project Table, calculate similarities based on these fingerprints, and cluster the entries using the similarity metric.

To open this panel, do one of the following:

  • Click the Tasks button and browse to Ligand-Based Virtual Screening → Fingerprint Similarity
  • Click the Tasks button and browse to Discovery Informatics and QSAR → Fingerprint Similarity

Canvas Similarity and Clustering Panel Features

Fingerprints tab

In this tab, you set up the fingerprint calculation.

Precision options

Choose the fingerprint precision. A higher number of bits reduces the chance of collisions; with 64-bit fingerprints, collisions should be extremely rare. A higher number of bits also means that the calculation takes longer to run.

Fingerprint type option menu

Choose the type of fingerprint to calculate. Radial, dendritic, or MolPrint2D often give the best results.

Atom typing scheme list

Choose an atom typing scheme for the fingerprint calculation. If you choose a more specific atom typing scheme, the fingerprints for each molecule will be more distinct, and the similarities between molecules will be smaller.

Similarity tab

In this tab you can calculate the similarity of one or more ligands to a set of ligands. The ligands that you want to use as references must be included in the Workspace, and the ligands for which you want to calculate similarities must be selected in the Project Table. You must generate fingerprints for all of these ligands before you calculate similarities: any ligands that don't have fingerprints are silently ignored.

Similarity metric option menu

Choose the metric that is used for calculating similarities. See canvasFPMatrix for a list of metrics and their definitions.

Tversky alpha and Tversky beta text boxes

Specify the alpha and beta parameters if the Tversky metric is chosen from the Similarity metric option menu.

Calculate Similarity button

Calculate the similarities. The similarities are reported as properties in the Project Table for the selected entries. If only one entry is included in the Workspace, the property name is Canvas metric Similarity. If more than one entry is included in the Workspace, five properties are added:

  • Canvas Mean metric Similarity—Average of similarities to all included entries
  • Canvas Max metric Similarity—Similarity of the most similar included entry
  • Canvas Min metric Similarity—Similarity of the least similar included entry
  • Canvas Max metric Similarity ID—the entry ID of the most similar included entry
  • Canvas Min metric Similarity ID—the entry ID of the least similar included entry
Sort selected entries by similarity option

Sort the selected entries in the Project Table by the similarity values, from highest to lowest. If you have more than one ligand in the Workspace, the maximum similarity is used for sorting.

Cluster tab

In this tab, you can cluster the selected entries by their similarity values, using the fingerprints and the settings in the Similarity tab. Hierarchical agglomerative clustering is used. Once clustering is done, you can examine the clustering statistics, a dendrogram of the clustering, and the distance matrix, and you can apply the clustering to the selected entries, to create groups or create index and size properties.

Linkage method option menu

Specify the linkage method. These are the methods used by Canvas for hierarchical clustering.

Calculate Clustering button

Perform the hierarchical clustering calculation.

Clustering Results section

This section gives access to the results of the clustering. The cluster strain and the best number of clusters are reported. Detailed results can be viewed by clicking the buttons, described below.

Clustering Statistics button

Display a plot of various statistics of the clustering as a function of the number of clusters, in the Clustering Statistics panel. The statistics are: Kelley penalty, R-squared, Semipartial R-squared, Merge distance, Separation ratio. You can click in the plot to set the number of clusters in the Number of clusters text box.

Dendrogram button

Display a dendrogram of the hierarchy of clusters, in the Dendrogram panel. You can click in the plot to set the number of clusters in the Number of clusters text box.

Distance matrix button

Display the distance matrix used for clustering graphically, with values represented by a color map, in the Distance Matrix panel. You can display the matrix in cluster order (as shown in the Dendrogram panel) or in the original (input) order. You can click in the plot to display the 2D structures in the panel, and optionally in the Workspace.

Apply Clustering section

In this section, you can apply the clustering to the selected entries in the Project Table for a particular number of clusters.

Number of clusters text box

Specify the number of clusters to use when applying the clustering results to the selected entries.

Create options

Specify the action to be taken when applying the clustering.

  • Duplicate entries to a new group for each cluster—Create new groups for each cluster, with titles set to Cluster N (or N_M if clustering has been applied M times), duplicate all the selected entries, and put the duplicates into these new groups.
  • Move entries to a new group for each cluster—Create new groups for each cluster with titles set to Cluster N (or N_M if clustering has been applied M times), and move the entries from their current location into the new groups.
  • A group containing the structures nearest the centroid in each cluster—Create a new group, entitled Representative Entries (with a suffix M if clustering has been applied M times) , and move the structure that is nearest the centroid in each cluster to this group.
  • Cluster index and size properties for each entry—Create two new properties for each entry that record which cluster the entry belongs to (the index) and how big the cluster is.
Apply Clustering button

Apply the clustering to the selected entries in the Project Table using the option chosen under Create.