canvasFPMatrix

This utility generates a pairwise similarity or distance matrix using binary or scaled fingerprints from one or two sets of molecules. See canvasFPMatrix Command Help for syntax and options.

The metrics are listed in Table 1, along with their type and formula. The quantities in the formula are defined in Table 2, except for the Tversky α and β parameters.

Table 1. Available metrics for the canvasFPMatrix command

Index

Name

Type

Formula

1

buser

similarity

(sqrt(cd)+c)/(sqrt(cd)+a+b-c)

2

cosine

similarity

c/sqrt(ab)

3

dice

similarity

2c/(a+b)

4

dixon

distance

(A+B)^2/(a+b-c)

5

euclidean

distance

sqrt(A+B)

6

hamann

similarity

(c+d-A-B)/N

7

hamming

distance

A+B

8

kulczynski

similarity

0.5(c/a + c/b)

9

matching

similarity

(c+d)/L

10

mcConnaughey

similarity

(c^2-(a-c)(b-c))/(ab)

11

minmax

similarity

sum{min(a,b)/max(a,b)}

12

modifiedTanimoto

similarity

α *T1 + (1.0-α)*T0

13

patternDifference

distance

AB/N^2

14

pearson

similarity

(cd-AB)/sqrt(ab(A+d)(B+d))

15

petke

similarity

c/max(a,b)

16

rogersTanimoto

similarity

(c+d)/(2(a+b)-3c+d)

17

shape

distance

(A+B)/N - ((A-B)/N)^2

18

simpson

similarity

c/min(a,b)

19

size

distance

((A-B)/N)^2

20

soergel

distance

(A+B)/(A+B+c)

21

tanimoto

similarity

c/(a+b-c)

22

tversky

similarity

c/(α(a-c)+β(b-c)+c)

23

variance

distance

(A+B)/(4N)

24

yule

similarity

(c*d - A*B)/(c*d + A*B)

 

Table 2. Variables used in metric formulae

Variable

Description

a

Number of bits that are on in structure 1

b

Number of bits that are on in structure 2

c

Number of bits that are on in both structure 1 and structure 2

d

Number of bits that are off in both structure 1 and structure 2

A

Number of bits that are on in structure 1 but not in structure 2. A = a - c

B

Number of bits that are on in structure 2 but not in structure 1. B = b - c

L

Total number of bits. L = a + b - c + d

N

Restricted total number of bits. N = a + b - c + min(d,10000)