How SiteMap Evaluates Sites
This stage uses the site-point groups produced in the site-finding stage and the grids produced in the mapping stage to evaluate the sites in terms of a number of properties. The same modifications to van der Waals radii and formal-charge contributions and the same definition of hydrophobicity are used as in the mapping stage. The properties for each site are added to the Maestro file for the site and recorded in the log file.
To minimize grid errors, the contact, phil, and don/acc SiteMap properties are calculated explicitly as average values computed at the site-point positions (including extension points), but the more complicated phob property is obtained by interpolation from the phobic grid file produced in the site-visualization step.
To make it easy to recognize sites that appear to be unusually favorable or deficient, key properties are expressed relative to the average value found for a large number of tight-binding (≤ 1 μ M) sites. The procedure by which this average was obtained is described in SiteMap Calibration. The properties and their use are described below.
The SiteScore is based on a weighted sum of several of the properties that are discussed below:
SiteScore = 0.0733 sqrt(n) + 0.6688 e - 0.20 p
where n is the number of site points (capped at 100), e is the enclosure score, and p is the hydrophilic score, and is capped at 1.0 to limit the impact of hydrophilicity in charged and highly polar sites. This score is constructed and calibrated so that the average SiteScore for 157 investigated submicromolar sites is 1.0. Thus, a score of greater than 1 suggests a site of particular promise. A SiteScore of 0.80 has been found to accurately distinguish between drug-binding and non-drug-binding sites (see SiteMap Calibration).
Dscore uses the same properties as SiteScore but different coefficients:
Dscore = 0.094 sqrt(n) + 0.60 e - 0.324 p
For Dscore, the hydrophilic score is not capped. This one of the keys for distinguishing “difficult” and “undruggable” targets from “druggable” ones [8]. The use of different functions for binding-site identification and for classifying druggability is justified because these are different, and sometimes conflicting, tasks. For example, ligands that bind to the PTP1B phosphate pocket with nanomolar, and even subnanomolar, affinity are known [9]. But these highly active ligands have charge structures like those of the natural phosphate substrate and are not drug-like. SiteMap should recognize that such a site can bind ligands tightly but should not rate it as druggable.
The number of site points that make up the site is a measure of the size of the site. As a rough rule of thumb, 2 to 3 site points typically correspond to each atom of the bound ligand, including hydrogens. The size of the site is often a good indicator of the preferred binding site.
These two properties provide different measures of how open the site is to solvent.
To evaluate the exposure property, “extension” site points are added on the 1-Å grid. These points must lie within a given distance in x, y, or z from an original site point (by default 3 Å), and must make good contact with the receptor or lie at least 4 Å from the nearest protein atom. The value of the property is the ratio of the number of extension points to the number of original plus extension points. A shallow, open site would allow many more site points to be added, giving a high exposure score. The lower the score, the better; the average for the tight-binding sites investigated is 0.49.
To evaluate the enclosure property, radial rays are drawn from the site points to sample all possible directions. The enclosure score is the fraction of rays that strike the receptor surface within a distance of 10 Å, averaged over the original and the extension site points used in the exposure evaluation. The receptor surface is the same surface that was used to classify grid points as outside or inside the protein in the site-finding step. Here, higher scores are better, with the average enclosure score for a tight-binding site being 0.78.
The contact property measures how strongly the average site point interacts with the surrounding receptor via van der Waals nonbonded interactions, when the site point is given nominal van der Waals parameters. The contact score has been calibrated so that the average score for a tight-binding site is 1.0.
These properties, labeled phob and phil, measure the relative hydrophobic and hydrophilic character of the site. The balance property expresses the ratio of the two. The phobic and philic scores have been calibrated so that the average score for a tight-binding site is 1.0. The average balance score for the investigated tight-binding sites, on the other hand is 1.6, not 1.0, because sites that have high phobic and low philic scores make large contributions to the average.
This property, labeled don/acc, indicates the degree to which a well-structured ligand might be expected to donate, rather than accept, hydrogen bonds, as inferred from the sizes and intensities of donor and acceptor SiteMap regions.
When a supplied ligand or other species is used to define the region of the receptor to be mapped, refdist, refmin, refavg, and sitemin properties are also computed. The first of these specifies the distance between the centroid of the site points and the centroid of the reference ligand. The second specifies the closest approach of a site point to a ligand atom. Both are given in angstroms.
The sitemin property is the smallest distance between an atom of the reference species used to define the site and the site-point centroid. If that reference species is the co-crystallized ligand, 4 Å and less is normally taken as a “hit”, by analogy to other practice in the literature. Larger values sometimes occur for cases in which the minimum distance of a reference atom to an individual site point (refmin) is small, showing that the site-point set does at least partly cover the reference ligand, because the site-point set is large and extends asymmetrically from the region occupied by the reference species.
In some cases, a small refmin value (typically < 1 Å) is accompanied by a moderately large refdist of 5 – 10 Å. These are cases in which the site extends asymmetrically beyond the reference ligand in one or more directions. In an endoprotease, such extensions may well map the channels that bind the N-terminal and C-terminal strands of the peptide undergoing cleavage, and hence are to be expected. These extensions are of interest because they may represent regions that a tight-binding ligand might usefully probe.
The volume of a protein site is well defined when the site is fully enclosed by the protein. More commonly, however, the site is open to solution on one or more sides. To assign the volume in such a case, what needs to be decided, as one proceeds outward from the protein surface, is where to stop counting. Understandably, different criteria will yield different site volumes.
SiteMap’s approach approximates the “shrink-wrap” volume of the site by excluding regions that protrude too far into the solvent. This is accomplished by first identifying all points on the cubic mapping grid that lie within 4 Å of any site point and are outside the protein surface. By default, the grid spacing is 0.7 Å. A large number of radial rays are then drawn from each candidate volume point, and those for which fewer than 60% of the rays strike the protein surface within 8 Å are removed. The volume of the site is then computed from the number of remaining volume points and the grid-box volume, which is (0.7 Å)3 in the default case.
If you run calculations from the command line, the mapping points considered to lie within the shrink-wrap can be saved by including the option -keepvolpts. This option returns a pdb-format file for each site that contains the calculated volume points. You can then visualize these points by importing the file.