Hypothesis Settings Dialog Box

Make settings that determine how the hypothesis is built, what its structure is, how it is scored.

To open this dialog box, click Hypothesis Settings in the Develop Pharmacophore Model panel.

Features
Additional Resources

Hypothesis Settings Dialog Box Features

Features tab
- Ligand-based and residue-based workflows
- Receptor-based (e-pharmacophore) workflows
Scoring tab
Excluded Volumes tab
Reset button

The features in this panel depend on which workflow you were in when you opened the panel (i.e. the choice from the Create pharmacophore model using option menu in the Develop Pharmacophore Model panel). The descriptions below include all features for all workflows, and are grouped and labeled by workflow.

Features tab

In this tab you can adjust the hypothesis and feature criteria that are used for constructing a hypothesis. Most of these are only present when multiple ligands are used to create the pharmacophore model.

Ligand-based and residue-based workflows

The following features are only available in the multiple ligands workflow.

Hypothesis should match at least N of M actives text box

Specify the percentage of the actives that should match the hypothesis. The number of hypotheses found decreases as you increase the percentage. As it is possible that actives bind in different modes, a specific hypothesis might not cover all of the actives. For example, a receptor might have a donor with two hydrogens, and one active binds through one hydrogen, another binds through the other hydrogen.

Number of features in the hypothesis N to M boxes

Specify the range in the number of features in the hypothesis. A larger range produces more hypotheses. Providing a range allows the optimum number to be determined in the search by the scoring.

Preferred minimum number of features option and box

Specify the preferred minimum number of features in the hypothesis. This can be different from the minimum in the range.

Hypothesis difference criterion text box

Specify the cutoff for determining when two hypotheses are considered to be the same. To determine whether two hypotheses of the same composition (e.g. ADHRR) are the same, they are aligned in every possible configuration. The maximum distance between any two corresponding sites is determined for each configuration. If the smallest value of the maximum distance in any configuration is less than the value specified in this text box, the hypotheses are considered to be the same (redundant). The hypothesis with the best survival score is kept and the other is discarded. If they are not redundant, the new hypothesis is inserted in the ranked list of hypotheses.

Return (number of features) x N hypotheses text box

Specify the number of hypotheses to return as a function of the number of features. The number of hypotheses returned is the number specified here times the number of features in the hypothesis.

Feature presets option menu

Set options for the way in which features are treated. For the first three options, vector features are treated the same as nondirectional features. When screening ligands, the vector score component of the score is set to zero if any of these options is selected; in other words, information on orientation of features is not used in scoring.

Treat aromatic ring as hydrophobic—Treat aromatic rings as if they were hydrophobic features.
Treat negative charge as acceptor—Treat acceptors as though they were negatively charged features.
Treat positive charge as donor—Treat donors as though they were positively charged features.
Replace vectors with projected points (acceptors and donors)—Represent the feature in terms of the point at which the matching feature is projected to be located rather than the direction in which the matching feature should be located. See Using Projected Points for more information.

Exclude features from hypothesis option

Exclude selected features on the ligands from the development of hypotheses. When you select this option the Select Features button is enabled, so you can select the features. This option is useful when you have a set of congeneric ligands and there are common features that you do not want to include in the hypothesis. You can use the View column in the feature limits table to show only the feature types that you want to pick for exclusion.

Select Features button

Select features in the Workspace to exclude from hypothesis development. Opens the Select Features to Exclude Dialog Box.

The following features are available in the single and multiple ligands workflows and the protein residues workflow.

Feature limits table

This table lists the possible features, and allows you to set limits on the number of features allowed in a hypothesis and the tolerance for matching a feature.

View	Checkboxes for viewing the feature types in the Workspace. They are all checked when the panel opens. Hiding features may make it easier to pick them in the Workspace for exclusion from hypotheses. All features are redisplayed when you close this dialog box. This column is only present in the multiple ligands workflow.
Features	Feature name, letter, and color of feature in the Workspace. Noneditable.
Minimum	Minimum number of features of the given type allowed in any hypothesis. Editable: you can set a nonzero value to require a certain number of this type of feature in the hypothesis. The default is zero.
Maximum	Maximum number of features of the given type allowed in any hypothesis. Editable: you can set the value to allow more or less of the given feature in the hypothesis. The default is 3, even if the number of features available is higher. This default is chemically reasonable in most cases, and reduces the size of the search.
Tolerance	The tolerance in angstroms for a feature of the given type to match the hypothesis. The tolerance defines a sphere of the specified radius around the site point for the feature (i.e. the centroid of the feature) in the hypothesis. The corresponding site point on the aligned ligand must lie within this sphere for the ligand feature to match the hypothesis feature.

Edit Features button

Edit the definitions of the features, which are defined in terms of SMARTS patterns. This is mainly useful if the feature you are interested in is not detected. Opens the Edit Features dialog box.

Receptor-based (e-pharmacophore) workflows

The following features are available only in the receptor-ligand complex and receptor cavity workflows. These workflows use Glide XP docking of the ligand or fragments and evaluate contributions to binding from the XP descriptors.

Maximum number of features text box

Specify the maximum number of features to add to the pharmacophore model. Features are ranked by their contributions to binding, and the features with the largest contributions to binding are chosen.

Definitions to use options

These options allow you to specify how to treat donor atoms and how the pharmacophore features are defined.

Donors as vectors—Treat donors as vector features, in which the bond from the donor heavy atom to the hydrogen is treated as a vector along which the acceptor must lie. This implies that the acceptor can only form a hydrogen bond with a donor that lies along this vector.
Donors as projected points—Treat a donor as a point feature, located where the corresponding acceptor would be found in a complex, here set to 1.4Å from the H atom along the X-H bond. This implies that the acceptor can accept a hydrogen bond with donors from other directions and in other locations. For the purpose of developing a common pharmacophore hypothesis, this choice allows matching of donors as belonging to the same feature even if the donor atoms themselves are not in the same location, because the corresponding acceptor is in the same locaction.
Custom feature file—Specify the pharmacophore feature set to use, by reading a custom feature file. Click Browse to open a file selector, in which you can navigate to the file and open it.

Minimum feature-feature distance text box

Specify the minimum distance allowed between two pharmacophore features of different type. This distance can be used to screen out features on atoms that are bonded to each other, for example.

Minimum feature-feature distance for features of the same type text box

Specify the minimum distance allowed between two pharmacophore features of the same type.

The following features are only available in the receptor cavity workflow, in which fragments are docked to a receptor in the absence of a ligand.

Perform clustering option

Cluster the docked fragments spatially using hierarchical clustering, and choose the best-scoring fragment from each cluster to define the features.

Create N clusters text box

Specify the number of clusters to use for hypothesis generation. The default is 15.

Remove non-contributing fragments option

Remove any fragments that do not contribute to binding before choosing features.

Type of ligand efficiency normalization options

Specify the definition of ligand efficiency used to normalize the docking score. Scores are compared after dividing by the chosen ligand efficiency value.

None—Don't use any ligand efficiency normalization.
Atom—Use the number of heavy atoms to normalize the scores.
MW—Use the molecular weight to normalize the scores.
Natural log—Use (1 + ln n_heavy) to normalize the scores, where n_heavy is the number of heavy atoms.

Scoring tab

Set options and parameters for the scoring of the hypotheses. The score is used to rank-order the hypotheses that are returned. This feature is only relevant when multiple ligands are used to develop a pharmacophore model, so it is only present for the multiple ligands workflow.

Scoring function option menu

Choose the scoring function to use. There are two options, Phase Hypo Score and Custom. The second allows the inclusion of a score that is mostly based on the alignment of the ligands to the hypothesis and to each other. The custom function is the default if you selected Use prealigned ligands in the Develop Pharmacophore Model Panel.

Actives and decoys section

In this section you can choose to override the default active and decoy sets used for scoring hypotheses. This can be done for both scoring functions.

Use custom active and decoy test sets option: Select this option if you want to use your own set of actives and decoys for scoring the hypothesis (using the BEDROC score), rather than the built-in defaults.
Actives text box and browse button: Enter the name of the file that contains the active compounds in the text box, or click the browse button (...) and navigate to the file. All structures in this file are used as actives.
Decoys text box and browse button: Enter the name of the file that contains the decoy compounds in the text box, or click the browse button (...) and navigate to the file. All structures in this file are used as decoys.

Scoring formula section

Adjust the weights in the formula used to score the hypotheses. This section is only shown if you choose Custom from the Scoring function option menu.

Phase Hypo Score text boxes

Set weights for the components of the Phase Hypo Score. The BEDROC score measures how well the hypothesis extracts a set of active ligands from a set of decoys.

Survival Score text boxes

These text boxes define the survival score of the hypotheses. The definition and possible range for each score is given below. The first five scores are combined as a scoring function and applied for the active ligands to give the active score. The same scoring function is used for the inactive ligands, and the inactive and active scores can be combined to give the survival score.

vector score

This score measures how well the vectors for acceptors, donors, and aromatic rings are aligned in the structures that contribute to this hypothesis, when the structures themselves are aligned to the pharmacophore. The vector score is the average cosine of the angles formed by corresponding pairs of vector features (acceptors, donors, and aromatic rings).

Possible range is from -1.0 (perfect anti-alignment) to 1.0 (perfect alignment).

site score

This score measures how closely the site points are superimposed in an alignment to the pharmacophore of each structure that contributes to the hypothesis. The site score for each ligand is calculated from the alignment score by

site_score(i) = 1.0 − alignScore(i)/alignCutoff

The site scores for the ligands are averaged to obtain the overall site score. The alignment score is the RMS deviation of the site points of a ligand from those of the reference ligand. The cutoff is 1.2 Å.

Possible range is 0.0 (alignment at RMSD threshold) to 1.0 (perfect alignment).

volume score

Measures how much the volumes of the contributing structures overlap when aligned on the pharmacophore. The volume score is the average of the individual volume scores. The individual volume score is the overlap of the volume of an aligned ligand with that of the reference ligand, divided by the total volume occupied by the two ligands.

Possible range is 0.0 (no volume overlap) to 1.0 (perfect volume overlap).

selectivity score

Estimate of the rarity of the hypothesis, based on the World Drug Index. The selectivity is the negative logarithm of the fraction of molecules in the Index that match the hypothesis. A selectivity of 2 means that 1 in 100 molecules match. High selectivity means that the hypothesis is more likely to be unique to the active ligands. Possible range is from 0.0 upward.

log10(number of matches)

Useful when the required minimum number of actives is smaller than the total number of actives.

Possible range is from 0.0 upward.

inactive score

This is the score for the specified inactive ligands. The score as defined by the five scores and weights given above is applied to the inactive ligands. It is subtracted from the score for the active ligands, i.e. the hypothesis is penalized if the inactives match it. Hypotheses with unusually high inactive scores are more likely to contain features that are just part of some common framework, rather than features that make key interactions with the protein.

Possible range is from 0.0 upward. The inactive score is only included in the survival score if you select Include inactive compounds.

Include inactive compounds option

Include the score for the inactive compounds in the survival score. The text box for the inactive score is enabled when you select this option. This allows you to generate survival scores with or without the inactives penalty.

Excluded Volumes tab

In this tab you make settings for addition of excluded volumes around the hypothesis. This tab is only present for the ligand-based and receptor-based workflows. You can add, edit, and delete excluded volumes in the Manage Excluded Volumes Panel.

Create excluded volume shell option (single)

Select this option to add excluded volumes to the hypothesis. The shell is a set of overlapping spheres that are placed around the hypothesis; the overlap defines a surface that is intended to mimic the receptor surface in whole or in part. When you select this option, the other options are made available.

Ligand-based workflows

In this workflow, excluded volumes are placed around the ligand or the ligand set. The options allow you to control the sphere locations and radii. For multiple ligands there are two choices, based on either the active molecules or active and inactive molecules. For a single ligand the volumes are placed with the same algorithm as the Actives choice for multiple ligands.

Create shell from options

Choose an option for which molecules are used to create excluded volumes. These options are only available for the multiple ligand workflow. The choice of active ligands should be as structurally diverse as possible, to avoid placing excluded volumes in regions that have no effect on activity (such as non-critical solvent regions).

Actives—place excluded volumes in a shell around the ligands designated as active.
Actives and inactives—place excluded volumes in any region of space in which inactives have atoms but actives do not .

Minimum number of inactives that must experience a clash text box

Specify the minimum number of inactives that must have atoms in a region that is not occupied by actives for an excluded volume sphere to be placed in that region.

Minimum distance between active surface and excluded volumes text box (single)

Specify the minimum distance between the van der Waals surface of any active ligand and the surface of an excluded volume sphere, in angstroms. This buffer distance can be considered to simulate receptor flexibility.

Excluded volume sphere radii text box (single)

Specify the radius that is to be used for the excluded volume spheres. Using a larger radius produces less spheres, but results in a less well-defined shape for the excluded region.

Receptor-based workflows

In these workflows, excluded volume spheres are placed on receptor atoms. The options allow you to control the sphere locations and radii.

Radii sizes options

Select an option for the radii of the excluded volume spheres.

Van der Waals radii of receptor atoms—Use the van der Waals radii of the receptor atoms as the radii of the excluded volume spheres. This option produces an excluded volume that closely describes the space occupied by the receptor. However, it is a rigid volume, and the receptor has some flexibility. You can use the scaling factors to mimic a small amount of receptor flexibility.
Fixed radius—Specify the radius of all spheres in the text box. This option can help to ensure that receptor atoms are not eliminated from the excluded volume due to being too close to the ligand.
Atom-level property—Use the values of the atom-level property selected from the option menu as the sphere radii. The property must already exist for the atoms of the receptor. This option allows maximum flexibility, but requires the creation of a property with the appropriate radius values. Atoms with a zero or unspecified value of this property are skipped. This feature can be used to select the atoms on which the excluded volumes are placed.

Radii scaling factors options

Scaling the radii down can be used to simulate some amount of receptor flexibility.

Fixed scaling factor—Specify the scaling factor in the text box. This option can be useful when van der Waals radii are used, to mimic some amount of flexibility in the receptor surface.
Atom-level property—Use the values of the atom-level property selected from the option menu to scale the sphere radii. The property must already exist for the atoms of the receptor. This option allows maximum flexibility, but requires the creation of a property with the appropriate scaling factors. Atoms with a zero or unspecified value of this property are skipped. This feature can be used to select the atoms for which scaling is applied.

Ignore receptor atoms whose surfaces are within N Aring; of the ligand surface text box

Excluded volume spheres are not placed on receptor atoms when the sphere surface is closer to the van der Waals surface of the ligand than the specified value. This value allows a buffer space between the ligand and the receptor to simulate receptor flexibility.

Limit excluded volume shell thickness to N Aring; option and text box

Excluded volume spheres are not placed on receptor atoms that are more than the specified distance from any ligand atom other than hydrogen. The outside surface of the excluded volumes is therefore approximately this distance from the ligand surface (depending on the excluded volume radii). Placing spheres further away than the first shell of receptor atoms is not generally useful, as ligands will be eliminated due to occupying some volume in the first shell. In addition, the time taken to evaluate occupation of excluded volumes scales with the number of spheres.

Reset button

Reset all the settings in this dialog box to their defaults.

Tutorials

Ligand-Based Virtual Screening Using Phase

Hypothesis Settings Dialog Box