Target Analysis with SiteMap and WaterMap
Tutorial Created with Software Release: 2025-2
Topics: Small Molecule Drug Discovery , Structure Prediction & Target Enablement
Products Used: SiteMap , WaterMap
|
1.0 GB |
This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayed
Abstract:
In this tutorial, you will learn how to perform a target analysis on a protein-ligand complex using SiteMap and WaterMap. You will identify and evaluate potential binding sites as well as determine and analyze the hydration in a binding site.
Tutorial Content
-
Introduction to Target Analysis
1. Introduction to Target Analysis
For effective computational drug discovery, particularly structure-based methods, detailed knowledge of the binding site of a target is highly advantageous. Although ligand-based virtual screening can be performed without structural information, structure-based approaches typically require a structure that includes a relevant bound ligand, or at least a clearly defined binding pocket. As part of a target analysis workflow, after determining, refining, and preparing a target so that any structural issues are resolved, the next step is to explore the druggability and hydration of possible binding sites.
SiteMap is a tool for identifying and evaluating protein binding sites, with or without a known binder. When the location of a protein-ligand (orthosteric, allosteric) or protein-protein binding site is unknown, SiteMap’s identification mode can predict potential locations. Additionally, sites can be mapped, scored, and visualized to facilitate understanding of how well existing ligands fit a binding site both in shape and interactions, and how extending ligands into nearby areas could enhance binding. This understanding can also guide the modification of compounds to design improved ligands in a lead discovery and optimization context.
Hydration of the binding site, meaning the placement of water molecules in the binding pocket, can significantly influence ligand binding, as the solvent is a direct competitor in ligand or substrate binding. The displacement of unstable water molecules can lead to large gains in potency. A tool to determine and analyze the thermodynamic properties of water molecules is WaterMap. It is based on molecular dynamics (MD) simulations to sample the placement of water molecules and to calculate their thermodynamic properties, which is not possible from just a single static structure such as a crystal structure. The MD engine for WaterMap is Desmond with the underlying force field OPLS4. Results from WaterMap can help to assess the druggability of a binding site and drive ligand modifications in a lead discovery and optimization process.
In this tutorial, you will apply SiteMap and WaterMap via the Maestro graphical user interface (GUI) to identify and evaluate ligand binding sites and their hydration thermodynamics. You will work on an ABL kinase (PDB-ID: 2HYY) co-crystalized with the inhibitor Imatinib (Gleevec), used in the treatment of chronic myelogenous leukemia (CML) and gastrointestinal stromal tumors, among others.
2. Creating Projects and Importing Structures
At the start of the session, change the file path to your chosen Working Directorythe location where files are saved in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is created, the project is automatically saved each time a change is made.
Structures can be imported directly from the PDB, or from files in your Working Directorythe location where files are saved, and are automatically added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.
- Double-click the Maestro icon
- (No icon? See Starting Maestro)
- Go to File > Change Working Directory.
- The Change Directory panel opens.
- Browse to the directory you want to use as your Working Directorythe location where files are saved, and click Select Folder.
- Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/target_analysis.zip.
- After downloading the zip file, unzip the contents in your Working Directorythe location where files are saved for ease of access throughout the tutorial.
- Go to File > Save Project As.
- Change the File name to Target_Analysis_2HYY, and change the Location to your Working Directorythe location where files are saved, then click Save.
- The project is now named
Target_Analysis_2HYY.prjand is saved in your Working Directorythe location where files are saved.
- The project is now named
- Go to File > Import Structures….
- The Import dialog box opens.
- Navigate to your Working Directory, select the file
2HYY_prepared.maeand select Open. Confirm by clicking Import.- A banner appears and a group is added to your Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.
Feel free to familiarize yourself with the protein before moving on, e.g. by choosing a visualization style that is most helpful to you.
Structure files obtained from the PDB, vendors, and other sources often lack necessary information for performing modeling-related tasks. Typically, these files are missing hydrogens, have incorrect or missing bond order assignments, charge states, side chain orientation, or are missing loop regions. In order to make these structures suitable for modeling tasks, the Protein Preparation Workflow is used to resolve common structural issues.
In this tutorial, the system (Chain A of the PDB entry 2HYY) has already been prepared. If you are following along with your own structure, make sure it is fully prepared before progressing to the next section. The Introduction to Structure Preparation and Visualization tutorial as well as the Best Practices for Protein Preparation can guide you through the basics of the process. For more information tailored to MD applications, see the Introduction to All-Atom Molecular Dynamics Simulations with Desmond tutorial.
For WaterMap calculations, as for all MD based tools, it is advisable to retain all crystallographic waters during the preparation stage. In WaterMap calculations, you have the option to specify that existing waters should be treated as solvents. This allows you to start with these waters in the right place, while fully solvating the rest of the system. With this setting, both the crystal waters and the other solvent molecules are treated in the same way, and both are allowed to move freely during the simulation. However, if there is any doubt about one or more waters, it is advisable to delete them and let WaterMap place them instead. For apo WaterMap jobs, it is important to note that some of the crystallographic waters are where they are due to interactions with the ligand, which is not present during the simulation. In such a case, starting with a dry complex allows the system setup step in WaterMap to fill the ligand vacancy as needed.
3. Identifying and Evaluating Binding Sites with SiteMap
In this section, you will learn how to run SiteMap from the Maestro graphical user interface via the SiteMap panel and how to analyze and interpret the results. You can run SiteMap in two different modes, either for the identification of probable binding sites on an entire apo protein, or for the evaluation of only a single known binding site or defined region, e.g. on a holo protein.
A SiteMap calculation consists of three stages. In the search stage, sites are identified via groups of so-called site points on a grid, which must be located outside but near the surface of the receptor and must be sufficiently enclosed. In the subsequent mapping stage, contour maps are generated based on interactions with the receptor. The final evaluation stage concludes the SiteMap job by calculating various properties to determine whether a site is likely to be able to accommodate a drug-like binder favourably.
As an alternative to the GUI, SiteMap can be run from the command line, which provides more options for customizing your jobs. For more information on usage and options, see the SiteMap Command Help or the SiteMap Command Reference Manual.
3.1 Identifying Top-Ranked Binding Sites
In order to find, map and evaluate potential binding sites SiteMap requires a prepared structure without any ligands, solvent and other molecules present. Everything present in the workspace will be considered as part of the receptor.
- Right-click on the entry 2HYY - prepared - holo, go to Split > Into Ligands, Water, Other.
- The original entry is split into individual entries for the ligand, waters, and the protein.
- Double-click the 2HYY - prepared - holo_protein entry and rename it to 2HYY - prepared - apo.
- Includethe entry is represented in the Workspace, the circle in the In column is blue the entry 2HYY - prepared - apo.
- Go to Tasks > Browse > Structure Analysis > Binding Site Detection….
- The SiteMap panel opens.
- For Task, choose Identify top-ranked potential receptor binding sites.
- Change the job name to sitemap_2HYY_apo.
-
Click run to start the job.
- The job takes ~ 1 minute.
- A banner appears when the job has been incorporatedonce a job is finished, output files from the working directory are added to the project and shown in the Entry List and Project Table and a new group is added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.
Note: You can find pregenerated results of this job in the zip file for this tutorial (sitemap_2HYY_apo). You can import them via File > Import Structures… and select the file sitemap_2HYY_apo_out.maegz.
The settings section provides some controls for the site search, mapping, and evaluation, such as the required number of site points for a site to be reported and the maximum number of returned sites. You can change the grid size for the evaluation stage (standard of 0.7Å). These settings can be used to tune the SiteMap results and adapt SiteMap for your specific target. Additionally, there is an option to find shallow binding sites, which are often found at protein-protein interaction interfaces, which decreases the required amount of enclosure.
In this example, SiteMap has returned four probable binding sites, which you will analyze in the following.
- In the top right corner of the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion, click the three vertical dots to open the settings.
- Click on Show Properties….
- The Show Properties dialog box opens.
- Click Choose, then search for and select the following properties from the list: SiteScore, DScore, volume, balance.
- Click OK.
- The selected properties are shown in the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.
The identified binding sites are returned and ranked by SiteScore. This property is useful to distinguish between molecule-binding and non-molecule-binding sites and incorporates the number of site points, the degree of enclosure, and the hydrophilicity. In contrast, the DScore (druggability score) can be used to distinguish between druggable and non-druggable binding sites. Although it is based on the same properties, the main difference lies in the fact that the hydrophilic term is not capped. In principle, a site can bind molecules tightly (have a good SiteScore), but still be non-druggable (have a bad DScore), because drug-like molecules are unlikely to bind. SiteScores and DScores above 1.0 are considered good, as they are calculated relative to the average of a large number of tight-binders. The volume gives an idea of the ability of a site to accommodate a ligand of a certain size, which usually is considered to be between 225 and 600 Å3. The balance is a measure of whether a site contains a good mixture of hydrophilic and hydrophobic regions for a ligand to bind to and values above 0.5 are considered good. Find more information on how the various properties available from SiteMap results are calculated here.
- Double-click the includethe entry is represented in the Workspace, the circle in the In column is blue circle of the entries sitemap_2HYY_apo_protein and 2HYY - prepared - holo_ligand to fix both entries in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
- Optional: Choose a visualization that is most helpful to you. For example, we have displayed the ligand in a green ball-and-stick representation, and the protein as grey ribbons.
- Includethe entry is represented in the Workspace, the circle in the In column is blue one site found by SiteMap at a time and visually analyze them in the workspace. Take into account the corresponding properties in the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.
SiteMap maps every found binding site and the results can be easily visualized in Maestro. Site points are shown as white spheres and depending on the type of target, various site maps are available as surfaces, highlighting hydrophobic regions in yellow and polar regions in green. The polar regions are further divided into hydrogen-bond acceptor regions in red, H-bond donor regions in blue, and metal-binding regions in pink. You can easily display/undisplay specific surfaces via the S behind the site entries and tweak the style of the surfaces via the Manage Surfaces panel.
Here, the top-ranked site corresponds to the active site, to which the cocrystallized inhibitor Imatinib binds. It has a SiteScore, DScore, and balance in the desired ranges. The found site points and different maps align well with the ligand.You can also use a site found by SiteMap for grid generation in Glide docking by selecting one of the site points in the center of the site to define the position of a ligand.
In case your target has flexible or even cryptic binding sites, e.g. which involve side chain reorientations and loop movements (induced fit), more advanced methods for binding site detection such as mixed solvent molecular dynamics (MxMD) can be useful. Learn more in the Exploring Protein Binding Sites with Mixed Solvent Molecular Dynamics tutorial.
3.2 Evaluating a Single Binding Site
- Includethe entry is represented in the Workspace, the circle in the In column is blue the entries 2HYY - prepared - holo_ligand and 2HYY - prepared - apo.
- In case you have closed the SiteMap panel, open it again.
- For Task, choose Evaluate a single binding site region.
- A banner appears in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed, prompting you to click on an atom in the center of the region you want to map.
- Click any atom in the ligand to define the binding site.
- Change the job name to sitemap_2HYY_holo.
-
Click run to start the job.
- The job takes ~ 1 minute.
- A banner appears when the job has been incorporatedonce a job is finished, output files from the working directory are added to the project and shown in the Entry List and Project Table and a new group is added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion.
Note: You can find pregenerated results of this job in the zip file for this tutorial (sitemap_2HYY_holo). You can import them via File > Import Structures… and select the file sitemap_2HYY_holo_out.maegz.
You can choose a buffer distance around the selected atoms to define the region to evaluate the binding site. In case you want to define this region with multiple molecules (e.g. a ligand and a cofactor), create one entry for the receptor and one entry for all molecules to define the region to evaluate.
- Double-click a ligand entry to fix it in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed, then includethe entry is represented in the Workspace, the circle in the In column is blue both sitemap_2HYY_apo_site_1 and sitemap_2HYY_holo_site_1.
- Open the Workspace Configuration panel by clicking on the plus button in the bottom right corner of the Workspace.
- Click Tile by to show both sites next to each other for comparison.
Both binding sites align well with the Imatinib ligand in terms of shape and chemistry. The site generated by the evaluation mode is smaller compared with the results of the identification mode, because it is clearer defined. You can get an idea of possible modifications to the ligand in terms of growth space, as well as the type of chemistry that the added functional groups should exhibit to improve ligand binding. A tool to quickly ideate and check ligand modifications is Ligand Designer, learn more in the Forming Protein-Ligand Interactions with the Ligand Designer tutorial.
4. Assessing Druggability with WaterMap
In this section, you will learn how to run WaterMap from the Maestro graphical user interface via the WaterMap - Perform Calculation panel. Subsequently you will analyze and interpret the results with the WaterMap - Examine Results panel and then visualize the hydration sites in the workspace.
A WaterMap calculation consists of three stages. In the simulation stage, the system is simulated with Desmond. In the subsequent clustering stage, the positions of water molecules are determined from peaks in the water density. The final stage concludes the WaterMap job by calculating various thermodynamic and other properties.
- Includethe entry is represented in the Workspace, the circle in the In column is blue the entries 2HYY - prepared - apo and 2HYY - prepared - holo_ligand to display them in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
- Go to Tasks > Browse > WaterMap > Perform Calculation….
- The WaterMap - Perform Calculation panel opens.
- You are prompted to pick an atom in the ligand to define the binding site. Click any atom in the ligand.
- The ligand highlighted in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
- Unselect the option Truncate protein.
- Change the Job name to watermap_2HYY_apo.
- Optional: Click on the Cog in the bottom of the panel and open the Job Settings Dialog box.
- Optional: Select a suitable GPU Host and click OK.
The WaterMap job is now ready to run. Because it is based on Desmond MD simulations, this job can only be run on a Linux-based host with GPUs (see Desmond System Requirements). Pre-generated result files for this job can be found in the zip file you downloaded at the beginning. Feel free to run the job if you have access to suitable hardware.
Another option to define the ligand binding site is to use selected residues of the receptor. You can change the distance from the defined binding site up to which water molecules will be analyzed. The default of 10Å corresponds to roughly three hydration shells. Truncating the protein can be a reasonable choice for large systems. This option reduces the system size and thus speeds up the job, but should not be considered if allostery is expected to play a role in binding. Existing water molecules can either be deleted or treated as part of the receptor (solute) or solvent. Only the later option includes existing water molecules in the analysis.
- In case you have not run the WaterMap job yourself, load the pregenerated results: File > Import Structures…, then select the file
watermap_2HYY_apo_wm.maegz. - Click on the workflow action menu (W) next to the watermap_2HYY_apo_watermap entry.
- The WaterMap - Examine Results panel opens and the entry is fixed in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
- Explore the Water sites table.
- You can select hydration sites by clicking on a row in the table, or by clicking on Pick to select sites and then clicking on hydration sites in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed.
There are different display options, e.g. displaying/undisplaying the receptor, ligand, the ligand surface, water density, and cavity map. The water site table contains columns for the site index/number and various properties. The occupancy is a measure of how often a water molecule can be found at this position throughout the simulation. The overlap shows the degree of displacement by the ligand. The difference in free energy (ΔG) as well as the enthalpic (ΔH) and entropic (-TΔS) contributions are listed. You can sort the table by a property by clicking on the respective header column.
A hydration site with a high ΔG has a large free energy in comparison to bulk water. These water molecules are unstable and displacing them with a ligand improves the binding. Some water molecules without a large ΔG value, but with a high -TΔS value, simultaneously experience good interactions with the receptor and a large entropic penalty. These water molecules are worth replacing to keep the favorable interactions and lower the entropic penalty.
- Undisplay the receptor.
- For Site label, select Site Number.
In the receptor analyzed here, there is a cluster of water molecules with high free energy difference within the binding site. This implies that a drug sized molecule that can be accommodated in this binding site will receive significant binding from simply occupying the binding site and displacing the water molecules and confirms the receptors druggability.
Overlaying the Imatinib ligand shows that a large amount of these water molecules are already displaced (e.g. waters with the indices 57, 61, 115). It would likely be beneficial for ligand potency to additionally displace waters such as indices 128, 37, 78. A candidate for replacement could be water molecule 17.
To learn more about how to use WaterMap results in a lead optimization context, see the Identifying Binding Site Requirements and Lead Optimization with WaterMap tutorial.
5. Conclusion and References
In this tutorial, you learned how to set up and run SiteMap on a protein-ligand complex to identify, map and evaluate potential binding sites from the Maestro GUI. Additionally, you learned how to set up and analyze a WaterMap job to gain insights into the thermodynamics of the hydration in the binding pocket.
For further reading:
- SiteMap User Manual
- WaterMap User Manual
- SiteMap: Halgren et al. “New Method for Fast and Accurate Binding-site Identification and Analysis” Chem. Biol. Drug Des. 2007, 69, 146–148
- SiteMap: Halgren et al. “Identifying and Characterizing Binding Sites and Assessing Druggability” J. Chem. Inf. Model. 2009, 49, 2, 377–389
- WaterMap: Young et al. “Motifs for molecular recognition exploiting hydrophobic enclosure in protein–ligand binding” Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 3, 808–813
- WaterMap: Abel et al. “The role of the active site solvent in the thermodynamics of factor Xa-ligand binding” J. Am. Chem. Soc. 2008, 130, 9, 2817–2831
6. Glossary of Terms
Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion
Included - the entry is represented in the Workspace, the circle in the In column is blue
incorporated - once a job is finished, output files from the working directory are added to the project and shown in the Entry List and Project Table
Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data
Recent actions - This is a list of your recent actions, which you can use to reopen a panel, displayed below the Browse row. (Right-click to delete.)
Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project
Selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries
Working Directory - the location where files are saved
Workspace - the 3D display area in the center of the main window, where molecular structures are displayed