Batch Homology Modeling Using the Multiple Sequence Viewer/Editor

Tutorial Created with Software Release: 2024-2
Topics: Antibody Design, Biologics Drug Discovery, Enzyme Engineering, Structure Prediction & Target Enablement
Products Used: BioLuminate

Tutorial files

0.9 KB

This tutorial is written for use with a 3-button mouse with a scroll wheel.
Words found in the Glossary of Terms are shown like this: Workspacethe 3D display area in the center of the main window, where molecular structures are displayedthe 3D display area in the center of the main window, where molecular structures are displayed

 

Tip: You can hover over a glossary term to display its definition. You can click on an image to expand it in the page.
Abstract:

 

In this tutorial, you will learn how to build multiple homology models using a single template with the Multiple Sequence Viewer/Editor

 

Tutorial Content
  1. Creating Projects and Importing Structures

  1. Loading and Analyzing Sequences in the Multiple Sequence Viewer/Editor

  1. Building Batch Homology Models

  1. Conclusion and References

  1. Glossary of Terms

1. Creating Projects and Importing Structures

At the start of the session, change the file path to your chosen Working Directorythe location that files are saved in Maestro to make file navigation easier. Each session in Maestro begins with a default Scratch Projecta temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project, which is not saved. A Maestro project stores all your data and has a .prj extension. A project may contain numerous entries corresponding to imported structures, as well as the output of modeling-related tasks. Once a project is created, the project is automatically saved each time a change is made.

Structures can be imported from the PDB directly, or from your Working Directorythe location that files are saved using File > Import Structures, and are added to the Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion and Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data. The Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion is located to the left of the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed. The Project Tabledisplays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data can be accessed by Ctrl+T (Cmd+T) or Window > Project Table if you would like to see an expanded view of your project data.

  1. Double-click the BioLuminate icon

Figure 1-1. Change Working Directory option.

  1. Go to File > Change Working Directory
  2. Find your directory, and click Choose
  3. Pre-generated input and results files are included for running jobs or examining output. Download the zip file here: https://www.schrodinger.com/sites/default/files/s3/release/current/Tutorials/zip/batch_homology.zip
  4. After downloading the zip file, unzip the contents in your Working Directory for ease of access throughout the tutorial

 

Figure 1-2. Save Project.

  1. Go to File > Save Project As
  2. Change the File name to batch_homology , click Save
    • The project is now named batch_homology.prj

 

 

2. Loading and Analyzing Sequences in the Multiple Sequence Viewer/Editor

In this section, we will load 8 sequences into the Multiple Sequence Viewer/Editor, perform a multiple sequence alignment, generate a logo plot, and calculate and display several sequence-based descriptors that can be used to triage large batches of sequences.

Figure 2-1. Open the Multiple Sequence Viewer/Editor.

  1. Go to Tasks > Biologics > Multiple Sequence Viewer/Editor
    • The Multiple Sequence Viewer/Editor opens
    • The sequences includedthe entry is represented in the Workspace, the circle in the In column is blue in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed sequences are shown

Figure 2-2. Import Sequences from File in the Multiple Sequence Viewer/Editor.

  1. In the Multiple Sequence Viewer/Editor, go to  File > Import Sequences from File

Figure 2-3. Importing sequences.

  1. Select(1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries the file pneumolysin_seqset.fasta
  2. Click Open
    • The sequences are loaded into the Multiple Sequence Viewer/Editor

 

 

Figure 2-4. Select the reference sequence.

  1. Click then right-click on POC2J9
  2. Choose Set as Reference

Figure 2-5. Align sequences.

  1. Click Align
  2. For Using, choose Multiple Sequence Alignment

 

Note: Make sure Selected only is unselected

 

  1. Click Align

 

 

 

Figure 2-6. Generate Logo plot.

Now that the sequences are aligned we can look at the degree of conservation across using a Sequence Logo chart

 

  1. Hover over the chart icon and click the
  2. Click Sequence Logo
    • The Sequence Logo plot is now visible above the sequence

Figure 2-7. Launch Compute Sequence Descriptors.

We are now going to compute sequence descriptors for all of our sequences. For a complete list of the available protein sequence descriptors, along with explanations and references, see the Protein Sequence Descriptors documentation page

 

 

  1. Go to Other Tasks > Compute Sequence Descriptors

 

 

Figure 2-8. Generate sequence-based descriptors.

Note: All of the descriptors in the table will be calculated by default click Add to calculate additional descriptors

 

  1. Click OK
    • The Calculate Sequence Descriptor job is launched

 

Note: The descriptors will not be added to the Multiple Sequence Viewer/Editor by default - a pop-up will appear letting you know the job is completed

 

Figure 2-9. Open show properties dialog.

  1. Click the + icon
  2. Click Show properties
    • A dialog appears

Figure 2-10. Display descriptors in the Multiple Sequence Viewer/Editor.

  1. Click Add
  2. Select Descriptors
  3. Type and select Bulkiness, Relative Mutability, and Transmembrane Tendency
    • The three descriptors have been added to the Multiple Sequence Viewer/Editor

 

 

Figure 2-11. Remove descriptors from the Multiple Sequence Viewer/Editor interface.

As we aren’t going to do anything with the calculated descriptors at the moment we can hide them from the Multiple Sequence Viewer/Editor.

 

  1. Click the + icon
  2. Click Show properties
  3. Click Hide all
    • The properties are now removed from the Multiple Sequence Viewer/Editor

3. Building Batch Homology Models

In this section, we will use a single template structure that we identify using a BLAST search in order to build homology models for the 8 loaded sequences.  Batch homology modeling is appropriate only for a set of sequences with high identity.

Figure 3-1. Open Build Homology Model panel.

  1. In the Multiple Sequence Viewer/Editor, go to Other Tasks > Build Homology Model
    • The Build Homology Model panel opens

Figure 3-2. Choose setting.

  1. For Use, choose Batch single template modeling

 

Figure 3-3. Find homolog.

 

  1. Click Find
    • A dialog appears
  2. Click the cog icon
  3. Uncheck Use local server only

 

Note: This requires internet access. The ‘Use local server only option’ is checked by default to prevent your BLAST searches and related tasks from going out to remote servers. To allow remote access (after a confirmation), clear this option. It is also available from the top-level Edit → Settings and Defaults menu.

 

  1. Click Run Search
  2. A message appears requesting remote access. Click OK
    • A BLAST search is launched
    • The search may take 1-2 minutes to complete

Figure 3-4. Choose homolog.

There are many templates with 100% sequence identity which would surely be the most ideal if we were running this as part of a project. For pedagogical reasons we will select 5AOD, which itself has incredibly high sequence identity

 

  1. Select 5AOD_A
  2. Click Import
    • 5AOD_A has been added to the Multiple Sequence Viewer/Editor

 

 

 

 

 

Figure 3-5. Set 5AOD as the reference.

  1. Click Set as Reference
    • 5AOD is set as the reference and is now in the top of the list

 

Figure 3-6. Run alignment.

  1. Click Run Alignment
    • The sequences are now aligned to the 5AOD sequence

Figure 3-7. Generate homology models.

 

  1. For Job name, write homology_modeling_batch
  2. Click Generate Model
    • This job will take ~ minutes to complete
    • 8 models will be created

 

Figure 3-8. Overlay of the 8 models.

  1. Shift-click in Entry Lista simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion to include all eight homology modeling outputs
    • The structures are included in the Workspacethe 3D display area in the center of the main window, where molecular structures are displayed
    • Cyan colored ribbons correspond to when residue backbone conformation is copied from the template, and a side chain mutation is at this position

 

After building homology models in batch, the following are common next steps:

  1. Calculate Protein Descriptors - Assuming you had some observable (experimental endpoint) associated with the sequences (and now structure) you could load the descriptors along with the observable into an ML engine to build a model that associated some combination of the descriptors with the observable (or to look at feature importance/selection). See this paper for an example.
  2. Run protein_patch_calculation.py to run Protein Surface Analysis in bulk from the command line

4. Conclusion and References

In this tutorial, we analyzed a series of sequences by first aligning them to a reference, then displaying  a sequence logo plot and several sequence-based descriptors. We then successfully built homology models for all of the sequences using a very close homolog.

For further learning:

 

5. Glossary of Terms

Entry List - a simplified view of the Project Table that allows you to perform basic operations such as selection and inclusion

included - the entry is represented in the Workspace, the circle in the In column is blue

incorporated - once a job is finished, output files from the Working Directory are added to the project and shown in the Entry List and Project Table

Project Table - displays the contents of a project and is also an interface for performing operations on selected entries, viewing properties, and organizing structures and data

Scratch Project - a temporary project in which work is not saved, closing a scratch project removes all current work and begins a new scratch project

selected - (1) the atoms are chosen in the Workspace. These atoms are referred to as "the selection" or "the atom selection". Workspace operations are performed on the selected atoms. (2) The entry is chosen in the Entry List (and Project Table) and the row for the entry is highlighted. Project operations are performed on all selected entries

Working Directory - the location that files are saved

Workspace - the 3D display area in the center of the main window, where molecular structures are displayed