flowchart TD
        step_Computational_Structure_Prediction["Computational Structure Prediction"]
        style step_Computational_Structure_Prediction stroke-width:2px    
        step_Prerequisites_and_preliminary_steps("Prerequisites and preliminary steps")
        step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it{{"Do you want to predict an entire protein structure, or just parts of it?"}}
        step_Induced_Fit_Docking("Induced-Fit Docking")
        step_Knowledge_based_protein_structure_prediction_methods("Knowledge-based protein structure prediction methods")
        step_Is_a_structure_of_a_sufficiently_similar_homolog_available{{"Is a structure of a sufficiently similar homolog available?"}}
        step_Homology_modeling("Homology modeling")
        step_ML_structure_prediction("ML structure prediction")
        step_Structure_refinement_and_loop_prediction("Structure refinement 
and loop prediction")
        step_Validating_the_predicted_structure("Validating the predicted structure")
        step_Conclusion_and_next_steps("Conclusion and next steps")
    
        step_Computational_Structure_Prediction --> step_Prerequisites_and_preliminary_steps
        step_Prerequisites_and_preliminary_steps --> step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it
        step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Only pocket reorganization"| step_Induced_Fit_Docking
        step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Entire structure"| step_Knowledge_based_protein_structure_prediction_methods
        step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Just refinement"| step_Structure_refinement_and_loop_prediction

        step_Knowledge_based_protein_structure_prediction_methods --> step_Is_a_structure_of_a_sufficiently_similar_homolog_available


        step_Induced_Fit_Docking --> step_Validating_the_predicted_structure
        step_Is_a_structure_of_a_sufficiently_similar_homolog_available --> |"Yes"| step_Homology_modeling
        step_Is_a_structure_of_a_sufficiently_similar_homolog_available --> |"No"| step_ML_structure_prediction


        step_Homology_modeling --> step_Validating_the_predicted_structure
        step_ML_structure_prediction --> step_Validating_the_predicted_structure
        step_Structure_refinement_and_loop_prediction --> step_Validating_the_predicted_structure
        step_Validating_the_predicted_structure --> step_Conclusion_and_next_steps

		classDef path_title stroke-width:2px,fill:#12122c,stroke:#12122c
		classDef decision_step stroke-width:2px,fill:#005aaa,stroke:#005aaa
		classDef simple_step stroke-width:2px,fill:#12122c,stroke:#12122c
		class step_Computational_Structure_Prediction path_title
		class step_Prerequisites_and_preliminary_steps,step_Induced_Fit_Docking,step_Knowledge_based_protein_structure_prediction_methods,step_Homology_modeling,step_ML_structure_prediction,step_Structure_refinement_and_loop_prediction,step_Validating_the_predicted_structure,step_Conclusion_and_next_steps simple_step
		class step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it,step_Is_a_structure_of_a_sufficiently_similar_homolog_available decision_step

Learning Path: Computational Structure Prediction

In many cases, experimentally obtained structural data for a project may be very limited, e.g. due to issues with crystallizing the target protein, or because it is the structure of the protein itself that is being optimized. Here, a variety of computational techniques for structure prediction and validation can come in handy. This learning path covers available techniques to obtain protein structures of sufficient quality for use in subsequent calculations from a given sequence. Note that for a full, reasonably-sized protein, a fully physics-based structure prediction is unfeasible. A combination of knowledge-based methods and physics-based refinement is good practice.

Target enablement, preparation, and validation
Enabling protein structures from x-ray crystallography, cryo-EM, ML-methods, and homology modeling for structure-based computational workflows

Next step: Prerequisites and preliminary steps

Prerequisites and premilinary steps

In order to make informed decisions along the way, you need a clear picture of what your goal is: Once you have the structure, which questions do you hope to answer with it, and which methods do you plan to use? Additionally, the more knowledge of the system you have, the more you will be able to validate your prediction. At minimum, you will require a target sequence, but structures of related proteins or common mutants are very helpful, as are known binders or any other data. This process is by its nature very tightly connected to the process of protein preparation and can be iterative because perfect validation is impossible and issues with the predicted structure may only become evident downstream.

Next step: Do you want to predict an entire protein structure, or just parts of it?

Decide: Do you want to predict an entire protein structure, or just parts of it?

Which tools are at your disposal depends on how much data you have available and the nature of your target.

Introduction to computational antibody engineering
A course dedicated to the particular challenges of predicting antibody structures.

T Cell Receptor Engineering
A learning path which covers available techniques for predicting the structures for T Cell receptors or peptide-MHC complexes on their own, as well as the TCR-pMHC ternary complex.

If you already have a structure for your protein, but need to model the reorganization of the binding pocket in response to a binding event or change in ligand: go to Induced-Fit Docking
If you want to predict the entire structure of a protein from sequence: go to Knowledge-based protein structure prediction methods
If you want to refine an experimental structure or fill gaps not resolved by experiment: go to Structure refinement and loop prediction

Induced-Fit Docking

Limited-scale reorganization of a binding site in response to a change in the bound ligand can be modeled completely with physics-based approaches. Force-field based energy minimization based on static structures does not reliably produce good bound poses, as conformational sampling is needed for both the ligand and the target. IFD-MD uses MD-based sampling to predict putative structures which can be validated with FEP+ if potency data is available for known binders.

Structure-Based Drug Discovery Without a Structure: Enabling Accurate FEP+ Predictions for Challenging Targets and ADMET Anti-Targets

Cross-docking with IFD-MD

Using IFD-MD on a covalently-bound ligand

Using IFD-MD on a Membrane-bound protein

Designing Out Common ADMET Liabilities using Consensus IFD-MD

Next step: Validating the predicted structure

Knowledge-based protein structure prediction methods

There are two distinct approaches here - homology modeling and machine-learning (ML) based structure prediction. Homology modeling uses closely related proteins to map structure motifs to a sequence, which is only possible if structures of homologs with sufficiently high sequence identity are available. ML structure prediction uses the similarity to known structures as well, but in a much more holistic fashion. It can be helpful to use both approaches and compare the results.

Next step: Is a structure of a sufficiently similar homolog available?

Decide: Is a structure of a sufficiently similar homolog available?

Yes: go to Homology modeling
No: go to ML structure prediction

Homology modeling

After an initial sequence alignment, the structure of the homolog is used where the sequences match. The segments that differ are modeled in using empirical heuristics for plausible structures. It is possible to use one or multiple templates, resulting in chimeric homology models. This approach is also applicable to multimeric proteins, using separate templates for each monomer.

Building Homology Models with the Multiple Sequence Viewer/Editor

Chimeric Homology Modeling using the Multiple Sequence Viewer/Editor

Heteromultimer Homology Modeling using the Multiple Sequence Viewer/Editor

Batch Homology Modeling using the Multiple Sequence Viewer/Editor

Next step: Validating the predicted structure

ML structure prediction

Tools like Alphafold allow for direct structure prediction without an explicit template. Note that they also assume that similar sequences share similar structures but on a much less explicit level compared to homology modeling. State of the art models (> Alphafold 2) perform very well for predicting structures for proteins somewhat close to what occurs in nature and has been crystallized and published. Due to this bias in the underlying training data, predictions for protein classes for which very little data is available or fully synthetic constructs are less reliable.

Benchmarking Refined and Unrefined AlphaFold2 Structures for Hit Discovery

Next step: Validating the predicted structure

Structure refinement and loop prediction

Depending on the experimental method (X-Ray, NMR, EM), the raw data available for refining a structure is different and different challenges can arise. Gaps or uncertainties in experimental structures can be filled in with computational tools such as PrimeX, GlideXtal, GlideEM, or Phenix/OPLS. Whether missing loops need to be modeled at all depends on the computational methods you plan to use. For example, modeling a flexible loop near the binding site can be problematic if you plan to run a Glide screen due to the rigidity artificially imposed by Glide. On the other hand, a break in the backbone can lead to artificial hydration patterns or even cause the protein to unfold during molecular dynamics.

Refining crystallographic protein-ligand structures using GlideXtal and Phenix/OPLS

Real space refinement with Phenix/OPLS3e

Docking ligands in cryo-EM maps with GlideEM

Crystallographic refinement with Phenix/OPLS3e

Next step: Validating the predicted structure

Validating the predicted structure

Depending on the prediction method used and the scale of the predictions (a few residues vs an entire protein), this process can become very involved. It is essential to consider what you plan to use the structure for (e.g. exploratory MD, virtual screening, FEP+ calculations) because that defines what a 'good (enough) structure' means in your case. As much as possible, it is good practice to decouple validation of the predicted structure from validation of the screening or simulation setup. Completely separating the two is impossible, so you may need to come back and refine your structure prediction after running some sanity checks for your follow-up calculations (e.g. docking known binders). Most of the tutorials linked in the previous steps contain examples of different validation methods. This is a non-exhaustive overview of additional tools which can be used to probe the quality of a predicted structure.

Protein Reliability Report

Introduction to All-Atom Molecular Dynamics with Desmond

Next step: Conclusion and next steps

Conclusion and next steps

We have introduced different methods for predicting protein structures either in part or entirely using both physics- and knowledge-based methods. How you continue from here will depend on the scientific questions you want to answer.

Enabling Structure-Based Drug Discovery Utilizing Predicted Models

Using AlphaFold and Experimental Structures for the Prediction of the Structure and Binding Affinities of GPCR Complexes via Induced Fit Docking and Free Energy Perturbation

Structure-Based Drug Discovery Without a Structure webinar

Target Enablement Services

Computational Target Analysis

Virtual Screening