flowchart TD
        step_Computational_Structure_Prediction["Computational Structure Prediction"]
        style step_Computational_Structure_Prediction stroke-width:2px    
        step_Prerequisites_and_preliminary_steps("Prerequisites and preliminary steps")
        step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it{{"Do you want to predict an entire protein structure, or just parts of it?"}}
        step_Induced_Fit_Docking("Induced-Fit Docking")
        step_Knowledge_based_protein_structure_prediction_methods("Knowledge-based protein structure prediction methods")
        step_Is_a_structure_of_a_sufficiently_similar_homolog_available{{"Is a structure of a sufficiently similar homolog available?"}}
        step_Homology_modeling("Homology modeling")
        step_ML_structure_prediction("ML structure prediction")
        step_Structure_refinement_and_loop_prediction("Structure refinement 
and loop prediction
") step_Validating_the_predicted_structure("Validating the predicted structure") step_Conclusion_and_next_steps("Conclusion and next steps") step_Computational_Structure_Prediction --> step_Prerequisites_and_preliminary_steps step_Prerequisites_and_preliminary_steps --> step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Only pocket reorganization"| step_Induced_Fit_Docking step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Entire structure"| step_Knowledge_based_protein_structure_prediction_methods step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Just refinement"| step_Structure_refinement_and_loop_prediction step_Knowledge_based_protein_structure_prediction_methods --> step_Is_a_structure_of_a_sufficiently_similar_homolog_available step_Induced_Fit_Docking --> step_Validating_the_predicted_structure step_Is_a_structure_of_a_sufficiently_similar_homolog_available --> |"Yes"| step_Homology_modeling step_Is_a_structure_of_a_sufficiently_similar_homolog_available --> |"No"| step_ML_structure_prediction step_Homology_modeling --> step_Validating_the_predicted_structure step_ML_structure_prediction --> step_Validating_the_predicted_structure step_Structure_refinement_and_loop_prediction --> step_Validating_the_predicted_structure step_Validating_the_predicted_structure --> step_Conclusion_and_next_steps classDef path_title stroke-width:2px,fill:#12122c,stroke:#12122c classDef decision_step stroke-width:2px,fill:#005aaa,stroke:#005aaa classDef simple_step stroke-width:2px,fill:#12122c,stroke:#12122c class step_Computational_Structure_Prediction path_title class step_Prerequisites_and_preliminary_steps,step_Induced_Fit_Docking,step_Knowledge_based_protein_structure_prediction_methods,step_Homology_modeling,step_ML_structure_prediction,step_Structure_refinement_and_loop_prediction,step_Validating_the_predicted_structure,step_Conclusion_and_next_steps simple_step class step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it,step_Is_a_structure_of_a_sufficiently_similar_homolog_available decision_step

Learning Path: Computational Structure Prediction

In many cases, experimentally obtained structural data for a project may be very limited, e.g. due to issues with crystallizing the target protein, or because it is the structure of the protein itself that is being optimized. Here, a variety of computational techniques for structure prediction and validation can come in handy. This learning path covers available techniques to obtain protein structures of sufficient quality for use in subsequent calculations from a given sequence. Note that for a full, reasonably-sized protein, a fully physics-based structure prediction is unfeasible. A combination of knowledge-based methods and physics-based refinement is good practice.

Target enablement, preparation, and validation
Enabling protein structures from x-ray crystallography, cryo-EM, ML-methods, and homology modeling for structure-based computational workflows

Prerequisites and premilinary steps

In order to make informed decisions along the way, you need a clear picture of what your goal is: Once you have the structure, which questions do you hope to answer with it, and which methods do you plan to use? Additionally, the more knowledge of the system you have, the more you will be able to validate your prediction. At minimum, you will require a target sequence, but structures of related proteins or common mutants are very helpful, as are known binders or any other data. This process is by its nature very tightly connected to the process of protein preparation and can be iterative because perfect validation is impossible and issues with the predicted structure may only become evident downstream.

Decide: Do you want to predict an entire protein structure, or just parts of it?

Which tools are at your disposal depends on how much data you have available and the nature of your target.

Introduction to computational antibody engineering
A course dedicated to the particular challenges of predicting antibody structures.
T Cell Receptor Engineering
A learning path which covers available techniques for predicting the structures for T Cell receptors or peptide-MHC complexes on their own, as well as the TCR-pMHC ternary complex.

Induced-Fit Docking

Limited-scale reorganization of a binding site in response to a change in the bound ligand can be modeled completely with physics-based approaches. Force-field based energy minimization based on static structures does not reliably produce good bound poses, as conformational sampling is needed for both the ligand and the target. IFD-MD uses MD-based sampling to predict putative structures which can be validated with FEP+ if potency data is available for known binders.

Knowledge-based protein structure prediction methods

There are two distinct approaches here - homology modeling and machine-learning (ML) based structure prediction. Homology modeling uses closely related proteins to map structure motifs to a sequence, which is only possible if structures of homologs with sufficiently high sequence identity are available. ML structure prediction uses the similarity to known structures as well, but in a much more holistic fashion. It can be helpful to use both approaches and compare the results.

Decide: Is a structure of a sufficiently similar homolog available?

Homology modeling

After an initial sequence alignment, the structure of the homolog is used where the sequences match. The segments that differ are modeled in using empirical heuristics for plausible structures. It is possible to use one or multiple templates, resulting in chimeric homology models. This approach is also applicable to multimeric proteins, using separate templates for each monomer.

ML structure prediction

Tools like Alphafold allow for direct structure prediction without an explicit template. Note that they also assume that similar sequences share similar structures but on a much less explicit level compared to homology modeling. State of the art models (> Alphafold 2) perform very well for predicting structures for proteins somewhat close to what occurs in nature and has been crystallized and published. Due to this bias in the underlying training data, predictions for protein classes for which very little data is available or fully synthetic constructs are less reliable.

Structure refinement and loop prediction

Depending on the experimental method (X-Ray, NMR, EM), the raw data available for refining a structure is different and different challenges can arise. Gaps or uncertainties in experimental structures can be filled in with computational tools such as PrimeX, GlideXtal, GlideEM, or Phenix/OPLS. Whether missing loops need to be modeled at all depends on the computational methods you plan to use. For example, modeling a flexible loop near the binding site can be problematic if you plan to run a Glide screen due to the rigidity artificially imposed by Glide. On the other hand, a break in the backbone can lead to artificial hydration patterns or even cause the protein to unfold during molecular dynamics.

Validating the predicted structure

Depending on the prediction method used and the scale of the predictions (a few residues vs an entire protein), this process can become very involved. It is essential to consider what you plan to use the structure for (e.g. exploratory MD, virtual screening, FEP+ calculations) because that defines what a 'good (enough) structure' means in your case. As much as possible, it is good practice to decouple validation of the predicted structure from validation of the screening or simulation setup. Completely separating the two is impossible, so you may need to come back and refine your structure prediction after running some sanity checks for your follow-up calculations (e.g. docking known binders). Most of the tutorials linked in the previous steps contain examples of different validation methods. This is a non-exhaustive overview of additional tools which can be used to probe the quality of a predicted structure.

Conclusion and next steps

We have introduced different methods for predicting protein structures either in part or entirely using both physics- and knowledge-based methods. How you continue from here will depend on the scientific questions you want to answer.