flowchart TD
step_Computational_Structure_Prediction["Computational Structure Prediction"]
style step_Computational_Structure_Prediction stroke-width:2px
step_Prerequisites_and_preliminary_steps("Prerequisites and preliminary steps")
step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it{{"Do you want to predict an entire protein structure, or just parts of it?"}}
step_Induced_Fit_Docking("Induced-Fit Docking")
step_Knowledge_based_protein_structure_prediction_methods("Knowledge-based protein structure prediction methods")
step_Is_a_structure_of_a_sufficiently_similar_homolog_available{{"Is a structure of a sufficiently similar homolog available?"}}
step_Homology_modeling("Homology modeling")
step_ML_structure_prediction("ML structure prediction")
step_Structure_refinement_and_loop_prediction("Structure refinement
and loop prediction")
step_Validating_the_predicted_structure("Validating the predicted structure")
step_Conclusion_and_next_steps("Conclusion and next steps")
step_Computational_Structure_Prediction --> step_Prerequisites_and_preliminary_steps
step_Prerequisites_and_preliminary_steps --> step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it
step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Only pocket reorganization"| step_Induced_Fit_Docking
step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Entire structure"| step_Knowledge_based_protein_structure_prediction_methods
step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it --> |"Just refinement"| step_Structure_refinement_and_loop_prediction
step_Knowledge_based_protein_structure_prediction_methods --> step_Is_a_structure_of_a_sufficiently_similar_homolog_available
step_Induced_Fit_Docking --> step_Validating_the_predicted_structure
step_Is_a_structure_of_a_sufficiently_similar_homolog_available --> |"Yes"| step_Homology_modeling
step_Is_a_structure_of_a_sufficiently_similar_homolog_available --> |"No"| step_ML_structure_prediction
step_Homology_modeling --> step_Validating_the_predicted_structure
step_ML_structure_prediction --> step_Validating_the_predicted_structure
step_Structure_refinement_and_loop_prediction --> step_Validating_the_predicted_structure
step_Validating_the_predicted_structure --> step_Conclusion_and_next_steps
classDef path_title stroke-width:2px,fill:#12122c,stroke:#12122c
classDef decision_step stroke-width:2px,fill:#005aaa,stroke:#005aaa
classDef simple_step stroke-width:2px,fill:#12122c,stroke:#12122c
class step_Computational_Structure_Prediction path_title
class step_Prerequisites_and_preliminary_steps,step_Induced_Fit_Docking,step_Knowledge_based_protein_structure_prediction_methods,step_Homology_modeling,step_ML_structure_prediction,step_Structure_refinement_and_loop_prediction,step_Validating_the_predicted_structure,step_Conclusion_and_next_steps simple_step
class step_Do_you_want_to_predict_an_entire_protein_structure__or_just_parts_of_it,step_Is_a_structure_of_a_sufficiently_similar_homolog_available decision_step
Learning Path: Computational Structure Prediction
In many cases, experimentally obtained structural data for a project may be very limited, e.g. due to issues with crystallizing the target protein, or because it is the structure of the protein itself that is being optimized. Here, a variety of computational techniques for structure prediction and validation can come in handy. This learning path covers available techniques to obtain protein structures of sufficient quality for use in subsequent calculations from a given sequence. Note that for a full, reasonably-sized protein, a fully physics-based structure prediction is unfeasible. A combination of knowledge-based methods and physics-based refinement is good practice.
Enabling protein structures from x-ray crystallography, cryo-EM, ML-methods, and homology modeling for structure-based computational workflows
Next step: Prerequisites and preliminary steps
Prerequisites and premilinary steps
In order to make informed decisions along the way, you need a clear picture of what your goal is: Once you have the structure, which questions do you hope to answer with it, and which methods do you plan to use? Additionally, the more knowledge of the system you have, the more you will be able to validate your prediction. At minimum, you will require a target sequence, but structures of related proteins or common mutants are very helpful, as are known binders or any other data. This process is by its nature very tightly connected to the process of protein preparation and can be iterative because perfect validation is impossible and issues with the predicted structure may only become evident downstream.
Next step: Do you want to predict an entire protein structure, or just parts of it?
Decide: Do you want to predict an entire protein structure, or just parts of it?
Which tools are at your disposal depends on how much data you have available and the nature of your target.
A course dedicated to the particular challenges of predicting antibody structures.
A learning path which covers available techniques for predicting the structures for T Cell receptors or peptide-MHC complexes on their own, as well as the TCR-pMHC ternary complex.
- If you already have a structure for your protein, but need to model the reorganization of the binding pocket in response to a binding event or change in ligand: go to Induced-Fit Docking
- If you want to predict the entire structure of a protein from sequence: go to Knowledge-based protein structure prediction methods
- If you want to refine an experimental structure or fill gaps not resolved by experiment: go to Structure refinement and loop prediction
Induced-Fit Docking
Limited-scale reorganization of a binding site in response to a change in the bound ligand can be modeled completely with physics-based approaches. Force-field based energy minimization based on static structures does not reliably produce good bound poses, as conformational sampling is needed for both the ligand and the target. IFD-MD uses MD-based sampling to predict putative structures which can be validated with FEP+ if potency data is available for known binders.
Next step: Validating the predicted structure
Knowledge-based protein structure prediction methods
There are two distinct approaches here - homology modeling and machine-learning (ML) based structure prediction. Homology modeling uses closely related proteins to map structure motifs to a sequence, which is only possible if structures of homologs with sufficiently high sequence identity are available. ML structure prediction uses the similarity to known structures as well, but in a much more holistic fashion. It can be helpful to use both approaches and compare the results.
Next step: Is a structure of a sufficiently similar homolog available?
Decide: Is a structure of a sufficiently similar homolog available?
- Yes: go to Homology modeling
- No: go to ML structure prediction
Homology modeling
After an initial sequence alignment, the structure of the homolog is used where the sequences match. The segments that differ are modeled in using empirical heuristics for plausible structures. It is possible to use one or multiple templates, resulting in chimeric homology models. This approach is also applicable to multimeric proteins, using separate templates for each monomer.
Next step: Validating the predicted structure
ML structure prediction
Tools like Alphafold allow for direct structure prediction without an explicit template. Note that they also assume that similar sequences share similar structures but on a much less explicit level compared to homology modeling. State of the art models (> Alphafold 2) perform very well for predicting structures for proteins somewhat close to what occurs in nature and has been crystallized and published. Due to this bias in the underlying training data, predictions for protein classes for which very little data is available or fully synthetic constructs are less reliable.
Next step: Validating the predicted structure
Structure refinement and loop prediction
Depending on the experimental method (X-Ray, NMR, EM), the raw data available for refining a structure is different and different challenges can arise. Gaps or uncertainties in experimental structures can be filled in with computational tools such as PrimeX, GlideXtal, GlideEM, or Phenix/OPLS. Whether missing loops need to be modeled at all depends on the computational methods you plan to use. For example, modeling a flexible loop near the binding site can be problematic if you plan to run a Glide screen due to the rigidity artificially imposed by Glide. On the other hand, a break in the backbone can lead to artificial hydration patterns or even cause the protein to unfold during molecular dynamics.
Next step: Validating the predicted structure
Validating the predicted structure
Depending on the prediction method used and the scale of the predictions (a few residues vs an entire protein), this process can become very involved. It is essential to consider what you plan to use the structure for (e.g. exploratory MD, virtual screening, FEP+ calculations) because that defines what a 'good (enough) structure' means in your case. As much as possible, it is good practice to decouple validation of the predicted structure from validation of the screening or simulation setup. Completely separating the two is impossible, so you may need to come back and refine your structure prediction after running some sanity checks for your follow-up calculations (e.g. docking known binders). Most of the tutorials linked in the previous steps contain examples of different validation methods. This is a non-exhaustive overview of additional tools which can be used to probe the quality of a predicted structure.
Next step: Conclusion and next steps
Conclusion and next steps
We have introduced different methods for predicting protein structures either in part or entirely using both physics- and knowledge-based methods. How you continue from here will depend on the scientific questions you want to answer.