QSAR, AutoQSAR, descriptors, features, machine learning, ML

Protein Descriptors

Protein descriptors can be used to develop QSAR/QSPR models for proteins, using tools such as the AutoQSAR Panel. A set of protein descriptors can be generated with the Calculate Protein Descriptors Panel. These descriptors are generated for the sequences, the structures, and from a patch analysis. The descriptors are listed in the tables below. See Ref. 18 for details on the descriptors.

Sequence descriptors.

Descriptor Explanation Comments/references
AGGRESCAN_Nr_hotspots Number of aggregation hotspots, nHS (from Aggrescan) http://bioinf.uab.es/aap/aap_help.html
AGGRESCAN_a3v_value Sum of Average amino acid flexibility value (Aggrescan) http://bioinf.uab.es/aap/aap_help.html
Aa_Composition Sum of Amino acid composition values(McCaldon & Argos) https://web.expasy.org/protscale/pscale/A.A.composition.html
Aa_Composition_Swissprot Sum of Amino acid composition values based on proteins in SwissProt https://web.expasy.org/protscale/pscale/A.A.Swiss-Prot.html
Aa_Flexibility_VTR Sum of Amino acid flexibility scale values (Vihinen, Torkkila & Rikonen) https://www.ncbi.nlm.nih.gov/pubmed/8090708
All_Aggrescan_a4v Sum of Average a3v over sliding window; aka a4vSS (Aggrescan) https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-65
All_Aggrescan_a4v_pos Sum of Average of a3v positive values over sliding window (Aggrescan)  
All_AggScore Sum of AggScore that includes all residues in the target protein  
All_Zyggregator_profile_smoothed Sum of Zyggregator profile Z-scores (Zyggregator)  
All_Zyggregator_profile_smoothed_pos Sum of positive Zyggregator profile Z-scores (Zyggregator)  
Alpha_Helix_Chou_Fasman Sum of alpha helix propensities (Chou & Fasman) https://web.expasy.org/protscale/pscale/alpha-helixFasman.html
Alpha_Helix_Deleage_Roux Sum of alpha helix propensities (Deleage & Roux) https://web.expasy.org/protscale/pscale/alpha-helixRoux.html
Alpha_Helix_Levitt Sum of alpha helix propensities (Levitt) https://web.expasy.org/protscale/pscale/alpha-helixLevitt.html
Antiparallel_Beta_Strand Sum of Conformational preference for antiparallel beta strand (Lifson & Sander) https://web.expasy.org/protscale/pscale/Antiparallelbeta-strand.html
Average_Flexibility_BP Sum of amino acid average flexibility values (Bhaskaran & Ponnusamy) https://web.expasy.org/protscale/pscale/Averageflexibility.html
Avg_Area_Buried Sum of Average area buried on transfer from standard state to folded protein (Rose et al.) https://web.expasy.org/protscale/pscale/Averageburied.html
Beta_Sheet_Chou_Fasman Sum of beta sheet propensities (Chou & Fasman) https://web.expasy.org/protscale/pscale/beta-sheetFasman.html
Beta_Sheet_Deleage_Roux Sum of beta sheet propensities (Deleage & Roux) https://web.expasy.org/protscale/pscale/beta-sheetRoux.html
Beta_Sheet_Levitt Sum of beta sheet propensities (Levitt) https://web.expasy.org/protscale/pscale/beta-sheetLevitt.html
Beta_Turn_Chou_Fasman Sum of beta turn propensities (Chou & Fasman) https://web.expasy.org/protscale/pscale/beta-turnFasman.html
Beta_Turn_Deleage_Roux Sum of beta turn propensities (Deleage & Roux) https://web.expasy.org/protscale/pscale/beta-turnRoux.html
Beta_Turn_Levitt Sum of beta turn propensities (Levitt) https://web.expasy.org/protscale/pscale/beta-turnLevitt.html
Bulkiness Sum of amino acid bulkiness values https://web.expasy.org/protscale/pscale/Bulkiness.html
Coil_Deleage_Roux Sum of Conformational parameters for coil (Deleage & Roux) https://web.expasy.org/protscale/pscale/CoilRoux.html
Disorder_Propensity_DisProt Sum of disorder promotion propensities https://www.ncbi.nlm.nih.gov/pubmed/17578581
Disorder_Propensity_FoldUnfold Sum of disorder promotion propensities (FoldUnFold) https://www.ncbi.nlm.nih.gov/pubmed/15498936
Disorder_Propensity_TOP_IDP Sum of disorder propensity for intrinsic disorder (TOP-IDP scale) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2676888/
HPLC_Retention_Ph_2_1 Sum of scale of Retention coefficients in HPLC, pH 2.1 https://web.expasy.org/protscale/pscale/HPLC2.1.html
HPLC_Tfa_Retention Sum of scale of retention coefficients in HPLC/TFA https://web.expasy.org/protscale/pscale/HPLCTFA.html
Hplc_Hfba_Retention Sum of scale of Retention coefficient in HFBA https://web.expasy.org/protscale/pscale/HPLCHFBA.html
Hplc_Retention_Ph_7_4 Sum of scale of Retention coefficients in HPLC, pH 7.4 https://web.expasy.org/protscale/pscale/HPLC7.4.html
Hydrophobicity_Abraham_Leo Sum of hydrophobicity scale values (Abraham & Leo) https://web.expasy.org/protscale/pscale/Hphob.Leo.html
Hydrophobicity_Black Sum of hydrophobicity scale values (Black) https://web.expasy.org/protscale/pscale/Hphob.Black.html
Hydrophobicity_Bull_Breese Sum of hydrophobicity scale values (Bull & Breese) https://web.expasy.org/protscale/pscale/Hphob.Breese.html
Hydrophobicity_Chothia Sum of hydrophobicity scale values based on Proportion of residues 95% buried (Chothia) https://web.expasy.org/protscale/pscale/Hphob.Chothia.html
Hydrophobicity_Eisenberg Sum of Normalized consensus hydrophobicity scale values (Eisenberg et al.) https://web.expasy.org/protscale/pscale/Hphob.Eisenberg.html
Hydrophobicity_Fauchere Sum of hydrophobicity scale values (Fauchere) https://web.expasy.org/protscale/pscale/Hphob.Fauchere.html
Hydrophobicity_Guy Sum of Hydrophobicity scale values based on free energy of transfer (kcal/mole) (Guy) https://web.expasy.org/protscale/pscale/Hphob.Guy.html
Hydrophobicity_Hopp_Woods Sum of hydrophilicity scale values (Hopp & Woods) https://web.expasy.org/protscale/pscale/Hphob.Woods.html
Hydrophobicity_Hplc_Parker Sum of Hydrophilicity scale derived from HPLC peptide retention times (Parker et al.) https://web.expasy.org/protscale/pscale/Hphob.Parker.html
Hydrophobicity_Hplc_Ph_3_4_Cowan Sum of Hydrophobicity indices at ph 3.4 determined by HPLC (Cowan & Whittaker) https://web.expasy.org/protscale/pscale/Hphob.pH3.4.html
Hydrophobicity_Hplc_Ph_7_5_Cowan Sum of Hydrophobicity indices at ph 7.5 determined by HPLC (Cowan & Whittaker) https://web.expasy.org/protscale/pscale/Hphob.pH7.5.html
Hydrophobicity_Hplc_Wilson Sum of Hydrophobic constants derived from HPLC peptide retention times (Wilson et al.) https://web.expasy.org/protscale/pscale/Hphob.Wilson.html
Hydrophobicity_Janin Sum of hydrophobicity scale dG of transfer from inside to outside of a globular protein (Janin) https://web.expasy.org/protscale/pscale/Hphob.Janin.html
Hydrophobicity_Kyte_Doolittle Sum of hydrophobicity scale values (Kyte & Dolittle) https://web.expasy.org/protscale/pscale/Hphob.Doolittle.html
Hydrophobicity_Manavalan Sum of Average surrounding hydrophobicity scale values (Manavalan & Ponnusamy) https://web.expasy.org/protscale/pscale/Hphob.Manavalan.html
Hydrophobicity_Miyazawa_Jernigan Sum of hydrophobicity scale values (Miyazawa & Jernigan) https://web.expasy.org/protscale/pscale/Hphob.Miyazawa.html
Hydrophobicity_Rao_Argos Sum of membrane buried helix parameters (Rao & Argos) https://web.expasy.org/protscale/pscale/Hphob.Argos.html
Hydrophobicity_Rf_Mobility Sum of hydrophobicity scale values based on Mobilities on chromatography paper (Aboderin) https://web.expasy.org/protscale/pscale/Hphob.mobility.html
Hydrophobicity_Rose Sum of hydrophobicity scale values based on Mean fractional exposed area loss (Rose) https://web.expasy.org/protscale/pscale/Hphob.Rose.html
Hydrophobicity_Roseman Sum of hydrophobicity scale values (Roseman) https://web.expasy.org/protscale/pscale/Hphob.Roseman.html
Hydrophobicity_Sweet Sum of Optimized matching hydrophobicity (OMH) scale values (Sweet) https://web.expasy.org/protscale/pscale/Hphob.Sweet.html
Hydrophobicity_Tanford Sum of hydrophobicity scale values (Tanford) https://web.expasy.org/protscale/pscale/Hphob.Tanford.html
Hydrophobicity_Welling Sum of Antigenicity value (X 10) values (Welling) https://web.expasy.org/protscale/pscale/Hphob.Welling.html
Hydrophobicity_Wolfenden Sum of Hydration potential (kcal/mole) at 25 °C scale values (Wolfenden) https://web.expasy.org/protscale/pscale/Hphob.Wolfenden.html
Molecular_Weight Molecular Weight based on a simple amino acid Mol Wt scale https://web.expasy.org/protscale/pscale/Molecularweight.html
Number_Of_Codons Sum of # of codons encoding each amino acid in the universal genetic code https://web.expasy.org/protscale/pscale/Numbercodons.html
Parallel_Beta_Strand Sum of Conformational preference values for parallel beta strand (Lifson & Sander) https://web.expasy.org/protscale/pscale/Parallelbeta-strand.html
Percentage_Accessible_Res Sum of Molar fraction (%) values (of 3220 accessible residues) (Janin) https://web.expasy.org/protscale/pscale/accessibleresidues.html
Percentage_Buried_Res Sum of Molar fraction (%) values (of 2001 buried residues) (Janin) https://web.expasy.org/protscale/pscale/buriedresidues.html
Polarity_Grantham Sum of polarity scale values (Grantham) https://web.expasy.org/protscale/pscale/PolarityGrantham.html
Polarity_Zimmerman Sum of polarity scale values (Zimmerman) https://web.expasy.org/protscale/pscale/PolarityZimmerman.html
Ratio_Hetero_End_Side Sum of Atomic weight ratio of hetero elements in end group to C in side chain https://web.expasy.org/protscale/pscale/Ratioside.html
Recognition_Factors Sum of recognition factors (average of interaction energy with each 20 aa) of each aa (Fraga) https://web.expasy.org/protscale/pscale/Recognitionfactors.html
Refractivity Sum of refractivity index of each aa (Jones) https://web.expasy.org/protscale/pscale/Refractivity.html
Relative_Mutability Sum of relative mutability values of aa; with Ala=100 (Dayhoff et al.) https://web.expasy.org/protscale/pscale/Relativemutability.html
Total_Beta_Strand Sum of Conformational preference for total beta strand (antiparallel+parallel) (Lifson & Sander) https://web.expasy.org/protscale/pscale/Totalbeta-strand.html
Transmembrane_Tendency Sum of transmembrane tendency values (Zhao & London) https://web.expasy.org/protscale/pscale/Transmembranetendency.html
ZYGGREGATOR_p_agg Sum of p_agg values (Zyggregator)  
ZYGGREGATOR_z_agg Zyggregator z_agg value (Zyggregator)  

Structure descriptors.

Descriptor Explanation Comments/references
All_Aromatic_SASA Sum of SASA of aromatic residues  
All_Atomic_Contact_Energy Atomic Contact Energy (ACE) according to Zhang et al. (see Ref) https://www.ncbi.nlm.nih.gov/pubmed/9126848
All_Dipole_Moment Dipole moment of molecule  
All_Formal_Charge Total Formal charge of molecule (Sum of formal charges of individual atoms)  
All_Greasy_SASA Normalized SASA of greasy atoms (extremely hydrophobic atoms with slogp > 0.05 and energy > 10) Normalized means divided by Total SASA
All_HB_Acceptor_SASA Normalized SASA of hydrogen bond acceptor atoms Normalized means divided by Total SASA
All_HB_Donor_SASA Normalized SASA of hydrogen bond donor atoms Normalized means divided by Total SASA
All_Hydrophilic_SASA Normalized SASA of hydrophilic atoms Normalized means divided by Total SASA
All_Hydrophobic_Moment Hydrophobic Moment of molecule  
All_Hydrophobic_SASA Normalized SASA of hydrophobic atoms Normalized means divided by Total SASA
All_Moment_of_Inertia Moment of Inertia of molecule  
All_Negative_formal_SASA Normalized SASA of atoms with negative formal charge Normalized means divided by Total SASA
All_Negative_partial_SASA Normalized SASA of atoms with negative partial charge Normalized means divided by Total SASA
All_Positive_formal_SASA Normalized SASA of atoms with positive formal charge Normalized means divided by Total SASA
All_Positive_partial_SASA Normalized SASA of atoms with positive partial charge Normalized means divided by Total SASA
All_SASA Total Solvent Accessible Surface Area (SASA)  
All_Zeta_Potential Zeta Potential  
Apparent_Charge_eV Apparent charge  
Atomic_contact_energy Atomic Contact Energy (ACE) according to Zhang et al. (see Ref) https://www.ncbi.nlm.nih.gov/pubmed/9126848
Connectivity Connectivity index of the molecule  
Debye_length Debye length  
Dipole_X_direction Dipole moment along X-axis All molecules should be aligned for this descriptor to be meaningful
Dipole_Y_direction Dipole moment along Y-axis All molecules should be aligned for this descriptor to be meaningful
Dipole_Z_direction Dipole moment along Z-axis All molecules should be aligned for this descriptor to be meaningful
Dipole_moment Dipole moment  
Drag_coeffient Drag Coefficient http://www.nottingham.ac.uk/ncmh/documents/papers/paper216.pdf
Electrophoretic_mobility Electrophoretic mobility  
Exposed_agg_surf_area Total SASA of greasy surfaces (extremely hydrophobic)  
Formal_Charge_eV Total formal charge of molecule  
Hydrodynamic_radius Hydrodynamic radius http://www.nottingham.ac.uk/ncmh/documents/papers/paper216.pdf
Hydrophobic_X_direction Hydrophobic Moment of molecule in X-direction All molecules should be aligned for this descriptor to be meaningful
Hydrophobic_Y_direction Hydrophobic Moment of molecule in Y-direction All molecules should be aligned for this descriptor to be meaningful
Hydrophobic_Z_direction Hydrophobic Moment of molecule in Z-direction All molecules should be aligned for this descriptor to be meaningful
Hydrophobicity_moment_KD Hydrophobicity moment (based on Kyte-Doolittle scale)  
Hydrophobicity_moment_Rosman Hydrophobicity moment (based on Roseman scale)  
Inertia_X_direction Moment of Inertia along X-direction All molecules should be aligned for this descriptor to be meaningful
Inertia_Y_direction Moment of Inertia along Y-direction All molecules should be aligned for this descriptor to be meaningful
Inertia_Z_direction Moment of Inertia along Z-direction All molecules should be aligned for this descriptor to be meaningful
Molecular_weight_kDa Molecular weight measured as Sum of atomic weights  
Net_Charge_model_based Net charge according to standard assignment of charges  
Net_Charge_propka_based Net charged based on PropKa charge assignment  
Nr_of_hydrogen_bonds Number of hydrogen bonds  
Nr_rotatable_bonds Number of rotatable bonds  
Radius_of_gyration Radius of gyration http://www.nottingham.ac.uk/ncmh/documents/papers/paper216.pdf
Sedimentation_constant Theoretical Sedimentation constant  
Total_acceptor_SASA Total SASA of hydrogen bond acceptor atoms  
Total_aromatic_SASA Total SASA of aromatic atoms  
Total_donor_SASA Total SASA of hydrogen bond donor atoms  
Total_hydrophil_SASA_slogp Total SASA of atoms with slogp < 0  
Total_hydrophilic_SASA Total SASA of hydrophilic atoms  
Total_hydrophob_SASA_slogp Total SASA of atoms with slogp > 0  
Total_hydrophobic_SASA Total SASA of hydrophobic atoms  
Total_negative_SASA Total SASA of negatively charged atoms  
Total_positive_SASA Total SASA of positively charged atoms  
Volume_asa_based Volume based on ASA  
Volume_vdw_based Volume based on vdw radii  
Zeta_potential Zeta potential  
pI_PROPKA_based Predicted pI based on PropKa  
pI_model_pKa_based Predicted pI based on standard pKas of titratable groups  

Patch descriptors.

Descriptor Explanation Comments/references
All_AggScore Schrodinger's patch-based Aggregation propensity score Calculated by Protein Surface Analyzer (BioLuminate)
All_Hydrophobic_Patch_Energy Sum of residue contributions to hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
All_Hydrophobic_Patch_Energy_gt15 Sum of residue contributions to hydrophobic patches > 15 (strong hydrophobic) Calculated by Protein Surface Analyzer (BioLuminate)
All_Hydrophobic_Patch_Energy_gt30 Sum of residue contributions to hydrophobic patches > 30 (very strong hydrophobic) Calculated by Protein Surface Analyzer (BioLuminate)
All_Negative_Patch_Energy Sum of residue contributions to negatively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
All_Negative_Patch_Energy_gt30 Sum of residue contributions to negative patches > 30 (strong -ve) Calculated by Protein Surface Analyzer (BioLuminate)
All_Negative_Patch_Energy_gt50 Sum of residue contributions to negative patches > 50 (very strong -ve) Calculated by Protein Surface Analyzer (BioLuminate)
All_Positive_Patch_Energy Sum of residue contributions to positively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
All_Positive_Patch_Energy_gt30 Sum of residue contributions to positive patches > 30 (strong +ve) Calculated by Protein Surface Analyzer (BioLuminate)
All_Positive_Patch_Energy_gt50 Sum of residue contributions to positive patches > 50 (very strong +ve) Calculated by Protein Surface Analyzer (BioLuminate)
Avg_Score_Hyd_Patches Average energy of hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Avg_Score_Neg_Patches Average energy of negative patches Calculated by Protein Surface Analyzer (BioLuminate)
Avg_Score_Pos_Patches Average energy of positive patches Calculated by Protein Surface Analyzer (BioLuminate)
Avg_Size_Hyd_Patches Average size of hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Avg_Size_Neg_Patches Average size of negative patches Calculated by Protein Surface Analyzer (BioLuminate)
Avg_Size_Pos_Patches Average size of positive patches Calculated by Protein Surface Analyzer (BioLuminate)
Max_Score_Hyd_Patches Maximum energy of hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Max_Score_Neg_Patches Maximum energy of negatively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Max_Score_Pos_Patches Maximum energy of positively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Max_Size_Hyd_Patches Maximum size of hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Max_Size_Neg_Patches Maximum size of negatively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Max_Size_Pos_Patches Maximum size of positively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Hyd_Patches Number of hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Hyd_Patches_gt250 Number of hydrophobic patches of size > 250 Ų Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Hyd_Patches_gt500 Number of hydrophobic patches of size > 500 Ų Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Neg_Patches Number of negatively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Neg_Patches_gt250 Number of negatively charged patches of size > 250 Ų Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Neg_Patches_gt500 Number of negatively charged patches of size > 500 Ų Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Pos_Patches Number of positively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Pos_Patches_gt250 Number of positively charged patches of size > 250 Ų Calculated by Protein Surface Analyzer (BioLuminate)
Nr_Pos_Patches_gt500 Number of positively charged patches of size > 500 Ų Calculated by Protein Surface Analyzer (BioLuminate)
Sum_Score_Hyd_Patches Sum of residue contributions to hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Sum_Score_Neg_Patches Sum of energies of negatively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Sum_Score_Pos_Patches Sum of energies of positively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Sum_Size_Hyd_Patches Sum of sizes of hydrophobic patches Calculated by Protein Surface Analyzer (BioLuminate)
Sum_Size_Neg_Patches Sum of sizes of negatively charged patches Calculated by Protein Surface Analyzer (BioLuminate)
Sum_Size_Pos_Patches Sum of sizes of positively charged patches Calculated by Protein Surface Analyzer (BioLuminate)

Other descriptors.

Descriptor Explanation Comments/references
Experiment Experimental values Users should replace this column with the experimental values of target property for machine learning
Ionic_strength Ionic strength of environment This descriptor can be changed depending on experimental conditions (to be implemented later)