QSAR, AutoQSAR, descriptors, features, machine learning, ML
Protein Descriptors
Protein descriptors can be used to develop QSAR/QSPR models for proteins, using tools such as the AutoQSAR Panel. A set of protein descriptors can be generated with the Calculate Protein Descriptors Panel. These descriptors are generated for the sequences, the structures, and from a patch analysis. The descriptors are listed in the tables below. See Ref. 18 for details on the descriptors.
| Descriptor | Explanation | Comments/references |
|---|---|---|
| AGGRESCAN_Nr_hotspots | Number of aggregation hotspots, nHS (from Aggrescan) | http://bioinf.uab.es/aap/aap_help.html |
| AGGRESCAN_a3v_value | Sum of Average amino acid flexibility value (Aggrescan) | http://bioinf.uab.es/aap/aap_help.html |
| Aa_Composition | Sum of Amino acid composition values(McCaldon & Argos) | https://web.expasy.org/protscale/pscale/A.A.composition.html |
| Aa_Composition_Swissprot | Sum of Amino acid composition values based on proteins in SwissProt | https://web.expasy.org/protscale/pscale/A.A.Swiss-Prot.html |
| Aa_Flexibility_VTR | Sum of Amino acid flexibility scale values (Vihinen, Torkkila & Rikonen) | https://www.ncbi.nlm.nih.gov/pubmed/8090708 |
| All_Aggrescan_a4v | Sum of Average a3v over sliding window; aka a4vSS (Aggrescan) | https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-65 |
| All_Aggrescan_a4v_pos | Sum of Average of a3v positive values over sliding window (Aggrescan) | |
| All_AggScore | Sum of AggScore that includes all residues in the target protein | |
| All_Zyggregator_profile_smoothed | Sum of Zyggregator profile Z-scores (Zyggregator) | |
| All_Zyggregator_profile_smoothed_pos | Sum of positive Zyggregator profile Z-scores (Zyggregator) | |
| Alpha_Helix_Chou_Fasman | Sum of alpha helix propensities (Chou & Fasman) | https://web.expasy.org/protscale/pscale/alpha-helixFasman.html |
| Alpha_Helix_Deleage_Roux | Sum of alpha helix propensities (Deleage & Roux) | https://web.expasy.org/protscale/pscale/alpha-helixRoux.html |
| Alpha_Helix_Levitt | Sum of alpha helix propensities (Levitt) | https://web.expasy.org/protscale/pscale/alpha-helixLevitt.html |
| Antiparallel_Beta_Strand | Sum of Conformational preference for antiparallel beta strand (Lifson & Sander) | https://web.expasy.org/protscale/pscale/Antiparallelbeta-strand.html |
| Average_Flexibility_BP | Sum of amino acid average flexibility values (Bhaskaran & Ponnusamy) | https://web.expasy.org/protscale/pscale/Averageflexibility.html |
| Avg_Area_Buried | Sum of Average area buried on transfer from standard state to folded protein (Rose et al.) | https://web.expasy.org/protscale/pscale/Averageburied.html |
| Beta_Sheet_Chou_Fasman | Sum of beta sheet propensities (Chou & Fasman) | https://web.expasy.org/protscale/pscale/beta-sheetFasman.html |
| Beta_Sheet_Deleage_Roux | Sum of beta sheet propensities (Deleage & Roux) | https://web.expasy.org/protscale/pscale/beta-sheetRoux.html |
| Beta_Sheet_Levitt | Sum of beta sheet propensities (Levitt) | https://web.expasy.org/protscale/pscale/beta-sheetLevitt.html |
| Beta_Turn_Chou_Fasman | Sum of beta turn propensities (Chou & Fasman) | https://web.expasy.org/protscale/pscale/beta-turnFasman.html |
| Beta_Turn_Deleage_Roux | Sum of beta turn propensities (Deleage & Roux) | https://web.expasy.org/protscale/pscale/beta-turnRoux.html |
| Beta_Turn_Levitt | Sum of beta turn propensities (Levitt) | https://web.expasy.org/protscale/pscale/beta-turnLevitt.html |
| Bulkiness | Sum of amino acid bulkiness values | https://web.expasy.org/protscale/pscale/Bulkiness.html |
| Coil_Deleage_Roux | Sum of Conformational parameters for coil (Deleage & Roux) | https://web.expasy.org/protscale/pscale/CoilRoux.html |
| Disorder_Propensity_DisProt | Sum of disorder promotion propensities | https://www.ncbi.nlm.nih.gov/pubmed/17578581 |
| Disorder_Propensity_FoldUnfold | Sum of disorder promotion propensities (FoldUnFold) | https://www.ncbi.nlm.nih.gov/pubmed/15498936 |
| Disorder_Propensity_TOP_IDP | Sum of disorder propensity for intrinsic disorder (TOP-IDP scale) | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2676888/ |
| HPLC_Retention_Ph_2_1 | Sum of scale of Retention coefficients in HPLC, pH 2.1 | https://web.expasy.org/protscale/pscale/HPLC2.1.html |
| HPLC_Tfa_Retention | Sum of scale of retention coefficients in HPLC/TFA | https://web.expasy.org/protscale/pscale/HPLCTFA.html |
| Hplc_Hfba_Retention | Sum of scale of Retention coefficient in HFBA | https://web.expasy.org/protscale/pscale/HPLCHFBA.html |
| Hplc_Retention_Ph_7_4 | Sum of scale of Retention coefficients in HPLC, pH 7.4 | https://web.expasy.org/protscale/pscale/HPLC7.4.html |
| Hydrophobicity_Abraham_Leo | Sum of hydrophobicity scale values (Abraham & Leo) | https://web.expasy.org/protscale/pscale/Hphob.Leo.html |
| Hydrophobicity_Black | Sum of hydrophobicity scale values (Black) | https://web.expasy.org/protscale/pscale/Hphob.Black.html |
| Hydrophobicity_Bull_Breese | Sum of hydrophobicity scale values (Bull & Breese) | https://web.expasy.org/protscale/pscale/Hphob.Breese.html |
| Hydrophobicity_Chothia | Sum of hydrophobicity scale values based on Proportion of residues 95% buried (Chothia) | https://web.expasy.org/protscale/pscale/Hphob.Chothia.html |
| Hydrophobicity_Eisenberg | Sum of Normalized consensus hydrophobicity scale values (Eisenberg et al.) | https://web.expasy.org/protscale/pscale/Hphob.Eisenberg.html |
| Hydrophobicity_Fauchere | Sum of hydrophobicity scale values (Fauchere) | https://web.expasy.org/protscale/pscale/Hphob.Fauchere.html |
| Hydrophobicity_Guy | Sum of Hydrophobicity scale values based on free energy of transfer (kcal/mole) (Guy) | https://web.expasy.org/protscale/pscale/Hphob.Guy.html |
| Hydrophobicity_Hopp_Woods | Sum of hydrophilicity scale values (Hopp & Woods) | https://web.expasy.org/protscale/pscale/Hphob.Woods.html |
| Hydrophobicity_Hplc_Parker | Sum of Hydrophilicity scale derived from HPLC peptide retention times (Parker et al.) | https://web.expasy.org/protscale/pscale/Hphob.Parker.html |
| Hydrophobicity_Hplc_Ph_3_4_Cowan | Sum of Hydrophobicity indices at ph 3.4 determined by HPLC (Cowan & Whittaker) | https://web.expasy.org/protscale/pscale/Hphob.pH3.4.html |
| Hydrophobicity_Hplc_Ph_7_5_Cowan | Sum of Hydrophobicity indices at ph 7.5 determined by HPLC (Cowan & Whittaker) | https://web.expasy.org/protscale/pscale/Hphob.pH7.5.html |
| Hydrophobicity_Hplc_Wilson | Sum of Hydrophobic constants derived from HPLC peptide retention times (Wilson et al.) | https://web.expasy.org/protscale/pscale/Hphob.Wilson.html |
| Hydrophobicity_Janin | Sum of hydrophobicity scale dG of transfer from inside to outside of a globular protein (Janin) | https://web.expasy.org/protscale/pscale/Hphob.Janin.html |
| Hydrophobicity_Kyte_Doolittle | Sum of hydrophobicity scale values (Kyte & Dolittle) | https://web.expasy.org/protscale/pscale/Hphob.Doolittle.html |
| Hydrophobicity_Manavalan | Sum of Average surrounding hydrophobicity scale values (Manavalan & Ponnusamy) | https://web.expasy.org/protscale/pscale/Hphob.Manavalan.html |
| Hydrophobicity_Miyazawa_Jernigan | Sum of hydrophobicity scale values (Miyazawa & Jernigan) | https://web.expasy.org/protscale/pscale/Hphob.Miyazawa.html |
| Hydrophobicity_Rao_Argos | Sum of membrane buried helix parameters (Rao & Argos) | https://web.expasy.org/protscale/pscale/Hphob.Argos.html |
| Hydrophobicity_Rf_Mobility | Sum of hydrophobicity scale values based on Mobilities on chromatography paper (Aboderin) | https://web.expasy.org/protscale/pscale/Hphob.mobility.html |
| Hydrophobicity_Rose | Sum of hydrophobicity scale values based on Mean fractional exposed area loss (Rose) | https://web.expasy.org/protscale/pscale/Hphob.Rose.html |
| Hydrophobicity_Roseman | Sum of hydrophobicity scale values (Roseman) | https://web.expasy.org/protscale/pscale/Hphob.Roseman.html |
| Hydrophobicity_Sweet | Sum of Optimized matching hydrophobicity (OMH) scale values (Sweet) | https://web.expasy.org/protscale/pscale/Hphob.Sweet.html |
| Hydrophobicity_Tanford | Sum of hydrophobicity scale values (Tanford) | https://web.expasy.org/protscale/pscale/Hphob.Tanford.html |
| Hydrophobicity_Welling | Sum of Antigenicity value (X 10) values (Welling) | https://web.expasy.org/protscale/pscale/Hphob.Welling.html |
| Hydrophobicity_Wolfenden | Sum of Hydration potential (kcal/mole) at 25 °C scale values (Wolfenden) | https://web.expasy.org/protscale/pscale/Hphob.Wolfenden.html |
| Molecular_Weight | Molecular Weight based on a simple amino acid Mol Wt scale | https://web.expasy.org/protscale/pscale/Molecularweight.html |
| Number_Of_Codons | Sum of # of codons encoding each amino acid in the universal genetic code | https://web.expasy.org/protscale/pscale/Numbercodons.html |
| Parallel_Beta_Strand | Sum of Conformational preference values for parallel beta strand (Lifson & Sander) | https://web.expasy.org/protscale/pscale/Parallelbeta-strand.html |
| Percentage_Accessible_Res | Sum of Molar fraction (%) values (of 3220 accessible residues) (Janin) | https://web.expasy.org/protscale/pscale/accessibleresidues.html |
| Percentage_Buried_Res | Sum of Molar fraction (%) values (of 2001 buried residues) (Janin) | https://web.expasy.org/protscale/pscale/buriedresidues.html |
| Polarity_Grantham | Sum of polarity scale values (Grantham) | https://web.expasy.org/protscale/pscale/PolarityGrantham.html |
| Polarity_Zimmerman | Sum of polarity scale values (Zimmerman) | https://web.expasy.org/protscale/pscale/PolarityZimmerman.html |
| Ratio_Hetero_End_Side | Sum of Atomic weight ratio of hetero elements in end group to C in side chain | https://web.expasy.org/protscale/pscale/Ratioside.html |
| Recognition_Factors | Sum of recognition factors (average of interaction energy with each 20 aa) of each aa (Fraga) | https://web.expasy.org/protscale/pscale/Recognitionfactors.html |
| Refractivity | Sum of refractivity index of each aa (Jones) | https://web.expasy.org/protscale/pscale/Refractivity.html |
| Relative_Mutability | Sum of relative mutability values of aa; with Ala=100 (Dayhoff et al.) | https://web.expasy.org/protscale/pscale/Relativemutability.html |
| Total_Beta_Strand | Sum of Conformational preference for total beta strand (antiparallel+parallel) (Lifson & Sander) | https://web.expasy.org/protscale/pscale/Totalbeta-strand.html |
| Transmembrane_Tendency | Sum of transmembrane tendency values (Zhao & London) | https://web.expasy.org/protscale/pscale/Transmembranetendency.html |
| ZYGGREGATOR_p_agg | Sum of p_agg values (Zyggregator) | |
| ZYGGREGATOR_z_agg | Zyggregator z_agg value (Zyggregator) |
| Descriptor | Explanation | Comments/references |
|---|---|---|
| All_Aromatic_SASA | Sum of SASA of aromatic residues | |
| All_Atomic_Contact_Energy | Atomic Contact Energy (ACE) according to Zhang et al. (see Ref) | https://www.ncbi.nlm.nih.gov/pubmed/9126848 |
| All_Dipole_Moment | Dipole moment of molecule | |
| All_Formal_Charge | Total Formal charge of molecule (Sum of formal charges of individual atoms) | |
| All_Greasy_SASA | Normalized SASA of greasy atoms (extremely hydrophobic atoms with slogp > 0.05 and energy > 10) | Normalized means divided by Total SASA |
| All_HB_Acceptor_SASA | Normalized SASA of hydrogen bond acceptor atoms | Normalized means divided by Total SASA |
| All_HB_Donor_SASA | Normalized SASA of hydrogen bond donor atoms | Normalized means divided by Total SASA |
| All_Hydrophilic_SASA | Normalized SASA of hydrophilic atoms | Normalized means divided by Total SASA |
| All_Hydrophobic_Moment | Hydrophobic Moment of molecule | |
| All_Hydrophobic_SASA | Normalized SASA of hydrophobic atoms | Normalized means divided by Total SASA |
| All_Moment_of_Inertia | Moment of Inertia of molecule | |
| All_Negative_formal_SASA | Normalized SASA of atoms with negative formal charge | Normalized means divided by Total SASA |
| All_Negative_partial_SASA | Normalized SASA of atoms with negative partial charge | Normalized means divided by Total SASA |
| All_Positive_formal_SASA | Normalized SASA of atoms with positive formal charge | Normalized means divided by Total SASA |
| All_Positive_partial_SASA | Normalized SASA of atoms with positive partial charge | Normalized means divided by Total SASA |
| All_SASA | Total Solvent Accessible Surface Area (SASA) | |
| All_Zeta_Potential | Zeta Potential | |
| Apparent_Charge_eV | Apparent charge | |
| Atomic_contact_energy | Atomic Contact Energy (ACE) according to Zhang et al. (see Ref) | https://www.ncbi.nlm.nih.gov/pubmed/9126848 |
| Connectivity | Connectivity index of the molecule | |
| Debye_length | Debye length | |
| Dipole_X_direction | Dipole moment along X-axis | All molecules should be aligned for this descriptor to be meaningful |
| Dipole_Y_direction | Dipole moment along Y-axis | All molecules should be aligned for this descriptor to be meaningful |
| Dipole_Z_direction | Dipole moment along Z-axis | All molecules should be aligned for this descriptor to be meaningful |
| Dipole_moment | Dipole moment | |
| Drag_coeffient | Drag Coefficient | http://www.nottingham.ac.uk/ncmh/documents/papers/paper216.pdf |
| Electrophoretic_mobility | Electrophoretic mobility | |
| Exposed_agg_surf_area | Total SASA of greasy surfaces (extremely hydrophobic) | |
| Formal_Charge_eV | Total formal charge of molecule | |
| Hydrodynamic_radius | Hydrodynamic radius | http://www.nottingham.ac.uk/ncmh/documents/papers/paper216.pdf |
| Hydrophobic_X_direction | Hydrophobic Moment of molecule in X-direction | All molecules should be aligned for this descriptor to be meaningful |
| Hydrophobic_Y_direction | Hydrophobic Moment of molecule in Y-direction | All molecules should be aligned for this descriptor to be meaningful |
| Hydrophobic_Z_direction | Hydrophobic Moment of molecule in Z-direction | All molecules should be aligned for this descriptor to be meaningful |
| Hydrophobicity_moment_KD | Hydrophobicity moment (based on Kyte-Doolittle scale) | |
| Hydrophobicity_moment_Rosman | Hydrophobicity moment (based on Roseman scale) | |
| Inertia_X_direction | Moment of Inertia along X-direction | All molecules should be aligned for this descriptor to be meaningful |
| Inertia_Y_direction | Moment of Inertia along Y-direction | All molecules should be aligned for this descriptor to be meaningful |
| Inertia_Z_direction | Moment of Inertia along Z-direction | All molecules should be aligned for this descriptor to be meaningful |
| Molecular_weight_kDa | Molecular weight measured as Sum of atomic weights | |
| Net_Charge_model_based | Net charge according to standard assignment of charges | |
| Net_Charge_propka_based | Net charged based on PropKa charge assignment | |
| Nr_of_hydrogen_bonds | Number of hydrogen bonds | |
| Nr_rotatable_bonds | Number of rotatable bonds | |
| Radius_of_gyration | Radius of gyration | http://www.nottingham.ac.uk/ncmh/documents/papers/paper216.pdf |
| Sedimentation_constant | Theoretical Sedimentation constant | |
| Total_acceptor_SASA | Total SASA of hydrogen bond acceptor atoms | |
| Total_aromatic_SASA | Total SASA of aromatic atoms | |
| Total_donor_SASA | Total SASA of hydrogen bond donor atoms | |
| Total_hydrophil_SASA_slogp | Total SASA of atoms with slogp < 0 | |
| Total_hydrophilic_SASA | Total SASA of hydrophilic atoms | |
| Total_hydrophob_SASA_slogp | Total SASA of atoms with slogp > 0 | |
| Total_hydrophobic_SASA | Total SASA of hydrophobic atoms | |
| Total_negative_SASA | Total SASA of negatively charged atoms | |
| Total_positive_SASA | Total SASA of positively charged atoms | |
| Volume_asa_based | Volume based on ASA | |
| Volume_vdw_based | Volume based on vdw radii | |
| Zeta_potential | Zeta potential | |
| pI_PROPKA_based | Predicted pI based on PropKa | |
| pI_model_pKa_based | Predicted pI based on standard pKas of titratable groups |
| Descriptor | Explanation | Comments/references |
|---|---|---|
| All_AggScore | Schrodinger's patch-based Aggregation propensity score | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Hydrophobic_Patch_Energy | Sum of residue contributions to hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Hydrophobic_Patch_Energy_gt15 | Sum of residue contributions to hydrophobic patches > 15 (strong hydrophobic) | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Hydrophobic_Patch_Energy_gt30 | Sum of residue contributions to hydrophobic patches > 30 (very strong hydrophobic) | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Negative_Patch_Energy | Sum of residue contributions to negatively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Negative_Patch_Energy_gt30 | Sum of residue contributions to negative patches > 30 (strong -ve) | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Negative_Patch_Energy_gt50 | Sum of residue contributions to negative patches > 50 (very strong -ve) | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Positive_Patch_Energy | Sum of residue contributions to positively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Positive_Patch_Energy_gt30 | Sum of residue contributions to positive patches > 30 (strong +ve) | Calculated by Protein Surface Analyzer (BioLuminate) |
| All_Positive_Patch_Energy_gt50 | Sum of residue contributions to positive patches > 50 (very strong +ve) | Calculated by Protein Surface Analyzer (BioLuminate) |
| Avg_Score_Hyd_Patches | Average energy of hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Avg_Score_Neg_Patches | Average energy of negative patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Avg_Score_Pos_Patches | Average energy of positive patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Avg_Size_Hyd_Patches | Average size of hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Avg_Size_Neg_Patches | Average size of negative patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Avg_Size_Pos_Patches | Average size of positive patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Max_Score_Hyd_Patches | Maximum energy of hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Max_Score_Neg_Patches | Maximum energy of negatively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Max_Score_Pos_Patches | Maximum energy of positively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Max_Size_Hyd_Patches | Maximum size of hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Max_Size_Neg_Patches | Maximum size of negatively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Max_Size_Pos_Patches | Maximum size of positively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Hyd_Patches | Number of hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Hyd_Patches_gt250 | Number of hydrophobic patches of size > 250 Ų | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Hyd_Patches_gt500 | Number of hydrophobic patches of size > 500 Ų | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Neg_Patches | Number of negatively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Neg_Patches_gt250 | Number of negatively charged patches of size > 250 Ų | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Neg_Patches_gt500 | Number of negatively charged patches of size > 500 Ų | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Pos_Patches | Number of positively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Pos_Patches_gt250 | Number of positively charged patches of size > 250 Ų | Calculated by Protein Surface Analyzer (BioLuminate) |
| Nr_Pos_Patches_gt500 | Number of positively charged patches of size > 500 Ų | Calculated by Protein Surface Analyzer (BioLuminate) |
| Sum_Score_Hyd_Patches | Sum of residue contributions to hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Sum_Score_Neg_Patches | Sum of energies of negatively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Sum_Score_Pos_Patches | Sum of energies of positively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Sum_Size_Hyd_Patches | Sum of sizes of hydrophobic patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Sum_Size_Neg_Patches | Sum of sizes of negatively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Sum_Size_Pos_Patches | Sum of sizes of positively charged patches | Calculated by Protein Surface Analyzer (BioLuminate) |
| Descriptor | Explanation | Comments/references |
|---|---|---|
| Experiment | Experimental values | Users should replace this column with the experimental values of target property for machine learning |
| Ionic_strength | Ionic strength of environment | This descriptor can be changed depending on experimental conditions (to be implemented later) |