Peptide QSAR — Setup Tab

In this tab you import the sequences and observable to be fit, choose a test and training set and set options for the model, then build the model. You can also apply a previously built model.

Setup Tab Features

Load Sequences and Observables button

Load the peptide sequences and the observable for each sequence into the table, which makes them available for use in building or applying a QSAR model. The contents of the table are replaced by the sequences you load. Opens the Peptide QSAR - Load Sequences and Observables Dialog Box.

Sequence table

This table lists the sequences imported into the panel, showing the name, the sequence, the observable, and the membership in the training set or the test set. You can select multiple rows in the table to assign them to one of the sets. If you chose to set values for the observable, you can edit the table cells in the Observable column to enter the values.

Set selected rows as option menu and Update button

Assign the rows that are selected in the sequence table to one of four sets to be used in building the QSAR model: Test set, Training set, Either training or test, Neither training nor test. To make the assignment, select the rows, choose the set, and click Update. The Training or Test Set column in the table is then filled in with the choice.

Export table button

Export the table data to a CSV file. Opens a file selector, in which you can navigate to a location and name the file.

QSAR method option menu

Choose the method for building the QSAR model. The choices are Partial Least Squares (PLS) or Kernel-based Partial Least Squares (KPLS). When you choose a method, the Advanced Options button changes to reflect the choice.

Peptide descriptor type option menu

Choose the type of amino acid descriptor set to be used as the X (independent) variables in the model. The choices are:

  • zvalue—Use the three z-value variables (z1, z2, z3) of Hellberg et al. [7] for the amino acid descriptors. These are derived from a principal components analysis (PCA) of 29 physicochemical variables for the 20 coded amino acids. The descriptors include molecular weight, pKa, pI, side-chain vdW volumes, NMR shifts, retention times, partition coefficients, solvent exposure. Choose this variable set only if the peptides in your set consist entirely of coded amino acids.

  • ezvalue—Use the five extended z-value variables of Sandberg et al. [8] for the amino acid descriptors. These are derived from a principal components analysis of 26 physicochemical descriptors for 87 amino acids (including the 20 coded amino acids). The descriptors include molecular weight, NMR shifts, partition coefficients, side-chain vdW volumes, HOMO and LUMO energies, heats of formation, polarizabilities, surface areas, hardnesses, TLC retention times, hydrogen-bond donor and acceptor counts, side chain charges.

  • dpps—Use the 10 divided physicochemical property scores of Tian et al. [9]. These are derived from 23 electronic, 54 hydrophobic, 37 steric and 5 H-bond properties of the 20 coded amino acids, by applying principal components analysis to each of the groups separately and keeping 4 electronic components and 2 each for the other groups.

  • all—Use all three sets of descriptors described above in the model.

Build a new model option

Select this option if you want to build a new model. When you choose this option, the options for building a model become available. The options for the rows in the table to use as the training set and the test set are described below.

Use all rows in the table option

Select this option to use all rows in the table to build the model. The text boxes for setting the percentage to use as the test set and the random seed become available when you select this option.

Use only rows marked as either option

Use only the table rows that are marked Either training or test to build the model. All other rows are ignored (even if they are marked Test set or Training set).

Use rows marked as Training Set and Test Set option

Use the table rows marked as Test set for the test set, and rows marked as Training set for the training set, and ignore all other rows.

Settings for model building and defining the test set are described below.

Advanced Options for PLS/KPLS button

Set options for building the model. Opens the Advanced Options - PLS dialog box or the Advanced Options - KPLS dialog box, depending on the QSAR method chosen. The button text also depends on the QSAR method chosen.

Randomly select N % for test set text box

Set the percentage of the chosen rows to be used for the test set. The remainder are used for the training set.

Seed text box

Specify a seed for random selection of the rows to use in the test and training sets.

Apply a model option, text box and Browse button

Apply a previously saved QSAR model to the sequences in the table. Enter the file name of the QSAR model file in the text box or click Browse to locate and select the file. Options for selection of the rows to apply the model to are listed below.

Use all rows in table option

Apply the selected QSAR model to all rows in the table.

Use rows marked as option and menu

Apply the selected QSAR model to the rows in the table that are marked as specified on the option menu.

Build/Apply button

Build or apply the QSAR model. The button is labeled Build if you are building a model, and Apply if you are applying a model. A busy cursor is displayed while the model is being built.

Reset button

Clear all data from the panel and reset all settings to their defaults. If you have a model that has not been saved, you are prompted to save it before resetting the panel.