Peptide QSAR — Advanced Options - KPLS Dialog Box
Set options for the kernel-based partial least-squares (KPLS) procedure for fitting the descriptors. For more information on this method, see Model-Building MethodsDefinitions for .
To open this panel, click Advanced Options for KPLS in the Setup tab of the Peptide QSAR panel.
- Features
- Additional Resources
Advanced Options - KPLS Dialog Box Features
- Maximum number of KPLS factors box
- Kernel nonlinearity slider, box, and Reset button
- Stop adding KPLS factors when standard deviation of the regression drops to option and text box
- Calculate uncertainty on test set predictions option
- Use N bootstrapping cycles box
- Maximum number of KPLS factors box
-
Specify the maximum number of KPLS factors to use in the regression model. Regression models are built for increasing numbers of KPLS factors up to this number. The maximum number that can be used is limited by the number of descriptors, which is 3 times the number of residues for the zvalue set, 5 times the number of residues for the ezvalue set, and 10 times the number of residues for the dpps set. It is rarely useful to build models with more than a few PLS factors, as models with a large number tend to be overfit. You should examine the statistics, particularly the stability and Q2, to determine how many PLS factors to use in the model you choose for application to new systems.
- Kernel nonlinearity slider, box, and Reset button
-
Change the kernel nonlinearity value. A Gaussian kernel exp(−d2/σ2) is used, where d is the Euclidean distance between two X variables. The nonlinearity value is 1/σ, so small values are almost linear, and large values are very nonlinear. Higher nonlinearity typically leads to tighter fitting, but it also tends to give poorer predictions on new peptides.
- Stop adding KPLS factors when standard deviation of the regression drops to option and text box
-
Select this option to stop adding KPLS factors when the standard deviation of the regression drops below the value specified in the text box. Using this option could result in fewer KPLS factors than the number specified in the Maximum number of KPLS factors box.
- Calculate uncertainty on test set predictions option
-
Calculate a confidence interval for each predicted value in the test set, by bootstrapping. This is done by sampling the training set randomly with replacement to generate a new test set of the same size with duplicates, building a model and making predictions of the test set, then repeating the procedure a specified number of times. The standard deviation from the original test set is then calculated as the uncertainty.
- Use N bootstrapping cycles box
-
Specify the number of times a random sample is made and a prediction obtained in the uncertainty calculations. This number determines how many values are used in calculating the standard deviation, and should be at least 5.