Developing Your Own Jaguar pKa Correction Parameters

If you want to develop your own pKa parameters, you can do so by adding information to the file that contains the parameters that Jaguar uses to correct its calculated pKa values, $SCHRODINGER/mmshare-vversion/data/jaguar/pka_match.xml. This file also contains the SMARTS patterns that enable Jaguar to recognize functional groups. You can thus extend Jaguar’s ability to calculate pKa values for new functional groups simply by adding the appropriate SMARTS patterns and correction parameters to this file.

A description of the XML file format standard is beyond the scope of this document, but the format is very simple and resembles HTML in its use of tags to enclose sections of ordinary text. The tags identify the purpose of the enclosed text. For example, the pKa module information for carboxylic acids looks like this:

<functional_group name="carboxylic acid" jaguar_id="4">
        <jaguar f1="0.4451" f2="-0.2516"/>
        <smarts>
         [#1]O[CX3]=O
        </smarts>
        <smarts>
         [OX1-][CX3]=O
        </smarts>
    </functional_group>

where name is a double-quoted string that describes the functional group, jaguar_id is an optional arbitrary index number for the functional group, and f1 and f2 are the pKa correction factors. The first SMARTS pattern describes the acidic form of the molecule, while the second SMARTS pattern describes the basic form of the molecule. For more information on SMARTS patterns, see the web page http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html.

The two pKa correction factors, f1 and f2, come from a linear fit of the calculated pKa values to the experimental values for a particular training set of molecules. f1 is the slope and f2 is the intercept. You can perform linear fits with many commonly available software programs.

Here are some suggestions for selecting a set of molecules to use as a training set for the development of new pKa correction parameters:

  • Select molecules for which the experimental pKa values are accurately known. Aqueous pKa values near 14 and beyond, or near 0 and beyond, are not generally accurate because of the difficulty in measuring the concentration of acid or base in the presence of high concentrations of hydronium or hydroxide (the leveling effect).

  • All of the experimental pKa values must be in the same solvent at the same temperature, plus or minus a few degrees. pKa values in mixed solvents are not a good choice. This is because the continuum solvation model used by Jaguar requires the specification of a single solvent dielectric constant and probe radius, and it is not known how these parameters should be specified for a mixed solvent system, especially when the degree of preferential solvation of the solute is unknown.

  • Try to obtain experimental pKa values that cover as wide a pKa range as possible for the given functional group. This ensures that the fitting parameters are broadly applicable to molecules containing that functional group.

  • The more molecules you use in the training set, the more clearly you can see how well the calculated pKa correlates with the experimental pKa, and the better idea you will have of the RMS error.

  • Do not select training set molecules that contain symmetrically equivalent functional groups. An additional correction term is required in this case to account for the increased entropy when degenerate sites are present. This correction can be applied manually, as needed, after the f1 and f2 correction factors have been automatically applied by Jaguar—see Equivalent Sites in Jaguar pKa Calculations.