Creating Reagent Libraries - Best Practices

General overview

The reactant libraries that are part of the Schrödinger Suite are generated from the eMolecules reagent database. The example workflow below illustrates the procedure that is used. The steps listed below use some additional scripts to standardize the reactants and filter based on properties generated in the standardization process. As these scripts are not generally available, you can instead use the -properties, -neutralize, and -desalt flags in the create_reagent_library.py script, or create your own procedures to perform filtering and standardization.

  1. The eMolecules library is standardized (stripped of salts, neutralized, etc.) and all the physicochemical properties are generated. You can use your own libraries of reagents, internal libraries, or preferred vendors, instead.

  2. The library is filtered using a general set of compound properties (MW < 400, AlogP < 5, rotatable bonds < 10, PSA < 120). Some libraries are not filtered using these physicochemical properties, as none or few of the reactants would survive these filters. In these cases the reactant library is generated using the unfiltered set (e.g. tri n-butyltin reagents would have too many rotatable bonds).

  3. The full library is filtered with a special set of reactant filters. This is a set of SMARTS filters to filter out some undesired chemotypes (excluding reactive handles).

  4. The reactant libraries are generated as .pfx files with the eMolecules ID as the name field, using the create_reagent_library.py script (see below).

  5. For the reactions to work as intended it is important to use the -neutralize and -desalt options so that the resulting reactants are desalted and neutralized. This way the reactants match the SMARTS patterns of the corresponding reactions in which these are used.

Creating reagent libraries using the create_reagent_library script

create_reagent_library.py Command Help takes a structure file and creates reagent library files in the format required by combinatorial_synthesis Command Help, optionally installing them to your ~/.schrodinger directory. It is recommended to use the following options, for the reasons given:

  • -neutralize: the way the default reaction library is written, it is assumed that functional groups are in their neutral form; for example, carboxylic acids are protonated and amines are deprotonated. To ensure that reactants generated by the script are compatible with this convention, add this option.

  • -desalt: reactions in PathFinder are meant to act on one molecule at a time; if your building block library has salts, use this option to throw out the counterion.

    See create_reagent_library.py Command Help for information on the command options.

Reagent sources

By default, combinatorial_synthesis.py uses the reagent sources specified in the route file. In the case of reagents identified by reagent class, the script tries to find a file with a name based the reagent class in each of the following directories, in order of decreasing precedence:

  1. Directories specified with the -library_path argument;

  2. Directories specified via the SCHRODINGER_REAGENT_LIB environment variable;

  3. ~/.schrodinger/reagents or its Windows equivalent;

  4. $SCHRODINGER/mmshare-v*/data/reagents

For example, if a route requires a starting material of class "alcohol", the script will look for structure files matching alcohol.*, and if one structure file is found (e.g., alcohol.sdf) in a given directory, it is used as the reagent source. If multiple structure files are found in the same directory, an error is raised.

You can override the default reagent sources via the -r command-line option, which allows the specification of a reagent source for each starting material, identified by index. Before running the enumeration, you can get a summary of the route by passing the -print option:

$ run combinatorial_synthesis.py -print route-6.json
Alkylation-SN2-nuc=N
   R-Transform-Alkyl-Halide_to_ROH
       1: alcohol
   Schotten-Baumann_Amide
       2: acyl_chloride
       3: amine

The reagent lines all start with an indented number (the reagent index) followed by a colon. To specify non-default reagent sources for the alcohol (index #1), you could do the following:

$ run combinatorial_synthesis.py -r 1=my_alcohols.sdf route-6.json

The -r option can be passed multiple times; one per non-default reagent source.

Reagent classes data

This JSON object contains additional information about a reagent class. A reagent classes object looks like this:

{
  "acid_chlorides": {
    "description": "Handle with care!",
    "reactive": true,
    "smarts": "[#6:102]-[#6:5](=O)-Cl"
  },
  "alcohols": {
    "description": "Everybody's favorite functional group!",
    "reactive": false,
    "smarts": "[#6;!$([#6]=O):102]-[O;H1]"
  }
}

The available fields are:

  • reactive: compounds of a reactive reagent class should be starting materials for a route and can't appear in intermediate steps. For example, acid chlorides.

  • description: for informational purposes, maybe to show in the GUI.

  • smarts: may be used by create_reagent_library.py to create a reagent file for this class.