combinatorial_diversity Command Help

Command: $SCHRODINGER/utilities/combinatorial_diversity

usage: combinatorial_diversity [-h] [-min_pop <m>] [-ndim <n>] [-rand <seed>]
                               [-nocopy] [-nofail] [-maxopen <n> | -nosplit]
                               [-products <p>] [-inflate <factor>]
                               [-fptype {dendritic,linear,molprint2D,radial}]
                               [-savefp] [-onlyfp] [-out <outfile>] [-no3d]
                               [-v3000] [-verbose] [-filter <file>]
                               [-list_props] [-hba <file>] [-hbd <file>]
                               [-NJOBS NJOBS] [-HOST <hostname>]
                               [-TMPDIR TMPDIR] [-JOBNAME JOBNAME]
                               <infile> <ndiverse>

Performs diverse structure selection with optional biasing of properties to
lie in specified ranges. Runs in combinatorial mode, where diverse structures
are selected after enumerating a minimum number of diverse products, or in
conventional mode, where diverse structures are selected directly from a file.

Copyright Schrodinger LLC, All Rights Reserved.

positional arguments:
  <infile>              Source of input structures. May be a combinatorial
                        synthetic route file (.json), a 32-bit Canvas
                        fingerprint file with SMILES and properties (.fp), a
                        CSV file with SMILES, titles and properties (.csv) or
                        a SMILES file (.smi).
  <ndiverse>            The number of diverse structures to select. Linear
                        scaling and distributed processing are achieved by
                        splitting chemical space into 2**N distinct regions
                        (where N is determined by -min_pop) and selecting the
                        appropriate number of diverse structures from each
                        region. To ensure speedy selection and high diversity,
                        it is strongly recommended that <ndiverse> be no
                        larger than 5% of the total pool from which selections
                        are to be made.

options:
  -h, --help            Show this message and exit.
  -min_pop <m>          The minimum population of each distinct region of
                        chemical space. This option would normally be used to
                        speed up a job for which the 5% rule is being
                        exceeded. For example, if selecting 10,000 diverse
                        structures from a pool of 100,000, reducing the
                        minimum population from 10,000 to 5,000 would
                        typically double the number of regions and halve total
                        selection time (default: 10,000).
  -ndim <n>             The number of dimensions in the chemical space from
                        which the distinct regions are defined. A maximum of
                        2**(n-1) regions are possible, so if n=10, up to 512
                        regions can be defined. This parameter would normally
                        be adjusted only when the pool of structures is so
                        large that the population of each region significantly
                        exceeds 10,000, even when splitting over the maximum
                        number of regions. A good rule of thumb is to use the
                        default value of 10 for a pool of up to 5 million, and
                        increase by 1 for each doubling of the pool size,
                        e.g., 10 million -> -ndim 11, 20 million -> -ndim 12,
                        etc.
  -rand <seed>          Random seed integer for initializing diversity
                        algorithm. Results are always the same for a given
                        random seed (default: 1).
  -nocopy               Utilize <infile> at its specified location and do not
                        copy to the job directory. This option is most useful
                        for very large input fingerprint files, as it allows a
                        given diversity subjob to directly access the
                        fingerprint rows assigned to it, without the cost of
                        copying or physically splitting the fingerprint file.
                        The file name must be specified using an absolute
                        path, and that path must be accessible to all compute
                        nodes on the host where the job is to run.
  -nofail               Exit with an error if a fingerprint generation subjob
                        or diversity selection subjob fails to successfully
                        complete. The default behavior is to issue a warning
                        to the log file but proceed with the partial results
                        from successfully completed subjobs.
  -maxopen <n>          When physically splitting an input fingerprint file or
                        an intermediate fingerprint file generated from the
                        input structures, allow no more than <n> output
                        fingerprint files to be open at any time. A larger
                        value of <n> results in faster splitting but greater
                        memory use (default: 256). Use -nosplit to disable
                        physical splitting.
  -nosplit              Do not physically split an input fingerprint file or
                        an intermediate fingerprint file generated from the
                        input structures. Similar to -nocopy, in that it
                        avoids the expense of splitting the fingerprint file,
                        and it allows each diversity subjob to directly access
                        its fingerprint rows. Differs from -nocopy, in that it
                        does not require an absolute path, but it does result
                        in the entire fingerprint file being copied to the job
                        directory of each diversity subjob. Mutually exclusive
                        with -maxopen.
  -products <p>         The minimum number of products that must be
                        successfully enumerated before selecting diverse
                        structures. Applies only to .json input. The default
                        is 20 times the number of diverse structures. This
                        option MUST be specified if the number of diverse
                        structures is greater than 50,000.
  -inflate <factor>     Product inflation factor. This value is multiplied by
                        the minimum number of products and supplied to
                        combinatorial_synthesis to ensure that an excess of
                        products are made. Applies only to .json input
                        (default: 1.25).
  -fptype {dendritic,linear,molprint2D,radial}
                        The type of Canvas fingerprints to generate for .json,
                        .csv and .smi inputs (default: molprint2D).
  -savefp               Save generated fingerprints to <jobname>_<fptype>.fp.
                        A default set of physicochemical properties are saved
                        with the fingerprints for .json and .smi inputs if a
                        property filter is supplied (see -filter).
  -onlyfp               Save generated fingerprints and exit without selecting
                        diverse structures. This option is provided to allow
                        large fingerprint files to be moved to a cross-mounted
                        location and supplied in a subsequent job with the
                        -nocopy option.
  -out <outfile>        Output Maestro, SD, CSV or SMILES file for diverse
                        structures (default: <jobname>_diverse.csv).
  -no3d                 Skip 3D coordinate generation for diverse structures.
  -v3000                Write SD file structures in V3000 format.
  -verbose              Output details of diversity selection/property
                        biasing.

Property Biasing Options:
  -filter <file>        CSV file containing one or more property filters, with
                        one filter per line. Each filter consists of the name
                        of a property, followed by the preferred minimum and
                        maximum values of that property, e.g., AlogP,2.0,5.0.
                        In the case of .json or .smi input, use of this option
                        triggers the creation of a set of default
                        physicochemical properties to which filters may be
                        applied. In the case of .fp or .csv input, filters may
                        be applied only to the numeric properties present in
                        those files. Use -list_props to see available
                        properties. Note that diverse structures are selected
                        with a bias toward satisfying as many filters as
                        possible, but not necessarily all filters. Note also
                        that a given property may appear in more than one
                        filter, so that multiple desired ranges are possible.
  -list_props           Get the list of properties available for biasing. Will
                        be the automatically calculated properties for .json
                        and .smi inputs, and the properties present in the
                        file for .fp and .csv inputs.
  -hba <file>           Use supplied rules to assign hydrogen bond acceptor
                        counts for .json and .smi input. Default rules are in
                        the file HbondAcceptor.typ in the Schrodinger software
                        installation.
  -hbd <file>           Use supplied rules to assign hydrogen bond donor
                        counts for .json and .smi input. Default rules are in
                        the file HbondDonor.typ in the Schrodinger software
                        installation.

Standard Options:
  -NJOBS NJOBS          Divide the overall job into NJOBS subjobs.

Job Control Options:
  -HOST <hostname>      Run job remotely on the indicated host entry.
  -TMPDIR TMPDIR        The name of the directory used to store files
                        temporarily during a job.
  -JOBNAME JOBNAME      Provide an explicit name for the job.