canvasFPGen Command Help

Command: $SCHRODINGER/utilities/canvasFPGen

usage: canvasFPGen [-h] -i<fmt> <infile> -o<fmt> <outfile> [options]
                   [Fingerprint Generation Options]
                   [Bit Treatment Options]
                   [Structure Treatment Options]
                   [Job Control Options]

Generates fingerprints for the molecules in a structure file.

Copyright Schrodinger LLC, All Rights Reserved.

options:
  -h, --help            Show this message and exit.

Input/Output Format Options:
  -ismi <smiFile> |
  -icsv <csvFile> [-noHeader] [-d <delimiter>] [-smi <col>] [-name <col>] |
  -imae <maeFile> [-fieldAsName <prop> | -sequence | -stamp <value>] |
  -isd <sdFile> [-fieldAsName <prop> | -sequence | -stamp <value>]
  
  -o <fpFile> |
  -odata <fpFile> [-uniform] [-fieldOnly <prop1> [<prop2> ...]] |
  -ocsv <csvFile>
  
  -echoSMILES
  
  Exactly one -i<fmt> and one -o<fmt> specification must be provided.

  -ismi <smiFile>       Input SMILES file containing one SMILES per line and
                        no header. An optional name may appear after each
                        SMILES, following a space or tab.
  -icsv <csvFile>       Input CSV file with SMILES in the first column,
                        structure names in the second column and any
                        properties in subsequent columns. A header line with
                        column names is expected by default.
  -imae <maeFile>       Input Maestro file.
  -isd <sdFile>         Input SD file.
  -noHeader             Input CSV file does not contain a header line.
  -d <delimiter>        Input CSV file delimiter (default: ",").
  -smi <col>            Name or position of the SMILES column in <csvIn>. The
                        default is the first column (1).
  -name <col>           Name or position of the column in <csvIn> that holds
                        structure names. The default is the second column (2).
  -fieldAsName <prop>   Use the indicated property in <maeFile> or <sdFile> as
                        the source of structure names.
  -sequence             Set structure names to "1", "2", etc.
  -stamp <value>        Set all structure names to the provided value.
  -o <fpFile>           Output binary file with fingerprints and structure
                        names.
  -odata <fpFile>       Output binary file with fingerprints, structure names
                        and properties. Not legal with -noHeader. Equivalent
                        to -o <fpFile> when used with -ismi since SMILES files
                        have no properties. Note that SMILES strings are
                        written to <fpFile> only if -echoSMILES is provided.
  -ocsv <fpFile>        Output CSV file for maccs or custom fingerprints.
                        Column names will be SMARTS patterns or pattern
                        labels, the latter being used if they are provided in
                        the custom <keyfile> (see -fptype in Fingerprint
                        Generation Options).
  -uniform              All structures in <maeFile> or <sdFile> have exactly
                        the same data fields. This avoids the need to make a
                        full pass through the input file to determine the
                        union of property names when -odata is used.
  -fieldOnly <prop1> [<prop2> ...]
                        Write only the specified properties to <fpFile>. The
                        default is to write all properties when -odata is
                        used.
  -echoSMILES           Write SMILES strings to <fpFile>. Legal with all
                        -i<fmt> and -o<fmt> choices.

Fingerprint Generation Options:
  [-fptype <fptype> [<keyfile>]] [-atomtype <atomtype> [<typefile>]] [-xp]
  [-maxspan <span>] [-path <max>] [-minpath <min>] [-ring <len>] [-halfstep]
  [-3D [-binwidth <width>] [-binoverlap <over>] [-minbin <min>] [-maxbin <max>]]
  [-iter <n>] [-miniter <min>] [-estatePath <len>] [-estateWidth <width>]
  [-startH] [-endH]

  -fptype <fptype> [<keyfile>]
                        Fingerprint type. Legal values are "dendritic",
                        "linear", "maccs", "molprint2D", "pairwise",
                        "quartet", "radial", "torsion", "triplet", and
                        "custom" (default: linear). If custom, a <keyfile>
                        must be supplied with a SMARTS pattern to define each
                        fingerprint bit, followed optionally by a pattern
                        label to be written as a column name when -ocsv is
                        used. For appropriate <keyfile> format, refer to
                        maccssubset.txt in the Schrodinger software
                        distribution. Use -help_fptype for detailed
                        descriptions of the fingerprint types.
  -atomtype <atomtype> [<typefile>]
                        Atom typing scheme. Legal values are 1-12, C and E. If
                        C, a <typefile> of custom atom type definitions must
                        be supplied. For appropriate <typefile> format, refer
                        to Mol2.typ in the Schrodinger software distribution.
                        Defaults for each fingerprint type are: dendritic=10,
                        linear=10, molprint2D=5, pairwise=9, quartet=10,
                        radial=4, torsion=10, triplet=10. Not legal with maccs
                        or custom fingerprint types. Use -help_atomtype for
                        detailed descriptions of the atom typing schemes.
  -xp                   Represent fingerprints using 64-bit precision rather
                        than 32. This effectively eliminates collisions of ON
                        bits, but doubles the disk space of the fingerprint
                        file. Not legal with -ocsv.
  -maxspan <span>       Maximum number of atoms spanned in any direction of a
                        dendritic fingerprint fragment (default: no limit).
  -path <len>           Maximum path length in bonds for dendritic, linear and
                        molprint2D fingerprints. The defaults are 7, 5 and 2,
                        respectively.
  -minpath <min>        Minimum path length in bonds for dendritic, linear and
                        molprint2D fingerprints. The defaults are 0, 0 and 2,
                        respectively.
  -ring <len>           Maximum linear fingerprint path that includes a ring
                        closure (default: 14). This allows ring-containing
                        fragments to be included without causing a massive
                        proliferation in the number of acyclic fragments.
  -halfstep             When generating linear fingerprints, include fragments
                        that terminate in the middle of a bond, excluding the
                        atom on the other end of that bond.
  -3D                   Use 3D distances for pairwise, triplet and quartet
                        fingerprints. Requires Maestro or SD file input.
  -binwidth <width>     Floating point distance bin width for 3D pairwise,
                        triplet and quartet fingerprints (default: 1.0)
  -binoverlap <over>    Distance bin overlap for 3D pairwise, triplet and
                        quartet fingerprints (default: 0.0). A non-zero value
                        allows a distance to be assigned to multiple bins
                        (i.e., "fuzzy" binning).
  -minbin <min>         Minimum distance bin for 3D pairwise, triplet and
                        quartet fingerprints (default: 0.0).
  -maxbin <max>         Maximum distance bin for 3D pairwise, triplet and
                        quartet fingerprints (default: no limit).
  -iter <n>             Number of radial fingerprint iterations (default: 4).
  -miniter <min>        Number of radial iterations below which features are
                        discarded (default: 0).
  -estatePath <len>     Path length for E-state atom typing (default: 2).
  -estateWidth <width>  Binning width for E-state atom typing (default: 0.25).
  -startH               Calculate molprint2D codes for hydrogen atoms.
  -endH                 Consider bonds to terminal hydrogens when assigning
                        molprint2D codes.

Bit Treatment Options:
  [[-min <fmin> | -noone] [-max <fmax> | -noall] [-reduce <n>]] | [-mostSig <m>]
  [-scaling <rule>] [-compress] [-truncate] [-summaryOnly] | -noSummary]
  [-bs <filename> [-prefix <s>]]

  -min <fmin>           Omit bits that are ON in less than the specified
                        fraction of structures (default: 0.0).
  -noone                Omit bits that are ON in only a single structure. Not
                        legal with -min <fmin>.
  -max <fmax>           Omit bits that are ON in more than the specified
                        fraction of structures (default: 1.0).
  -noall                Omit bits that are ON in all structures. Not legal
                        with -max <fmax>.
  -reduce <n>           Reduce fingerprint precision by the indicated power of
                        2. For example, 32-bit fingerprints accommodate 2^32
                        unique bit positions, but "-reduce 22" shrinks that to
                        only 2^10 unique bit positions.
  -mostSig <m>          Keep only the <m> most informative bits. A value of
                        4294967295 triggers the use of an alternative
                        algorithm that selects a low-rank orthogonal set of
                        bits. Not legal in combination with any of the
                        previous filtering options.
  -scaling <rule>       Apply a scaling rule to replace each ON bit with a
                        floating point value. Must be an integer in the range
                        0-13 (default: 0). Not legal in combination with
                        -reduce <n>. Use -help_scaling to display the scaling
                        rules.
  -compress             Use frequency-based compression to reduce the required
                        storage by approximately tenfold. Ignored with -ocsv.
  -truncate             Truncate the fingerprint file if no ON bits are
                        generated. Not legal with multiple CPUs.
  -summaryOnly          Store only the aggregate ON bit counts, not the actual
                        fingerprints. Results in only the header being written
                        with -ocsv.
  -noSummary            Omit aggregate bit counts from the fingerprint file.
                        Not legal with -compress. Ignored with -ocsv.
  -bs <filename>        Write the set of retained ON bits to a binary file for
                        use by other software that must restrict fingerprints
                        to the same set of bits. Not legal with -ocsv.
  -prefix <s>           Prefix for the names of ON bits (default: "BIT").

Structure Treatment Options:
  [-n <ranges>] [-strip] [-nostereo] [-fill] [-obad <filename>]

  -n <ranges>           The subset of structures to process, e.g., "1,4" =
                        structures 1 and 4; "1:10,14" = structures 1-10 and
                        14; "2:" = structures 2 through the end of the file;
                        ":5,13:18" = structures 1-5 and 13-18.
  -strip                Retain only the largest fragment in a disconnected
                        structure.
  -nostereo             Ignore any defined stereochemistry. Legal only with
                        radial fingerprints.
  -fill                 Insert an empty fingerprint record when a structure
                        cannot be processed. This ensures that the number of
                        output fingerprints is equal to the number of input
                        structures.
  -obad <filename>      Save structures that cannot be processed to the
                        supplied file. Structures are written to <filename> in
                        a format that depends on -i<fmt> as follows:
                        -ismi,-icsv->SMILES, -imae->Maestro, -isd->SD.

Job Control Options:
  [-JOB <jobname> [-HOST <host>[:<n>]] [-TMPDIR <dir>] [-MINREC <n>]]

  -JOB <jobname>        Run under Schrodinger job control using the provided
                        job name. If omitted, no other job control options are
                        permitted.
  -HOST <host>[:<n>]    Run job remotely on the indicated host entry. Include
                        :<n> to split the job across <n> CPUs.
  -TMPDIR <dir>         Store temporary job files in <dir>.
  -MINREC <n>           Minimum number of records per CPU (default: 100).
                        Prevents a large number of CPUs from being utilized
                        when the total number of records is relatively small.