canvasFPGen Command Help
Command: $SCHRODINGER/utilities/canvasFPGen
usage: canvasFPGen [-h] -i<fmt> <infile> -o<fmt> <outfile> [options]
[Fingerprint Generation Options]
[Bit Treatment Options]
[Structure Treatment Options]
[Job Control Options]
Generates fingerprints for the molecules in a structure file.
Copyright Schrodinger LLC, All Rights Reserved.
options:
-h, --help Show this message and exit.
Input/Output Format Options:
-ismi <smiFile> |
-icsv <csvFile> [-noHeader] [-d <delimiter>] [-smi <col>] [-name <col>] |
-imae <maeFile> [-fieldAsName <prop> | -sequence | -stamp <value>] |
-isd <sdFile> [-fieldAsName <prop> | -sequence | -stamp <value>]
-o <fpFile> |
-odata <fpFile> [-uniform] [-fieldOnly <prop1> [<prop2> ...]] |
-ocsv <csvFile>
-echoSMILES
Exactly one -i<fmt> and one -o<fmt> specification must be provided.
-ismi <smiFile> Input SMILES file containing one SMILES per line and
no header. An optional name may appear after each
SMILES, following a space or tab.
-icsv <csvFile> Input CSV file with SMILES in the first column,
structure names in the second column and any
properties in subsequent columns. A header line with
column names is expected by default.
-imae <maeFile> Input Maestro file.
-isd <sdFile> Input SD file.
-noHeader Input CSV file does not contain a header line.
-d <delimiter> Input CSV file delimiter (default: ",").
-smi <col> Name or position of the SMILES column in <csvIn>. The
default is the first column (1).
-name <col> Name or position of the column in <csvIn> that holds
structure names. The default is the second column (2).
-fieldAsName <prop> Use the indicated property in <maeFile> or <sdFile> as
the source of structure names.
-sequence Set structure names to "1", "2", etc.
-stamp <value> Set all structure names to the provided value.
-o <fpFile> Output binary file with fingerprints and structure
names.
-odata <fpFile> Output binary file with fingerprints, structure names
and properties. Not legal with -noHeader. Equivalent
to -o <fpFile> when used with -ismi since SMILES files
have no properties. Note that SMILES strings are
written to <fpFile> only if -echoSMILES is provided.
-ocsv <fpFile> Output CSV file for maccs or custom fingerprints.
Column names will be SMARTS patterns or pattern
labels, the latter being used if they are provided in
the custom <keyfile> (see -fptype in Fingerprint
Generation Options).
-uniform All structures in <maeFile> or <sdFile> have exactly
the same data fields. This avoids the need to make a
full pass through the input file to determine the
union of property names when -odata is used.
-fieldOnly <prop1> [<prop2> ...]
Write only the specified properties to <fpFile>. The
default is to write all properties when -odata is
used.
-echoSMILES Write SMILES strings to <fpFile>. Legal with all
-i<fmt> and -o<fmt> choices.
Fingerprint Generation Options:
[-fptype <fptype> [<keyfile>]] [-atomtype <atomtype> [<typefile>]] [-xp]
[-maxspan <span>] [-path <max>] [-minpath <min>] [-ring <len>] [-halfstep]
[-3D [-binwidth <width>] [-binoverlap <over>] [-minbin <min>] [-maxbin <max>]]
[-iter <n>] [-miniter <min>] [-estatePath <len>] [-estateWidth <width>]
[-startH] [-endH]
-fptype <fptype> [<keyfile>]
Fingerprint type. Legal values are "dendritic",
"linear", "maccs", "molprint2D", "pairwise",
"quartet", "radial", "torsion", "triplet", and
"custom" (default: linear). If custom, a <keyfile>
must be supplied with a SMARTS pattern to define each
fingerprint bit, followed optionally by a pattern
label to be written as a column name when -ocsv is
used. For appropriate <keyfile> format, refer to
maccssubset.txt in the Schrodinger software
distribution. Use -help_fptype for detailed
descriptions of the fingerprint types.
-atomtype <atomtype> [<typefile>]
Atom typing scheme. Legal values are 1-12, C and E. If
C, a <typefile> of custom atom type definitions must
be supplied. For appropriate <typefile> format, refer
to Mol2.typ in the Schrodinger software distribution.
Defaults for each fingerprint type are: dendritic=10,
linear=10, molprint2D=5, pairwise=9, quartet=10,
radial=4, torsion=10, triplet=10. Not legal with maccs
or custom fingerprint types. Use -help_atomtype for
detailed descriptions of the atom typing schemes.
-xp Represent fingerprints using 64-bit precision rather
than 32. This effectively eliminates collisions of ON
bits, but doubles the disk space of the fingerprint
file. Not legal with -ocsv.
-maxspan <span> Maximum number of atoms spanned in any direction of a
dendritic fingerprint fragment (default: no limit).
-path <len> Maximum path length in bonds for dendritic, linear and
molprint2D fingerprints. The defaults are 7, 5 and 2,
respectively.
-minpath <min> Minimum path length in bonds for dendritic, linear and
molprint2D fingerprints. The defaults are 0, 0 and 2,
respectively.
-ring <len> Maximum linear fingerprint path that includes a ring
closure (default: 14). This allows ring-containing
fragments to be included without causing a massive
proliferation in the number of acyclic fragments.
-halfstep When generating linear fingerprints, include fragments
that terminate in the middle of a bond, excluding the
atom on the other end of that bond.
-3D Use 3D distances for pairwise, triplet and quartet
fingerprints. Requires Maestro or SD file input.
-binwidth <width> Floating point distance bin width for 3D pairwise,
triplet and quartet fingerprints (default: 1.0)
-binoverlap <over> Distance bin overlap for 3D pairwise, triplet and
quartet fingerprints (default: 0.0). A non-zero value
allows a distance to be assigned to multiple bins
(i.e., "fuzzy" binning).
-minbin <min> Minimum distance bin for 3D pairwise, triplet and
quartet fingerprints (default: 0.0).
-maxbin <max> Maximum distance bin for 3D pairwise, triplet and
quartet fingerprints (default: no limit).
-iter <n> Number of radial fingerprint iterations (default: 4).
-miniter <min> Number of radial iterations below which features are
discarded (default: 0).
-estatePath <len> Path length for E-state atom typing (default: 2).
-estateWidth <width> Binning width for E-state atom typing (default: 0.25).
-startH Calculate molprint2D codes for hydrogen atoms.
-endH Consider bonds to terminal hydrogens when assigning
molprint2D codes.
Bit Treatment Options:
[[-min <fmin> | -noone] [-max <fmax> | -noall] [-reduce <n>]] | [-mostSig <m>]
[-scaling <rule>] [-compress] [-truncate] [-summaryOnly] | -noSummary]
[-bs <filename> [-prefix <s>]]
-min <fmin> Omit bits that are ON in less than the specified
fraction of structures (default: 0.0).
-noone Omit bits that are ON in only a single structure. Not
legal with -min <fmin>.
-max <fmax> Omit bits that are ON in more than the specified
fraction of structures (default: 1.0).
-noall Omit bits that are ON in all structures. Not legal
with -max <fmax>.
-reduce <n> Reduce fingerprint precision by the indicated power of
2. For example, 32-bit fingerprints accommodate 2^32
unique bit positions, but "-reduce 22" shrinks that to
only 2^10 unique bit positions.
-mostSig <m> Keep only the <m> most informative bits. A value of
4294967295 triggers the use of an alternative
algorithm that selects a low-rank orthogonal set of
bits. Not legal in combination with any of the
previous filtering options.
-scaling <rule> Apply a scaling rule to replace each ON bit with a
floating point value. Must be an integer in the range
0-13 (default: 0). Not legal in combination with
-reduce <n>. Use -help_scaling to display the scaling
rules.
-compress Use frequency-based compression to reduce the required
storage by approximately tenfold. Ignored with -ocsv.
-truncate Truncate the fingerprint file if no ON bits are
generated. Not legal with multiple CPUs.
-summaryOnly Store only the aggregate ON bit counts, not the actual
fingerprints. Results in only the header being written
with -ocsv.
-noSummary Omit aggregate bit counts from the fingerprint file.
Not legal with -compress. Ignored with -ocsv.
-bs <filename> Write the set of retained ON bits to a binary file for
use by other software that must restrict fingerprints
to the same set of bits. Not legal with -ocsv.
-prefix <s> Prefix for the names of ON bits (default: "BIT").
Structure Treatment Options:
[-n <ranges>] [-strip] [-nostereo] [-fill] [-obad <filename>]
-n <ranges> The subset of structures to process, e.g., "1,4" =
structures 1 and 4; "1:10,14" = structures 1-10 and
14; "2:" = structures 2 through the end of the file;
":5,13:18" = structures 1-5 and 13-18.
-strip Retain only the largest fragment in a disconnected
structure.
-nostereo Ignore any defined stereochemistry. Legal only with
radial fingerprints.
-fill Insert an empty fingerprint record when a structure
cannot be processed. This ensures that the number of
output fingerprints is equal to the number of input
structures.
-obad <filename> Save structures that cannot be processed to the
supplied file. Structures are written to <filename> in
a format that depends on -i<fmt> as follows:
-ismi,-icsv->SMILES, -imae->Maestro, -isd->SD.
Job Control Options:
[-JOB <jobname> [-HOST <host>[:<n>]] [-TMPDIR <dir>] [-MINREC <n>]]
-JOB <jobname> Run under Schrodinger job control using the provided
job name. If omitted, no other job control options are
permitted.
-HOST <host>[:<n>] Run job remotely on the indicated host entry. Include
:<n> to split the job across <n> CPUs.
-TMPDIR <dir> Store temporary job files in <dir>.
-MINREC <n> Minimum number of records per CPU (default: 100).
Prevents a large number of CPUs from being utilized
when the total number of records is relatively small.