canvasSearch Command Help
Command: $SCHRODINGER/utilities/canvasSearch
canvasSearch - Search a list of target molecules against a set
of queries, composed of either molecules or partial
structures. By default, only targets that match all
the queries are returned. However, with -require <n>
option, a user can choose to return targets that match
at least <n> of the queries. For example, setting <n>
to 1 returns targets that match any of the queries.
This program can also be used to filter a
target list based on either standard REOS rules or a
user-defined file containing SMILES queries and minimum
and maximum number of times a query should be matched.
Filtering and search can be performed separately or
sequentially (filtering first).
Usage: canvasSearch [<job control options>] <program options>
Job Control Options: [-JOB <jobName> [-HOST <host>[:<n>]]
[-MINREC <nrec>]
[-LOCAL]
[-TMPDIR <dir>]
[-WAIT]
[-INTERVAL <n>]
[-NICE]]
-JOB <jobName> - Job name. If omitted, no other job control options are
permitted.
-HOST <host>[:<n>] - Run job on <host>. Include ":<n>" to split across
<n> CPUs.
-MINREC <nrec> - Minimum number of records per CPU. Prevents
submission of a large number of subjobs that each
contains only a small number of records. The default
is 100.
-LOCAL - Store temporary job files in current directory.
-TMPDIR <dir> - Store temporary job files in <dir>.
-WAIT - Do not return prompt until job completes.
-INTERVAL <n> - Update log file every <n> seconds.
-NICE - Run job at reduced priority.
Program Options: -i<fmt> <inputFile>
[-n <selection>]
[-fieldAsName <field>]
[-index <indexFile> | -newIndex <indexFile>]
[-noIndex]
[-filter [-reos]
[-file <ruleFile> [-d <delimiter>] ]
[-maxVio <n>] ]
[-helpREOS]
[-q<format> <queryFile> [-require <n>] ]
[-exact ]
[-o<fmt> <outputFile>]
[-o<fmt>2 <outputFile>]
[-osmi <smiFile> [-u] | -osd <sdFile> | -omae <maeFile>]
[-osmi2 <smiFile> | -osd2 <sdFile> | -omae2 <maeFile>]
[-no2DCoord]
[-v3 ]
[-useXforAromN]
[-strict]
[-allowRelative]
[-matchCount [<csvFile>] [-qmap <queryMapFile>]
-comment -prefix <queryPrefix> [-showAll] ]
-i<fmt> <inputFile> - Input file containing a list of target molecules.
-ismi = Each line must start with a SMILES
string. ID, or name, of the molecule may be followed
with a tab or a whitespace character.
-isd = SD file as input.
-imae = Maestro file as input.
-n <selection> - Selected molecules in <targetFile> to search. The
following are valid <selection> specifications:
1:10,14,15 - 1 through 10, 14, and 15
1,3,10: - 1, 3, and 10 through end of file
:5,20:30 - 1 through 5, and 20 through 30
By default, all molecules are included.
-fieldAsName <field> - Field in a SD file (-isd), a Maestro file (-imae)
or a Canvas project to be used as name of a target
molecule.
-index <indexFile> - Use previously-generated index file of all
the molecules in <targetFile> in search. A matching
fingerprint will be generated for each query.
-newIndex <indexFile> - Generate index file of both the target molecules
and the queries before search. The saved <indexFile>
of the target molecules can be used later with -index
option.
-noIndex - Do not use any index for search, even if present.
Overwrites the above two options.
-filter - Filter the target file based on maximum and/or
minimum number of counts of a given set of patterns.
-reos - Rapid Elimination of Swill, a set of rules to
identify lead-like molecules.
-file <ruleFile> - User supplied rule file to use. The file must be of
the following format:
Each line contains one SMARTS/SMILES string, followed
by the minimum and maximum number of allowed counts
and optional comment surrounded by double-quotes.
-d <delimiter> - Delimiter used to separate each field in <ruleFile>.
The default is tab '\t'. Use -d ' ' or " "
for space. If space-delimited, consecutive spaces
will be ignored. Note that the use of ',' as a
delimiter is NOT supported.
-maxVio <n> - Maximum number of violations allowed for the rules.
By default <n> is set to 0.
-helpREOS - Print out REOS patterns to stdout, each followed by
the minimum and maximum number of counts. Tab is
used as delimiter in each line.
-q<fmt> <queryFile> - File containing a list of queries:
-qsmi = SMILES/SMARTS
-qmae = Maestro file
-qsd = SD file
-qmol MDL mol file
By default, all queries must be matched.
-require <n> - Minimum number of queries that must be matched.
Not valid with -qmol.
-exact - Require exact match for each query. Default is to
match by substructure.
-o<fmt> <outputFile> - <outputFile> contains target molecules that passed
filter (if -filter is used) and matched all or the
required number of queries.
-osmi = If target molecules are supplied as SMILES
(-ismi), the original SMILES will be used. Otherwise
SMILES strings are generated by Canvas. Overridden by
-u which will use unique SMILES.
-osd = SD format. If <outputFile> ends with .sdf.gz
or .sd.gz, writes in compressed format.
-omae = Maestro format. If <outputFile> ends with
.maegz or .gz, writes in compressed format.
-o<fmt>2 <outputFile> - <outputFile> contains target molecules that do not
satisfy the required matches. Not valid when
searching with index.
-osmi2 = If target molecules are supplied as SMILES
(-ismi), the original SMILES will be used. Otherwise
SMILES strings are generated by Canvas.
-osd2 = SD format. If <outputFile> ends with .sdf.gz
or .sd.gz, writes in compressed format.
-omae2 = Maestro format. If <outputFile> ends with
.maegz or .gz, writes in compressed format.
-no2DCoord - Do not generate coordinates in the output <sdFile>
or <maeFile> if -ismi <smiFile> is used as input.
By default, 2D coordinates are generated in the
above case.
-v3 - Output MDL version 3 SD Format.
-matchCount <csvFile> - Calculates the number of matches to each query or
filtering pattern, 0 for no match. Counts are saved
in the <csvFile>. If -filter and -q<smi/mae/sd> are
both used, only queries in the latter are listed. By
default, only targets that passed the filter
(if -filter) is used, and matched all or the required
number of queries, are printed out to <csvFile>.
Omission of <csvFile> in concert with -osd or -omae
will redirect this output to -osd or -omae.
-comment - Set query names to the contents of comments field
from the filter file or the query molecule title
for other input types. Use an automatically generated
name (query1, query2, etc.) if this is blank.
-prefix <queryPrefix> - Each query in the <csvFile> will be represented by
the following format: <queryPrefix>::query<n>.
<queryPrefix> can be a search name, such as
"my_search1".
-qmap <queryMapFile> - This file provides the mapping between the above
mentioned <queryPrefix>::query<n> and the actual
SMARTS/SMILES query patterns.
-showAll - If this option is used with -matchCount, counts for
all targets are printed out to <csvFile>.
-strict - perform additional validation of each target input
structure prior to matching. This will impact
performance negatively.
-allowRelative - Allow relative stereochemistry matches.
-useXforAromN - Add explicit connectivities for all aromatic
nitrogens appearing in -qsmi, -qsd, and -qmae.