oned_screen run Command Help
Command: $SCHRODINGER/oned_screen run
usage: oned_screen run [-h] -query <qfile> -screen <source>
[Screening Options]
[Hit Treatment Options]
[Job Control Options]
Optional Arguments:
-h, --help Show this message and exit.
Required Arguments:
-query <qfile> File containing one or more query structures. Supported
formats are SMILES, SMILES-CSV, Maestro, SD and Phase
hypothesis (.phypo). If a single query is supplied, hits
are returned in <jobname>-hits.csv.gz, where <jobname>
is derived from <qfile>. If multiple queries are
supplied, hits are returned in the files
<jobname>_<n>-hits.csv.gz, where <n> runs from 1 to the
number of queries.
-screen <source> 1D data file (.1dbin) to screen, or list file (.list)
containing the names of the 1D data files to screen,
with one name per line. Use of multiple data files
containing no more than 50 million rows apiece is
strongly recommended to achieve the fastest multi-CPU
screens. Absolute paths must be provided if -nocopy is
supplied.
Screening Options:
[-norm {1,2,3,4}] [-filter <min>]
-norm {1,2,3,4} Similarity normalization scheme. For query A and
screening structure B, similarity is computed as
O(A,B)/norm(A,B), where O(A,B) is the overlap between A
and B, and norm(A,B) is a function of the self-overlaps
O(A,A) and O(B,B): 1->max{O(A,A), O(B,B)},
2->min{O(A,A), O(B,B)}, 3->O(A,A), 4->O(B,B) (default:
1).
-filter <min> Filter out molecules whose similarities fall below
<min>. The default is to apply no filter.
Hit Treatment Options:
[-nosort | -keep <maxhits> [-fraction <f>] [-catgz] [-limit <maxrows>]]
-nosort Output hits in the order they are screened. The default
is to sort hits by decreasing similarity to the query.
-keep <maxhits> Cap the number of sorted hits per query at <maxhits>
(default: 1000). Must not exceed 4294967295.
-fraction <f> Cap the number of sorted hits per subjob (and per query)
at <f>*<maxhits>. For example, -keep 1000000 -fraction
0.25 would result in a maximum of 250,000 hits per
subjob. While this means only the first 250,000 hits in
the combined set are guaranteed to be optimal, the
remaining 750,000 are likely to be near optimal if the
number of subjobs is sufficiently large and high-
similarity hits are uniformly distributed over the
subjobs. Furthermore, the time required to post-process
hits could be reduced by as much as 75%. The default
fraction is 1.0.
-catgz Write intermediate file of concatenated hits in
compressed format. The resulting file is about half as
large, but concatenation is about 3 times slower. Note
that the final hits returned by the job are always
compressed.
-limit <maxrows> Limit on the number of rows held in memory when sorting
hits (default: 1000000). Increasing this limit is
strongly recommended when requesting more than
10,000,000 hits.
Job Control Options:
[-HOST <host>[:<n>]] [-TMPDIR <tmpdir>] [-JOBNAME <jobname>] [-nocopy]
[-NJOBS <njobs>]
-HOST <host>[:<n>] Run job remotely on the indicated host entry. Include
:<n> to distribute the job over <n> CPUs.
-TMPDIR <tmpdir> Store temporary job files in <tmpdir>.
-JOBNAME <jobname> Override default job name.
-nocopy Do not copy source or destination 1D data files between
the local host and job host. 1D data file names must
include an absolute path that exists on the job host and
on all compute nodes of that host.
-NJOBS <njobs> Divide the overall job into <njobs> subjobs.