Running Distributed Schrödinger Jobs

A number of Schrödinger products can distribute work over multiple processors. There are several algorithms for performing the distribution. Some jobs divide the input structures into batches. Each batch is then submitted to a processor for execution as a subjob. The number of processors used and the number of batches (subjobs) can be different. Others divide the work among the available processors, so that the number of subjobs and the number of processors is the same. In addition to single jobs, there are a number of workflows that submit many single jobs (e.g. Desmond or Jaguar jobs) as subjobs in the overall process.

Jaguar, QSite, and Quantum ESPRESSO can use multiple processors with OpenMP threading as well as distributed processing.

For each distributed job there is a driver that is responsible for dividing up the work, submitting the subjobs, and collating the results. Thus, if N processors are requested, the job creates N+1 processes. Depending on the program, the driver can run on the local host or on the remote host. If the driver runs on the remote host, there are two separate cases to consider:

The remote host (or collection of remote hosts) does not involve a queuing system. Even though the driver does not takes much time, the overall job does not necessarily run inefficiently, because the operating system can swap the driver process out when it is idle and use the time for other processes.
The remote host is running a queuing system. The driver is run as a separate queued job, and the subjobs are then submitted to the queue by the driver. The driver job can then occupy a processor that will be idle most of the time. The exceptions are Prime loop refinement, Phase, Jaguar, and Quantum ESPRESSO, for which the driver runs (or can run) a subjob locally.

The hosts on which the driver and the subjobs run are set with one or more of the following options:

-HOST: General list of hosts for the job. If one of other options is not used, the specified hosts are generally used for the other. If neither of the other options is used, the driver runs either on localhost or the first host, the subjobs can run on the remaining hosts or on all the hosts, including the first.
-DRIVERHOST: host on which to run the driver (must be a single host name). If omitted, the driver may be run on the first host specified by -HOST or on localhost, if -HOST is not used or the application default is to run the driver locally.
-SUBHOST: hosts on which to run the subjobs. If omitted, all hosts specified by -HOST are used for the subjobs. The number of CPUs used is determined by the number of hosts or host/CPU combinations specified.

The syntax of these options is given in Running Jobs From the Command Line. Information on the number of subjobs, number of CPUs, and the driver job location is given in Table 1 for running distributed jobs from the command line, and in Table 2 for running distributed jobs from Maestro. The notation “Standard” means that the options listed above are used to determine the driver location and the number of CPUs.

Table 1. Distributed processing behavior for jobs submitted from the command line
Product	Program	Number of subjobs determined by	Number of CPUs for subjobs determined by	Driver location
Epik	`epik`	`-NJOBS` njobs, `-JOBCTS` maxstructs	Standard	Standard
Glide	`glide`	`-NJOBS` njobs	Standard	Standard
Induced Fit Docking	`ifd`	Number of ligands (Glide), number of poses (Prime).	`-NGLIDECPU`, `-NPRIMECPU`, or keywords in input file	Standard
Jaguar	`jaguar`	`-HOST` host:njobs	`-PARALLEL` ncpus; value is for all subjobs combined and includes the driver	Standard
LigPrep	`ligprep`	`-NJOBS` njobs, `-JOBCTS` maxstructs	Standard	Standard
MacroModel	`bmin`	`-NJOBS` njobs, `-JOBCTS` maxstructs	Standard	Standard
Phase	`phase_dbsearch` `phasedb_confsites` `phasedb_convert`	Number of processors	Number of hosts in `-HOST` list	First host in `-HOST` list; `-DRIVERHOST` ignored.
Prime	`multirefine`	Stage of process, `MAX_JOBS` keyword in input file	Host given by `HOST` keyword in input file, otherwise host given by `-HOST`	First host in `-HOST` list
QM-Polarized Ligand Docking	`qpld`	`-NJOBS` njobs	Number of hosts in `-HOST` list or -`host_`program lists.	Host specified by `-DRIVERHOST`; default first host in `-HOST` list
SiteMap	`sitemap`	Number of processors	Number of hosts in `-HOST` list	Local host
Virtual Screening Workflow	`vsw`	`-NJOBS` njobs, `-adjust`	Number of hosts in `-HOST` list or -`host_`program lists.	Host specified by `-DRIVERHOST`; default first host in `-HOST` list

Table 2. Distributed processing for jobs submitted from Maestro
Product	Job type	Number of subjobs set in or determined by	Number of CPUs set in or determined by	Master job location
Epik		Number of CPUs (passed as `-NJOBS`)	Job Settings dialog box	Selected host
Glide	Docking	Job Settings dialog box	Job Settings dialog box	Selected host
Induced Fit Docking		Number of ligands (Glide), number of poses (Prime).	Induced Fit Docking panel, Job options section.	Local host
Jaguar		Job Settings dialog box	Job Settings dialog box	Selected host
LigPrep		Number of CPUs (passed as `-NJOBS`)	Job Settings dialog box	Selected host
Phase	Clean Structures, Generate Conformers, Find Matches	Number of CPUs	Job Settings dialog box	Selected host
Prime	Loop Structure	Depends on stage of process and sampling method	Refine Structure - Options dialog box	Selected host
QM-Polarized Ligand Docking		Number of CPUs	Job Settings dialog box	Selected host
Quantum ESPRESSO	All	Number of CPUs	Job Settings dialog box	Selected host
SiteMap		Number of CPUs	Job Settings dialog box	Selected host
Virtual Screening Workflow		Job Settings dialog box	Job Settings dialog box	Selected host

The launch directory must be available from the master job host for Induced Fit Docking.