Running Distributed Schrödinger Jobs

A number of Schrödinger products can distribute work over multiple processors. There are several algorithms for performing the distribution. Some jobs divide the input structures into batches. Each batch is then submitted to a processor for execution as a subjob. The number of processors used and the number of batches (subjobs) can be different. Others divide the work among the available processors, so that the number of subjobs and the number of processors is the same. In addition to single jobs, there are a number of workflows that submit many single jobs (e.g. Desmond or Jaguar jobs) as subjobs in the overall process.

Jaguar, QSite, and Quantum ESPRESSO can use multiple processors with OpenMP threading as well as distributed processing.

For each distributed job there is a driver that is responsible for dividing up the work, submitting the subjobs, and collating the results. Thus, if N processors are requested, the job creates N+1 processes. Depending on the program, the driver can run on the local host or on the remote host. If the driver runs on the remote host, there are two separate cases to consider:

  • The remote host (or collection of remote hosts) does not involve a queuing system. Even though the driver does not takes much time, the overall job does not necessarily run inefficiently, because the operating system can swap the driver process out when it is idle and use the time for other processes.

  • The remote host is running a queuing system. The driver is run as a separate queued job, and the subjobs are then submitted to the queue by the driver. The driver job can then occupy a processor that will be idle most of the time. The exceptions are Prime loop refinement, Phase, Jaguar, and Quantum ESPRESSO, for which the driver runs (or can run) a subjob locally.

The hosts on which the driver and the subjobs run are set with one or more of the following options:

  • -HOST: General list of hosts for the job. If one of other options is not used, the specified hosts are generally used for the other. If neither of the other options is used, the driver runs either on localhost or the first host, the subjobs can run on the remaining hosts or on all the hosts, including the first.

  • -DRIVERHOST: host on which to run the driver (must be a single host name). If omitted, the driver may be run on the first host specified by -HOST or on localhost, if -HOST is not used or the application default is to run the driver locally.

  • -SUBHOST: hosts on which to run the subjobs. If omitted, all hosts specified by -HOST are used for the subjobs. The number of CPUs used is determined by the number of hosts or host/CPU combinations specified.

The syntax of these options is given in Running Jobs From the Command Line. Information on the number of subjobs, number of CPUs, and the driver job location is given in Table 1 for running distributed jobs from the command line, and in Table 2 for running distributed jobs from Maestro. The notation “Standard” means that the options listed above are used to determine the driver location and the number of CPUs.

Table 1. Distributed processing behavior for jobs submitted from the command line

Product

 

Program

 

Number of subjobs determined by

Number of CPUs for subjobs determined by

Driver location

 

Epik

epik

-NJOBS njobs,

-JOBCTS maxstructs

Standard

Standard

Glide

glide

-NJOBS njobs

Standard

Standard

Induced Fit Docking

ifd

Number of ligands (Glide), number of poses (Prime).

-NGLIDECPU,
-NPRIMECPU, or
keywords in input file

Standard

Jaguar jaguar -HOST host:njobs -PARALLEL ncpus; value is for all subjobs combined and includes the driver Standard

LigPrep

ligprep

-NJOBS njobs,

-JOBCTS maxstructs

Standard

Standard

MacroModel

bmin

-NJOBS njobs,

-JOBCTS maxstructs

Standard

Standard

Phase

phase_dbsearch phasedb_confsites

phasedb_convert

Number of processors

Number of hosts in -HOST list

First host in -HOST list; -DRIVERHOST ignored.

Prime

multirefine

Stage of process, MAX_JOBS keyword in input file

Host given by HOST keyword in input file, otherwise host given by -HOST

First host in -HOST list

QM-Polarized Ligand Docking

qpld

-NJOBS njobs

Number of hosts in -HOST list or -host_program lists.

Host specified by
-DRIVERHOST; default first host in -HOST list

SiteMap

sitemap

Number of processors

Number of hosts in -HOST list

Local host

Virtual Screening Workflow

vsw

-NJOBS njobs,

-adjust

Number of hosts in -HOST list or -host_program lists.

Host specified by
-DRIVERHOST; default first host in -HOST list

 

Table 2. Distributed processing for jobs submitted from Maestro

Product

 

Job type

 

Number of subjobs set in or determined by

Number of CPUs set in or determined by

Master job location

 

Epik

 

Number of CPUs (passed as -NJOBS)

Job Settings dialog box

Selected host

Glide

Docking

Job Settings dialog box

Job Settings dialog box

Selected host

Induced Fit Docking

 

Number of ligands (Glide), number of poses (Prime).

Induced Fit Docking panel, Job options section.

Local host

Jaguar   Job Settings dialog box Job Settings dialog box Selected host

LigPrep

 

Number of CPUs (passed as -NJOBS)

Job Settings dialog box

Selected host

Phase

Clean Structures,

Generate Conformers,

Find Matches

Number of CPUs

Job Settings dialog box

Selected host

Prime

Loop Structure

Depends on stage of process and sampling method

Refine Structure - Options dialog box

Selected host

QM-Polarized Ligand Docking

 

Number of CPUs

Job Settings dialog box

Selected host

Quantum ESPRESSO

All

Number of CPUs

Job Settings dialog box

Selected host

SiteMap

 

Number of CPUs

Job Settings dialog box

Selected host

Virtual Screening Workflow

 

Job Settings dialog box

Job Settings dialog box

Selected host

The launch directory must be available from the master job host for Induced Fit Docking.