Running a Multithreaded Jaguar Job with OpenMP
In this method, parts of the code are executed in parallel, using threads. See http://www.openmp.org for details. Use of threads takes only one license no matter how many threads are used, but it requires extra memory, and all threads spawned from a single process must run on the same node. OpenMP is useful in speeding up expensive calculations for single jobs. Use of OpenMP is supported on all platforms.
OpenMP is resource-intensive, and you can overload a computer by requesting too many OpenMP threads. A parallel job using 2 OpenMP threads will use almost twice as much memory as a single-processor job, at least for those portions of the job that can be run in parallel.
Also keep in mind that OpenMP is not 100% efficient, nor is the scaling linear. Thus a 4-processor job does not run 4 times faster than a 1-processor job, and a 32-processor job does not run twice as fast as a 16-processor job. The efficiency depends on the computer hardware, on the kind of calculation you are doing, on the level of theory, and on what portion of the program code has been enabled to use OpenMP.
The command syntax to use OpenMP threads is one of the following:
jaguar run [script.py | script.bat] input-file -PARALLEL nthreads
where nthreads is the maximum number of OpenMP threads to create for the multithreaded parts of the calculation. This syntax applies to a simple job with no subjobs. In practice you should not specify more threads than there are processor cores on the host, or you risk overloading the host.
Because a team of OpenMP threads must run on the same node, you should be careful when using OpenMP on a cluster, so that you do not request more threads than there are cores on any one compute node. You can avoid this situation by specifying the maximum number of processors (cores) per node with the -procs_per_node option, or setting the maximum for each host in the hosts file—see The processors and processors_per_node Settings in the Hosts File for information.
When OpenMP is being used, the output file indicates the number of threads:
Using up to 4 threads per process
If the job has subjobs (including the case of multiple input files), you can restrict the number of simultaneous subjobs by adding -HOST:njobs to the command, e.g.
jaguar run [script.py | script.bat] input-files -PARALLEL nthreads -HOST hostname:njobs
In this case, only njobs subjobs are submitted for execution by the master job at any one time. Each subjob runs with nthreads threads, so the total number of processors needed is njobs × nthreads. You can set the maximum number of threads per subjobs using the -max_threads options. However, if you omit the -HOST option, the -PARALLEL option is interpreted differently—see Automatic Selection of Distribution and Threading Options for Jaguar Jobs.
The use of the hostname:njobs syntax does not apply to some scripts, which request the specified number of processors on a single node for maximum efficiency. These scripts are autots, counterpoise.py, csrch.py, and hydrogen_bond.py; for these scripts the value of njobs is ignored. You can request a single node for a job with the -use_one_node option.
When running a batch of multi-threaded jaguar jobs, you can use the -optimize_cpus option to assign CPUs in an optimal way based on jaguar_timer estimates, e.g.
jaguar run <input-files> -PARALLEL <N> -optimize_cpus
Longer running jobs use more CPUs, while shorter running jobs use less. -optimize_cpus is also available for Jaguar workflows. If a batch of multi-threaded jobs are launched during the workflow, their CPU assignments will also be optimized.