Set up the Host Entries on Job Server
Job Server creates and submits batch jobs to the queuing system for users. Therefore, it is necessary to update the Job Server configuration file with “host entries” used for Schrödinger jobs.
REQUIRED - It is necessary to:
-
set up the name of the host machine that will be visible to users submitting jobs (name)
-
set up the directory for temporary or scratch files (tmpdir) on a system that is:
-
sufficient disk space (see System Requirements)
-
mounted locally
-
writable by all the users that will use this Job Server
-
In addition, you can also add optional specifications like:
-
CPU vs. GPU resource requests
-
The compute node scratch space locations
-
The queues/partitions for submissions
-
Memory usage requests
-
Environment variables set in the compute node execution environment
New Job Server Set-upFollow the steps below to set up Job Server for the first time: |
Situations to ConsiderSome common situations that are important to consider when setting up Job Server: |
Name the Job Server Host Entries
The name/label for a host entry should distinguish its purpose from other entries, because only the name is exposed in Maestro Job Settings dialogs or used in command line invocations (e.g., ‘-HOST <host_entry_name>’). For example, ‘cpu_short’, ‘cpu_highmem’, ‘driver’ (i.e., long-running), or ‘gpu’.
The host entries are specified via a Job Server configuration file:
<jobserver_dir>/config/hosts.yml
Each host entry block consists of “name”, "tmpdir", and optionally one or more other keywords:
| Keyword | Description |
| name | Required. This is the name/label of the host entry and is exposed to users submitting jobs from Maestro or the command line (e.g., via the -HOST argument). |
| tmpdir | Required. Base directory for temporary or scratch files, also called the scratch directory. The file system on which this directory is mounted should be large enough for the largest temporary files, should be mounted locally, and should be writable by all the users that will use this Job Server. |
| qargs |
Arguments passed to the batch submission command (e.g., ‘sbatch’, ‘qsub’, ‘bsub’). Typically used for specifying the queue/partition for the job and making CPU/GPU/memory/time resource requests, etc. The exact arguments depend on both the queuing system type and the specific cluster configuration. Default: “” (none) |
| processors |
This value 1) limits the number of processors a user can request per job in Maestro and, for some applications, 2) sets the maximum number of subjobs that will run simultaneously when no specific number of processors is requested. It often is set to the number of cores (or GPU devices for GPU entries) available in the cluster. Default: 1 |
| processors_per_node |
The number of processors (cores) per node available to a batch queue. This setting is used by applications that support threaded parallel execution (OpenMP). Default: 1 |
| env |
Environment variables to be set on the host at runtime. The syntax for the environment variables is variable=value. A host entry can have environment variable specification lines in the env block. Note that the value will be parsed literally, with spaces or quotes interpreted as part of the environment variable’s value. Default: [] (none) |
| cuda_cores |
For GPU entries, the number of CUDA cores per GPU for the GPUs that can be used by jobs submitted via the host entry. This keyword is needed if License Checking is enabled so the correct number of license resources can be requested (see Setting Up License Checking for Queueing Systems). Default: 0 (i.e., defer to the application default, which is 5120, the number of cores in a Tesla V100) |
| gpgpu |
Specifies that the host entry can be used for GPU submissions. If compute nodes have multiple GPUs each, the gpgpu block will have multiple lines, one for each GPU. The specification is in the form index: <index>, description: <description>, where <index> is the numerical GPU id, usually starting from 0, and <description> is a textual description of the GPU, e.g., Tesla V100. These lines are used in Maestro for classification of host entries as either CPU or GPU, because an application panel only will expose host entries suitable for the type(s) of compute resources it will use. Maestro also uses the number of gpgpu specifications to limit the number of GPUs that can be requested for jobs/subjobs running on a single compute node. Note that the presence of gpgpu lines does not itself request GPU resources from the queuing system; that must be done via qargs. Default: [] |
The qargs can include a special %NPROC% macro that is substituted at submission time with the actual number of CPUs or GPUs requested for the batch job. This allows jobs of varying parallelization to be submitted using the same host entry.
Sample host.yml file
Below are example ‘hosts.yml’ files shown for Schrödinger supported queueing systems with commonly used host entries like for a workflow driver job (driver), a basic CPU job (cpu), a basic GPU job (gpu) or a CPU job with a more specific memory requirement (cpu_highmem).
Taking the 'driver' host entry for SLURM as an example, when a user submits a job with the ‘-HOST driver’ command-line flag, the configuration directs the job's temporary or scratch files to be written to the ‘/scr’ directory on the compute node. The entry is configured to interact with the SLURM queueing system by passing a specific set of arguments via the ‘qargs’ section so that SLURM submits the job to the cluster partition with the name ‘cpu_driver’ and to confine the entire job to a single compute node (‘--nodes=1’). The number of tasks or cores requested on that node will dynamically match the number of processors requested for the job via the %NPROC% variable (‘--ntasks-per-node=%NPROC%’). For resource allocation, the configuration informs Schrödinger that each node in this partition has 4 processors available for threaded applications (‘processors_per_node: 4’). However, it also sets a global limit of 2000 total processors that a single job can request or use (‘processors: 2000’). Moreover, by using the ‘env’ section in the host entry, the job sets the environment variable ‘SCHRODINGER_MAX_LAUNCH_CONCURRENCY’ at runtime.
For a complete description of all available keywords and their functions, refer to the table listed below.
- SLURM
- SGE
- PBS Pro
- LSF
Here is an example ‘hosts.yml’ file for Slurm submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):
entries: - name: driver tmpdir: /scr qargs: --partition=cpu_driver --nodes=1 --ntasks-per-node=%NPROC% processors: 2000 processors_per_node: 4 env: - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4 - name: cpu tmpdir: /scr qargs: --partition=cpu --nodes=1 --ntasks-per-node=%NPROC% processors: 2000 processors_per_node: 8 - name: cpu_highmem tmpdir: /scr qargs: --partition=cpu --nodes=1 --ntasks-per-node=%NPROC% --mem-per-cpu 8G processors: 500 processors_per_node: 4 - name: gpu tmpdir: /scr qargs: --partition=gpu --nodes=1 --ntasks-per-node=%NPROC% --gres=gpu:%NPROC% processors: 100 gpgpu: - index: 0 description: Tesla T4 - index: 1 description: Tesla T4 cuda_cores: 2560
Here is an example ‘hosts.yml’ file for SGE submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):
entries: - name: driver tmpdir: /scr qargs: -q cpu_driver.q -pe smp %NPROC% processors: 2000 processors_per_node: 4 env: - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4 - name: cpu tmpdir: /scr qargs: -q cpu.q -pe smp %NPROC% processors: 2000 processors_per_node: 8 - name: cpu_highmem tmpdir: /scr qargs: -q cpu.q -pe smp %NPROC% -l h_vmem=8G processors: 500 processors_per_node: 4 - name: gpu tmpdir: /scr qargs: -q gpu.q -pe smp %NPROC% -l gpu=%NPROC% processors: 100 gpgpu: - index: 0 description: Tesla T4 - index: 1 description: Tesla T4 cuda_cores: 2560
Here is an example ‘hosts.yml’ file for PBS Pro submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):
entries: - name: driver tmpdir: /scr qargs: -q cpu_driver -l select=1:ncpus=%NPROC% processors: 2000 processors_per_node: 4 env: - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4 - name: cpu tmpdir: /scr qargs: -q cpu -l select=1:ncpus=%NPROC% processors: 2000 processors_per_node: 8 - name: cpu_highmem tmpdir: /scr qargs: -q cpu -l select=1:ncpus=%NPROC%:pmem=8gb processors: 500 processors_per_node: 4 - name: gpu tmpdir: /scr qargs: -q gpu -l select=1:ncpus=%NPROC%:ngpus=%NPROC% processors: 100 gpgpu: - index: 0 description: Tesla T4 - index: 1 description: Tesla T4 cuda_cores: 2560
Here is an example ‘hosts.yml’ file for LSF submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):
entries: - name: driver tmpdir: /scr qargs: -q cpu_driver -n %NPROC% -R "span[hosts=1]" processors: 2000 processors_per_node: 4 env: - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4 - name: cpu tmpdir: /scr qargs: -q cpu -n %NPROC% -R "span[hosts=1]" processors: 2000 processors_per_node: 8 - name: cpu_highmem tmpdir: /scr qargs: -q cpu -n %NPROC% -R "span[hosts=1]" -R "rusage[mem=8192]" processors: 500 processors_per_node: 4 - name: gpu tmpdir: /scr qargs: -q gpu -n %NPROC% -R "span[hosts=1]" -R "rusage[ngpus_excl_p=1]" processors: 100 gpgpu: - index: 0 description: Tesla T4 - index: 1 description: Tesla T4 cuda_cores: 2560
Validate the hosts.yml file
Run the following command to check if the created / configured hosts.yml file is valid with respect to the YAML syntax:
sudo -u jobserver $SCHRODINGER/jsc admin check-hosts-config <jobserver_dir>/config/hosts.yml
Reload without restarting
To update Job Server to reflect the changes to the host config without restarting the server, run:
sudo -u jobserver $SCHRODINGER/jsc admin reload-hosts <hostname>
Situations to consider
Specify compute Schrödinger installations
During Job Server installation there was an option to specify the parent directory of Schrödinger installations. If that step was skipped or if you have multiple parent directories of Schrödinger Installations, follow the steps in Specify compute Schrödinger installations.
Multiple Clusters with their own Job Server
If you have multiple clusters, each with their own Job Server, and users will be submitting jobs to all of them, the host entry names must be unique across all of those Job Servers. For example, you can’t have a ‘cpu’ entry in more than one Job Server configuration, because the client would not be able to determine the cluster to which the job should be submitted. One solution would be making the names unique by prefixing them with cluster labels, such as‘cluster1_cpu’ and ‘cluster2_cpu’. This also makes it clearer to users which compute resources they are choosing for their jobs.
Converting prior versions of Job Server from schrodinger.hosts to hosts.yml
Customers who were using Job Server prior to the 2025-3 release should already have a schrodinger.hosts file configured in one or more Schrodinger installation directories. One of these files can be converted to the Job Server <jobserver_dir>/config/hosts.yml file by running the command:
sudo -u jobserver $SCHRODINGER/jsc admin convert-schrodinger-hosts --job-server-dir=<jobserver_dir> --schrod=<schrodinger_installation_with_hosts_file>
This command will create the <jobserver_dir>/config/hosts.yml file.
Note that there are a number of ‘schrodinger.hosts’ keywords that are no longer supported/relevant for the centralized Job Server ‘hosts.yml’ hosts configuration (e.g., ‘schrodinger’, ‘queue’, ‘base’). Also, this tool will convert all the host entries it finds in the file (except ‘localhost’), so you might need to remove those that don’t pertain to the cluster for which you’re configuring Job Server.
If you do edit the hosts.yml file, be sure to Validate the hosts.yml file
Modifying host entries for an active Job Server
If the ‘hosts.yml’ file is modified (e.g., to add/remove/alter entries) after Job Server has been started, Job Server must be reloaded by running the following command before the changes will take effect:
sudo -u jobserver $SCHRODINGER/jsc admin reload-hosts <hostname>