Set up the Host Entries on Job Server

Job Server creates and submits batch jobs to the queuing system for users. Therefore, it is necessary to update the Job Server configuration file with “host entries” used for Schrödinger jobs.

 

REQUIRED - It is necessary to:

  • set up the name of the host machine that will be visible to users submitting jobs (name)

  • set up the directory for temporary or scratch files (tmpdir) on a system that is:

    • sufficient disk space (see System Requirements)

    • mounted locally

    • writable by all the users that will use this Job Server

 

In addition, you can also add optional specifications like:

  • CPU vs. GPU resource requests

  • The compute node scratch space locations

  • The queues/partitions for submissions

  • Memory usage requests

  • Environment variables set in the compute node execution environment

 

 

New Job Server Set-up

Follow the steps below to set up Job Server for the first time:

  1. Name the Job Server Host Entries

  2.  Validate the hosts.yml file

 

Situations to Consider

Some common situations that are important to consider when setting up Job Server:

 

 

Name the Job Server Host Entries

The name/label for a host entry should distinguish its purpose from other entries, because only the name is exposed in Maestro Job Settings dialogs or used in command line invocations (e.g., ‘-HOST <host_entry_name>’). For example, ‘cpu_short’, ‘cpu_highmem’, ‘driver’ (i.e., long-running), or ‘gpu’.

The host entries are specified via a Job Server configuration file:

<jobserver_dir>/config/hosts.yml

Each host entry block consists of “name”, "tmpdir", and optionally one or more other keywords:

 

Keyword Description
Required. This is the name/label of the host entry and is exposed to users submitting jobs from Maestro or the command line (e.g., via the -HOST argument).
Required. Base directory for temporary or scratch files, also called the scratch directory. The file system on which this directory is mounted should be large enough for the largest temporary files, should be mounted locally, and should be writable by all the users that will use this Job Server.

Arguments passed to the batch submission command (e.g., ‘sbatch’, ‘qsub’, ‘bsub’). Typically used for specifying the queue/partition for the job and making CPU/GPU/memory/time resource requests, etc. The exact arguments depend on both the queuing system type and the specific cluster configuration.

Default: “” (none)

This value 1) limits the number of processors a user can request per job in Maestro and, for some applications, 2) sets the maximum number of subjobs that will run simultaneously when no specific number of processors is requested. It often is set to the number of cores (or GPU devices for GPU entries) available in the cluster.

Default: 1

The number of processors (cores) per node available to a batch queue. This setting is used by applications that support threaded parallel execution (OpenMP).

Default: 1

Environment variables to be set on the host at runtime. The syntax for the environment variables is variable=value. A host entry can have environment variable specification lines in the env block. Note that the value will be parsed literally, with spaces or quotes interpreted as part of the environment variable’s value.

Default: [] (none)

For GPU entries, the number of CUDA cores per GPU for the GPUs that can be used by jobs submitted via the host entry. This keyword is needed if License Checking is enabled so the correct number of license resources can be requested (see Setting Up License Checking for Queueing Systems).

Default: 0 (i.e., defer to the application default, which is 5120, the number of cores in a Tesla V100)

Specifies that the host entry can be used for GPU submissions. If compute nodes have multiple GPUs each, the gpgpu block will have multiple lines, one for each GPU. The specification is in the form index: <index>, description: <description>, where <index> is the numerical GPU id, usually starting from 0, and <description> is a textual description of the GPU, e.g., Tesla V100. These lines are used in Maestro for classification of host entries as either CPU or GPU, because an application panel only will expose host entries suitable for the type(s) of compute resources it will use. Maestro also uses the number of gpgpu specifications to limit the number of GPUs that can be requested for jobs/subjobs running on a single compute node.

Note that the presence of gpgpu lines does not itself request GPU resources from the queuing system; that must be done via qargs.

Default: []

 

The qargs can include a special %NPROC% macro that is substituted at submission time with the actual number of CPUs or GPUs requested for the batch job. This allows jobs of varying parallelization to be submitted using the same host entry.

 

Sample host.yml file

Below are example ‘hosts.yml’ files shown for Schrödinger supported queueing systems with commonly used host entries like for a workflow driver job (driver), a basic CPU job (cpu), a basic GPU job (gpu) or a CPU job with a more specific memory requirement (cpu_highmem).

Taking the 'driver' host entry for SLURM as an example, when a user submits a job with the ‘-HOST driver’ command-line flag, the configuration directs the job's temporary or scratch files to be written to the ‘/scr’ directory on the compute node. The entry is configured to interact with the SLURM queueing system by passing a specific set of arguments via the ‘qargs’ section so that SLURM submits the job to the cluster partition with the name ‘cpu_driver’ and to confine the entire job to a single compute node (‘--nodes=1’). The number of tasks or cores requested on that node will dynamically match the number of processors requested for the job via the %NPROC% variable (‘--ntasks-per-node=%NPROC%’). For resource allocation, the configuration informs Schrödinger that each node in this partition has 4 processors available for threaded applications (‘processors_per_node: 4’). However, it also sets a global limit of 2000 total processors that a single job can request or use (‘processors: 2000’). Moreover, by using the ‘env’ section in the host entry, the job sets the environment variable ‘SCHRODINGER_MAX_LAUNCH_CONCURRENCY’ at runtime.

For a complete description of all available keywords and their functions, refer to the table listed below.

Here is an example ‘hosts.yml’ file for Slurm submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):

entries:

  - name: driver
    tmpdir: /scr
    qargs: --partition=cpu_driver --nodes=1 --ntasks-per-node=%NPROC%
    processors: 2000
    processors_per_node: 4
    env:
      - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4

  - name: cpu
    tmpdir: /scr
    qargs: --partition=cpu --nodes=1 --ntasks-per-node=%NPROC%
    processors: 2000
    processors_per_node: 8

  - name: cpu_highmem
    tmpdir: /scr
    qargs: --partition=cpu --nodes=1 --ntasks-per-node=%NPROC% --mem-per-cpu 8G
    processors: 500
    processors_per_node: 4

  - name: gpu
    tmpdir: /scr
    qargs: --partition=gpu --nodes=1 --ntasks-per-node=%NPROC% --gres=gpu:%NPROC%
    processors: 100
    gpgpu:
      - index: 0
        description: Tesla T4
      - index: 1
        description: Tesla T4
    cuda_cores: 2560

Here is an example ‘hosts.yml’ file for SGE submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):

entries:

  - name: driver
    tmpdir: /scr
    qargs: -q cpu_driver.q -pe smp %NPROC%
    processors: 2000
    processors_per_node: 4
    env:
      - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4

  - name: cpu
    tmpdir: /scr
    qargs: -q cpu.q -pe smp %NPROC%
    processors: 2000
    processors_per_node: 8

  - name: cpu_highmem
    tmpdir: /scr
    qargs: -q cpu.q -pe smp %NPROC% -l h_vmem=8G
    processors: 500
    processors_per_node: 4

  - name: gpu
    tmpdir: /scr
    qargs: -q gpu.q -pe smp %NPROC% -l gpu=%NPROC%
    processors: 100
    gpgpu:
      - index: 0
        description: Tesla T4
      - index: 1
        description: Tesla T4
    cuda_cores: 2560				

Here is an example ‘hosts.yml’ file for PBS Pro submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):

entries:

  - name: driver
    tmpdir: /scr
    qargs: -q cpu_driver -l select=1:ncpus=%NPROC%
    processors: 2000
    processors_per_node: 4
    env:
      - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4

  - name: cpu
    tmpdir: /scr
    qargs: -q cpu -l select=1:ncpus=%NPROC%
    processors: 2000
    processors_per_node: 8

  - name: cpu_highmem
    tmpdir: /scr
    qargs: -q cpu -l select=1:ncpus=%NPROC%:pmem=8gb
    processors: 500
    processors_per_node: 4

  - name: gpu
    tmpdir: /scr
    qargs: -q gpu -l select=1:ncpus=%NPROC%:ngpus=%NPROC%
    processors: 100
    gpgpu:
      - index: 0
        description: Tesla T4
      - index: 1
        description: Tesla T4
    cuda_cores: 2560		

Here is an example ‘hosts.yml’ file for LSF submissions with four host entries (‘driver’, ‘cpu’, ‘cpu_highmem’, and ‘gpu’):

entries:

  - name: driver
    tmpdir: /scr
    qargs: -q cpu_driver -n %NPROC% -R "span[hosts=1]"
    processors: 2000
    processors_per_node: 4
    env:
      - SCHRODINGER_MAX_LAUNCH_CONCURRENCY=4

  - name: cpu
    tmpdir: /scr
    qargs: -q cpu -n %NPROC% -R "span[hosts=1]"
    processors: 2000
    processors_per_node: 8

  - name: cpu_highmem
    tmpdir: /scr
    qargs: -q cpu -n %NPROC% -R "span[hosts=1]" -R "rusage[mem=8192]"
    processors: 500
    processors_per_node: 4

  - name: gpu
    tmpdir: /scr
    qargs: -q gpu -n %NPROC% -R "span[hosts=1]" -R "rusage[ngpus_excl_p=1]"
    processors: 100
    gpgpu:
      - index: 0
        description: Tesla T4
      - index: 1
        description: Tesla T4
    cuda_cores: 2560				
NOTE: The SCHRODINGER_MAX_LAUNCH_CONCURRENCY environment variable limits the number of concurrent threads that can be used by a Schrödinger application for launching subjobs. Learn more about Job Server Environment Variables.

 

 Validate the hosts.yml file

Run the following command to check if the created / configured hosts.yml file is valid with respect to the YAML syntax:

sudo -u jobserver $SCHRODINGER/jsc admin check-hosts-config <jobserver_dir>/config/hosts.yml

 

 Reload without restarting

To update Job Server to reflect the changes to the host config without restarting the server, run:

sudo -u jobserver $SCHRODINGER/jsc admin reload-hosts <hostname>

 


 

 

Situations to consider

 

Specify compute Schrödinger installations

During Job Server installation there was an option to specify the parent directory of Schrödinger installations. If that step was skipped or if you have multiple parent directories of Schrödinger Installations, follow the steps in Specify compute Schrödinger installations.

 

Multiple Clusters with their own Job Server

If you have multiple clusters, each with their own Job Server, and users will be submitting jobs to all of them, the host entry names must be unique across all of those Job Servers. For example, you can’t have a ‘cpu’ entry in more than one Job Server configuration, because the client would not be able to determine the cluster to which the job should be submitted. One solution would be making the names unique by prefixing them with cluster labels, such as‘cluster1_cpu’ and ‘cluster2_cpu’. This also makes it clearer to users which compute resources they are choosing for their jobs.

 

Converting prior versions of Job Server from schrodinger.hosts to hosts.yml

Customers who were using Job Server prior to the 2025-3 release should already have a schrodinger.hosts file configured in one or more Schrodinger installation directories. One of these files can be converted to the Job Server <jobserver_dir>/config/hosts.yml file by running the command:

sudo -u jobserver $SCHRODINGER/jsc admin convert-schrodinger-hosts --job-server-dir=<jobserver_dir> --schrod=<schrodinger_installation_with_hosts_file>

This command will create the <jobserver_dir>/config/hosts.yml file.

 

Note that there are a number of ‘schrodinger.hosts’ keywords that are no longer supported/relevant for the centralized Job Server ‘hosts.yml’ hosts configuration (e.g., ‘schrodinger’, ‘queue’, ‘base’). Also, this tool will convert all the host entries it finds in the file (except ‘localhost’), so you might need to remove those that don’t pertain to the cluster for which you’re configuring Job Server.

If you do edit the hosts.yml file, be sure to  Validate the hosts.yml file

 

Modifying host entries for an active Job Server

If the ‘hosts.yml’ file is modified (e.g., to add/remove/alter entries) after Job Server has been started, Job Server must be reloaded by running the following command before the changes will take effect: 

sudo -u jobserver $SCHRODINGER/jsc admin reload-hosts <hostname>