Configure queuing system license checking with flexible GPU licensing

 

In order for the queuing system based Setting Up License Checking for Queueing Systems mechanism to reserve the correct number of licenses each GPU host entry must be configured to submit to one specific GPU type and have a field that specifies the number of cuda compute cores available on a single one of those GPU types. (See Table below)

In most cases this will just mean adding a single line to your existing gpu schrodinger.hosts file entries:

cuda_cores: [enter number here]

For example, a host entry for a Slurm queue submitting a node with 4 V100 GPUs would look like this:

name: slurm-v100-gpu
host: headnode.mycluster.com
schrodinger: /opt/bin/schrodinger2022-2
queue: SLURM2.1
qargs: --partition=SLURM-partition --ntasks=%NPROC% --gres=gpu:type:%NPROC%
tmpdir: /usr/local/tmp
gpgpu: 0, Tesla V100
gpgpu: 1, Tesla V100
gpgpu: 2, Tesla V100
gpgpu: 3, Tesla V100
cuda_cores: 5120

 

How to find the 'cuda_cores' values?

  1. First, log into the GPU computer node.
  2. Then, run the following command:

    $SCHRODINGER/utilities/query_gpgpu -a
    NOTE: From the 2024-2 release onward, the value given by the "Discounted Cores" is the “cuda_cores” value. Before the 2024-2 release, "Total Cores" is the “cuda_cores” value.

     

  GPU Card MIG Instance CUDA Cores License Usage Notes  
P100

 

3584 .69 CUDA cores refer to the PCIe model (not SXM or NVL)
P40

 

3540 .75  
V100

 

5120 1.00 CUDA cores refer to the PCIe model (not SXM or NVL)
T4

 

2560 .50  
A100

 

6912 1.38 CUDA cores refer to the PCIe model (not SXM or NVL)
A100

3g

2688 .50 CUDA cores refer to the PCIe model (not SXM or NVL)
A100

2g

1792 .38 CUDA cores refer to the PCIe model (not SXM or NVL)
A100

1g

869 .19 CUDA cores refer to the PCIe model (not SXM or NVL)
L4

 

5120 1.00 L4 prior to Schrödinger release 2024-2 is 7424
H100

 

14592 2.88 CUDA cores refer to the PCIe model (not SXM or NVL)
H100

3g

5888 1.13 CUDA cores refer to the PCIe model (not SXM or NVL)
H100

2g

3840 .75 CUDA cores refer to the PCIe model (not SXM or NVL)
H100

1g

1792 .38 CUDA cores refer to the PCIe model (not SXM or NVL)
B200 (SXM)

 

18944 3.7  
B200 (SXM)

3g

8960 1.75  
B200 (SXM)

2g

4608 0.9  
B200 (SXM)

1g

2304 0.45  
Quadro P5000

 

2560 .50  
Quadro RTX 5000

 

3072 .63  
Quadro RTX A5000

 

8192 1.63  
Quadro RTX A4000

 

6144 1.19  
RTX PRO 6000 Blackwell

 

24064 4.7  
RTX PRO 6000 Blackwell

2g

12032 2.35  
RTX PRO 6000 Blackwell

1g

5888 1.15