Active Learning Glide

System Requirements

 

Supported Operating Systems

NOTE: Available on Linux only. See System Requirements for Linux.

 


Hardware Requirements

  Required Considerations
Driver

The driver (master job) must run for the complete duration of the job without being interrupted. This means the computing resource on which it runs cannot be a spot or preemptible cloud instance. These nodes can be pre-empted (terminated) and if that happens your whole job will be lost.

The -DRIVERHOST argument determines where the driver runs. Select a host entry that is for an on-demand (i.e. not preemptible) node type.

If sufficient licenses and computational resources are available to run multiple Active Learning Glide jobs simultaneously, it is recommended to configure the driver host entry so that it requests an entire node, to avoid multiple drivers potentially using the same node and scratch filesystem, and thereby doubling (or more) the space requirement.

Processor (CPU)

x86_64 compatible processor

For large jobs, computing on a cluster with a queueing system is recommended, with the following hardware components:

  • A highly capable file server for the external network.

  • Shared storage for the intra-cluster network, to reduce traffic to and from the external network.

  • Fast processors, large memory, and high-quality motherboards and network interfaces, especially on the management nodes.

System memory (RAM)

64 GB memory for the entire node

RAM is not related to the input file size, only the disk space is related.
Disk space

The amount of scratch space required on the DRIVERHOST is relative to the size of the input ligand file. See example below.

 

 


Scratch space example

Scratch requirements for an example are provided in red. All parameters are consistent with our recommendations for an ultra-large screen with AL-Glide.

For larger input files, please substitute the size of the input file to obtain correct estimates for your jobs.

 

Example of requirements based on inputs

Inputs for example:
  • 1 billion drug-like ligands in SMILES format (100GB)

  • 3 iterations of active-learning (-iter 3)

  • Batch training size of ligands. (-train_size 50000)

  • The top ligands after each iteration retained is 100M

  • Rescoring of the top 1M ligands with Glide SP (-num_rescore_ligands 1000000)

  • Write output poses in Maestro format for the rescored ligands (-write_pose)

  Required Optional
a single copy of the input file 100 GB  
The input ligand file split into individual sub-job input batches

100 GB

Series of csv files containing the predictions of the top 10% of each batch (sorted by uncertain). They are used to select input ligands for each iteration of training. 30 GB  
Series of csv files containing the ligand_ml predictions for the ligands in all the batches 100G * num_iteration

 

An output file for each iteration of training containing the predictions of the number of top-scoring compounds specified by the -keep command-line argument 30 GB  
    If -num_rescore_ligand is specified, a single csv file containing the top rescored poses with Glide SP compounds as specified by num_rescore_ligand. (200 MB)
    If -write_pose is provided, a Maestro file containing the poses of the rescored ligands. (2 GB)
Total disk space required 620 G(620 GB (=100 + 100 + 130*3 + 30) for 3 iterations 622.2 G (=100 + 100 + 130*3 + 30 + 0.2 + 2)

 

NOTE: The maximum recommended value for -training_size is 100,000

Benchmarks show that the enrichment score and recovery rate become asymptotic around a training size of 100,000. There is no indication that larger training sets are needed for performance (even with libraries on the order of billions of compounds).

Subjob Requirements

Requirements for memory, disk space, and recommended Google Cloud instance type are listed below.

All values based on the example workflow described above.

 

 

ML Training*

ML Evaluation

Glide Docking

Scratch Space 200GB 100GB 100GB
Memory 64GB (8 GB/CPU core) 32GB (4 GB/CPU core) 32GB (4 GB/CPU core)
Compatible with Preemptible Nodes No Yes Yes
Recommended GCP Node Type n1-highmem-8 n2-standard-8 n2-standard-8

* NVIDIATesla T4 GPUs recommended.

 


GPGPU Requirements

(General-purpose computing on graphics processing units)

We support the following NVIDIA solutions:

Architecture Server / HPC Workstation
Pascal

Tesla P40

Tesla P100

Quadro P5000
Volta

Tesla V100

 

Turing

Tesla T4

Quadro RTX 5000

Ampere

A100

RTX A4000

RTX A5000

Ada Lovelace

L4

RTX 4000 SFF Ada

RTX 2000 Ada

Hopper

H100

 

Blackwell

B200 (SXM)

RTX PRO 6000 Blackwell Workstation
RTX PRO 4000 Blackwell SFF

Unless otherwise specified, we only support and test on the PCIe variant of the cards listed above.

 

To check the compute capability of NVIDIA cards, see NVIDIA CUDA GPU Compute Capability.

Supported Linux drivers

  • We support only the NVIDIA recommended / certified / production branch' Linux drivers for these cards with minimum CUDA version 12.0. Download from NVIDIA's Drivers webpage.

Supported Multi-Instance GPU (MIG)

Pre-configured Schrödinger compatible GPU boxes

  • For information on pre-configured Schrödinger compatible GPU boxes see this article.

Notes

  • Standard support does not cover consumer-level GPU cards such as GeForce GTX cards. Learn more about our rigorous validation process and why we exclusively support professional-grade NVIDIA hardware in this article.
  • If you already have another NVIDIA GPGPU and would like to know if we have experience with it, please contact our support at help@schrodinger.com.