flowchart TD
step_1("Literature review")
step_2("Goal confirmation")
step_3a("Select and download
a ligand library")
step_3b("Library file conversions")
step_3c("Library profiling")
step_4a("Property filtering
with the
library design command")
step_4b("PDF generation")
step_4c("Filter with CanvasSearch")
step_4d("Filter with
ligand filter files
during LigPrep")
step_5a("Select a library subset
for pilot screening")
step_5b("Add actives to pilot
library for model validation")
step_5c("Calculate enrichment
post pilot screen")
step_6("Screening workflows")
step_1 --> step_2
subgraph three[" "]
subgraph three_step["Library selection & profiling"]
step_3a --> step_3b
step_3b --> step_3c
end
end
subgraph four[" "]
subgraph four_step["Filtering & labeling"]
step_4a --> step_4b
step_4b --> step_4c
step_4c --> step_4d
end
end
subgraph five[" "]
subgraph five_step["Prepare for pilot screen validation run"]
step_5a --> step_5b
step_5b --> step_5c
end
end
step_2 --> three
three <--Realign/adjust goal--> four
four <--Expand or change library--> five
five<--Expand or adjust filters-->step_6
classDef path_title stroke-width:2px,fill:#12122c,stroke:#12122c
classDef simple_step stroke-width:2px,fill:#12122c,stroke:#12122c
classDef simple_step_3 stroke-width:2px,fill:#005aaa,stroke:#005aaa
classDef simple_step_4 stroke-width:2px,fill:#4cb748,stroke:#4cb748
classDef simple_step_5 stroke-width:2px,fill:#f37c28,stroke:#f37c28
classDef subgraph_3 font-size:1.1rem,stroke-width:0px,fill:#def2fb
classDef subgraph_4 font-size:1.1rem,stroke-width:0px,fill:#baddae
classDef subgraph_5 font-size:1.1rem,stroke-width:0px,fill:#fbc69d
classDef subgraph_main_3 stroke-width:0px,fill:#def2fb
classDef subgraph_main_4 stroke-width:0px,fill:#BADDAE
classDef subgraph_main_5 stroke-width:0px,fill:#fbc69d
class step_Library_Design path_title
class step_1,step_2,step_6a,step_6b,step_6 simple_step
class step_3,step_3a,step_3b,step_3c simple_step_3
class step_4,step_4a,step_4b,step_4c,step_4d simple_step_4
class step_5,step_5a,step_5b,step_5c simple_step_5
class three_step,four_step,five_step,three,four,five subgraph_3
Learning Path: Library Design
Literature Review
Before a screening campaign begins, all relevant literature and data are gathered to guide library design and project decisions. This stage involves collaboration among medicinal, computational, and synthetic chemists, biologists, and project leaders.
During the literature review, experts analyze the biological target, identifying past research, liabilities, and reasons for failures. Searches in patents, the PDB, and ChEMBL uncover existing compounds, IP concerns, synthetic challenges, false positives, and ADME optimization opportunities, informing later scaffold filtering. Gathering published 3D models and SAR aids ligand design and library selection by highlighting key protein-ligand interactions and binding site characteristics, refining filtering criteria and modeling constraints.
Goal Confirmation
The goal confirmation stage ensures team alignment before a screening campaign by validating the target selection rationale and defining hit selection criteria. For instance, the virtual screen should be designed to meet library size and diversity goals while addressing medicinal chemistry requirements. This stage also assesses whether the screening strategy and library design filters are likely to yield meaningful discoveries. Throughout goal confirmation, literature insights are revisited to refine the approach. By setting clear criteria, only relevant compounds are included in the ligand library, improving the project's chances of success.
Library selection & profiling
Select and download a ligand library
When selecting a ligand library, researchers consider diversity, property space, availability, metadata, and size to align with project goals. Libraries are typically downloaded as SDF files from public databases like ZINC and ChEMBL, purchased from vendors, or designed in-house. Some libraries contain virtual compounds generated through enumeration, while others consist of real, synthesizable compounds, preferred for actionable results and seamless transition to lab-based assays.
Library file conversions
Vendors typically provide ligand libraries in formats like SDF or SMILES, but for effective prefiltering during library design, these large files often need to be condensed for streamlined 2D editing and analysis.
Library profiling
Ligand profiling analyzes composition, adds calculated properties, and processes millions of compounds in parallel using library_analysis.py. It requires a CSV file with a “SMILES” header and takes ~4 CPU hours per million compounds using RDKit for 2D properties. Running in 3D mode with LigPrep enables pKa calculations but is much slower (~800 CPU hours per million compounds). A YAML file can control batch size, host settings, and LigPrep parameters.
Filtering & Labeling
Property filtering with library design command
Ligand filtering requires a YAML file to define criteria and takes about 10 CPU minutes per million compounds. Available properties for filtering can be found in the CSV headers, with numeric properties filtered by range and non-numeric ones by exact match.
PDF generation
The final step in the library analysis workflow is PDF generation. This process takes only a few minutes for a 10 million compound library. Plotting the library to assess the impact of a filter is often done many times prior to screening.
Filter with CanvasSearch
CanvasSearch matches target molecules against SMARTS queries, returning only those that meet all criteria by default, though users can specify a required match count. Substructure searches can follow “any” or “all” criteria, and recursion settings allow searching for moieties like di-ketones or exact matches. The script also supports filtering using standard REOS rules or user-defined queries. Schrödinger recommends testing multiple search configurations to optimize results.
Filter with ligand filter files during LigPrep
The ligfilter utility filters structure files based on properties and descriptors. It supports filtering by Maestro properties, predefined feature counts, and SMARTS pattern matches for functional groups. The output file indicates criteria that failed the filter. LigFilters can be applied during or after Ligand Preparation, typically specifying atom-level or microstate properties like tautomer probability, which is useful for avoiding protonated or newly incorporated molecules from LigPrep.
Prepare for pilot screen validation run
Select a library subset for pilot screening
Pilot screens and validation runs benchmark performance, and test filters, ligand libraries, and screening constraints, providing an opportunity to evaluate top ranked compounds and refine the process. It’s highly recommended to run one or more smaller pilot screens with representative ligand subsets from the full library before pursuing a larger-scale virtual screen. The length of the full screen and computational resource requirements can also be estimated from the pilot run.
Add actives to pilot library for model validation
Model validation requires a test library of known actives and unknowns or decoy compounds. Model validation is often automated, but understanding the purpose of enrichment remains essential. Active ligands have experimental binding affinity for the target. or expected preferential binding. How research teams define “actives” can vary for each project.
Calculate enrichment post pilot screen
Enrichment is essentially a high true positive rate in the top-ranked compounds. What we are looking for is the ability of the docking model to score known actives better than decoy compounds.
Screening workflows
In silico examples:
- Active learning Glide
- Property prediction machine learning models (QSAR)
- High-throughput virtual screening (HTVS)
Experimental examples:
- High-throughput lab-based screening (HTS)
- Phenotypic screening