Library Analysis Workflow
Profiling Step
This step takes a CSV file with a “SMILES” in the header field.
The library_analysis.py script is used:
$SCHRODINGER/run library_analysis.py <database>.csv <database_profiled>.csv -simple2D -batch_size 100000 -HOST bolt_cpu
This takes about 4 cpu hours per 1M compounds. It uses mostly RDKit to compute the following 2D properties:
MW, HBA, RB, Total Rings, Aliph. Rings, Stereo Centers, Frac. CSP3, N+O, HAC, AlogP, CRB.Max, CRB.Mean, PSA, Arom. Rings, HBD, Unspecified Stereo Centers, HTL.MPO.geom, HTL.MPO.arim, HTL.MPO.sum, Class HTL('hit_to_lead' property-based risk-function).
Alternatively, you can run it in “3D” mode which runs LigPrep to get the pKa.
$SCHRODINGER/run library_analysis.py <database>.csv <database>_profiled.csv -batch_size 10000 -HOST bolt_cpu
The full workflow with LigPrep is much slower (~800 cpu hours per 1M compounds). The default settings for LigPrep is “-s 1 -nd -bvac -epik -W e,-best_neutral,-ph,7.4” and the LigPrep output will NOT be saved.
The output from this workflow will have the following properties in addition to the 2D workflow:
InChI=1S, InChIKey, Murko Scaffold SMILES, Murko InChI=1S, Murko InChIKey, Eccentricity, Best Neutral State Penalty, AlogD@7.4, Ion Class, Max. pKa, Min. pKa, Class, CNS.MPO (CNS MPO based on Wager et al. )
You can also use a YAML file to control more settings like different batch_size and host for each step, detail settings for LigPrep, etc.
Filtering Step
This step will need a YAML file to specify the filtering criteria. This step runs about 10 cpu mins per 1M compounds.
$SCHRODINGER/run library_analysis.py -filter <database>_profile.csv <database>_filtered.csv -config filter.yaml -HOST bolt_cpu
Below is an example of the filter.yaml file:
PropertyFilterProfiledMol:
property_ranges:
MW: [300, 350]
Class: ["Druglike", "Leadlike"]
Look in the CSV headers for the available property that can be used for filtering. The current implementation will do a range filtering for numeric properties and an exact match for non-numeric properties.
PDF Generation Step
This step uses database_report.py and takes a few minutes for a ~10M library.
$SCHRODINGER/run database_report.py <database>_profiled.csv <database>_report.pdf
An example for the PDF can be seen here.