Full 300-drug HDAC2 screen: specificity achieved, BG-1003 top hit

Corrected pipeline ran clean: 1 MSA query, 299/299 screened, 0 failures, ~$5-8 (vs the fragile 2.5hr/$15 version). Results: - Scale validation: HDAC inhibitors rank 1-9 (>=0.99); valproic-acid 0.90. - DECISIVE specificity: best negative control = cetirizine rank 44 (P=0.39); all 26 negative controls rank low. Co-folding REJECTS unrelated drugs -- exactly what connectivity could not do (where norethindrone/ciprofloxacin ranked top). The modality-pivot thesis vindicated at screen scale. - Discovery: BG-1003 (rank 5, P=0.997, random sample) is the standout non-obvious binder, above several known HDAC inhibitors; also JW55, BRD-K14666757. 11 drugs P>0.9 (8 known inhibitors + 3 non-obvious). Caveats kept honest: BG-1003 may be a known HDAC inhibitor in the random sample (validation, not novelty) -- needs identity check; binding != efficacy; prodrug/macrocycle false negatives. Full ranking in docs/results/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:05:06 +02:00
parent 98c99bc271
commit 71069d3f10
2 changed files with 328 additions and 1 deletions
--- a/docs/structure_binding_notes.md
+++ b/docs/structure_binding_notes.md
@@ -160,8 +160,35 @@ model. The screen mishandles non-standard chemotypes (prodrugs, macrocycles).
 The screening pipeline is validated. Next: run the full set (incl. the 240 random + negatives) to
 hunt for NON-obvious HDAC2 binders (the actual discovery run), ~$15-20.

+## Step 7 — Full 300-drug discovery screen vs HDAC2 (2026-06-26)
+
+`modal run gpu/modal_app.py::screen` — 299 drugs co-folded vs HDAC2 (+Zn), ranked by P(binder).
+Corrected pipeline: 1 MSA query (computed once, reused), 299/299 screened, 0 failures, ~$5-8.
+
+**Scale validation:** HDAC inhibitors occupy ranks 1-9 (trichostatin-A, vorinostat, panobinostat,
+belinostat, scriptaid, mocetinostat, entinostat, apicidin; all ≥0.99), weak valproic-acid demoted
+to 0.90. The pilot held at 300.
+
+**DECISIVE — specificity (the thing connectivity could NOT do):** the best-scoring negative
+control is cetirizine at **rank 44, P=0.39**. All 26 negative controls (antifungals,
+antihistamines, antibiotics, hormones) rank low — co-folding REJECTS the unrelated drugs. This is
+the exact failure mode that sank the connectivity approach (negative controls ranked top there);
+structure-based binding has the specificity expression-connectivity fundamentally lacked.
+
+**Discovery hits (non-obvious high-P binders):** BG-1003 (rank 5, P=0.997, general_sample),
+BRD-K14666757 (10, 0.968), JW55 (11, 0.936, a tankyrase inhibitor), FIT (13, 0.831). 11 drugs
+score P>0.9 = 8 known HDAC inhibitors + 3 non-obvious. BG-1003 is the standout — a "random" LINCS
+compound scoring as a near-certain HDAC2 binder above several known inhibitors.
+
+**Honest caveats:** BG-1003 may be a known HDAC inhibitor that landed in the random sample (→
+validation, not novelty) — needs an identity/literature check before any claim. Several hits are
+unannotated BRD tool compounds. Binding != efficacy. The screen still mishandles prodrugs/
+macrocycles (romidepsin-type false negatives). Full ranking: docs/results/screen_HDAC2_full.csv.
+
 ## Next steps
- [ ] Full screen (300 drugs) vs HDAC2 — discovery run for non-obvious binders.
+- [ ] Identity-check BG-1003 and the BRD hits (ChEMBL/literature): known HDAC binders or novel?
+- [ ] Pose-RMSD the top non-obvious hits (geometry sanity, like vorinostat).
+- [ ] Extend the screen to other validated HbF/hemoglobin targets; integrate with the expression layer.
 - [ ] Investigate PKR: allosteric site may need the full assembly / better pocket definition.
 - [ ] Phase 2 screen: rank the ~300-drug set against HDAC2 (the validated target) by P(binder);
  positive-control recovery test at screen scale.