Full 300-drug HDAC2 screen: specificity achieved, BG-1003 top hit

Corrected pipeline ran clean: 1 MSA query, 299/299 screened, 0 failures,
~$5-8 (vs the fragile 2.5hr/$15 version).

Results:
- Scale validation: HDAC inhibitors rank 1-9 (>=0.99); valproic-acid 0.90.
- DECISIVE specificity: best negative control = cetirizine rank 44 (P=0.39);
  all 26 negative controls rank low. Co-folding REJECTS unrelated drugs --
  exactly what connectivity could not do (where norethindrone/ciprofloxacin
  ranked top). The modality-pivot thesis vindicated at screen scale.
- Discovery: BG-1003 (rank 5, P=0.997, random sample) is the standout
  non-obvious binder, above several known HDAC inhibitors; also JW55,
  BRD-K14666757. 11 drugs P>0.9 (8 known inhibitors + 3 non-obvious).

Caveats kept honest: BG-1003 may be a known HDAC inhibitor in the random
sample (validation, not novelty) -- needs identity check; binding != efficacy;
prodrug/macrocycle false negatives. Full ranking in docs/results/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-26 10:05:06 +02:00
parent 98c99bc271
commit 71069d3f10
2 changed files with 328 additions and 1 deletions

View File

@@ -160,8 +160,35 @@ model. The screen mishandles non-standard chemotypes (prodrugs, macrocycles).
The screening pipeline is validated. Next: run the full set (incl. the 240 random + negatives) to
hunt for NON-obvious HDAC2 binders (the actual discovery run), ~$15-20.
## Step 7 — Full 300-drug discovery screen vs HDAC2 (2026-06-26)
`modal run gpu/modal_app.py::screen` — 299 drugs co-folded vs HDAC2 (+Zn), ranked by P(binder).
Corrected pipeline: 1 MSA query (computed once, reused), 299/299 screened, 0 failures, ~$5-8.
**Scale validation:** HDAC inhibitors occupy ranks 1-9 (trichostatin-A, vorinostat, panobinostat,
belinostat, scriptaid, mocetinostat, entinostat, apicidin; all ≥0.99), weak valproic-acid demoted
to 0.90. The pilot held at 300.
**DECISIVE — specificity (the thing connectivity could NOT do):** the best-scoring negative
control is cetirizine at **rank 44, P=0.39**. All 26 negative controls (antifungals,
antihistamines, antibiotics, hormones) rank low — co-folding REJECTS the unrelated drugs. This is
the exact failure mode that sank the connectivity approach (negative controls ranked top there);
structure-based binding has the specificity expression-connectivity fundamentally lacked.
**Discovery hits (non-obvious high-P binders):** BG-1003 (rank 5, P=0.997, general_sample),
BRD-K14666757 (10, 0.968), JW55 (11, 0.936, a tankyrase inhibitor), FIT (13, 0.831). 11 drugs
score P>0.9 = 8 known HDAC inhibitors + 3 non-obvious. BG-1003 is the standout — a "random" LINCS
compound scoring as a near-certain HDAC2 binder above several known inhibitors.
**Honest caveats:** BG-1003 may be a known HDAC inhibitor that landed in the random sample (→
validation, not novelty) — needs an identity/literature check before any claim. Several hits are
unannotated BRD tool compounds. Binding != efficacy. The screen still mishandles prodrugs/
macrocycles (romidepsin-type false negatives). Full ranking: docs/results/screen_HDAC2_full.csv.
## Next steps
- [ ] Full screen (300 drugs) vs HDAC2 — discovery run for non-obvious binders.
- [ ] Identity-check BG-1003 and the BRD hits (ChEMBL/literature): known HDAC binders or novel?
- [ ] Pose-RMSD the top non-obvious hits (geometry sanity, like vorinostat).
- [ ] Extend the screen to other validated HbF/hemoglobin targets; integrate with the expression layer.
- [ ] Investigate PKR: allosteric site may need the full assembly / better pocket definition.
- [ ] Phase 2 screen: rank the ~300-drug set against HDAC2 (the validated target) by P(binder);
positive-control recovery test at screen scale.