Files
Reverso/docs/structure_binding_notes.md
Junior B. 71069d3f10 Full 300-drug HDAC2 screen: specificity achieved, BG-1003 top hit
Corrected pipeline ran clean: 1 MSA query, 299/299 screened, 0 failures,
~$5-8 (vs the fragile 2.5hr/$15 version).

Results:
- Scale validation: HDAC inhibitors rank 1-9 (>=0.99); valproic-acid 0.90.
- DECISIVE specificity: best negative control = cetirizine rank 44 (P=0.39);
  all 26 negative controls rank low. Co-folding REJECTS unrelated drugs --
  exactly what connectivity could not do (where norethindrone/ciprofloxacin
  ranked top). The modality-pivot thesis vindicated at screen scale.
- Discovery: BG-1003 (rank 5, P=0.997, random sample) is the standout
  non-obvious binder, above several known HDAC inhibitors; also JW55,
  BRD-K14666757. 11 drugs P>0.9 (8 known inhibitors + 3 non-obvious).

Caveats kept honest: BG-1003 may be a known HDAC inhibitor in the random
sample (validation, not novelty) -- needs identity check; binding != efficacy;
prodrug/macrocycle false negatives. Full ranking in docs/results/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:05:06 +02:00

203 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Structure-based binding track — working notes
Branch `structure-based-binding`. Implements PLAN §12. Baseline-first, start with the two cleanest
targets (Hemoglobin + PKR), de-risk the harness before scaling.
## Status (2026-06-23)
**Toolchain check (PLAN §12.6 pitfall 4, confirmed real):**
- ✅ RDKit installs on ARM Mac — ligand side ready.
- ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3)
is **blocked on toolchain** — must resolve via conda/micromamba (`vina`/`smina`), a GPU AF3-class
model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.
**Structures obtained:** `5E83` (hemoglobin + voxelotor), `8XFD` (PKR + mitapivat) in
`data/raw/structures/`.
**Step 0 — ligand-based retrieval baseline (`scripts/binding_ligand_baseline.py`):**
RDKit Tanimoto of our 300 drugs vs known sickle binders.
- Engine VALIDATED on in-set classes: `decitabine`→azacitidine (0.62); `vorinostat`→scriptaid
(0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers.
- But voxelotor / mitapivat have **no analog** in our set (max Tanimoto ~0.200.26). A 300-drug
library is too sparse to contain look-alikes of distinctive scaffolds.
**Takeaways:**
1. Ligand retrieval works but needs a **bigger drug library** to be useful for distinctive targets.
2. The targets without in-set analogs (Hb, PKR) need **actual docking** (§12.3) — which scores
binding directly, no look-alike required. That is the gating next step, and it needs the
toolchain solved.
## Step 1 — docking baseline (2026-06-24)
**Toolchain SOLVED on ARM Mac:** AutoDock Vina 1.2.5 mac binary (`tools/vina`, runs under Rosetta)
+ open-babel (brew) for prep. Docking runs end-to-end (`scripts/dock_positive_controls.py`).
Co-crystal ligands identified: 5L7 = voxelotor (5E83), WV2 = mitapivat (8XFD).
**Positive-control cross-docking — inconclusive, and instructively so.** Affinities (kcal/mol):
| ligand | hemoglobin | PKR |
|---|---|---|
| voxelotor | 8.1 | 9.3 |
| mitapivat | 10.0 | 11.2 |
| decitabine | 6.6 | 7.0 |
| hydroxyurea | 3.9 | 3.6 |
| caffeine | 6.1 | 6.4 |
The scores rank almost perfectly by **molecular size** (mitapivat > voxelotor > decitabine/caffeine
> hydroxyurea) in *both* pockets — mitapivat wins even on hemoglobin, which it doesn't target. So
raw Vina affinity is confounded by ligand size and per-pocket stickiness; it cannot yet
distinguish target-specific binding. This is the **docking analog of the connectivity specificity
problem** — raw scores carry a systematic bias (size here, broadness there) that masquerades as
signal. voxelotor *does* dock to Hb (8.1, a real score); the cross-target test just isn't the
right validation.
## Step 2 — redocking-RMSD validation FAILS across the board (2026-06-24)
Redocked each co-crystal ligand into its own structure (`scripts/dock_validate.py`); RMSD vs
crystal pose via obrms:
| redock | RMSD | note |
|---|---|---|
| voxelotor → Hb (5E83) | NA | covalent binder (Schiff base, αVal1) — out of scope §12.7 |
| mitapivat → PKR (8XFD) | 8.2 Å | allosteric, cofactor (FBP/Mg) stripped |
| **vorinostat → HDAC2 (4LXZ, Zn kept)** | **7.9 Å** | classical non-covalent target — should have worked |
**The clean target also failing means this is a systematic PIPELINE-QUALITY problem, not target
choice.** The cheap Vina + open-babel setup produces scores but does not reproduce known binding
geometry, so its affinities are not yet trustworthy. Ligand efficiency (affinity / heavy atoms)
also doesn't fix it — it over-corrects, ranking tiny hydroxyurea (0.78) "best".
Likely causes (in priority order):
1. **Low-quality receptor prep** — open-babel `-xr` is not production docking prep. Need
AutoDockTools `prepare_receptor` or **Meeko** + `reduce`/pdb2pqr for protonation, charges, and
proper AutoDock atom typing.
2. **Ligand prep** — should use Meeko (correct rotatable bonds / typing), not bare obabel `--gen3d`.
3. **RMSD metric** — obrms superimposes before RMSD; redocking validation wants symmetry-corrected
RMSD **in place** (receptor frame). Worth confirming with an in-place metric.
**Honest takeaway:** consistent with the whole project — the *quick* version of each method runs
but doesn't survive honest validation. Credible structure-based docking needs production prep
tooling (Meeko/ADFR), which is the real next investment for this track.
## Step 3 — production prep helps, but classical docking is the wrong tool here (2026-06-24)
`scripts/dock_production.py`: Meeko ligand prep (proper rotatable-bond/AD typing) + in-place
symmetry-corrected RMSD (spyrmsd, not obrms which superimposes). On the clean HDAC2/vorinostat
target (Zn kept):
- **7.9 Å → 4.76 Å** with proper ligand prep + correct metric. Prep and metric genuinely mattered.
- But still FAIL (>2 Å). The residual is the deeper problem: **vorinostat binding is defined by its
hydroxamate chelating the catalytic Zn, and Vina has no metal-coordination term** — it cannot
score the interaction that determines the pose.
**The real finding: sickle's druggable targets bind via non-classical chemistry that classical
docking handles poorly** — Hb/voxelotor (covalent), PKR/mitapivat (allosteric + cofactor),
HDAC/vorinostat (Zn chelation). This is the target landscape, not bad luck. AutoDock Vina is the
wrong tool for it.
**Redirect:** the modality that DOES handle covalent/metal/induced-fit binding is **data-driven
AF3-class co-folding** (Boltz-2 / Chai-1 / DiffDock — they learn these modes from the PDB). That is
the indicated next tool for this disease — and it's gated by the **24 GB local memory ceiling**
(PLAN §12.6 pitfall 4): needs a cloud GPU or a bigger box. The "GPU breaks all-local" prediction is
now the binding constraint of the whole track.
## Step 4 — AF3-class co-folding (Boltz-2 on Modal GPU) WORKS on the Zn case (2026-06-24)
Ran Phase 1 on Modal (L4, serverless) — `gpu/modal_app.py`, ~$0.600.80. Co-folded each known
binder + 2 negatives into each target WITH the binding-mode cofactors (HDAC2+Zn, PKR+FBP/Mg,
Hb+heme). Ranked by Boltz-2 P(binder):
| target | known binder P(binder) | negatives | verdict |
|---|---|---|---|
| **HDAC2 (+Zn)** | vorinostat **0.9994** | caffeine 0.12, hydroxyurea 0.77 | **PASS — decisive** |
| hemoglobin (+heme) | voxelotor 0.46 | caffeine 0.34, hydroxyurea 0.07 | PASS (weak) |
| PKR (+FBP/Mg) | mitapivat 0.32 | hydroxyurea 0.40 (beat it) | FAIL |
**Headline: HDAC2 + zinc — the exact case Vina failed (7.9 Å redock, no metal term) — co-folding
NAILS** (vorinostat 0.999 vs negatives ~0.1). The data-driven model handles the Zn-chelation
chemistry classical docking could not. The modality pivot is validated on its decisive test.
The first clear positive result in the project after a long string of honest negatives.
Notes: (1) affinity sign confirmed — vorinostat has the lowest affinity_pred_value (1.78,
strongest) AND highest P(binder); ranking by max(affinity) would be backwards (the P(binder) fix
was necessary). (2) 2/3: Hb weak (covalent/tetramer, as predicted), PKR miss (allosteric pocket).
(3) Engineering: had to add cuequivariance kernels to the image; serialize (max_containers=1) so
the weights download once (parallel containers corrupted the checkpoint).
## Step 5 — pose-RMSD confirmation: co-folding reproduces the geometry (2026-06-24)
`scripts/pose_rmsd.py`: superpose predicted HDAC2 onto crystal 4LXZ (Ca), transform the predicted
vorinostat + Zn, compare poses (spyrmsd, in-place).
| metric | co-folding | classical Vina |
|---|---|---|
| protein fold, Ca RMSD over 366 res | **0.14 A** | — |
| **vorinostat pose RMSD** | **0.21 A** PASS | 7.9 A FAIL |
| catalytic Zn placement | 2.73 A (minor) | no metal term |
Co-folding reproduces the fold AND the vorinostat binding pose to **0.21 A (crystal-accurate)** on
the exact Zn-chelation case Vina was off by 7.9 A. HDAC2 is now validated on BOTH axes: affinity
(P(binder)=0.999) and geometry (0.21 A). Minor blemish: the Zn ion itself lands ~2.7 A off (ligand
perfect, metal slightly less). The structure-binding modality is comprehensively validated on its
decisive metal-coordination case.
## Step 6 — Phase 2 screen pilot (HDAC2): recovers the inhibitor class decisively (2026-06-24)
`modal run gpu/modal_app.py::screen --limit 24` — co-fold 24 drugs vs HDAC2 (+Zn), parallel
(~10-wide, cached weights), ranked by P(binder). ~$1.3.
**Top 9 = all HDAC inhibitors** (trichostatin-A, vorinostat, panobinostat, belinostat, scriptaid,
mocetinostat, entinostat, apicidin all P≥0.99; valproic-acid 0.91), then a clean drop to
hydroxyurea 0.78 and non-HDAC drugs sinking to dexamethasone 0.03. The model captures the
**structure-activity gradient**: potent Zn-chelating hydroxamates ~0.99 vs weak fatty-acid
valproate 0.91 vs DNMT inhibitors / IMiDs / steroid near 0. The screen recovers the right
pharmacological class at scale.
**Honest false negative:** romidepsin (potent HDAC inhibitor) ranked low (0.43) — it is a cyclic
depsipeptide PRODRUG (must be reduced to expose its Zn-binding thiol), which co-folding doesn't
model. The screen mishandles non-standard chemotypes (prodrugs, macrocycles).
The screening pipeline is validated. Next: run the full set (incl. the 240 random + negatives) to
hunt for NON-obvious HDAC2 binders (the actual discovery run), ~$15-20.
## Step 7 — Full 300-drug discovery screen vs HDAC2 (2026-06-26)
`modal run gpu/modal_app.py::screen` — 299 drugs co-folded vs HDAC2 (+Zn), ranked by P(binder).
Corrected pipeline: 1 MSA query (computed once, reused), 299/299 screened, 0 failures, ~$5-8.
**Scale validation:** HDAC inhibitors occupy ranks 1-9 (trichostatin-A, vorinostat, panobinostat,
belinostat, scriptaid, mocetinostat, entinostat, apicidin; all ≥0.99), weak valproic-acid demoted
to 0.90. The pilot held at 300.
**DECISIVE — specificity (the thing connectivity could NOT do):** the best-scoring negative
control is cetirizine at **rank 44, P=0.39**. All 26 negative controls (antifungals,
antihistamines, antibiotics, hormones) rank low — co-folding REJECTS the unrelated drugs. This is
the exact failure mode that sank the connectivity approach (negative controls ranked top there);
structure-based binding has the specificity expression-connectivity fundamentally lacked.
**Discovery hits (non-obvious high-P binders):** BG-1003 (rank 5, P=0.997, general_sample),
BRD-K14666757 (10, 0.968), JW55 (11, 0.936, a tankyrase inhibitor), FIT (13, 0.831). 11 drugs
score P>0.9 = 8 known HDAC inhibitors + 3 non-obvious. BG-1003 is the standout — a "random" LINCS
compound scoring as a near-certain HDAC2 binder above several known inhibitors.
**Honest caveats:** BG-1003 may be a known HDAC inhibitor that landed in the random sample (→
validation, not novelty) — needs an identity/literature check before any claim. Several hits are
unannotated BRD tool compounds. Binding != efficacy. The screen still mishandles prodrugs/
macrocycles (romidepsin-type false negatives). Full ranking: docs/results/screen_HDAC2_full.csv.
## Next steps
- [ ] Identity-check BG-1003 and the BRD hits (ChEMBL/literature): known HDAC binders or novel?
- [ ] Pose-RMSD the top non-obvious hits (geometry sanity, like vorinostat).
- [ ] Extend the screen to other validated HbF/hemoglobin targets; integrate with the expression layer.
- [ ] Investigate PKR: allosteric site may need the full assembly / better pocket definition.
- [ ] Phase 2 screen: rank the ~300-drug set against HDAC2 (the validated target) by P(binder);
positive-control recovery test at screen scale.
- [ ] Add a one-time weight-warmup function so post-cache runs go back to fast parallel safely.
- [ ] (optional) Salvage one classical Vina case: PKR with FBP/Mg cofactors RETAINED, to confirm
the harness can validate on a non-metal sickle target.
- [ ] Production receptor prep (Meeko mk_prepare_receptor + protonation) if staying with Vina.
- [ ] §12.9 generative beacon — only after a validated scoring function exists.
> **Hardware note:** this machine is **24 GB** unified memory (not the 96 GB PLAN §2 assumed),
> which caps local AF3-class model inference. Classical docking (above) is unaffected.