Reverso/docs/structure_binding_notes.md

# Structure-based binding track — working notes

Branch `structure-based-binding`. Implements PLAN §12. Baseline-first, start with the two cleanest
targets (Hemoglobin + PKR), de-risk the harness before scaling.

## Status (2026-06-23)

**Toolchain check (PLAN §12.6 pitfall 4, confirmed real):**
- ✅ RDKit installs on ARM Mac — ligand side ready.
- ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3)
  is **blocked on toolchain** — must resolve via conda/micromamba (`vina`/`smina`), a GPU AF3-class
  model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.

**Structures obtained:** `5E83` (hemoglobin + voxelotor), `8XFD` (PKR + mitapivat) in
`data/raw/structures/`.

**Step 0 — ligand-based retrieval baseline (`scripts/binding_ligand_baseline.py`):**
RDKit Tanimoto of our 300 drugs vs known sickle binders.
- Engine VALIDATED on in-set classes: `decitabine`→azacitidine (0.62); `vorinostat`→scriptaid
  (0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers.
- But voxelotor / mitapivat have **no analog** in our set (max Tanimoto ~0.20–0.26). A 300-drug
  library is too sparse to contain look-alikes of distinctive scaffolds.

**Takeaways:**
1. Ligand retrieval works but needs a **bigger drug library** to be useful for distinctive targets.
2. The targets without in-set analogs (Hb, PKR) need **actual docking** (§12.3) — which scores
   binding directly, no look-alike required. That is the gating next step, and it needs the
   toolchain solved.

## Step 1 — docking baseline (2026-06-24)

**Toolchain SOLVED on ARM Mac:** AutoDock Vina 1.2.5 mac binary (`tools/vina`, runs under Rosetta)
+ open-babel (brew) for prep. Docking runs end-to-end (`scripts/dock_positive_controls.py`).
Co-crystal ligands identified: 5L7 = voxelotor (5E83), WV2 = mitapivat (8XFD).

**Positive-control cross-docking — inconclusive, and instructively so.** Affinities (kcal/mol):

| ligand | hemoglobin | PKR |
|---|---|---|
| voxelotor | −8.1 | −9.3 |
| mitapivat | −10.0 | −11.2 |
| decitabine | −6.6 | −7.0 |
| hydroxyurea | −3.9 | −3.6 |
| caffeine | −6.1 | −6.4 |

The scores rank almost perfectly by **molecular size** (mitapivat > voxelotor > decitabine/caffeine
> hydroxyurea) in *both* pockets — mitapivat wins even on hemoglobin, which it doesn't target. So
raw Vina affinity is confounded by ligand size and per-pocket stickiness; it cannot yet
distinguish target-specific binding. This is the **docking analog of the connectivity specificity
problem** — raw scores carry a systematic bias (size here, broadness there) that masquerades as
signal. voxelotor *does* dock to Hb (−8.1, a real score); the cross-target test just isn't the
right validation.

## Step 2 — redocking-RMSD validation FAILS across the board (2026-06-24)

Redocked each co-crystal ligand into its own structure (`scripts/dock_validate.py`); RMSD vs
crystal pose via obrms:

| redock | RMSD | note |
|---|---|---|
| voxelotor → Hb (5E83) | NA | covalent binder (Schiff base, αVal1) — out of scope §12.7 |
| mitapivat → PKR (8XFD) | 8.2 Å | allosteric, cofactor (FBP/Mg) stripped |
| **vorinostat → HDAC2 (4LXZ, Zn kept)** | **7.9 Å** | classical non-covalent target — should have worked |

**The clean target also failing means this is a systematic PIPELINE-QUALITY problem, not target
choice.** The cheap Vina + open-babel setup produces scores but does not reproduce known binding
geometry, so its affinities are not yet trustworthy. Ligand efficiency (affinity / heavy atoms)
also doesn't fix it — it over-corrects, ranking tiny hydroxyurea (−0.78) "best".

Likely causes (in priority order):
1. **Low-quality receptor prep** — open-babel `-xr` is not production docking prep. Need
   AutoDockTools `prepare_receptor` or **Meeko** + `reduce`/pdb2pqr for protonation, charges, and
   proper AutoDock atom typing.
2. **Ligand prep** — should use Meeko (correct rotatable bonds / typing), not bare obabel `--gen3d`.
3. **RMSD metric** — obrms superimposes before RMSD; redocking validation wants symmetry-corrected
   RMSD **in place** (receptor frame). Worth confirming with an in-place metric.

**Honest takeaway:** consistent with the whole project — the *quick* version of each method runs
but doesn't survive honest validation. Credible structure-based docking needs production prep
tooling (Meeko/ADFR), which is the real next investment for this track.

## Step 3 — production prep helps, but classical docking is the wrong tool here (2026-06-24)

`scripts/dock_production.py`: Meeko ligand prep (proper rotatable-bond/AD typing) + in-place
symmetry-corrected RMSD (spyrmsd, not obrms which superimposes). On the clean HDAC2/vorinostat
target (Zn kept):

- **7.9 Å → 4.76 Å** with proper ligand prep + correct metric. Prep and metric genuinely mattered.
- But still FAIL (>2 Å). The residual is the deeper problem: **vorinostat binding is defined by its
  hydroxamate chelating the catalytic Zn, and Vina has no metal-coordination term** — it cannot
  score the interaction that determines the pose.

**The real finding: sickle's druggable targets bind via non-classical chemistry that classical
docking handles poorly** — Hb/voxelotor (covalent), PKR/mitapivat (allosteric + cofactor),
HDAC/vorinostat (Zn chelation). This is the target landscape, not bad luck. AutoDock Vina is the
wrong tool for it.

**Redirect:** the modality that DOES handle covalent/metal/induced-fit binding is **data-driven
AF3-class co-folding** (Boltz-2 / Chai-1 / DiffDock — they learn these modes from the PDB). That is
the indicated next tool for this disease — and it's gated by the **24 GB local memory ceiling**
(PLAN §12.6 pitfall 4): needs a cloud GPU or a bigger box. The "GPU breaks all-local" prediction is
now the binding constraint of the whole track.

## Step 4 — AF3-class co-folding (Boltz-2 on Modal GPU) WORKS on the Zn case (2026-06-24)

Ran Phase 1 on Modal (L4, serverless) — `gpu/modal_app.py`, ~$0.60–0.80. Co-folded each known
binder + 2 negatives into each target WITH the binding-mode cofactors (HDAC2+Zn, PKR+FBP/Mg,
Hb+heme). Ranked by Boltz-2 P(binder):

| target | known binder P(binder) | negatives | verdict |
|---|---|---|---|
| **HDAC2 (+Zn)** | vorinostat **0.9994** | caffeine 0.12, hydroxyurea 0.77 | **PASS — decisive** |
| hemoglobin (+heme) | voxelotor 0.46 | caffeine 0.34, hydroxyurea 0.07 | PASS (weak) |
| PKR (+FBP/Mg) | mitapivat 0.32 | hydroxyurea 0.40 (beat it) | FAIL |

**Headline: HDAC2 + zinc — the exact case Vina failed (7.9 Å redock, no metal term) — co-folding
NAILS** (vorinostat 0.999 vs negatives ~0.1). The data-driven model handles the Zn-chelation
chemistry classical docking could not. The modality pivot is validated on its decisive test.
The first clear positive result in the project after a long string of honest negatives.

Notes: (1) affinity sign confirmed — vorinostat has the lowest affinity_pred_value (−1.78,
strongest) AND highest P(binder); ranking by max(affinity) would be backwards (the P(binder) fix
was necessary). (2) 2/3: Hb weak (covalent/tetramer, as predicted), PKR miss (allosteric pocket).
(3) Engineering: had to add cuequivariance kernels to the image; serialize (max_containers=1) so
the weights download once (parallel containers corrupted the checkpoint).

## Step 5 — pose-RMSD confirmation: co-folding reproduces the geometry (2026-06-24)

`scripts/pose_rmsd.py`: superpose predicted HDAC2 onto crystal 4LXZ (Ca), transform the predicted
vorinostat + Zn, compare poses (spyrmsd, in-place).

| metric | co-folding | classical Vina |
|---|---|---|
| protein fold, Ca RMSD over 366 res | **0.14 A** | — |
| **vorinostat pose RMSD** | **0.21 A** PASS | 7.9 A FAIL |
| catalytic Zn placement | 2.73 A (minor) | no metal term |

Co-folding reproduces the fold AND the vorinostat binding pose to **0.21 A (crystal-accurate)** on
the exact Zn-chelation case Vina was off by 7.9 A. HDAC2 is now validated on BOTH axes: affinity
(P(binder)=0.999) and geometry (0.21 A). Minor blemish: the Zn ion itself lands ~2.7 A off (ligand
perfect, metal slightly less). The structure-binding modality is comprehensively validated on its
decisive metal-coordination case.

## Step 6 — Phase 2 screen pilot (HDAC2): recovers the inhibitor class decisively (2026-06-24)

`modal run gpu/modal_app.py::screen --limit 24` — co-fold 24 drugs vs HDAC2 (+Zn), parallel
(~10-wide, cached weights), ranked by P(binder). ~$1.3.

**Top 9 = all HDAC inhibitors** (trichostatin-A, vorinostat, panobinostat, belinostat, scriptaid,
mocetinostat, entinostat, apicidin all P≥0.99; valproic-acid 0.91), then a clean drop to
hydroxyurea 0.78 and non-HDAC drugs sinking to dexamethasone 0.03. The model captures the
**structure-activity gradient**: potent Zn-chelating hydroxamates ~0.99 vs weak fatty-acid
valproate 0.91 vs DNMT inhibitors / IMiDs / steroid near 0. The screen recovers the right
pharmacological class at scale.

**Honest false negative:** romidepsin (potent HDAC inhibitor) ranked low (0.43) — it is a cyclic
depsipeptide PRODRUG (must be reduced to expose its Zn-binding thiol), which co-folding doesn't
model. The screen mishandles non-standard chemotypes (prodrugs, macrocycles).

The screening pipeline is validated. Next: run the full set (incl. the 240 random + negatives) to
hunt for NON-obvious HDAC2 binders (the actual discovery run), ~$15-20.

## Step 7 — Full 300-drug discovery screen vs HDAC2 (2026-06-26)

`modal run gpu/modal_app.py::screen` — 299 drugs co-folded vs HDAC2 (+Zn), ranked by P(binder).
Corrected pipeline: 1 MSA query (computed once, reused), 299/299 screened, 0 failures, ~$5-8.

**Scale validation:** HDAC inhibitors occupy ranks 1-9 (trichostatin-A, vorinostat, panobinostat,
belinostat, scriptaid, mocetinostat, entinostat, apicidin; all ≥0.99), weak valproic-acid demoted
to 0.90. The pilot held at 300.

**DECISIVE — specificity (the thing connectivity could NOT do):** the best-scoring negative
control is cetirizine at **rank 44, P=0.39**. All 26 negative controls (antifungals,
antihistamines, antibiotics, hormones) rank low — co-folding REJECTS the unrelated drugs. This is
the exact failure mode that sank the connectivity approach (negative controls ranked top there);
structure-based binding has the specificity expression-connectivity fundamentally lacked.

**Discovery hits (non-obvious high-P binders):** BG-1003 (rank 5, P=0.997, general_sample),
BRD-K14666757 (10, 0.968), JW55 (11, 0.936, a tankyrase inhibitor), FIT (13, 0.831). 11 drugs
score P>0.9 = 8 known HDAC inhibitors + 3 non-obvious. BG-1003 is the standout — a "random" LINCS
compound scoring as a near-certain HDAC2 binder above several known inhibitors.

**Honest caveats:** BG-1003 may be a known HDAC inhibitor that landed in the random sample (→
validation, not novelty) — needs an identity/literature check before any claim. Several hits are
unannotated BRD tool compounds. Binding != efficacy. The screen still mishandles prodrugs/
macrocycles (romidepsin-type false negatives). Full ranking: docs/results/screen_HDAC2_full.csv.

## Next steps
- [ ] Identity-check BG-1003 and the BRD hits (ChEMBL/literature): known HDAC binders or novel?
- [ ] Pose-RMSD the top non-obvious hits (geometry sanity, like vorinostat).
- [ ] Extend the screen to other validated HbF/hemoglobin targets; integrate with the expression layer.
- [ ] Investigate PKR: allosteric site may need the full assembly / better pocket definition.
- [ ] Phase 2 screen: rank the ~300-drug set against HDAC2 (the validated target) by P(binder);
  positive-control recovery test at screen scale.
- [ ] Add a one-time weight-warmup function so post-cache runs go back to fast parallel safely.
- [ ] (optional) Salvage one classical Vina case: PKR with FBP/Mg cofactors RETAINED, to confirm
  the harness can validate on a non-metal sickle target.
- [ ] Production receptor prep (Meeko mk_prepare_receptor + protonation) if staying with Vina.
- [ ] §12.9 generative beacon — only after a validated scoring function exists.

> **Hardware note:** this machine is **24 GB** unified memory (not the 96 GB PLAN §2 assumed),
> which caps local AF3-class model inference. Classical docking (above) is unaffected.