§12.4 pushed to its limit. Meeko ligand prep + in-place symmetry RMSD (spyrmsd, not obrms) on clean HDAC2/vorinostat: 7.9A -> 4.76A. Prep and metric mattered, but still FAIL. Residual cause is fundamental: vorinostat binds via hydroxamate-Zn chelation and Vina has no metal-coordination term. Real finding: sickle's druggable targets bind via non-classical chemistry classical docking handles poorly -- Hb (covalent), PKR (allosteric+cofactor), HDAC (Zn chelation). Vina is the wrong tool for this target landscape. Redirect: data-driven AF3-class co-folding (Boltz-2/Chai-1/DiffDock) handles these modes -- the indicated next tool, gated by the 24GB local memory ceiling (cloud GPU needed). The "GPU breaks all-local" §12.6 prediction is now the binding constraint of the track. Adds: scripts/dock_production.py; deps meeko, spyrmsd, gemmi. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
114 lines
6.5 KiB
Markdown
114 lines
6.5 KiB
Markdown
# Structure-based binding track — working notes
|
||
|
||
Branch `structure-based-binding`. Implements PLAN §12. Baseline-first, start with the two cleanest
|
||
targets (Hemoglobin + PKR), de-risk the harness before scaling.
|
||
|
||
## Status (2026-06-23)
|
||
|
||
**Toolchain check (PLAN §12.6 pitfall 4, confirmed real):**
|
||
- ✅ RDKit installs on ARM Mac — ligand side ready.
|
||
- ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3)
|
||
is **blocked on toolchain** — must resolve via conda/micromamba (`vina`/`smina`), a GPU AF3-class
|
||
model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.
|
||
|
||
**Structures obtained:** `5E83` (hemoglobin + voxelotor), `8XFD` (PKR + mitapivat) in
|
||
`data/raw/structures/`.
|
||
|
||
**Step 0 — ligand-based retrieval baseline (`scripts/binding_ligand_baseline.py`):**
|
||
RDKit Tanimoto of our 300 drugs vs known sickle binders.
|
||
- Engine VALIDATED on in-set classes: `decitabine`→azacitidine (0.62); `vorinostat`→scriptaid
|
||
(0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers.
|
||
- But voxelotor / mitapivat have **no analog** in our set (max Tanimoto ~0.20–0.26). A 300-drug
|
||
library is too sparse to contain look-alikes of distinctive scaffolds.
|
||
|
||
**Takeaways:**
|
||
1. Ligand retrieval works but needs a **bigger drug library** to be useful for distinctive targets.
|
||
2. The targets without in-set analogs (Hb, PKR) need **actual docking** (§12.3) — which scores
|
||
binding directly, no look-alike required. That is the gating next step, and it needs the
|
||
toolchain solved.
|
||
|
||
## Step 1 — docking baseline (2026-06-24)
|
||
|
||
**Toolchain SOLVED on ARM Mac:** AutoDock Vina 1.2.5 mac binary (`tools/vina`, runs under Rosetta)
|
||
+ open-babel (brew) for prep. Docking runs end-to-end (`scripts/dock_positive_controls.py`).
|
||
Co-crystal ligands identified: 5L7 = voxelotor (5E83), WV2 = mitapivat (8XFD).
|
||
|
||
**Positive-control cross-docking — inconclusive, and instructively so.** Affinities (kcal/mol):
|
||
|
||
| ligand | hemoglobin | PKR |
|
||
|---|---|---|
|
||
| voxelotor | −8.1 | −9.3 |
|
||
| mitapivat | −10.0 | −11.2 |
|
||
| decitabine | −6.6 | −7.0 |
|
||
| hydroxyurea | −3.9 | −3.6 |
|
||
| caffeine | −6.1 | −6.4 |
|
||
|
||
The scores rank almost perfectly by **molecular size** (mitapivat > voxelotor > decitabine/caffeine
|
||
> hydroxyurea) in *both* pockets — mitapivat wins even on hemoglobin, which it doesn't target. So
|
||
raw Vina affinity is confounded by ligand size and per-pocket stickiness; it cannot yet
|
||
distinguish target-specific binding. This is the **docking analog of the connectivity specificity
|
||
problem** — raw scores carry a systematic bias (size here, broadness there) that masquerades as
|
||
signal. voxelotor *does* dock to Hb (−8.1, a real score); the cross-target test just isn't the
|
||
right validation.
|
||
|
||
## Step 2 — redocking-RMSD validation FAILS across the board (2026-06-24)
|
||
|
||
Redocked each co-crystal ligand into its own structure (`scripts/dock_validate.py`); RMSD vs
|
||
crystal pose via obrms:
|
||
|
||
| redock | RMSD | note |
|
||
|---|---|---|
|
||
| voxelotor → Hb (5E83) | NA | covalent binder (Schiff base, αVal1) — out of scope §12.7 |
|
||
| mitapivat → PKR (8XFD) | 8.2 Å | allosteric, cofactor (FBP/Mg) stripped |
|
||
| **vorinostat → HDAC2 (4LXZ, Zn kept)** | **7.9 Å** | classical non-covalent target — should have worked |
|
||
|
||
**The clean target also failing means this is a systematic PIPELINE-QUALITY problem, not target
|
||
choice.** The cheap Vina + open-babel setup produces scores but does not reproduce known binding
|
||
geometry, so its affinities are not yet trustworthy. Ligand efficiency (affinity / heavy atoms)
|
||
also doesn't fix it — it over-corrects, ranking tiny hydroxyurea (−0.78) "best".
|
||
|
||
Likely causes (in priority order):
|
||
1. **Low-quality receptor prep** — open-babel `-xr` is not production docking prep. Need
|
||
AutoDockTools `prepare_receptor` or **Meeko** + `reduce`/pdb2pqr for protonation, charges, and
|
||
proper AutoDock atom typing.
|
||
2. **Ligand prep** — should use Meeko (correct rotatable bonds / typing), not bare obabel `--gen3d`.
|
||
3. **RMSD metric** — obrms superimposes before RMSD; redocking validation wants symmetry-corrected
|
||
RMSD **in place** (receptor frame). Worth confirming with an in-place metric.
|
||
|
||
**Honest takeaway:** consistent with the whole project — the *quick* version of each method runs
|
||
but doesn't survive honest validation. Credible structure-based docking needs production prep
|
||
tooling (Meeko/ADFR), which is the real next investment for this track.
|
||
|
||
## Step 3 — production prep helps, but classical docking is the wrong tool here (2026-06-24)
|
||
|
||
`scripts/dock_production.py`: Meeko ligand prep (proper rotatable-bond/AD typing) + in-place
|
||
symmetry-corrected RMSD (spyrmsd, not obrms which superimposes). On the clean HDAC2/vorinostat
|
||
target (Zn kept):
|
||
|
||
- **7.9 Å → 4.76 Å** with proper ligand prep + correct metric. Prep and metric genuinely mattered.
|
||
- But still FAIL (>2 Å). The residual is the deeper problem: **vorinostat binding is defined by its
|
||
hydroxamate chelating the catalytic Zn, and Vina has no metal-coordination term** — it cannot
|
||
score the interaction that determines the pose.
|
||
|
||
**The real finding: sickle's druggable targets bind via non-classical chemistry that classical
|
||
docking handles poorly** — Hb/voxelotor (covalent), PKR/mitapivat (allosteric + cofactor),
|
||
HDAC/vorinostat (Zn chelation). This is the target landscape, not bad luck. AutoDock Vina is the
|
||
wrong tool for it.
|
||
|
||
**Redirect:** the modality that DOES handle covalent/metal/induced-fit binding is **data-driven
|
||
AF3-class co-folding** (Boltz-2 / Chai-1 / DiffDock — they learn these modes from the PDB). That is
|
||
the indicated next tool for this disease — and it's gated by the **24 GB local memory ceiling**
|
||
(PLAN §12.6 pitfall 4): needs a cloud GPU or a bigger box. The "GPU breaks all-local" prediction is
|
||
now the binding constraint of the whole track.
|
||
|
||
## Next steps
|
||
- [ ] AF3-class co-folding on a GPU (Boltz-2 affinity / Chai-1 / DiffDock); redo the §12.4
|
||
positive-control recovery test there — it should handle the metal/covalent modes Vina can't.
|
||
- [ ] (optional) Salvage one classical Vina case: PKR with FBP/Mg cofactors RETAINED, to confirm
|
||
the harness can validate on a non-metal sickle target.
|
||
- [ ] Production receptor prep (Meeko mk_prepare_receptor + protonation) if staying with Vina.
|
||
- [ ] §12.9 generative beacon — only after a validated scoring function exists.
|
||
|
||
> **Hardware note:** this machine is **24 GB** unified memory (not the 96 GB PLAN §2 assumed),
|
||
> which caps local AF3-class model inference. Classical docking (above) is unaffected.
|