# Structure-based binding track — working notes Branch `structure-based-binding`. Implements PLAN §12. Baseline-first, start with the two cleanest targets (Hemoglobin + PKR), de-risk the harness before scaling. ## Status (2026-06-23) **Toolchain check (PLAN §12.6 pitfall 4, confirmed real):** - ✅ RDKit installs on ARM Mac — ligand side ready. - ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3) is **blocked on toolchain** — must resolve via conda/micromamba (`vina`/`smina`), a GPU AF3-class model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta. **Structures obtained:** `5E83` (hemoglobin + voxelotor), `8XFD` (PKR + mitapivat) in `data/raw/structures/`. **Step 0 — ligand-based retrieval baseline (`scripts/binding_ligand_baseline.py`):** RDKit Tanimoto of our 300 drugs vs known sickle binders. - Engine VALIDATED on in-set classes: `decitabine`→azacitidine (0.62); `vorinostat`→scriptaid (0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers. - But voxelotor / mitapivat have **no analog** in our set (max Tanimoto ~0.20–0.26). A 300-drug library is too sparse to contain look-alikes of distinctive scaffolds. **Takeaways:** 1. Ligand retrieval works but needs a **bigger drug library** to be useful for distinctive targets. 2. The targets without in-set analogs (Hb, PKR) need **actual docking** (§12.3) — which scores binding directly, no look-alike required. That is the gating next step, and it needs the toolchain solved. ## Next steps - [ ] Resolve the docking toolchain (recommend: micromamba + smina/vina, CPU, no GPU needed for baseline). - [ ] Dock the known binders (voxelotor→5E83, mitapivat→8XFD) as positive controls (§12.4 recovery test). - [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval to have reach. - [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock) vs the docking baseline; and §12.9 generative beacon.