Files
Reverso/docs/structure_binding_notes.md
Junior B. 817bcda7dc Structure-binding track: scaffold + ligand-retrieval baseline
Start the structure-based binding branch (PLAN §12), baseline-first.

- src/binding.py: validated RDKit ligand retrieval (morgan_fp, tanimoto,
  retrieve_nearest = the §12.9 engine) + dock() stub documenting the
  blocked ARM-Mac toolchain
- scripts/binding_ligand_baseline.py: 300 drugs vs known binders
- docs/structure_binding_notes.md: status, toolchain blocker, next steps
- pyproject: [structure] extra (rdkit); data/raw/structures/ for PDBs

Step-0 finding: retrieval engine VALIDATED on in-set classes
(decitabine->azacitidine 0.62; vorinostat->scriptaid/belinostat) but the
distinctive binders voxelotor/mitapivat have no analog in our 300-drug
set (Tanimoto ~0.2). Needs (a) bigger library, (b) real docking (§12.3),
which is blocked on the ARM-Mac docking toolchain (§12.6 pitfall 4).
Structures 5E83 (Hb+voxelotor) and 8XFD (PKR+mitapivat) fetched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 23:53:27 +02:00

35 lines
1.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Structure-based binding track — working notes
Branch `structure-based-binding`. Implements PLAN §12. Baseline-first, start with the two cleanest
targets (Hemoglobin + PKR), de-risk the harness before scaling.
## Status (2026-06-23)
**Toolchain check (PLAN §12.6 pitfall 4, confirmed real):**
- ✅ RDKit installs on ARM Mac — ligand side ready.
- ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3)
is **blocked on toolchain** — must resolve via conda/micromamba (`vina`/`smina`), a GPU AF3-class
model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.
**Structures obtained:** `5E83` (hemoglobin + voxelotor), `8XFD` (PKR + mitapivat) in
`data/raw/structures/`.
**Step 0 — ligand-based retrieval baseline (`scripts/binding_ligand_baseline.py`):**
RDKit Tanimoto of our 300 drugs vs known sickle binders.
- Engine VALIDATED on in-set classes: `decitabine`→azacitidine (0.62); `vorinostat`→scriptaid
(0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers.
- But voxelotor / mitapivat have **no analog** in our set (max Tanimoto ~0.200.26). A 300-drug
library is too sparse to contain look-alikes of distinctive scaffolds.
**Takeaways:**
1. Ligand retrieval works but needs a **bigger drug library** to be useful for distinctive targets.
2. The targets without in-set analogs (Hb, PKR) need **actual docking** (§12.3) — which scores
binding directly, no look-alike required. That is the gating next step, and it needs the
toolchain solved.
## Next steps
- [ ] Resolve the docking toolchain (recommend: micromamba + smina/vina, CPU, no GPU needed for baseline).
- [ ] Dock the known binders (voxelotor→5E83, mitapivat→8XFD) as positive controls (§12.4 recovery test).
- [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval to have reach.
- [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock) vs the docking baseline; and §12.9 generative beacon.