Structure-binding track: scaffold + ligand-retrieval baseline
Start the structure-based binding branch (PLAN §12), baseline-first. - src/binding.py: validated RDKit ligand retrieval (morgan_fp, tanimoto, retrieve_nearest = the §12.9 engine) + dock() stub documenting the blocked ARM-Mac toolchain - scripts/binding_ligand_baseline.py: 300 drugs vs known binders - docs/structure_binding_notes.md: status, toolchain blocker, next steps - pyproject: [structure] extra (rdkit); data/raw/structures/ for PDBs Step-0 finding: retrieval engine VALIDATED on in-set classes (decitabine->azacitidine 0.62; vorinostat->scriptaid/belinostat) but the distinctive binders voxelotor/mitapivat have no analog in our 300-drug set (Tanimoto ~0.2). Needs (a) bigger library, (b) real docking (§12.3), which is blocked on the ARM-Mac docking toolchain (§12.6 pitfall 4). Structures 5E83 (Hb+voxelotor) and 8XFD (PKR+mitapivat) fetched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
34
docs/structure_binding_notes.md
Normal file
34
docs/structure_binding_notes.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Structure-based binding track — working notes
|
||||
|
||||
Branch `structure-based-binding`. Implements PLAN §12. Baseline-first, start with the two cleanest
|
||||
targets (Hemoglobin + PKR), de-risk the harness before scaling.
|
||||
|
||||
## Status (2026-06-23)
|
||||
|
||||
**Toolchain check (PLAN §12.6 pitfall 4, confirmed real):**
|
||||
- ✅ RDKit installs on ARM Mac — ligand side ready.
|
||||
- ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3)
|
||||
is **blocked on toolchain** — must resolve via conda/micromamba (`vina`/`smina`), a GPU AF3-class
|
||||
model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.
|
||||
|
||||
**Structures obtained:** `5E83` (hemoglobin + voxelotor), `8XFD` (PKR + mitapivat) in
|
||||
`data/raw/structures/`.
|
||||
|
||||
**Step 0 — ligand-based retrieval baseline (`scripts/binding_ligand_baseline.py`):**
|
||||
RDKit Tanimoto of our 300 drugs vs known sickle binders.
|
||||
- Engine VALIDATED on in-set classes: `decitabine`→azacitidine (0.62); `vorinostat`→scriptaid
|
||||
(0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers.
|
||||
- But voxelotor / mitapivat have **no analog** in our set (max Tanimoto ~0.20–0.26). A 300-drug
|
||||
library is too sparse to contain look-alikes of distinctive scaffolds.
|
||||
|
||||
**Takeaways:**
|
||||
1. Ligand retrieval works but needs a **bigger drug library** to be useful for distinctive targets.
|
||||
2. The targets without in-set analogs (Hb, PKR) need **actual docking** (§12.3) — which scores
|
||||
binding directly, no look-alike required. That is the gating next step, and it needs the
|
||||
toolchain solved.
|
||||
|
||||
## Next steps
|
||||
- [ ] Resolve the docking toolchain (recommend: micromamba + smina/vina, CPU, no GPU needed for baseline).
|
||||
- [ ] Dock the known binders (voxelotor→5E83, mitapivat→8XFD) as positive controls (§12.4 recovery test).
|
||||
- [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval to have reach.
|
||||
- [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock) vs the docking baseline; and §12.9 generative beacon.
|
||||
Reference in New Issue
Block a user