Start the structure-based binding branch (PLAN §12), baseline-first. - src/binding.py: validated RDKit ligand retrieval (morgan_fp, tanimoto, retrieve_nearest = the §12.9 engine) + dock() stub documenting the blocked ARM-Mac toolchain - scripts/binding_ligand_baseline.py: 300 drugs vs known binders - docs/structure_binding_notes.md: status, toolchain blocker, next steps - pyproject: [structure] extra (rdkit); data/raw/structures/ for PDBs Step-0 finding: retrieval engine VALIDATED on in-set classes (decitabine->azacitidine 0.62; vorinostat->scriptaid/belinostat) but the distinctive binders voxelotor/mitapivat have no analog in our 300-drug set (Tanimoto ~0.2). Needs (a) bigger library, (b) real docking (§12.3), which is blocked on the ARM-Mac docking toolchain (§12.6 pitfall 4). Structures 5E83 (Hb+voxelotor) and 8XFD (PKR+mitapivat) fetched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1.9 KiB
1.9 KiB
Structure-based binding track — working notes
Branch structure-based-binding. Implements PLAN §12. Baseline-first, start with the two cleanest
targets (Hemoglobin + PKR), de-risk the harness before scaling.
Status (2026-06-23)
Toolchain check (PLAN §12.6 pitfall 4, confirmed real):
- ✅ RDKit installs on ARM Mac — ligand side ready.
- ❌ AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3)
is blocked on toolchain — must resolve via conda/micromamba (
vina/smina), a GPU AF3-class model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.
Structures obtained: 5E83 (hemoglobin + voxelotor), 8XFD (PKR + mitapivat) in
data/raw/structures/.
Step 0 — ligand-based retrieval baseline (scripts/binding_ligand_baseline.py):
RDKit Tanimoto of our 300 drugs vs known sickle binders.
- Engine VALIDATED on in-set classes:
decitabine→azacitidine (0.62);vorinostat→scriptaid (0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers. - But voxelotor / mitapivat have no analog in our set (max Tanimoto ~0.20–0.26). A 300-drug library is too sparse to contain look-alikes of distinctive scaffolds.
Takeaways:
- Ligand retrieval works but needs a bigger drug library to be useful for distinctive targets.
- The targets without in-set analogs (Hb, PKR) need actual docking (§12.3) — which scores binding directly, no look-alike required. That is the gating next step, and it needs the toolchain solved.
Next steps
- Resolve the docking toolchain (recommend: micromamba + smina/vina, CPU, no GPU needed for baseline).
- Dock the known binders (voxelotor→5E83, mitapivat→8XFD) as positive controls (§12.4 recovery test).
- Expand the ligand library (full ChEMBL/LINCS) for retrieval to have reach.
- Only then: AF3-class co-folding (Boltz-2/DiffDock) vs the docking baseline; and §12.9 generative beacon.