Files
Reverso/docs/structure_binding_notes.md
Junior B. 817bcda7dc Structure-binding track: scaffold + ligand-retrieval baseline
Start the structure-based binding branch (PLAN §12), baseline-first.

- src/binding.py: validated RDKit ligand retrieval (morgan_fp, tanimoto,
  retrieve_nearest = the §12.9 engine) + dock() stub documenting the
  blocked ARM-Mac toolchain
- scripts/binding_ligand_baseline.py: 300 drugs vs known binders
- docs/structure_binding_notes.md: status, toolchain blocker, next steps
- pyproject: [structure] extra (rdkit); data/raw/structures/ for PDBs

Step-0 finding: retrieval engine VALIDATED on in-set classes
(decitabine->azacitidine 0.62; vorinostat->scriptaid/belinostat) but the
distinctive binders voxelotor/mitapivat have no analog in our 300-drug
set (Tanimoto ~0.2). Needs (a) bigger library, (b) real docking (§12.3),
which is blocked on the ARM-Mac docking toolchain (§12.6 pitfall 4).
Structures 5E83 (Hb+voxelotor) and 8XFD (PKR+mitapivat) fetched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 23:53:27 +02:00

1.9 KiB
Raw Permalink Blame History

Structure-based binding track — working notes

Branch structure-based-binding. Implements PLAN §12. Baseline-first, start with the two cleanest targets (Hemoglobin + PKR), de-risk the harness before scaling.

Status (2026-06-23)

Toolchain check (PLAN §12.6 pitfall 4, confirmed real):

  • RDKit installs on ARM Mac — ligand side ready.
  • AutoDock Vina does NOT pip-install on ARM Mac; no docking binary available. Docking (§12.3) is blocked on toolchain — must resolve via conda/micromamba (vina/smina), a GPU AF3-class model (Boltz-2/Chai-1/DiffDock), or an x86 Vina binary under Rosetta.

Structures obtained: 5E83 (hemoglobin + voxelotor), 8XFD (PKR + mitapivat) in data/raw/structures/.

Step 0 — ligand-based retrieval baseline (scripts/binding_ligand_baseline.py): RDKit Tanimoto of our 300 drugs vs known sickle binders.

  • Engine VALIDATED on in-set classes: decitabine→azacitidine (0.62); vorinostat→scriptaid (0.42), belinostat (0.28). Correctly clusters DNMT1 / HDAC HbF-inducers.
  • But voxelotor / mitapivat have no analog in our set (max Tanimoto ~0.200.26). A 300-drug library is too sparse to contain look-alikes of distinctive scaffolds.

Takeaways:

  1. Ligand retrieval works but needs a bigger drug library to be useful for distinctive targets.
  2. The targets without in-set analogs (Hb, PKR) need actual docking (§12.3) — which scores binding directly, no look-alike required. That is the gating next step, and it needs the toolchain solved.

Next steps

  • Resolve the docking toolchain (recommend: micromamba + smina/vina, CPU, no GPU needed for baseline).
  • Dock the known binders (voxelotor→5E83, mitapivat→8XFD) as positive controls (§12.4 recovery test).
  • Expand the ligand library (full ChEMBL/LINCS) for retrieval to have reach.
  • Only then: AF3-class co-folding (Boltz-2/DiffDock) vs the docking baseline; and §12.9 generative beacon.