Structure-binding track: scaffold + ligand-retrieval baseline

Start the structure-based binding branch (PLAN §12), baseline-first.

- src/binding.py: validated RDKit ligand retrieval (morgan_fp, tanimoto,
  retrieve_nearest = the §12.9 engine) + dock() stub documenting the
  blocked ARM-Mac toolchain
- scripts/binding_ligand_baseline.py: 300 drugs vs known binders
- docs/structure_binding_notes.md: status, toolchain blocker, next steps
- pyproject: [structure] extra (rdkit); data/raw/structures/ for PDBs

Step-0 finding: retrieval engine VALIDATED on in-set classes
(decitabine->azacitidine 0.62; vorinostat->scriptaid/belinostat) but the
distinctive binders voxelotor/mitapivat have no analog in our 300-drug
set (Tanimoto ~0.2). Needs (a) bigger library, (b) real docking (§12.3),
which is blocked on the ARM-Mac docking toolchain (§12.6 pitfall 4).
Structures 5E83 (Hb+voxelotor) and 8XFD (PKR+mitapivat) fetched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-23 23:53:27 +02:00
parent 6c2c71d73d
commit 817bcda7dc
5 changed files with 195 additions and 0 deletions

View File

@@ -33,6 +33,12 @@ dev = [
"pytest>=8.0",
"ruff>=0.5",
]
# Structure-based binding track (PLAN §12). Docking tool (vina/smina) is NOT pip-installable on
# ARM Mac — install via conda/micromamba or use a GPU AF3-class model; see docs/structure_binding_notes.md.
structure = [
"rdkit>=2024.3",
"requests>=2.31",
]
[tool.setuptools.packages.find]
where = ["."]