Structure-binding track: scaffold + ligand-retrieval baseline
Start the structure-based binding branch (PLAN §12), baseline-first. - src/binding.py: validated RDKit ligand retrieval (morgan_fp, tanimoto, retrieve_nearest = the §12.9 engine) + dock() stub documenting the blocked ARM-Mac toolchain - scripts/binding_ligand_baseline.py: 300 drugs vs known binders - docs/structure_binding_notes.md: status, toolchain blocker, next steps - pyproject: [structure] extra (rdkit); data/raw/structures/ for PDBs Step-0 finding: retrieval engine VALIDATED on in-set classes (decitabine->azacitidine 0.62; vorinostat->scriptaid/belinostat) but the distinctive binders voxelotor/mitapivat have no analog in our 300-drug set (Tanimoto ~0.2). Needs (a) bigger library, (b) real docking (§12.3), which is blocked on the ARM-Mac docking toolchain (§12.6 pitfall 4). Structures 5E83 (Hb+voxelotor) and 8XFD (PKR+mitapivat) fetched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
89
src/binding.py
Normal file
89
src/binding.py
Normal file
@@ -0,0 +1,89 @@
|
||||
"""Structure-based binding track (PLAN §12).
|
||||
|
||||
Two capabilities:
|
||||
- ligand-based retrieval (RDKit, works now): find existing drugs near a query molecule in
|
||||
chemical space — validated, and the engine behind §12.9 generative-guided retrieval.
|
||||
- structure-based docking (§12.3): score whether a ligand binds a target pocket. Blocked on an
|
||||
ARM-Mac docking toolchain (AutoDock Vina does not pip-install); see ``dock`` for options.
|
||||
|
||||
Caveat carried throughout: chemical similarity != activity, and docking != efficacy (§12.6).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from rdkit import Chem, DataStructs, RDLogger
|
||||
from rdkit.Chem import rdFingerprintGenerator
|
||||
|
||||
RDLogger.DisableLog("rdApp.*")
|
||||
_MORGAN = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
|
||||
|
||||
STRUCT_DIR = Path("data/raw/structures")
|
||||
|
||||
# Known sickle small-molecule binders, by target (positive controls for the §12.4 recovery test).
|
||||
KNOWN_BINDERS = {
|
||||
"hemoglobin": "voxelotor",
|
||||
"PKR": "mitapivat",
|
||||
"DNMT1": "decitabine",
|
||||
"HDAC": "vorinostat",
|
||||
}
|
||||
|
||||
# Curated target structures (PLAN §12.2). Add PDB ids as the harness grows.
|
||||
TARGET_PDB = {
|
||||
"hemoglobin": "5E83", # hemoglobin + voxelotor (GBT440)
|
||||
"PKR": "8XFD", # pyruvate kinase R + mitapivat
|
||||
}
|
||||
|
||||
|
||||
def morgan_fp(smiles: str):
|
||||
"""Morgan (ECFP4) fingerprint, or None for invalid / missing SMILES ('-666', '')."""
|
||||
if not isinstance(smiles, str) or smiles in ("-666", ""):
|
||||
return None
|
||||
mol = Chem.MolFromSmiles(smiles)
|
||||
return _MORGAN.GetFingerprint(mol) if mol else None
|
||||
|
||||
|
||||
def tanimoto(smiles_a: str, smiles_b: str) -> float | None:
|
||||
fa, fb = morgan_fp(smiles_a), morgan_fp(smiles_b)
|
||||
if fa is None or fb is None:
|
||||
return None
|
||||
return DataStructs.TanimotoSimilarity(fa, fb)
|
||||
|
||||
|
||||
def retrieve_nearest(
|
||||
query_smiles: str,
|
||||
library: dict[str, str],
|
||||
top_n: int = 5,
|
||||
) -> list[tuple[float, str]]:
|
||||
"""Rank a library of {name: smiles} by Tanimoto similarity to a query molecule.
|
||||
|
||||
This is the §12.9 retrieval step: the query may be a known binder (positive-control beacon)
|
||||
or a generated idealised binder; the returned existing drugs are repurposing candidates that
|
||||
STILL require docking/validation (similarity != activity).
|
||||
"""
|
||||
qfp = morgan_fp(query_smiles)
|
||||
if qfp is None:
|
||||
raise ValueError("invalid query SMILES")
|
||||
sims = []
|
||||
for name, smi in library.items():
|
||||
fp = morgan_fp(smi)
|
||||
if fp is not None:
|
||||
sims.append((DataStructs.TanimotoSimilarity(qfp, fp), name))
|
||||
return sorted(sims, reverse=True)[:top_n]
|
||||
|
||||
|
||||
def dock(target: str, ligand_smiles: str) -> float:
|
||||
"""Dock a ligand into a target pocket and return a binding score (PLAN §12.3).
|
||||
|
||||
Blocked: AutoDock Vina does not pip-install on ARM Mac and no docking binary is on PATH.
|
||||
Resolve the toolchain first (one of):
|
||||
- conda/micromamba: ``vina`` (conda-forge) or ``smina`` (bioconda), osx-arm64 builds
|
||||
- an AF3-class co-folding model on GPU: Boltz-2 / Chai-1 / DiffDock (also predicts affinity)
|
||||
- x86 Vina binary under Rosetta 2
|
||||
Then: fetch TARGET_PDB[target], define the pocket box, prep the ligand (Meeko), score.
|
||||
"""
|
||||
raise NotImplementedError(
|
||||
"Docking toolchain unresolved on ARM Mac (PLAN §12.6 pitfall 4 / §12.8). "
|
||||
"See docstring for options."
|
||||
)
|
||||
Reference in New Issue
Block a user