Phase 1 co-folding WORKS: HDAC2/Zn validated (Boltz-2 on Modal)

First clear positive result in the project. Ran Phase 1 on Modal L4
(~$0.70). Boltz-2 P(binder), cofactors co-folded:
- HDAC2 (+Zn): vorinostat 0.9994 vs negatives ~0.1 -> PASS, decisive
- hemoglobin (+heme): voxelotor 0.46 -> PASS (weak; covalent/tetramer)
- PKR (+FBP/Mg): mitapivat 0.32 < hydroxyurea 0.40 -> FAIL (allosteric)

HDAC2/Zn is the exact case classical Vina failed (no metal term, 7.9A
redock). Co-folding handles the Zn-chelation chemistry -> the structure-
binding modality pivot (PLAN §12) is validated on its decisive test.

Engineering fixes that got it running: image needs cuequivariance kernels;
max_containers=1 so weights download once (parallel corrupted the shared-
Volume checkpoint); rank by P(binder) not affinity_pred_value (sign).

Adds docs/results/phase1_affinity.csv (committed; raw under data/ gitignored).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-24 19:41:19 +02:00
parent 07705a5884
commit c891a78541
3 changed files with 52 additions and 10 deletions

View File

@@ -27,7 +27,8 @@ app = modal.App("reverso-binding")
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("git", "wget")
.pip_install("boltz", "rdkit", "numpy")
# Boltz-2 needs NVIDIA cuequivariance kernels (cuda 12) for inference, plus rdkit/numpy.
.pip_install("boltz", "cuequivariance-torch", "cuequivariance-ops-torch-cu12", "rdkit", "numpy")
)
weights = modal.Volume.from_name("reverso-binding-weights", create_if_missing=True)
WEIGHTS = "/weights"
@@ -119,7 +120,9 @@ def build_boltz_yaml(protein_seq: str, ligand_smiles: str, cofactor_ccds: list[s
# ------------------------------------------------------------------------------- GPU function
@app.function(gpu="L4", image=image, volumes={WEIGHTS: weights}, timeout=3600)
# max_containers=1: run the inputs serially on one warm container so the weights download ONCE
# (no concurrent-download race that corrupts the checkpoint) and are reused for the rest.
@app.function(gpu="L4", image=image, volumes={WEIGHTS: weights}, timeout=3600, max_containers=1)
def cofold(label: str, protein_seq: str, ligand_smiles: str, cofactor_ccds: list[str]) -> dict:
"""Co-fold one complex (protein + drug + cofactors) on the GPU; return affinity + P(binder).
@@ -136,12 +139,14 @@ def cofold(label: str, protein_seq: str, ligand_smiles: str, cofactor_ccds: list
work.mkdir(parents=True, exist_ok=True)
(work / "in.yaml").write_text(build_boltz_yaml(protein_seq, ligand_smiles, cofactor_ccds))
out = work / "out"
subprocess.run(
["boltz", "predict", str(work / "in.yaml"), "--use_msa_server",
"--cache", boltz_cache, "--out_dir", str(out), "--output_format", "pdb"],
check=True,
)
weights.commit() # persist anything newly downloaded
try:
subprocess.run(
["boltz", "predict", str(work / "in.yaml"), "--use_msa_server",
"--cache", boltz_cache, "--out_dir", str(out), "--output_format", "pdb"],
check=True,
)
finally:
weights.commit() # persist downloaded weights/CCD even if this run fails, so retries skip it
# Affinity is written to a JSON under out/predictions/<name>/; parse defensively (keys vary).
aff = {"affinity_pred_value": None, "affinity_probability_binary": None}