From 08ed713cc8714761fff99768e006feec7626a4b0 Mon Sep 17 00:00:00 2001 From: "Junior B." Date: Wed, 24 Jun 2026 16:45:04 +0200 Subject: [PATCH] GPU plan: ephemeral serverless co-folding (Modal) + app skeleton docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding (Boltz-2/DiffDock) on a GPU then paying nothing when idle. - Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the GPU + model weights are heavy -> serverless is ideal. - Recommend Modal (per-second billing, scales to zero = nothing to kill); RunPod as the SSH-box alternative with idle auto-terminate. - Lifecycle: image -> weights Volume (cache, don't re-download) -> run -> git push small results -> teardown automatic. - Phase 1 validate on 3 known binders (~$1) before paying for a screen; Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15. gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold() function, local entrypoint); boltz invocation stubbed with TODOs. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/gpu_plan.md | 86 ++++++++++++++++++++++++++++++++++++++++++++++++ gpu/modal_app.py | 67 +++++++++++++++++++++++++++++++++++++ 2 files changed, 153 insertions(+) create mode 100644 docs/gpu_plan.md create mode 100644 gpu/modal_app.py diff --git a/docs/gpu_plan.md b/docs/gpu_plan.md new file mode 100644 index 0000000..0741261 --- /dev/null +++ b/docs/gpu_plan.md @@ -0,0 +1,86 @@ +# GPU plan — ephemeral AF3-class co-folding for the binding track + +Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track +(§12), then **pay nothing when idle**. The work is *bursty* (a validation run, then a screen), +the inputs are *tiny*, so the design optimises for zero idle cost, not for a persistent box. + +## What actually has to move (small!) + +| Thing | Size | How it gets to the GPU | +|---|---|---| +| Code | KB | `git clone` the gitea repo (or mount) | +| Target structures (PDBs) | a few MB | in the repo / git | +| Ligands (SMILES, drug set) | KB | in the repo / git | +| **Model weights** (Boltz-2 / DiffDock) | **~2–6 GB** | downloaded once, **cached in a persistent volume** | +| Results (poses, scores, RMSDs) | KB–MB | `git push` back / download | + +The 27 GB LINCS data is **not** part of this track — nothing big to upload. The only thing worth +persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run). + +## Provider choice + +| Option | Billing | Idle cost | "Kill" model | Best for | +|---|---|---|---|---| +| **Modal** (recommended) | per-second | **$0 (scales to zero)** | automatic — nothing to remember | bursty batch runs | +| RunPod | per-minute, on-demand/spot | only while pod exists | manual `terminate` | interactive SSH box | +| Vast.ai | per-minute spot | only while rented | manual destroy | cheapest, more fiddly | +| Lambda / AWS-GCP spot | per-hour/second | until you stop it | manual stop | if you already have credits | + +**Recommend Modal.** You define the run as a function with a `gpu=` decorator; on call it +provisions the GPU, runs, and releases it — **there is no GPU to forget to kill**, and you pay only +for the seconds it ran. That *is* the "kill the GPU to save money" requirement, automated. + +## The lifecycle (Modal) + +1. **Define image** once: CUDA base + `boltz` (or DiffDock) + `rdkit`, `meeko`, `spyrmsd`, `gemmi`. +2. **Weights volume**: `modal.Volume` mounted at `/weights`; the model downloads into it on first + run and is cached forever after (no re-download cost). +3. **Run**: `modal run gpu/modal_app.py` → provisions GPU → runs the test → returns results to your + laptop. GPU released the moment the function returns. +4. **Persist results**: write the returned scores/poses into `data/processed/binding/` and + `git commit` (small, text). +5. **Teardown**: nothing to do — Modal scaled to zero. `modal app stop` only if a run is hung. + +RunPod alternative (if you want an interactive box): start pod → `git clone` → run → `git push` +results → **`runpodctl remove pod `** (or Stop in the UI). Set an **idle auto-terminate** so a +forgotten box can't bleed money. + +## What runs on the GPU (in cost order — cheap validation first) + +- **Phase 1 — modality validation (~minutes, ~$1):** co-fold the 3 known binders into their + targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose + (RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is + real; if not, stop before spending on a screen. +- **Phase 2 — screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset) + against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control + recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`. + +## Model choice + +- **Boltz-2** (MIT, pip-installable) — predicts the protein–ligand complex **and** a binding + affinity → directly gives a rankable score. Primary choice. Fits a 24–40 GB GPU for these + single-domain targets. +- **DiffDock-L** — lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight. +- GPU: an **L4 (24 GB, ~$0.6–0.8/hr)** or **A10/L40S (24–48 GB)** is plenty; no multi-GPU, no A100 + needed for these sizes. + +## Cost controls (the save-money checklist) + +- Serverless (Modal) → **zero idle cost** by construction; or an **idle-timeout** auto-kill on a box. +- **Cache weights** in a persistent volume — re-downloading 5 GB on a $1/hr GPU is wasted money. +- **Validate on one target before screening** — don't pay for a 300-drug screen until Phase 1 passes. +- Prefer **spot/interruptible** for the batch screen (Phase 2 is restartable). +- Keep prep (Meeko/RDKit) and result-scoring on the **laptop**; only the model forward pass needs GPU. +- Estimated total to validate + a first screen: **~$5–15**, not a standing bill. + +## Repo integration + +- `gpu/modal_app.py` — the Modal app (skeleton committed alongside this plan). +- Results land in `data/processed/binding/` (gitignored) + a small committed summary. +- Pin model + weights version in the image for reproducibility. + +## Next step + +Scaffold `gpu/modal_app.py` for Phase-1 validation (3 known binders), do a dry run locally +(`modal run --detach`? no — just `modal run`), confirm cost, then Phase 2. Requires a Modal account ++ `pip install modal` + `modal token new` (one-time auth). diff --git a/gpu/modal_app.py b/gpu/modal_app.py new file mode 100644 index 0000000..94b5a6f --- /dev/null +++ b/gpu/modal_app.py @@ -0,0 +1,67 @@ +"""Ephemeral GPU runner for AF3-class co-folding (PLAN §12, docs/gpu_plan.md). + +Serverless: `modal run gpu/modal_app.py` provisions a GPU, runs, and releases it — zero idle cost, +nothing to remember to kill. Model weights are cached in a persistent Volume so we never re-pay GPU +time to re-download them. Prep (Meeko/RDKit) and RMSD scoring (spyrmsd) stay light; only the model +forward pass needs the GPU. + +Setup (one-time): `pip install modal && modal token new`. +Run Phase 1 (validate on 3 known binders): `modal run gpu/modal_app.py`. + +STATUS: scaffold. The boltz invocation (input spec + output parsing) is stubbed where marked TODO; +wire it after a first `modal run` confirms the image builds and the GPU is reachable. +""" + +from __future__ import annotations + +import modal + +app = modal.App("reverso-binding") + +# CUDA image + AF3-class model (Boltz-2) + light prep/scoring deps. +image = ( + modal.Image.debian_slim(python_version="3.12") + .apt_install("git", "wget") + .pip_install("boltz", "rdkit", "meeko", "spyrmsd", "gemmi", "numpy") +) + +# Persist model weights across runs so we download them once, not every GPU-billed run. +weights = modal.Volume.from_name("reverso-binding-weights", create_if_missing=True) + +# Known binders -> (PDB id, crystal ligand resname, SMILES placeholder filled by caller). +# Phase 1 validation: does co-folding reproduce these crystal poses where Vina failed? +KNOWN = { + "voxelotor_Hb": ("5E83", "5L7"), + "mitapivat_PKR": ("8XFD", "WV2"), + "vorinostat_HDAC2": ("4LXZ", "SHH"), +} + + +@app.function(gpu="L4", image=image, volumes={"/weights": weights}, timeout=3600) +def cofold(protein_seq: str, ligand_smiles: str, weights_dir: str = "/weights") -> dict: + """Co-fold one protein+ligand complex and return predicted affinity + pose (PDB string). + + Runs on the GPU only for this call, then the GPU is released. TODO: replace the stub with the + actual Boltz-2 invocation (write the YAML/FASTA input spec, call `boltz predict + --use_msa_server --out_dir ... --cache /weights`, parse the predicted structure + affinity). + """ + import subprocess # noqa: F401 (used once boltz is wired) + + # TODO: build boltz input (protein_seq + ligand_smiles), run, parse pose+affinity. + raise NotImplementedError("Wire Boltz-2 here; see docs/gpu_plan.md Phase 1.") + + +@app.local_entrypoint() +def main() -> None: + """Phase 1 driver (runs locally; only cofold() touches the GPU). + + Pulls target sequences + ligand SMILES from the repo, fans out one GPU call per known binder, + scores redocking RMSD vs the crystal pose locally (spyrmsd), and prints pass/fail. Results are + tiny — commit a summary into data/processed/binding/. + """ + # TODO: load protein sequences from data/raw/structures/.pdb (gemmi) and ligand SMILES + # (PubChem / drug_set), then: + # results = list(cofold.map(seqs, smiles)) + # and compute in-place spyrmsd RMSD vs the crystal ligand for each. + print("Scaffold: fill in sequence/SMILES loading + cofold.map, then score RMSD. " + "See docs/gpu_plan.md.")