GPU plan: ephemeral serverless co-folding (Modal) + app skeleton

docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding (Boltz-2/DiffDock) on a GPU then paying nothing when idle. - Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the GPU + model weights are heavy -> serverless is ideal. - Recommend Modal (per-second billing, scales to zero = nothing to kill); RunPod as the SSH-box alternative with idle auto-terminate. - Lifecycle: image -> weights Volume (cache, don't re-download) -> run -> git push small results -> teardown automatic. - Phase 1 validate on 3 known binders (~$1) before paying for a screen; Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15. gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold() function, local entrypoint); boltz invocation stubbed with TODOs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 16:45:04 +02:00
parent 7c6cef1aef
commit 08ed713cc8
2 changed files with 153 additions and 0 deletions
--- a/docs/gpu_plan.md
+++ b/docs/gpu_plan.md
@@ -0,0 +1,86 @@
+# GPU plan — ephemeral AF3-class co-folding for the binding track
+
+Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track
+(§12), then **pay nothing when idle**. The work is *bursty* (a validation run, then a screen),
+the inputs are *tiny*, so the design optimises for zero idle cost, not for a persistent box.
+
+## What actually has to move (small!)
+
+| Thing | Size | How it gets to the GPU |
+|---|---|---|
+| Code | KB | `git clone` the gitea repo (or mount) |
+| Target structures (PDBs) | a few MB | in the repo / git |
+| Ligands (SMILES, drug set) | KB | in the repo / git |
+| **Model weights** (Boltz-2 / DiffDock) | **~2–6 GB** | downloaded once, **cached in a persistent volume** |
+| Results (poses, scores, RMSDs) | KB–MB | `git push` back / download |
+
+The 27 GB LINCS data is **not** part of this track — nothing big to upload. The only thing worth
+persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).
+
+## Provider choice
+
+| Option | Billing | Idle cost | "Kill" model | Best for |
+|---|---|---|---|---|
+| **Modal** (recommended) | per-second | **$0 (scales to zero)** | automatic — nothing to remember | bursty batch runs |
+| RunPod | per-minute, on-demand/spot | only while pod exists | manual `terminate` | interactive SSH box |
+| Vast.ai | per-minute spot | only while rented | manual destroy | cheapest, more fiddly |
+| Lambda / AWS-GCP spot | per-hour/second | until you stop it | manual stop | if you already have credits |
+
+**Recommend Modal.** You define the run as a function with a `gpu=` decorator; on call it
+provisions the GPU, runs, and releases it — **there is no GPU to forget to kill**, and you pay only
+for the seconds it ran. That *is* the "kill the GPU to save money" requirement, automated.
+
+## The lifecycle (Modal)
+
+1. **Define image** once: CUDA base + `boltz` (or DiffDock) + `rdkit`, `meeko`, `spyrmsd`, `gemmi`.
+2. **Weights volume**: `modal.Volume` mounted at `/weights`; the model downloads into it on first
+   run and is cached forever after (no re-download cost).
+3. **Run**: `modal run gpu/modal_app.py` → provisions GPU → runs the test → returns results to your
+   laptop. GPU released the moment the function returns.
+4. **Persist results**: write the returned scores/poses into `data/processed/binding/` and
+   `git commit` (small, text).
+5. **Teardown**: nothing to do — Modal scaled to zero. `modal app stop` only if a run is hung.
+
+RunPod alternative (if you want an interactive box): start pod → `git clone` → run → `git push`
+results → **`runpodctl remove pod <id>`** (or Stop in the UI). Set an **idle auto-terminate** so a
+forgotten box can't bleed money.
+
+## What runs on the GPU (in cost order — cheap validation first)
+
+- **Phase 1 — modality validation (~minutes, ~$1):** co-fold the 3 known binders into their
+  targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose
+  (RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is
+  real; if not, stop before spending on a screen.
+- **Phase 2 — screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset)
+  against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control
+  recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`.
+
+## Model choice
+
+- **Boltz-2** (MIT, pip-installable) — predicts the protein–ligand complex **and** a binding
+  affinity → directly gives a rankable score. Primary choice. Fits a 24–40 GB GPU for these
+  single-domain targets.
+- **DiffDock-L** — lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight.
+- GPU: an **L4 (24 GB, ~$0.6–0.8/hr)** or **A10/L40S (24–48 GB)** is plenty; no multi-GPU, no A100
+  needed for these sizes.
+
+## Cost controls (the save-money checklist)
+
+- Serverless (Modal) → **zero idle cost** by construction; or an **idle-timeout** auto-kill on a box.
+- **Cache weights** in a persistent volume — re-downloading 5 GB on a $1/hr GPU is wasted money.
+- **Validate on one target before screening** — don't pay for a 300-drug screen until Phase 1 passes.
+- Prefer **spot/interruptible** for the batch screen (Phase 2 is restartable).
+- Keep prep (Meeko/RDKit) and result-scoring on the **laptop**; only the model forward pass needs GPU.
+- Estimated total to validate + a first screen: **~$5–15**, not a standing bill.
+
+## Repo integration
+
+- `gpu/modal_app.py` — the Modal app (skeleton committed alongside this plan).
+- Results land in `data/processed/binding/` (gitignored) + a small committed summary.
+- Pin model + weights version in the image for reproducibility.
+
+## Next step
+
+Scaffold `gpu/modal_app.py` for Phase-1 validation (3 known binders), do a dry run locally
+(`modal run --detach`? no — just `modal run`), confirm cost, then Phase 2. Requires a Modal account
+ `pip install modal` + `modal token new` (one-time auth).