GPU plan: ephemeral serverless co-folding (Modal) + app skeleton

docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding
(Boltz-2/DiffDock) on a GPU then paying nothing when idle.
- Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the
  GPU + model weights are heavy -> serverless is ideal.
- Recommend Modal (per-second billing, scales to zero = nothing to kill);
  RunPod as the SSH-box alternative with idle auto-terminate.
- Lifecycle: image -> weights Volume (cache, don't re-download) -> run ->
  git push small results -> teardown automatic.
- Phase 1 validate on 3 known binders (~$1) before paying for a screen;
  Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15.

gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold()
function, local entrypoint); boltz invocation stubbed with TODOs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-24 16:45:04 +02:00
parent 7c6cef1aef
commit 08ed713cc8
2 changed files with 153 additions and 0 deletions

86
docs/gpu_plan.md Normal file
View File

@@ -0,0 +1,86 @@
# GPU plan — ephemeral AF3-class co-folding for the binding track
Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track
(§12), then **pay nothing when idle**. The work is *bursty* (a validation run, then a screen),
the inputs are *tiny*, so the design optimises for zero idle cost, not for a persistent box.
## What actually has to move (small!)
| Thing | Size | How it gets to the GPU |
|---|---|---|
| Code | KB | `git clone` the gitea repo (or mount) |
| Target structures (PDBs) | a few MB | in the repo / git |
| Ligands (SMILES, drug set) | KB | in the repo / git |
| **Model weights** (Boltz-2 / DiffDock) | **~26 GB** | downloaded once, **cached in a persistent volume** |
| Results (poses, scores, RMSDs) | KBMB | `git push` back / download |
The 27 GB LINCS data is **not** part of this track — nothing big to upload. The only thing worth
persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).
## Provider choice
| Option | Billing | Idle cost | "Kill" model | Best for |
|---|---|---|---|---|
| **Modal** (recommended) | per-second | **$0 (scales to zero)** | automatic — nothing to remember | bursty batch runs |
| RunPod | per-minute, on-demand/spot | only while pod exists | manual `terminate` | interactive SSH box |
| Vast.ai | per-minute spot | only while rented | manual destroy | cheapest, more fiddly |
| Lambda / AWS-GCP spot | per-hour/second | until you stop it | manual stop | if you already have credits |
**Recommend Modal.** You define the run as a function with a `gpu=` decorator; on call it
provisions the GPU, runs, and releases it — **there is no GPU to forget to kill**, and you pay only
for the seconds it ran. That *is* the "kill the GPU to save money" requirement, automated.
## The lifecycle (Modal)
1. **Define image** once: CUDA base + `boltz` (or DiffDock) + `rdkit`, `meeko`, `spyrmsd`, `gemmi`.
2. **Weights volume**: `modal.Volume` mounted at `/weights`; the model downloads into it on first
run and is cached forever after (no re-download cost).
3. **Run**: `modal run gpu/modal_app.py` → provisions GPU → runs the test → returns results to your
laptop. GPU released the moment the function returns.
4. **Persist results**: write the returned scores/poses into `data/processed/binding/` and
`git commit` (small, text).
5. **Teardown**: nothing to do — Modal scaled to zero. `modal app stop` only if a run is hung.
RunPod alternative (if you want an interactive box): start pod → `git clone` → run → `git push`
results → **`runpodctl remove pod <id>`** (or Stop in the UI). Set an **idle auto-terminate** so a
forgotten box can't bleed money.
## What runs on the GPU (in cost order — cheap validation first)
- **Phase 1 — modality validation (~minutes, ~$1):** co-fold the 3 known binders into their
targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose
(RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is
real; if not, stop before spending on a screen.
- **Phase 2 screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset)
against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control
recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`.
## Model choice
- **Boltz-2** (MIT, pip-installable) predicts the proteinligand complex **and** a binding
affinity directly gives a rankable score. Primary choice. Fits a 2440 GB GPU for these
single-domain targets.
- **DiffDock-L** lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight.
- GPU: an **L4 (24 GB, ~$0.60.8/hr)** or **A10/L40S (2448 GB)** is plenty; no multi-GPU, no A100
needed for these sizes.
## Cost controls (the save-money checklist)
- Serverless (Modal) **zero idle cost** by construction; or an **idle-timeout** auto-kill on a box.
- **Cache weights** in a persistent volume re-downloading 5 GB on a $1/hr GPU is wasted money.
- **Validate on one target before screening** don't pay for a 300-drug screen until Phase 1 passes.
- Prefer **spot/interruptible** for the batch screen (Phase 2 is restartable).
- Keep prep (Meeko/RDKit) and result-scoring on the **laptop**; only the model forward pass needs GPU.
- Estimated total to validate + a first screen: **~$515**, not a standing bill.
## Repo integration
- `gpu/modal_app.py` the Modal app (skeleton committed alongside this plan).
- Results land in `data/processed/binding/` (gitignored) + a small committed summary.
- Pin model + weights version in the image for reproducibility.
## Next step
Scaffold `gpu/modal_app.py` for Phase-1 validation (3 known binders), do a dry run locally
(`modal run --detach`? no just `modal run`), confirm cost, then Phase 2. Requires a Modal account
+ `pip install modal` + `modal token new` (one-time auth).

67
gpu/modal_app.py Normal file
View File

@@ -0,0 +1,67 @@
"""Ephemeral GPU runner for AF3-class co-folding (PLAN §12, docs/gpu_plan.md).
Serverless: `modal run gpu/modal_app.py` provisions a GPU, runs, and releases it — zero idle cost,
nothing to remember to kill. Model weights are cached in a persistent Volume so we never re-pay GPU
time to re-download them. Prep (Meeko/RDKit) and RMSD scoring (spyrmsd) stay light; only the model
forward pass needs the GPU.
Setup (one-time): `pip install modal && modal token new`.
Run Phase 1 (validate on 3 known binders): `modal run gpu/modal_app.py`.
STATUS: scaffold. The boltz invocation (input spec + output parsing) is stubbed where marked TODO;
wire it after a first `modal run` confirms the image builds and the GPU is reachable.
"""
from __future__ import annotations
import modal
app = modal.App("reverso-binding")
# CUDA image + AF3-class model (Boltz-2) + light prep/scoring deps.
image = (
modal.Image.debian_slim(python_version="3.12")
.apt_install("git", "wget")
.pip_install("boltz", "rdkit", "meeko", "spyrmsd", "gemmi", "numpy")
)
# Persist model weights across runs so we download them once, not every GPU-billed run.
weights = modal.Volume.from_name("reverso-binding-weights", create_if_missing=True)
# Known binders -> (PDB id, crystal ligand resname, SMILES placeholder filled by caller).
# Phase 1 validation: does co-folding reproduce these crystal poses where Vina failed?
KNOWN = {
"voxelotor_Hb": ("5E83", "5L7"),
"mitapivat_PKR": ("8XFD", "WV2"),
"vorinostat_HDAC2": ("4LXZ", "SHH"),
}
@app.function(gpu="L4", image=image, volumes={"/weights": weights}, timeout=3600)
def cofold(protein_seq: str, ligand_smiles: str, weights_dir: str = "/weights") -> dict:
"""Co-fold one protein+ligand complex and return predicted affinity + pose (PDB string).
Runs on the GPU only for this call, then the GPU is released. TODO: replace the stub with the
actual Boltz-2 invocation (write the YAML/FASTA input spec, call `boltz predict
--use_msa_server --out_dir ... --cache /weights`, parse the predicted structure + affinity).
"""
import subprocess # noqa: F401 (used once boltz is wired)
# TODO: build boltz input (protein_seq + ligand_smiles), run, parse pose+affinity.
raise NotImplementedError("Wire Boltz-2 here; see docs/gpu_plan.md Phase 1.")
@app.local_entrypoint()
def main() -> None:
"""Phase 1 driver (runs locally; only cofold() touches the GPU).
Pulls target sequences + ligand SMILES from the repo, fans out one GPU call per known binder,
scores redocking RMSD vs the crystal pose locally (spyrmsd), and prints pass/fail. Results are
tiny — commit a summary into data/processed/binding/.
"""
# TODO: load protein sequences from data/raw/structures/<pdb>.pdb (gemmi) and ligand SMILES
# (PubChem / drug_set), then:
# results = list(cofold.map(seqs, smiles))
# and compute in-place spyrmsd RMSD vs the crystal ligand for each.
print("Scaffold: fill in sequence/SMILES loading + cofold.map, then score RMSD. "
"See docs/gpu_plan.md.")