Files
Reverso/docs/gpu_plan.md
Junior B. 08ed713cc8 GPU plan: ephemeral serverless co-folding (Modal) + app skeleton
docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding
(Boltz-2/DiffDock) on a GPU then paying nothing when idle.
- Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the
  GPU + model weights are heavy -> serverless is ideal.
- Recommend Modal (per-second billing, scales to zero = nothing to kill);
  RunPod as the SSH-box alternative with idle auto-terminate.
- Lifecycle: image -> weights Volume (cache, don't re-download) -> run ->
  git push small results -> teardown automatic.
- Phase 1 validate on 3 known binders (~$1) before paying for a screen;
  Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15.

gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold()
function, local entrypoint); boltz invocation stubbed with TODOs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 16:45:04 +02:00

87 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# GPU plan — ephemeral AF3-class co-folding for the binding track
Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track
(§12), then **pay nothing when idle**. The work is *bursty* (a validation run, then a screen),
the inputs are *tiny*, so the design optimises for zero idle cost, not for a persistent box.
## What actually has to move (small!)
| Thing | Size | How it gets to the GPU |
|---|---|---|
| Code | KB | `git clone` the gitea repo (or mount) |
| Target structures (PDBs) | a few MB | in the repo / git |
| Ligands (SMILES, drug set) | KB | in the repo / git |
| **Model weights** (Boltz-2 / DiffDock) | **~26 GB** | downloaded once, **cached in a persistent volume** |
| Results (poses, scores, RMSDs) | KBMB | `git push` back / download |
The 27 GB LINCS data is **not** part of this track — nothing big to upload. The only thing worth
persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).
## Provider choice
| Option | Billing | Idle cost | "Kill" model | Best for |
|---|---|---|---|---|
| **Modal** (recommended) | per-second | **$0 (scales to zero)** | automatic — nothing to remember | bursty batch runs |
| RunPod | per-minute, on-demand/spot | only while pod exists | manual `terminate` | interactive SSH box |
| Vast.ai | per-minute spot | only while rented | manual destroy | cheapest, more fiddly |
| Lambda / AWS-GCP spot | per-hour/second | until you stop it | manual stop | if you already have credits |
**Recommend Modal.** You define the run as a function with a `gpu=` decorator; on call it
provisions the GPU, runs, and releases it — **there is no GPU to forget to kill**, and you pay only
for the seconds it ran. That *is* the "kill the GPU to save money" requirement, automated.
## The lifecycle (Modal)
1. **Define image** once: CUDA base + `boltz` (or DiffDock) + `rdkit`, `meeko`, `spyrmsd`, `gemmi`.
2. **Weights volume**: `modal.Volume` mounted at `/weights`; the model downloads into it on first
run and is cached forever after (no re-download cost).
3. **Run**: `modal run gpu/modal_app.py` → provisions GPU → runs the test → returns results to your
laptop. GPU released the moment the function returns.
4. **Persist results**: write the returned scores/poses into `data/processed/binding/` and
`git commit` (small, text).
5. **Teardown**: nothing to do — Modal scaled to zero. `modal app stop` only if a run is hung.
RunPod alternative (if you want an interactive box): start pod → `git clone` → run → `git push`
results → **`runpodctl remove pod <id>`** (or Stop in the UI). Set an **idle auto-terminate** so a
forgotten box can't bleed money.
## What runs on the GPU (in cost order — cheap validation first)
- **Phase 1 — modality validation (~minutes, ~$1):** co-fold the 3 known binders into their
targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose
(RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is
real; if not, stop before spending on a screen.
- **Phase 2 screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset)
against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control
recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`.
## Model choice
- **Boltz-2** (MIT, pip-installable) predicts the proteinligand complex **and** a binding
affinity directly gives a rankable score. Primary choice. Fits a 2440 GB GPU for these
single-domain targets.
- **DiffDock-L** lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight.
- GPU: an **L4 (24 GB, ~$0.60.8/hr)** or **A10/L40S (2448 GB)** is plenty; no multi-GPU, no A100
needed for these sizes.
## Cost controls (the save-money checklist)
- Serverless (Modal) **zero idle cost** by construction; or an **idle-timeout** auto-kill on a box.
- **Cache weights** in a persistent volume re-downloading 5 GB on a $1/hr GPU is wasted money.
- **Validate on one target before screening** don't pay for a 300-drug screen until Phase 1 passes.
- Prefer **spot/interruptible** for the batch screen (Phase 2 is restartable).
- Keep prep (Meeko/RDKit) and result-scoring on the **laptop**; only the model forward pass needs GPU.
- Estimated total to validate + a first screen: **~$515**, not a standing bill.
## Repo integration
- `gpu/modal_app.py` the Modal app (skeleton committed alongside this plan).
- Results land in `data/processed/binding/` (gitignored) + a small committed summary.
- Pin model + weights version in the image for reproducibility.
## Next step
Scaffold `gpu/modal_app.py` for Phase-1 validation (3 known binders), do a dry run locally
(`modal run --detach`? no just `modal run`), confirm cost, then Phase 2. Requires a Modal account
+ `pip install modal` + `modal token new` (one-time auth).