Files

Junior B. 08ed713cc8 GPU plan: ephemeral serverless co-folding (Modal) + app skeleton

docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding
(Boltz-2/DiffDock) on a GPU then paying nothing when idle.
- Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the
  GPU + model weights are heavy -> serverless is ideal.
- Recommend Modal (per-second billing, scales to zero = nothing to kill);
  RunPod as the SSH-box alternative with idle auto-terminate.
- Lifecycle: image -> weights Volume (cache, don't re-download) -> run ->
  git push small results -> teardown automatic.
- Phase 1 validate on 3 known binders (~$1) before paying for a screen;
  Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15.

gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold()
function, local entrypoint); boltz invocation stubbed with TODOs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-24 16:45:04 +02:00

4.9 KiB

Raw Blame History

GPU plan — ephemeral AF3-class co-folding for the binding track

Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track (§12), then pay nothing when idle. The work is bursty (a validation run, then a screen), the inputs are tiny, so the design optimises for zero idle cost, not for a persistent box.

What actually has to move (small!)

Thing	Size	How it gets to the GPU
Code	KB	`git clone` the gitea repo (or mount)
Target structures (PDBs)	a few MB	in the repo / git
Ligands (SMILES, drug set)	KB	in the repo / git
Model weights (Boltz-2 / DiffDock)	~2–6 GB	downloaded once, cached in a persistent volume
Results (poses, scores, RMSDs)	KB–MB	`git push` back / download

The 27 GB LINCS data is not part of this track — nothing big to upload. The only thing worth persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).

Provider choice

Option	Billing	Idle cost	"Kill" model	Best for
Modal (recommended)	per-second	$0 (scales to zero)	automatic — nothing to remember	bursty batch runs
RunPod	per-minute, on-demand/spot	only while pod exists	manual `terminate`	interactive SSH box
Vast.ai	per-minute spot	only while rented	manual destroy	cheapest, more fiddly
Lambda / AWS-GCP spot	per-hour/second	until you stop it	manual stop	if you already have credits

Recommend Modal. You define the run as a function with a gpu= decorator; on call it provisions the GPU, runs, and releases it — there is no GPU to forget to kill, and you pay only for the seconds it ran. That is the "kill the GPU to save money" requirement, automated.

Define image once: CUDA base + boltz (or DiffDock) + rdkit, meeko, spyrmsd, gemmi.
Weights volume: modal.Volume mounted at /weights; the model downloads into it on first run and is cached forever after (no re-download cost).
Run: modal run gpu/modal_app.py → provisions GPU → runs the test → returns results to your laptop. GPU released the moment the function returns.
Persist results: write the returned scores/poses into data/processed/binding/ and git commit (small, text).
Teardown: nothing to do — Modal scaled to zero. modal app stop only if a run is hung.

RunPod alternative (if you want an interactive box): start pod → git clone → run → git push results → runpodctl remove pod <id> (or Stop in the UI). Set an idle auto-terminate so a forgotten box can't bleed money.

What runs on the GPU (in cost order — cheap validation first)

Phase 1 — modality validation (~minutes, ~$1): co-fold the 3 known binders into their targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose (RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is real; if not, stop before spending on a screen.
Phase 2 — screen (~tens of minutes, a few $): run the ~300-drug set (or a focused subset) against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control recovery test. Output a ranked CSV, same shape as the connectivity ranked_candidates.

Model choice

Boltz-2 (MIT, pip-installable) — predicts the protein–ligand complex and a binding affinity → directly gives a rankable score. Primary choice. Fits a 24–40 GB GPU for these single-domain targets.
DiffDock-L — lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight.
GPU: an L4 (24 GB, ~$0.6–0.8/hr) or A10/L40S (24–48 GB) is plenty; no multi-GPU, no A100 needed for these sizes.

Cost controls (the save-money checklist)

Serverless (Modal) → zero idle cost by construction; or an idle-timeout auto-kill on a box.
Cache weights in a persistent volume — re-downloading 5 GB on a $1/hr GPU is wasted money.
Validate on one target before screening — don't pay for a 300-drug screen until Phase 1 passes.
Prefer spot/interruptible for the batch screen (Phase 2 is restartable).
Keep prep (Meeko/RDKit) and result-scoring on the laptop; only the model forward pass needs GPU.
Estimated total to validate + a first screen: ~$5–15, not a standing bill.

Repo integration

gpu/modal_app.py — the Modal app (skeleton committed alongside this plan).
Results land in data/processed/binding/ (gitignored) + a small committed summary.
Pin model + weights version in the image for reproducibility.

Next step

Scaffold gpu/modal_app.py for Phase-1 validation (3 known binders), do a dry run locally (modal run --detach? no — just modal run), confirm cost, then Phase 2. Requires a Modal account

pip install modal + modal token new (one-time auth).

4.9 KiB Raw Blame History Unescape Escape