Files
Reverso/docs/gpu_plan.md
Junior B. 07705a5884 GPU Phase 1: co-fold cofactors/metals (the binding-mode determinants)
Add metal/cofactor handling to the Boltz-2 YAML as CCD ligand entries -
the modes classical docking couldn't model:
- HDAC2 + catalytic Zn (vorinostat chelates it)
- PKR + FBP + Mg (allosteric activator + metal)
- hemoglobin + heme
Same cofactors present when co-folding negatives into a target (fair test).

build_boltz_yaml() gains a cofactor_ccds arg (emits `ligand: {ccd: ...}`
entries); TARGETS carries per-target cofactors; cofold()/main() thread them
through. Verified locally: YAML builds correctly with Zn / FBP+Mg.

Honest limitation noted: Hb's voxelotor site is at the tetramer centre and
covalent (Schiff base), so single-chain+heme only approximates it - HDAC2
(Zn) and PKR (cofactor) are the real co-folding tests. Ready for
`modal run gpu/modal_app.py`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 17:16:06 +02:00

6.6 KiB
Raw Blame History

GPU plan — ephemeral AF3-class co-folding for the binding track

Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track (§12), then pay nothing when idle. The work is bursty (a validation run, then a screen), the inputs are tiny, so the design optimises for zero idle cost, not for a persistent box.

What actually has to move (small!)

Thing Size How it gets to the GPU
Code KB git clone the gitea repo (or mount)
Target structures (PDBs) a few MB in the repo / git
Ligands (SMILES, drug set) KB in the repo / git
Model weights (Boltz-2 / DiffDock) ~26 GB downloaded once, cached in a persistent volume
Results (poses, scores, RMSDs) KBMB git push back / download

The 27 GB LINCS data is not part of this track — nothing big to upload. The only thing worth persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).

How the model weights persist (the cost-saver)

A modal.Volume is a named, cloud-backed filesystem that lives independently of any container or GPU — it survives every teardown. Mounted into the function at /weights:

  • Run 1: /weights is empty → the model downloads weights there (the one-time slow cost).
  • Run 2+: the same Volume mounts with the files already present → download skipped → no GPU-billed seconds wasted re-fetching 5 GB.

Two things make it actually cache:

  1. Point the downloader at the mount (weights only persist if written under /weights): HF_HOME=/weights/hf (HuggingFace), TORCH_HOME=/weights/torch, boltz --cache /weights/boltz.
  2. Commit semantics: writes persist on weights.commit() (modern Modal also auto-commits on a clean exit); other containers see them after weights.reload(). Pattern: reload() → run → commit().

The Volume itself costs pennies (~$/GB-month of storage), separate from the GPU — so caching ~5 GB of weights is near-free and saves real GPU time on every subsequent run. (Alternative: bake weights into the image at build time via image.run_function(download) — fastest cold start, but the image rebuilds when weights change. The skeleton uses the Volume approach.)

Provider choice

Option Billing Idle cost "Kill" model Best for
Modal (recommended) per-second $0 (scales to zero) automatic — nothing to remember bursty batch runs
RunPod per-minute, on-demand/spot only while pod exists manual terminate interactive SSH box
Vast.ai per-minute spot only while rented manual destroy cheapest, more fiddly
Lambda / AWS-GCP spot per-hour/second until you stop it manual stop if you already have credits

Recommend Modal. You define the run as a function with a gpu= decorator; on call it provisions the GPU, runs, and releases it — there is no GPU to forget to kill, and you pay only for the seconds it ran. That is the "kill the GPU to save money" requirement, automated.

The lifecycle (Modal)

  1. Define image once: CUDA base + boltz (or DiffDock) + rdkit, meeko, spyrmsd, gemmi.
  2. Weights volume: modal.Volume mounted at /weights; the model downloads into it on first run and is cached forever after (no re-download cost).
  3. Run: modal run gpu/modal_app.py → provisions GPU → runs the test → returns results to your laptop. GPU released the moment the function returns.
  4. Persist results: write the returned scores/poses into data/processed/binding/ and git commit (small, text).
  5. Teardown: nothing to do — Modal scaled to zero. modal app stop only if a run is hung.

RunPod alternative (if you want an interactive box): start pod → git clone → run → git push results → runpodctl remove pod <id> (or Stop in the UI). Set an idle auto-terminate so a forgotten box can't bleed money.

What runs on the GPU (in cost order — cheap validation first)

  • Phase 1 — modality validation (~minutes, ~$1): co-fold each known binder + 2 negative controls (caffeine, hydroxyurea) into each target (Hb, PKR, HDAC2) with the binding-mode cofactors co-folded in — HDAC2 + catalytic Zn, PKR + FBP/Mg, Hb + heme (as CCD ligand entries) — and check the known binder has the highest Boltz-2 P(binder) for its own target. This is the discrimination Vina couldn't manage precisely because it can't model Zn-chelation / cofactors. (Ranking uses P(binder), not the raw affinity value, whose sign is version-dependent.) Pose-RMSD vs crystal is a deeper check but needs receptor superposition (align predicted protein to crystal, transform ligand) — a later refinement. If Phase 1 passes, the modality is real; if not, stop before paying for a screen.
  • Phase 2 — screen (~tens of minutes, a few $): run the ~300-drug set (or a focused subset) against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control recovery test. Output a ranked CSV, same shape as the connectivity ranked_candidates.

Model choice

  • Boltz-2 (MIT, pip-installable) — predicts the proteinligand complex and a binding affinity → directly gives a rankable score. Primary choice. Fits a 2440 GB GPU for these single-domain targets.
  • DiffDock-L — lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight.
  • GPU: an L4 (24 GB, ~$0.60.8/hr) or A10/L40S (2448 GB) is plenty; no multi-GPU, no A100 needed for these sizes.

Cost controls (the save-money checklist)

  • Serverless (Modal) → zero idle cost by construction; or an idle-timeout auto-kill on a box.
  • Cache weights in a persistent volume — re-downloading 5 GB on a $1/hr GPU is wasted money.
  • Validate on one target before screening — don't pay for a 300-drug screen until Phase 1 passes.
  • Prefer spot/interruptible for the batch screen (Phase 2 is restartable).
  • Keep prep (Meeko/RDKit) and result-scoring on the laptop; only the model forward pass needs GPU.
  • Estimated total to validate + a first screen: ~$515, not a standing bill.

Repo integration

  • gpu/modal_app.py — the Modal app (skeleton committed alongside this plan).
  • Results land in data/processed/binding/ (gitignored) + a small committed summary.
  • Pin model + weights version in the image for reproducibility.

Next step

Scaffold gpu/modal_app.py for Phase-1 validation (3 known binders), do a dry run locally (modal run --detach? no — just modal run), confirm cost, then Phase 2. Requires a Modal account

  • pip install modal + modal token new (one-time auth).