Add metal/cofactor handling to the Boltz-2 YAML as CCD ligand entries -
the modes classical docking couldn't model:
- HDAC2 + catalytic Zn (vorinostat chelates it)
- PKR + FBP + Mg (allosteric activator + metal)
- hemoglobin + heme
Same cofactors present when co-folding negatives into a target (fair test).
build_boltz_yaml() gains a cofactor_ccds arg (emits `ligand: {ccd: ...}`
entries); TARGETS carries per-target cofactors; cofold()/main() thread them
through. Verified locally: YAML builds correctly with Zn / FBP+Mg.
Honest limitation noted: Hb's voxelotor site is at the tetramer centre and
covalent (Schiff base), so single-chain+heme only approximates it - HDAC2
(Zn) and PKR (cofactor) are the real co-folding tests. Ready for
`modal run gpu/modal_app.py`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
113 lines
6.6 KiB
Markdown
113 lines
6.6 KiB
Markdown
# GPU plan — ephemeral AF3-class co-folding for the binding track
|
||
|
||
Goal: run AF3-class co-folding (Boltz-2 / DiffDock) on a GPU for the structure-binding track
|
||
(§12), then **pay nothing when idle**. The work is *bursty* (a validation run, then a screen),
|
||
the inputs are *tiny*, so the design optimises for zero idle cost, not for a persistent box.
|
||
|
||
## What actually has to move (small!)
|
||
|
||
| Thing | Size | How it gets to the GPU |
|
||
|---|---|---|
|
||
| Code | KB | `git clone` the gitea repo (or mount) |
|
||
| Target structures (PDBs) | a few MB | in the repo / git |
|
||
| Ligands (SMILES, drug set) | KB | in the repo / git |
|
||
| **Model weights** (Boltz-2 / DiffDock) | **~2–6 GB** | downloaded once, **cached in a persistent volume** |
|
||
| Results (poses, scores, RMSDs) | KB–MB | `git push` back / download |
|
||
|
||
The 27 GB LINCS data is **not** part of this track — nothing big to upload. The only thing worth
|
||
persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).
|
||
|
||
## How the model weights persist (the cost-saver)
|
||
|
||
A `modal.Volume` is a **named, cloud-backed filesystem that lives independently of any container
|
||
or GPU** — it survives every teardown. Mounted into the function at `/weights`:
|
||
|
||
- **Run 1:** `/weights` is empty → the model downloads weights there (the one-time slow cost).
|
||
- **Run 2+:** the same Volume mounts with the files already present → download skipped → **no
|
||
GPU-billed seconds wasted re-fetching 5 GB.**
|
||
|
||
Two things make it actually cache:
|
||
1. **Point the downloader at the mount** (weights only persist if written under `/weights`):
|
||
`HF_HOME=/weights/hf` (HuggingFace), `TORCH_HOME=/weights/torch`, `boltz --cache /weights/boltz`.
|
||
2. **Commit semantics:** writes persist on `weights.commit()` (modern Modal also auto-commits on a
|
||
clean exit); other containers see them after `weights.reload()`. Pattern: `reload()` → run →
|
||
`commit()`.
|
||
|
||
The Volume itself costs pennies (~$/GB-month of storage), *separate from the GPU* — so caching ~5 GB
|
||
of weights is near-free and saves real GPU time on every subsequent run.
|
||
(Alternative: bake weights into the image at build time via `image.run_function(download)` — fastest
|
||
cold start, but the image rebuilds when weights change. The skeleton uses the Volume approach.)
|
||
|
||
## Provider choice
|
||
|
||
| Option | Billing | Idle cost | "Kill" model | Best for |
|
||
|---|---|---|---|---|
|
||
| **Modal** (recommended) | per-second | **$0 (scales to zero)** | automatic — nothing to remember | bursty batch runs |
|
||
| RunPod | per-minute, on-demand/spot | only while pod exists | manual `terminate` | interactive SSH box |
|
||
| Vast.ai | per-minute spot | only while rented | manual destroy | cheapest, more fiddly |
|
||
| Lambda / AWS-GCP spot | per-hour/second | until you stop it | manual stop | if you already have credits |
|
||
|
||
**Recommend Modal.** You define the run as a function with a `gpu=` decorator; on call it
|
||
provisions the GPU, runs, and releases it — **there is no GPU to forget to kill**, and you pay only
|
||
for the seconds it ran. That *is* the "kill the GPU to save money" requirement, automated.
|
||
|
||
## The lifecycle (Modal)
|
||
|
||
1. **Define image** once: CUDA base + `boltz` (or DiffDock) + `rdkit`, `meeko`, `spyrmsd`, `gemmi`.
|
||
2. **Weights volume**: `modal.Volume` mounted at `/weights`; the model downloads into it on first
|
||
run and is cached forever after (no re-download cost).
|
||
3. **Run**: `modal run gpu/modal_app.py` → provisions GPU → runs the test → returns results to your
|
||
laptop. GPU released the moment the function returns.
|
||
4. **Persist results**: write the returned scores/poses into `data/processed/binding/` and
|
||
`git commit` (small, text).
|
||
5. **Teardown**: nothing to do — Modal scaled to zero. `modal app stop` only if a run is hung.
|
||
|
||
RunPod alternative (if you want an interactive box): start pod → `git clone` → run → `git push`
|
||
results → **`runpodctl remove pod <id>`** (or Stop in the UI). Set an **idle auto-terminate** so a
|
||
forgotten box can't bleed money.
|
||
|
||
## What runs on the GPU (in cost order — cheap validation first)
|
||
|
||
- **Phase 1 — modality validation (~minutes, ~$1):** co-fold each known binder + 2 negative
|
||
controls (caffeine, hydroxyurea) into each target (Hb, PKR, HDAC2) **with the binding-mode
|
||
cofactors co-folded in** — HDAC2 + catalytic **Zn**, PKR + **FBP/Mg**, Hb + **heme** (as CCD
|
||
ligand entries) — and check the **known binder has the highest Boltz-2 P(binder)** for its own
|
||
target. This is the discrimination Vina couldn't manage precisely because it can't model
|
||
Zn-chelation / cofactors. (Ranking uses P(binder), not the raw affinity value, whose
|
||
sign is version-dependent.) Pose-RMSD vs crystal is a deeper check but needs receptor
|
||
superposition (align predicted protein to crystal, transform ligand) — a later refinement. If
|
||
Phase 1 passes, the modality is real; if not, stop before paying for a screen.
|
||
- **Phase 2 — screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset)
|
||
against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control
|
||
recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`.
|
||
|
||
## Model choice
|
||
|
||
- **Boltz-2** (MIT, pip-installable) — predicts the protein–ligand complex **and** a binding
|
||
affinity → directly gives a rankable score. Primary choice. Fits a 24–40 GB GPU for these
|
||
single-domain targets.
|
||
- **DiffDock-L** — lighter, pose-only (needs a separate scorer); fallback if Boltz memory is tight.
|
||
- GPU: an **L4 (24 GB, ~$0.6–0.8/hr)** or **A10/L40S (24–48 GB)** is plenty; no multi-GPU, no A100
|
||
needed for these sizes.
|
||
|
||
## Cost controls (the save-money checklist)
|
||
|
||
- Serverless (Modal) → **zero idle cost** by construction; or an **idle-timeout** auto-kill on a box.
|
||
- **Cache weights** in a persistent volume — re-downloading 5 GB on a $1/hr GPU is wasted money.
|
||
- **Validate on one target before screening** — don't pay for a 300-drug screen until Phase 1 passes.
|
||
- Prefer **spot/interruptible** for the batch screen (Phase 2 is restartable).
|
||
- Keep prep (Meeko/RDKit) and result-scoring on the **laptop**; only the model forward pass needs GPU.
|
||
- Estimated total to validate + a first screen: **~$5–15**, not a standing bill.
|
||
|
||
## Repo integration
|
||
|
||
- `gpu/modal_app.py` — the Modal app (skeleton committed alongside this plan).
|
||
- Results land in `data/processed/binding/` (gitignored) + a small committed summary.
|
||
- Pin model + weights version in the image for reproducibility.
|
||
|
||
## Next step
|
||
|
||
Scaffold `gpu/modal_app.py` for Phase-1 validation (3 known binders), do a dry run locally
|
||
(`modal run --detach`? no — just `modal run`), confirm cost, then Phase 2. Requires a Modal account
|
||
+ `pip install modal` + `modal token new` (one-time auth).
|