GPU plan: make weight persistence concrete (Modal Volume cache)
Document and wire the weight-caching mechanism: - modal.Volume is a cloud-backed FS independent of the GPU/container; run 1 downloads weights into /weights, run 2+ reuses them (no GPU time wasted re-downloading). - Point downloaders at the mount: HF_HOME/TORCH_HOME/boltz --cache; persist via weights.commit(), see updates via weights.reload(). - Volume storage costs pennies, separate from GPU = near-free caching. modal_app.py cofold(): set cache env vars to /weights, reload()/commit() around the (stubbed) boltz call. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -17,6 +17,27 @@ the inputs are *tiny*, so the design optimises for zero idle cost, not for a per
|
||||
The 27 GB LINCS data is **not** part of this track — nothing big to upload. The only thing worth
|
||||
persisting is the model-weights cache (so we don't re-download = re-pay GPU time every run).
|
||||
|
||||
## How the model weights persist (the cost-saver)
|
||||
|
||||
A `modal.Volume` is a **named, cloud-backed filesystem that lives independently of any container
|
||||
or GPU** — it survives every teardown. Mounted into the function at `/weights`:
|
||||
|
||||
- **Run 1:** `/weights` is empty → the model downloads weights there (the one-time slow cost).
|
||||
- **Run 2+:** the same Volume mounts with the files already present → download skipped → **no
|
||||
GPU-billed seconds wasted re-fetching 5 GB.**
|
||||
|
||||
Two things make it actually cache:
|
||||
1. **Point the downloader at the mount** (weights only persist if written under `/weights`):
|
||||
`HF_HOME=/weights/hf` (HuggingFace), `TORCH_HOME=/weights/torch`, `boltz --cache /weights/boltz`.
|
||||
2. **Commit semantics:** writes persist on `weights.commit()` (modern Modal also auto-commits on a
|
||||
clean exit); other containers see them after `weights.reload()`. Pattern: `reload()` → run →
|
||||
`commit()`.
|
||||
|
||||
The Volume itself costs pennies (~$/GB-month of storage), *separate from the GPU* — so caching ~5 GB
|
||||
of weights is near-free and saves real GPU time on every subsequent run.
|
||||
(Alternative: bake weights into the image at build time via `image.run_function(download)` — fastest
|
||||
cold start, but the image rebuilds when weights change. The skeleton uses the Volume approach.)
|
||||
|
||||
## Provider choice
|
||||
|
||||
| Option | Billing | Idle cost | "Kill" model | Best for |
|
||||
|
||||
Reference in New Issue
Block a user