Add metal/cofactor handling to the Boltz-2 YAML as CCD ligand entries -
the modes classical docking couldn't model:
- HDAC2 + catalytic Zn (vorinostat chelates it)
- PKR + FBP + Mg (allosteric activator + metal)
- hemoglobin + heme
Same cofactors present when co-folding negatives into a target (fair test).
build_boltz_yaml() gains a cofactor_ccds arg (emits `ligand: {ccd: ...}`
entries); TARGETS carries per-target cofactors; cofold()/main() thread them
through. Verified locally: YAML builds correctly with Zn / FBP+Mg.
Honest limitation noted: Hb's voxelotor site is at the tetramer centre and
covalent (Schiff base), so single-chain+heme only approximates it - HDAC2
(Zn) and PKR (cofactor) are the real co-folding tests. Ready for
`modal run gpu/modal_app.py`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flesh out the Modal app into a runnable Phase-1 positive-control test and
reconcile it with the plan:
- cofold() GPU fn: build Boltz-2 YAML (protein+ligand+affinity), run
`boltz predict --use_msa_server --cache /weights/boltz`, parse affinity
JSON + predicted pose; weights persist via Volume.
- Local helpers (CPU, import-tested against our PDBs): binding_chain_sequence
(gemmi -- correctly picks the binding chain, e.g. alpha-globin for 5E83),
pubchem_smiles, build_boltz_yaml, fetch_pdb (RCSB).
- main(): fan out cofold.starmap over 3 targets x (known binder + 2
negatives); tabulate; PASS if known binder has top P(binder) for its target.
Alignment fixes:
- Rank by P(binder) (higher=better), NOT raw affinity_pred_value whose sign
(~log IC50) is version-dependent -- avoids a backwards positive-control test.
- gpu_plan.md Phase 1 updated to affinity/P(binder) ranking; pose-RMSD noted
as a later refinement (needs receptor superposition).
Local half verified (sequence/SMILES/YAML); cofold() needs a live `modal run`
(account + `modal token new`) to validate end-to-end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document and wire the weight-caching mechanism:
- modal.Volume is a cloud-backed FS independent of the GPU/container;
run 1 downloads weights into /weights, run 2+ reuses them (no GPU time
wasted re-downloading).
- Point downloaders at the mount: HF_HOME/TORCH_HOME/boltz --cache; persist
via weights.commit(), see updates via weights.reload().
- Volume storage costs pennies, separate from GPU = near-free caching.
modal_app.py cofold(): set cache env vars to /weights, reload()/commit()
around the (stubbed) boltz call.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding
(Boltz-2/DiffDock) on a GPU then paying nothing when idle.
- Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the
GPU + model weights are heavy -> serverless is ideal.
- Recommend Modal (per-second billing, scales to zero = nothing to kill);
RunPod as the SSH-box alternative with idle auto-terminate.
- Lifecycle: image -> weights Volume (cache, don't re-download) -> run ->
git push small results -> teardown automatic.
- Phase 1 validate on 3 known binders (~$1) before paying for a screen;
Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15.
gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold()
function, local entrypoint); boltz invocation stubbed with TODOs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>