GPU Phase 1 runnable: real Boltz-2 co-folding + alignment review

Flesh out the Modal app into a runnable Phase-1 positive-control test and
reconcile it with the plan:
- cofold() GPU fn: build Boltz-2 YAML (protein+ligand+affinity), run
  `boltz predict --use_msa_server --cache /weights/boltz`, parse affinity
  JSON + predicted pose; weights persist via Volume.
- Local helpers (CPU, import-tested against our PDBs): binding_chain_sequence
  (gemmi -- correctly picks the binding chain, e.g. alpha-globin for 5E83),
  pubchem_smiles, build_boltz_yaml, fetch_pdb (RCSB).
- main(): fan out cofold.starmap over 3 targets x (known binder + 2
  negatives); tabulate; PASS if known binder has top P(binder) for its target.

Alignment fixes:
- Rank by P(binder) (higher=better), NOT raw affinity_pred_value whose sign
  (~log IC50) is version-dependent -- avoids a backwards positive-control test.
- gpu_plan.md Phase 1 updated to affinity/P(binder) ranking; pose-RMSD noted
  as a later refinement (needs receptor superposition).

Local half verified (sequence/SMILES/YAML); cofold() needs a live `modal run`
(account + `modal token new`) to validate end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-24 16:56:27 +02:00
parent 81d56b7a76
commit 4022c0cb94
2 changed files with 163 additions and 57 deletions

View File

@@ -68,10 +68,13 @@ forgotten box can't bleed money.
## What runs on the GPU (in cost order — cheap validation first)
- **Phase 1 — modality validation (~minutes, ~$1):** co-fold the 3 known binders into their
targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose
(RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is
real; if not, stop before spending on a screen.
- **Phase 1 — modality validation (~minutes, ~$1):** co-fold each known binder + 2 negative
controls (caffeine, hydroxyurea) into each target (Hb, PKR, HDAC2) and check the **known binder
has the highest Boltz-2 P(binder)** for its own target — the discrimination Vina couldn't manage
on metal/covalent/allosteric modes. (Ranking uses P(binder), not the raw affinity value, whose
sign is version-dependent.) Pose-RMSD vs crystal is a deeper check but needs receptor
superposition (align predicted protein to crystal, transform ligand) — a later refinement. If
Phase 1 passes, the modality is real; if not, stop before paying for a screen.
- **Phase 2 — screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset)
against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control
recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`.