GPU Phase 1 runnable: real Boltz-2 co-folding + alignment review

Flesh out the Modal app into a runnable Phase-1 positive-control test and reconcile it with the plan: - cofold() GPU fn: build Boltz-2 YAML (protein+ligand+affinity), run `boltz predict --use_msa_server --cache /weights/boltz`, parse affinity JSON + predicted pose; weights persist via Volume. - Local helpers (CPU, import-tested against our PDBs): binding_chain_sequence (gemmi -- correctly picks the binding chain, e.g. alpha-globin for 5E83), pubchem_smiles, build_boltz_yaml, fetch_pdb (RCSB). - main(): fan out cofold.starmap over 3 targets x (known binder + 2 negatives); tabulate; PASS if known binder has top P(binder) for its target. Alignment fixes: - Rank by P(binder) (higher=better), NOT raw affinity_pred_value whose sign (~log IC50) is version-dependent -- avoids a backwards positive-control test. - gpu_plan.md Phase 1 updated to affinity/P(binder) ranking; pose-RMSD noted as a later refinement (needs receptor superposition). Local half verified (sequence/SMILES/YAML); cofold() needs a live `modal run` (account + `modal token new`) to validate end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 16:56:27 +02:00
parent 81d56b7a76
commit 4022c0cb94
2 changed files with 163 additions and 57 deletions
--- a/docs/gpu_plan.md
+++ b/docs/gpu_plan.md
@@ -68,10 +68,13 @@ forgotten box can't bleed money.

 ## What runs on the GPU (in cost order — cheap validation first)

- **Phase 1 — modality validation (~minutes, ~$1):** co-fold the 3 known binders into their
-  targets (voxelotor/Hb, mitapivat/PKR, vorinostat/HDAC2) and check it reproduces the crystal pose
-  (RMSD <2 Å) where Vina failed on metal/covalent/allosteric modes. If this passes, the modality is
-  real; if not, stop before spending on a screen.
+- **Phase 1 — modality validation (~minutes, ~$1):** co-fold each known binder + 2 negative
+  controls (caffeine, hydroxyurea) into each target (Hb, PKR, HDAC2) and check the **known binder
+  has the highest Boltz-2 P(binder)** for its own target — the discrimination Vina couldn't manage
+  on metal/covalent/allosteric modes. (Ranking uses P(binder), not the raw affinity value, whose
+  sign is version-dependent.) Pose-RMSD vs crystal is a deeper check but needs receptor
+  superposition (align predicted protein to crystal, transform ligand) — a later refinement. If
+  Phase 1 passes, the modality is real; if not, stop before paying for a screen.
 - **Phase 2 — screen (~tens of minutes, a few $):** run the ~300-drug set (or a focused subset)
  against the sickle targets; rank by Boltz-2 predicted affinity; redo the §12.4 positive-control
  recovery test. Output a ranked CSV, same shape as the connectivity `ranked_candidates`.