Add the `screen` entrypoint (parallel ~10-wide, cached weights) and run a
24-drug pilot vs HDAC2 (+Zn), ranked by Boltz-2 P(binder). ~$1.3.
Result (recovery test at scale): top 9 are ALL HDAC inhibitors
(trichostatin-A/vorinostat/panobinostat/belinostat/scriptaid/mocetinostat/
entinostat/apicidin >=0.99; valproic-acid 0.91), clean drop-off to
hydroxyurea 0.78 and non-HDAC drugs to dexamethasone 0.03. Captures the
structure-activity gradient (hydroxamates > weak fatty-acid > non-HDAC).
Honest false negative: romidepsin (potent HDAC inhibitor) ranks low (0.43)
-- it's a depsipeptide PRODRUG co-folding doesn't model. Screen mishandles
non-standard chemotypes.
Screening pipeline validated; next is the full 300-drug discovery run.
max_containers=10 (parallel safe once weights cached).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- gpu/modal_app.py: add the `pose` local entrypoint used for the HDAC2
pose-RMSD validation (run: `modal run gpu/modal_app.py::pose`).
- pyproject [structure] extra: add the deps we actually use locally
(gemmi, spyrmsd, meeko, modal) for reproducibility; document the non-pip
tools (Vina binary, open-babel) and that Boltz/cuequivariance are
Modal-image-only.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First clear positive result in the project. Ran Phase 1 on Modal L4
(~$0.70). Boltz-2 P(binder), cofactors co-folded:
- HDAC2 (+Zn): vorinostat 0.9994 vs negatives ~0.1 -> PASS, decisive
- hemoglobin (+heme): voxelotor 0.46 -> PASS (weak; covalent/tetramer)
- PKR (+FBP/Mg): mitapivat 0.32 < hydroxyurea 0.40 -> FAIL (allosteric)
HDAC2/Zn is the exact case classical Vina failed (no metal term, 7.9A
redock). Co-folding handles the Zn-chelation chemistry -> the structure-
binding modality pivot (PLAN §12) is validated on its decisive test.
Engineering fixes that got it running: image needs cuequivariance kernels;
max_containers=1 so weights download once (parallel corrupted the shared-
Volume checkpoint); rank by P(binder) not affinity_pred_value (sign).
Adds docs/results/phase1_affinity.csv (committed; raw under data/ gitignored).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add metal/cofactor handling to the Boltz-2 YAML as CCD ligand entries -
the modes classical docking couldn't model:
- HDAC2 + catalytic Zn (vorinostat chelates it)
- PKR + FBP + Mg (allosteric activator + metal)
- hemoglobin + heme
Same cofactors present when co-folding negatives into a target (fair test).
build_boltz_yaml() gains a cofactor_ccds arg (emits `ligand: {ccd: ...}`
entries); TARGETS carries per-target cofactors; cofold()/main() thread them
through. Verified locally: YAML builds correctly with Zn / FBP+Mg.
Honest limitation noted: Hb's voxelotor site is at the tetramer centre and
covalent (Schiff base), so single-chain+heme only approximates it - HDAC2
(Zn) and PKR (cofactor) are the real co-folding tests. Ready for
`modal run gpu/modal_app.py`.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flesh out the Modal app into a runnable Phase-1 positive-control test and
reconcile it with the plan:
- cofold() GPU fn: build Boltz-2 YAML (protein+ligand+affinity), run
`boltz predict --use_msa_server --cache /weights/boltz`, parse affinity
JSON + predicted pose; weights persist via Volume.
- Local helpers (CPU, import-tested against our PDBs): binding_chain_sequence
(gemmi -- correctly picks the binding chain, e.g. alpha-globin for 5E83),
pubchem_smiles, build_boltz_yaml, fetch_pdb (RCSB).
- main(): fan out cofold.starmap over 3 targets x (known binder + 2
negatives); tabulate; PASS if known binder has top P(binder) for its target.
Alignment fixes:
- Rank by P(binder) (higher=better), NOT raw affinity_pred_value whose sign
(~log IC50) is version-dependent -- avoids a backwards positive-control test.
- gpu_plan.md Phase 1 updated to affinity/P(binder) ranking; pose-RMSD noted
as a later refinement (needs receptor superposition).
Local half verified (sequence/SMILES/YAML); cofold() needs a live `modal run`
(account + `modal token new`) to validate end-to-end.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Document and wire the weight-caching mechanism:
- modal.Volume is a cloud-backed FS independent of the GPU/container;
run 1 downloads weights into /weights, run 2+ reuses them (no GPU time
wasted re-downloading).
- Point downloaders at the mount: HF_HOME/TORCH_HOME/boltz --cache; persist
via weights.commit(), see updates via weights.reload().
- Volume storage costs pennies, separate from GPU = near-free caching.
modal_app.py cofold(): set cache env vars to /weights, reload()/commit()
around the (stubbed) boltz call.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding
(Boltz-2/DiffDock) on a GPU then paying nothing when idle.
- Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the
GPU + model weights are heavy -> serverless is ideal.
- Recommend Modal (per-second billing, scales to zero = nothing to kill);
RunPod as the SSH-box alternative with idle auto-terminate.
- Lifecycle: image -> weights Volume (cache, don't re-download) -> run ->
git push small results -> teardown automatic.
- Phase 1 validate on 3 known binders (~$1) before paying for a screen;
Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15.
gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold()
function, local entrypoint); boltz invocation stubbed with TODOs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>