Phase 2 screen pilot: HDAC2 recovers the inhibitor class (P>=0.99)
Add the `screen` entrypoint (parallel ~10-wide, cached weights) and run a 24-drug pilot vs HDAC2 (+Zn), ranked by Boltz-2 P(binder). ~$1.3. Result (recovery test at scale): top 9 are ALL HDAC inhibitors (trichostatin-A/vorinostat/panobinostat/belinostat/scriptaid/mocetinostat/ entinostat/apicidin >=0.99; valproic-acid 0.91), clean drop-off to hydroxyurea 0.78 and non-HDAC drugs to dexamethasone 0.03. Captures the structure-activity gradient (hydroxamates > weak fatty-acid > non-HDAC). Honest false negative: romidepsin (potent HDAC inhibitor) ranks low (0.43) -- it's a depsipeptide PRODRUG co-folding doesn't model. Screen mishandles non-standard chemotypes. Screening pipeline validated; next is the full 300-drug discovery run. max_containers=10 (parallel safe once weights cached). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -120,9 +120,10 @@ def build_boltz_yaml(protein_seq: str, ligand_smiles: str, cofactor_ccds: list[s
|
||||
|
||||
# ------------------------------------------------------------------------------- GPU function
|
||||
|
||||
# max_containers=1: run the inputs serially on one warm container so the weights download ONCE
|
||||
# (no concurrent-download race that corrupts the checkpoint) and are reused for the rest.
|
||||
@app.function(gpu="L4", image=image, volumes={WEIGHTS: weights}, timeout=3600, max_containers=1)
|
||||
# max_containers caps parallel fan-out (cost control). The download race that corrupts the
|
||||
# checkpoint only happens on a COLD volume; once weights are cached+committed (Phase 1 did this),
|
||||
# parallel containers just reload them, so a screen can safely run ~10-wide.
|
||||
@app.function(gpu="L4", image=image, volumes={WEIGHTS: weights}, timeout=3600, max_containers=10)
|
||||
def cofold(label: str, protein_seq: str, ligand_smiles: str, cofactor_ccds: list[str]) -> dict:
|
||||
"""Co-fold one complex (protein + drug + cofactors) on the GPU; return affinity + P(binder).
|
||||
|
||||
@@ -185,6 +186,53 @@ def pose() -> None:
|
||||
print("no structure returned")
|
||||
|
||||
|
||||
@app.local_entrypoint()
|
||||
def screen(limit: int = 0) -> None:
|
||||
"""Phase 2: co-fold the drug set against the validated target (HDAC2 + Zn), rank by P(binder).
|
||||
|
||||
`modal run gpu/modal_app.py::screen --limit 24` (pilot; omit --limit for the full set).
|
||||
Recovery check at scale: the known HDAC inhibitors (related_mechanism) should rank top.
|
||||
Weights are cached, so this fans out ~10-wide.
|
||||
"""
|
||||
import csv
|
||||
import pandas as pd
|
||||
|
||||
target = "HDAC2"
|
||||
pdb, res, _drug, cofactors = TARGETS[target]
|
||||
seq = binding_chain_sequence(pdb, res)
|
||||
|
||||
df = pd.read_csv("data/processed/drug_set_v1.csv")
|
||||
df = df[df["canonical_smiles"].notna() & (df["canonical_smiles"] != "-666")].copy()
|
||||
if limit: # pilot: prioritise mechanism + controls (incl. the HDAC inhibitors) then fill
|
||||
pri = df[df["inclusion_reason"].isin(["ground_truth", "related_mechanism", "negative_control"])]
|
||||
df = pd.concat([pri, df.drop(pri.index)]).head(limit)
|
||||
jobs = [(f"{target}__{r.pert_iname}", seq, r.canonical_smiles, cofactors) for r in df.itertuples()]
|
||||
print(f"screening {len(jobs)} drugs vs {target} (+{cofactors})")
|
||||
|
||||
results = list(cofold.starmap(jobs))
|
||||
by = {j[0].split("__")[1]: r for j, r in zip(jobs, results)}
|
||||
reason = dict(zip(df["pert_iname"], df["inclusion_reason"]))
|
||||
|
||||
rows = [{"drug": d, "P_binder": (r or {}).get("prob_binder"),
|
||||
"affinity": (r or {}).get("affinity"), "inclusion_reason": reason.get(d)}
|
||||
for d, r in by.items()]
|
||||
rows = [x for x in rows if x["P_binder"] is not None]
|
||||
rows.sort(key=lambda x: x["P_binder"], reverse=True)
|
||||
|
||||
out = Path("data/processed/binding"); out.mkdir(parents=True, exist_ok=True)
|
||||
with (out / f"screen_{target}.csv").open("w", newline="") as f:
|
||||
w = csv.DictWriter(f, fieldnames=["rank", "drug", "P_binder", "affinity", "inclusion_reason"])
|
||||
w.writeheader()
|
||||
for i, x in enumerate(rows, 1):
|
||||
w.writerow({"rank": i, **x})
|
||||
print(f"\nscreened {len(rows)} drugs vs {target}; top 15 by P(binder):")
|
||||
for i, x in enumerate(rows[:15], 1):
|
||||
print(f" {i:2d} {x['drug']:20s} P={x['P_binder']:.3f} [{x['inclusion_reason']}]")
|
||||
hdac_like = [i for i, x in enumerate(rows, 1) if x["inclusion_reason"] == "related_mechanism"]
|
||||
if hdac_like:
|
||||
print(f"\nrelated-mechanism (HDAC inhibitors etc.) ranks: {hdac_like[:10]}")
|
||||
|
||||
|
||||
@app.local_entrypoint()
|
||||
def main() -> None:
|
||||
"""Fan out one GPU call per (target, ligand) pair; tabulate affinities; positive-control test."""
|
||||
|
||||
Reference in New Issue
Block a user