diff --git a/PLAN.md b/PLAN.md index b202cea..64e2c4f 100644 --- a/PLAN.md +++ b/PLAN.md @@ -521,8 +521,10 @@ confidence-tiered data layer. ### 12.7 Explicitly NOT in this track -Free energy perturbation / MD-based affinity; covalent docking; de novo molecule *generation* -(that's design, not repurposing); BCL11A or any non-pocket target; biologics; combination binding. +Free energy perturbation / MD-based affinity; covalent docking; **de novo generation of molecules +as final candidates to synthesise** (design, not repurposing — but see §12.9 for the +generate-then-retrieve hybrid, which *is* repurposing); BCL11A or any non-pocket target; +biologics; combination binding. ### 12.8 Open decisions before committing @@ -531,3 +533,36 @@ Free energy perturbation / MD-based affinity; covalent docking; de novo molecule - **Compute:** secure a GPU environment (the "all local" §2 assumption breaks here). - **Scope of v1:** the 7-target shortlist above, or start with just Hb + PKR (the two with the cleanest positive controls) to de-risk the harness before scaling targets. + +### 12.9 Door left open — generative-guided retrieval (generate → match existing) + +A legitimate way to bring generative models *into the repurposing frame* (vs the design frame +excluded in §12.7): don't generate molecules to synthesise — generate them as a **search beacon**. + +**The idea.** Use a pocket-conditioned generative model (target-conditioned diffusion — e.g. +TargetDiff, DiffSBDD, Pocket2Mol) to propose idealised binders for a sickle target. Then retrieve +the **nearest existing drugs** to those proposals by chemical similarity (Tanimoto over Morgan +fingerprints, or a learned molecular embedding). The retrieved neighbours — real, already-approved +or clinical molecules — are the repurposing candidates. The generated molecule is never made; it +only *defines a region of chemical space* worth searching. This is the user-proposed framing and +it is sound: "generate the ideal, then find what we already have nearby." + +**Why it could add value.** It can point at scaffolds / regions a known-binder-seeded or +brute-force docking sweep would miss, and it makes the target's binding requirements explicit as +geometry rather than as a single reference ligand. + +**Honest caveats (why it's a *door*, not a commitment).** +1. **Generated molecules are often synthetically unrealistic / invalid** — which is exactly why + they must be used only as beacons, never as candidates. +2. **Similarity ≠ activity.** Activity cliffs mean a near-neighbour of a good binder can be inert. + So retrieved neighbours do **not** bypass validation — they must still be docked/scored (§12.3) + and clear the binding recovery test (§12.4). The generative step *proposes*; it does not *prove*. +3. **Marginal-value question.** Directly docking the whole existing library (§12.3) already covers + chemical space. Whether generate-then-retrieve beats that — by efficiency or by surfacing + non-obvious scaffolds — is an open empirical question and needs a head-to-head before it earns + real investment. +4. **Only as good as the pocket conditioning** of the generator, and the chemistry of the target. + +**Status:** explore only *after* the §12.3–12.4 docking harness works and is validated on the +known binders — same discipline as everywhere else: prove the baseline, then test whether the +fancier method adds anything.