PLAN §12.9: leave door open for generative-guided retrieval

Reframe de novo generation into the repurposing frame per the founder's
idea: use a pocket-conditioned generative model (TargetDiff/DiffSBDD/
Pocket2Mol) to propose an idealised binder as a SEARCH BEACON, then
retrieve the nearest EXISTING drugs by chemical similarity (Tanimoto/
embedding) as repurposing candidates — the generated molecule is never
synthesised.

Caveats kept honest: generated molecules used only as beacons (often
synthetically invalid); similarity != activity, so retrieved neighbours
still must be docked + pass the binding recovery test; open question
whether it beats brute-force docking the existing library. Explore only
after the §12.3-12.4 docking baseline is validated. §12.7 exclusion
reworded to point here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-23 23:43:25 +02:00
parent 7449dbeefb
commit 6c2c71d73d

39
PLAN.md
View File

@@ -521,8 +521,10 @@ confidence-tiered data layer.
### 12.7 Explicitly NOT in this track
Free energy perturbation / MD-based affinity; covalent docking; de novo molecule *generation*
(that's design, not repurposing); BCL11A or any non-pocket target; biologics; combination binding.
Free energy perturbation / MD-based affinity; covalent docking; **de novo generation of molecules
as final candidates to synthesise** (design, not repurposing but see §12.9 for the
generate-then-retrieve hybrid, which *is* repurposing); BCL11A or any non-pocket target;
biologics; combination binding.
### 12.8 Open decisions before committing
@@ -531,3 +533,36 @@ Free energy perturbation / MD-based affinity; covalent docking; de novo molecule
- **Compute:** secure a GPU environment (the "all local" §2 assumption breaks here).
- **Scope of v1:** the 7-target shortlist above, or start with just Hb + PKR (the two with the
cleanest positive controls) to de-risk the harness before scaling targets.
### 12.9 Door left open — generative-guided retrieval (generate → match existing)
A legitimate way to bring generative models *into the repurposing frame* (vs the design frame
excluded in §12.7): don't generate molecules to synthesise generate them as a **search beacon**.
**The idea.** Use a pocket-conditioned generative model (target-conditioned diffusion e.g.
TargetDiff, DiffSBDD, Pocket2Mol) to propose idealised binders for a sickle target. Then retrieve
the **nearest existing drugs** to those proposals by chemical similarity (Tanimoto over Morgan
fingerprints, or a learned molecular embedding). The retrieved neighbours real, already-approved
or clinical molecules are the repurposing candidates. The generated molecule is never made; it
only *defines a region of chemical space* worth searching. This is the user-proposed framing and
it is sound: "generate the ideal, then find what we already have nearby."
**Why it could add value.** It can point at scaffolds / regions a known-binder-seeded or
brute-force docking sweep would miss, and it makes the target's binding requirements explicit as
geometry rather than as a single reference ligand.
**Honest caveats (why it's a *door*, not a commitment).**
1. **Generated molecules are often synthetically unrealistic / invalid** which is exactly why
they must be used only as beacons, never as candidates.
2. **Similarity ≠ activity.** Activity cliffs mean a near-neighbour of a good binder can be inert.
So retrieved neighbours do **not** bypass validation they must still be docked/scored 12.3)
and clear the binding recovery test 12.4). The generative step *proposes*; it does not *prove*.
3. **Marginal-value question.** Directly docking the whole existing library 12.3) already covers
chemical space. Whether generate-then-retrieve beats that by efficiency or by surfacing
non-obvious scaffolds is an open empirical question and needs a head-to-head before it earns
real investment.
4. **Only as good as the pocket conditioning** of the generator, and the chemistry of the target.
**Status:** explore only *after* the §12.312.4 docking harness works and is validated on the
known binders same discipline as everywhere else: prove the baseline, then test whether the
fancier method adds anything.