Scope Phase 2 structure-based binding track into PLAN (§12)
Add a scoped (not committed) follow-on track pivoting modality from expression-connectivity to structure-based drug-target binding, motivated by the empirical finding that the expression modality is signal-dead for this task (relational-only supervised AUC = 0.49, chance). §12 covers: the evidence for the pivot, a sickle-specific druggable target shortlist with known-binder positive controls (Hb/voxelotor, PKR/mitapivat, DNMT1/decitabine, LSD1, HDAC, EHMT2, PDE9), method (classical docking baseline -> AF3-class co-folding: Boltz-2/Chai-1/DiffDock), a pre-registered binding recovery test, integration with the expression layer as the real prize, honest pitfalls (binding != efficacy, BCL11A untractable, GPU breaks the all-local assumption), and open decisions before committing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
105
PLAN.md
105
PLAN.md
@@ -426,3 +426,108 @@ This MVP exists in a broader strategic context that was built through ~7 expert
|
||||
- **Synthetic trial arms and drug repurposing share data infrastructure.** This is a platform play, not a single product.
|
||||
|
||||
The MVP's job is to produce one credible result. Everything else follows from that.
|
||||
|
||||
---
|
||||
|
||||
## 12. Phase 2 track — Structure-based binding (scoped 2026-06-23)
|
||||
|
||||
> **Status: scoped, not committed.** This is a follow-on track proposed *after* the MVP and its
|
||||
> follow-up experiments. It does not change the MVP's locked decisions (§2); it responds to what
|
||||
> those experiments empirically showed. Read §9–11 and the experiment commits first.
|
||||
|
||||
### 12.1 Why pivot modality (the evidence, not a hunch)
|
||||
|
||||
The expression-connectivity approach was built, validated, and pushed hard (gene-space
|
||||
expansion, cell-composition deconvolution, reference-library tau, supervised learning). The
|
||||
empirical verdict:
|
||||
|
||||
- Connectivity **recovers hydroxyurea** (top ~6–8%) but **cannot achieve specificity** —
|
||||
unrelated drugs (norethindrone, ciprofloxacin) score as strong reversers. Unfixed by four
|
||||
independent methods.
|
||||
- A supervised model on indication labels hit **0.925 CV AUC** — but it was a **degree-bias
|
||||
mirage**: it learned drug popularity, not disease matching (it ranked hydroxyurea *231/300*).
|
||||
- The decisive test: with drug-popularity features removed, the model trained on the actual
|
||||
drug↔disease connectivity scored **AUC 0.491 — pure chance**. **The expression-connectivity
|
||||
modality contains essentially no disease-specific therapeutic signal for this task.**
|
||||
|
||||
This is a *signal* problem, not a *model* problem — no amount of model sophistication (diffusion,
|
||||
GNNs, etc.) extracts signal that isn't in the data. The response is to **change modality** to one
|
||||
with a strong, physical, drug-specific signal: **does a molecule bind a sickle-relevant target?**
|
||||
A drug that binds HbS is mechanistically specific by construction — the opposite of a coincidental
|
||||
expression reverser. Structure is also where the generative-AI frontier (AlphaFold3, which is
|
||||
itself a diffusion model) actually has traction, because structure has physical ground truth.
|
||||
|
||||
### 12.2 Targets (sickle-specific, druggable, structurally characterised)
|
||||
|
||||
Small molecules only (§2). Curated shortlist with public structures and, crucially, **known
|
||||
small-molecule binders to serve as positive controls**:
|
||||
|
||||
| Target | Mechanism in sickle | Known binder (positive control) |
|
||||
|---|---|---|
|
||||
| Hemoglobin (HBB/HBA tetramer, HbS) | Anti-polymerisation; R-state stabiliser | **voxelotor** (binds α-Val1) |
|
||||
| PKR (PKLR, red-cell pyruvate kinase) | Activator → ↓2,3-BPG → ↑O2 affinity | **mitapivat**, etavopivat |
|
||||
| DNMT1 | HbF induction (de-repress γ-globin) | **decitabine**, azacitidine |
|
||||
| LSD1 / KDM1A | HbF induction | tranylcypromine analogues |
|
||||
| HDAC1/2 | HbF induction | vorinostat, panobinostat |
|
||||
| EHMT2 (G9a) | HbF induction | UNC0642 (tool) |
|
||||
| PDE9 | ↑cGMP, anti-adhesion | PF-04447943 (sickle trial) |
|
||||
|
||||
Hard/excluded for v1: **BCL11A** (transcription factor, no classic pocket — the γ-globin master
|
||||
repressor but not small-molecule-tractable yet) and any gene-therapy / biologic mechanism.
|
||||
|
||||
### 12.3 Method (baseline → generative co-folding)
|
||||
|
||||
1. **Prepare structures.** Pull target structures from the PDB; AF3/Boltz-predict any missing.
|
||||
2. **Prepare ligands.** Reuse the existing ~300-drug set (we already have canonical SMILES from
|
||||
ChEMBL); expandable to the full ChEMBL/LINCS catalogue.
|
||||
3. **Dock + score**, in increasing sophistication:
|
||||
- **Baseline:** classical docking (AutoDock Vina / smina) — fast, CPU, well-understood.
|
||||
- **Generative co-folding:** an open AlphaFold3-class model — **Boltz-2** (predicts the
|
||||
protein–ligand complex *and* a binding-affinity estimate, MIT-licensed), **Chai-1**, or
|
||||
**DiffDock** (a diffusion model for docking — the legitimate home for the "diffusion"
|
||||
instinct). These predict the bound pose directly and tend to beat classical docking.
|
||||
- Report both; the baseline keeps us honest about whether the ML model adds anything.
|
||||
|
||||
### 12.4 Validation (a real recovery test, like §6 Week 4)
|
||||
|
||||
Pre-register before scoring: **the known structure-based sickle drugs must rank as top binders to
|
||||
their targets** — voxelotor→hemoglobin, mitapivat→PKR, decitabine→DNMT1. Negative controls
|
||||
(unrelated drugs) must *not* bind these pockets. This is a cleaner recovery test than the
|
||||
expression one, because binding is mechanistically specific — it should not have the
|
||||
coincidental-reverser problem that sank the connectivity approach.
|
||||
|
||||
### 12.5 The real prize — integrate, don't replace
|
||||
|
||||
The long-term value is **both modalities together**: a candidate that *reverses the disease
|
||||
signature* (expression) **and** *binds a sickle-relevant target* (structure) is far more credible
|
||||
than either alone. Structure supplies the specificity the expression layer lacks; expression
|
||||
supplies the systems-level, target-agnostic view structure lacks. The platform thesis (§11) —
|
||||
two databases + a matching engine — extends naturally to a third (structures) feeding the same
|
||||
confidence-tiered data layer.
|
||||
|
||||
### 12.6 Honest pitfalls (do not ignore)
|
||||
|
||||
1. **Binding ≠ efficacy.** A molecule can bind and do nothing therapeutic. Structure-based hits
|
||||
are still hypotheses (cf. §9.7).
|
||||
2. **Only covers the enzyme/pocket subset.** Sickle's biggest lever (γ-globin reactivation via
|
||||
BCL11A) is largely transcriptional and not small-molecule-tractable — structure-based screening
|
||||
is blind to it. Be explicit about coverage.
|
||||
3. **Docking/affinity accuracy is limited.** Pose prediction is decent; absolute affinity is hard.
|
||||
Validate on known binders before trusting novel scores.
|
||||
4. **Compute.** AF3-class models are GPU-heavy; the local Mac Studio (§2) may not suffice — this
|
||||
track likely needs a GPU box or cloud, the first MVP dependency to break the "all local" rule.
|
||||
5. **Moat.** Structures and tools are public; the proprietary value is the curated target list,
|
||||
the integration with the expression layer, and provenance/tiering — not the docker.
|
||||
|
||||
### 12.7 Explicitly NOT in this track
|
||||
|
||||
Free energy perturbation / MD-based affinity; covalent docking; de novo molecule *generation*
|
||||
(that's design, not repurposing); BCL11A or any non-pocket target; biologics; combination binding.
|
||||
|
||||
### 12.8 Open decisions before committing
|
||||
|
||||
- **Tooling:** classical-docking baseline first, or straight to Boltz-2/DiffDock? (Recommend:
|
||||
baseline first, for an honest reference — the lesson of the whole expression arc.)
|
||||
- **Compute:** secure a GPU environment (the "all local" §2 assumption breaks here).
|
||||
- **Scope of v1:** the 7-target shortlist above, or start with just Hb + PKR (the two with the
|
||||
cleanest positive controls) to de-risk the harness before scaling targets.
|
||||
|
||||
Reference in New Issue
Block a user