Redocking-RMSD validation fails 3/3: pipeline-quality issue

§12.4 de-biased validation (scripts/dock_validate.py). Redock each co-crystal ligand into its own structure, RMSD vs crystal: - voxelotor->Hb: NA (covalent binder, out of scope §12.7) - mitapivat->PKR: 8.2A (allosteric, cofactors stripped) - vorinostat->HDAC2 (4LXZ, zinc kept): 7.9A -- a CLASSICAL target that should have worked The clean target also failing => systematic pipeline-quality problem, not target choice. Cheap Vina + open-babel prep gives scores but doesn't reproduce known geometry, so affinities aren't trustworthy. Ligand efficiency over-corrects (ranks tiny hydroxyurea best). Fix needs production prep (Meeko/AutoDockTools prepare_receptor + reduce) and an in-place RMSD metric. Consistent with the project theme: the quick version of every method runs but fails honest validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 07:28:47 +02:00
parent 75f5383961
commit 51bd90df41
2 changed files with 127 additions and 8 deletions
--- a/docs/structure_binding_notes.md
+++ b/docs/structure_binding_notes.md
@@ -51,15 +51,41 @@ problem** — raw scores carry a systematic bias (size here, broadness there) th
 signal. voxelotor *does* dock to Hb (−8.1, a real score); the cross-target test just isn't the
 right validation.

+## Step 2 — redocking-RMSD validation FAILS across the board (2026-06-24)
+
+Redocked each co-crystal ligand into its own structure (`scripts/dock_validate.py`); RMSD vs
+crystal pose via obrms:
+
+| redock | RMSD | note |
+|---|---|---|
+| voxelotor → Hb (5E83) | NA | covalent binder (Schiff base, αVal1) — out of scope §12.7 |
+| mitapivat → PKR (8XFD) | 8.2 Å | allosteric, cofactor (FBP/Mg) stripped |
+| **vorinostat → HDAC2 (4LXZ, Zn kept)** | **7.9 Å** | classical non-covalent target — should have worked |
+
+**The clean target also failing means this is a systematic PIPELINE-QUALITY problem, not target
+choice.** The cheap Vina + open-babel setup produces scores but does not reproduce known binding
+geometry, so its affinities are not yet trustworthy. Ligand efficiency (affinity / heavy atoms)
+also doesn't fix it — it over-corrects, ranking tiny hydroxyurea (−0.78) "best".
+
+Likely causes (in priority order):
+1. **Low-quality receptor prep** — open-babel `-xr` is not production docking prep. Need
+   AutoDockTools `prepare_receptor` or **Meeko** + `reduce`/pdb2pqr for protonation, charges, and
+   proper AutoDock atom typing.
+2. **Ligand prep** — should use Meeko (correct rotatable bonds / typing), not bare obabel `--gen3d`.
+3. **RMSD metric** — obrms superimposes before RMSD; redocking validation wants symmetry-corrected
+   RMSD **in place** (receptor frame). Worth confirming with an in-place metric.
+
+**Honest takeaway:** consistent with the whole project — the *quick* version of each method runs
+but doesn't survive honest validation. Credible structure-based docking needs production prep
+tooling (Meeko/ADFR), which is the real next investment for this track.
+
 ## Next steps
- [ ] **Redocking-RMSD validation** (the gold-standard positive control): redock the crystal ligand
-  5L7/WV2 into its own structure, compute pose RMSD vs crystal. <2 Å = geometry validated. This
-  tests pose accuracy, which size bias doesn't corrupt.
- [ ] **Ligand-efficiency normalization** (affinity / heavy-atom count) to de-bias the size effect,
-  the docking counterpart of the connectivity calibration work.
- [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval reach.
- [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock via PyTorch-MPS — note 24 GB ceiling) vs the
-  docking baseline; and §12.9 generative beacon.
+- [ ] Install **Meeko** (+ reduce / pdb2pqr) and redo receptor+ligand prep; re-run redocking RMSD.
+- [ ] Fix the RMSD metric (in-place, symmetry-corrected) to rule out a measurement artifact.
+- [ ] Only once redocking validates (<2 Å) are affinity scores trustworthy — then cross-dock /
+  screen the library and revisit ligand-efficiency / pose-based scoring.
+- [ ] Later: AF3-class co-folding (Boltz-2/DiffDock via PyTorch-MPS — 24 GB ceiling) and the §12.9
+  generative beacon.

 > **Hardware note:** this machine is **24 GB** unified memory (not the 96 GB PLAN §2 assumed),
 > which caps local AF3-class model inference. Classical docking (above) is unaffected.