Redocking-RMSD validation fails 3/3: pipeline-quality issue

§12.4 de-biased validation (scripts/dock_validate.py).
Redock each co-crystal ligand into its own structure, RMSD vs crystal:
- voxelotor->Hb: NA (covalent binder, out of scope §12.7)
- mitapivat->PKR: 8.2A (allosteric, cofactors stripped)
- vorinostat->HDAC2 (4LXZ, zinc kept): 7.9A -- a CLASSICAL target that
  should have worked

The clean target also failing => systematic pipeline-quality problem,
not target choice. Cheap Vina + open-babel prep gives scores but doesn't
reproduce known geometry, so affinities aren't trustworthy. Ligand
efficiency over-corrects (ranks tiny hydroxyurea best). Fix needs
production prep (Meeko/AutoDockTools prepare_receptor + reduce) and an
in-place RMSD metric. Consistent with the project theme: the quick
version of every method runs but fails honest validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-24 07:28:47 +02:00
parent 75f5383961
commit 51bd90df41
2 changed files with 127 additions and 8 deletions

View File

@@ -51,15 +51,41 @@ problem** — raw scores carry a systematic bias (size here, broadness there) th
signal. voxelotor *does* dock to Hb (8.1, a real score); the cross-target test just isn't the
right validation.
## Step 2 — redocking-RMSD validation FAILS across the board (2026-06-24)
Redocked each co-crystal ligand into its own structure (`scripts/dock_validate.py`); RMSD vs
crystal pose via obrms:
| redock | RMSD | note |
|---|---|---|
| voxelotor → Hb (5E83) | NA | covalent binder (Schiff base, αVal1) — out of scope §12.7 |
| mitapivat → PKR (8XFD) | 8.2 Å | allosteric, cofactor (FBP/Mg) stripped |
| **vorinostat → HDAC2 (4LXZ, Zn kept)** | **7.9 Å** | classical non-covalent target — should have worked |
**The clean target also failing means this is a systematic PIPELINE-QUALITY problem, not target
choice.** The cheap Vina + open-babel setup produces scores but does not reproduce known binding
geometry, so its affinities are not yet trustworthy. Ligand efficiency (affinity / heavy atoms)
also doesn't fix it — it over-corrects, ranking tiny hydroxyurea (0.78) "best".
Likely causes (in priority order):
1. **Low-quality receptor prep** — open-babel `-xr` is not production docking prep. Need
AutoDockTools `prepare_receptor` or **Meeko** + `reduce`/pdb2pqr for protonation, charges, and
proper AutoDock atom typing.
2. **Ligand prep** — should use Meeko (correct rotatable bonds / typing), not bare obabel `--gen3d`.
3. **RMSD metric** — obrms superimposes before RMSD; redocking validation wants symmetry-corrected
RMSD **in place** (receptor frame). Worth confirming with an in-place metric.
**Honest takeaway:** consistent with the whole project — the *quick* version of each method runs
but doesn't survive honest validation. Credible structure-based docking needs production prep
tooling (Meeko/ADFR), which is the real next investment for this track.
## Next steps
- [ ] **Redocking-RMSD validation** (the gold-standard positive control): redock the crystal ligand
5L7/WV2 into its own structure, compute pose RMSD vs crystal. <2 Å = geometry validated. This
tests pose accuracy, which size bias doesn't corrupt.
- [ ] **Ligand-efficiency normalization** (affinity / heavy-atom count) to de-bias the size effect,
the docking counterpart of the connectivity calibration work.
- [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval reach.
- [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock via PyTorch-MPS note 24 GB ceiling) vs the
docking baseline; and §12.9 generative beacon.
- [ ] Install **Meeko** (+ reduce / pdb2pqr) and redo receptor+ligand prep; re-run redocking RMSD.
- [ ] Fix the RMSD metric (in-place, symmetry-corrected) to rule out a measurement artifact.
- [ ] Only once redocking validates (<2 Å) are affinity scores trustworthy then cross-dock /
screen the library and revisit ligand-efficiency / pose-based scoring.
- [ ] Later: AF3-class co-folding (Boltz-2/DiffDock via PyTorch-MPS 24 GB ceiling) and the §12.9
generative beacon.
> **Hardware note:** this machine is **24 GB** unified memory (not the 96 GB PLAN §2 assumed),
> which caps local AF3-class model inference. Classical docking (above) is unaffected.