Docking baseline: toolchain solved, raw affinity is size-biased
§12.3-12.4 first binding result on ARM Mac. - Toolchain SOLVED: AutoDock Vina 1.2.5 mac binary (Rosetta) + open-babel (brew). No conda, no MLX. dock_positive_controls.py runs end-to-end. - Cross-dock known binders + negatives into Hb (5E83) and PKR (8XFD), box centered on co-crystal ligands (5L7=voxelotor, WV2=mitapivat). Finding: raw Vina affinity ranks almost perfectly by MOLECULAR SIZE (mitapivat > voxelotor > decitabine/caffeine > hydroxyurea) in both pockets — mitapivat wins even on hemoglobin it doesn't target. Raw score can't distinguish target-specific binding: the docking analog of the connectivity specificity problem. Next: redocking-RMSD validation + ligand-efficiency normalization. Note: machine is 24GB (not 96GB per PLAN §2), capping local AF3-class inference. tools/ gitignored (vina binary). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -27,8 +27,39 @@ RDKit Tanimoto of our 300 drugs vs known sickle binders.
|
||||
binding directly, no look-alike required. That is the gating next step, and it needs the
|
||||
toolchain solved.
|
||||
|
||||
## Step 1 — docking baseline (2026-06-24)
|
||||
|
||||
**Toolchain SOLVED on ARM Mac:** AutoDock Vina 1.2.5 mac binary (`tools/vina`, runs under Rosetta)
|
||||
+ open-babel (brew) for prep. Docking runs end-to-end (`scripts/dock_positive_controls.py`).
|
||||
Co-crystal ligands identified: 5L7 = voxelotor (5E83), WV2 = mitapivat (8XFD).
|
||||
|
||||
**Positive-control cross-docking — inconclusive, and instructively so.** Affinities (kcal/mol):
|
||||
|
||||
| ligand | hemoglobin | PKR |
|
||||
|---|---|---|
|
||||
| voxelotor | −8.1 | −9.3 |
|
||||
| mitapivat | −10.0 | −11.2 |
|
||||
| decitabine | −6.6 | −7.0 |
|
||||
| hydroxyurea | −3.9 | −3.6 |
|
||||
| caffeine | −6.1 | −6.4 |
|
||||
|
||||
The scores rank almost perfectly by **molecular size** (mitapivat > voxelotor > decitabine/caffeine
|
||||
> hydroxyurea) in *both* pockets — mitapivat wins even on hemoglobin, which it doesn't target. So
|
||||
raw Vina affinity is confounded by ligand size and per-pocket stickiness; it cannot yet
|
||||
distinguish target-specific binding. This is the **docking analog of the connectivity specificity
|
||||
problem** — raw scores carry a systematic bias (size here, broadness there) that masquerades as
|
||||
signal. voxelotor *does* dock to Hb (−8.1, a real score); the cross-target test just isn't the
|
||||
right validation.
|
||||
|
||||
## Next steps
|
||||
- [ ] Resolve the docking toolchain (recommend: micromamba + smina/vina, CPU, no GPU needed for baseline).
|
||||
- [ ] Dock the known binders (voxelotor→5E83, mitapivat→8XFD) as positive controls (§12.4 recovery test).
|
||||
- [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval to have reach.
|
||||
- [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock) vs the docking baseline; and §12.9 generative beacon.
|
||||
- [ ] **Redocking-RMSD validation** (the gold-standard positive control): redock the crystal ligand
|
||||
5L7/WV2 into its own structure, compute pose RMSD vs crystal. <2 Å = geometry validated. This
|
||||
tests pose accuracy, which size bias doesn't corrupt.
|
||||
- [ ] **Ligand-efficiency normalization** (affinity / heavy-atom count) to de-bias the size effect,
|
||||
the docking counterpart of the connectivity calibration work.
|
||||
- [ ] Expand the ligand library (full ChEMBL/LINCS) for retrieval reach.
|
||||
- [ ] Only then: AF3-class co-folding (Boltz-2/DiffDock via PyTorch-MPS — note 24 GB ceiling) vs the
|
||||
docking baseline; and §12.9 generative beacon.
|
||||
|
||||
> **Hardware note:** this machine is **24 GB** unified memory (not the 96 GB PLAN §2 assumed),
|
||||
> which caps local AF3-class model inference. Classical docking (above) is unaffected.
|
||||
|
||||
Reference in New Issue
Block a user