51bd90df416a85e85135eb71e29d27fb56d2030e
§12.4 de-biased validation (scripts/dock_validate.py). Redock each co-crystal ligand into its own structure, RMSD vs crystal: - voxelotor->Hb: NA (covalent binder, out of scope §12.7) - mitapivat->PKR: 8.2A (allosteric, cofactors stripped) - vorinostat->HDAC2 (4LXZ, zinc kept): 7.9A -- a CLASSICAL target that should have worked The clean target also failing => systematic pipeline-quality problem, not target choice. Cheap Vina + open-babel prep gives scores but doesn't reproduce known geometry, so affinities aren't trustworthy. Ligand efficiency over-corrects (ranks tiny hydroxyurea best). Fix needs production prep (Meeko/AutoDockTools prepare_receptor + reduce) and an in-place RMSD metric. Consistent with the project theme: the quick version of every method runs but fails honest validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reverso MVP — Sickle Cell Repurposing Pipeline
A minimum viable drug repurposing pipeline for sickle cell disease: build a disease signature from public transcriptomic data, build drug profiles for ~300 small molecules, and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?
See PLAN.md for the full specification, locked decisions, and week-by-week build plan.
Quickstart
# Requires Python >=3.11,<3.13 (see note below)
pip install -e . # or: pip install -e ".[dev]" for test/lint tooling
pytest # run unit tests
Python version note: use Python 3.11–3.13 (
python3.13 -m venv .venv). Python 3.14 is not yet supported by all pipeline dependencies (pydeseq2,cmapPy).
Project layout
data/ raw (downloaded, never edited) / processed / results — gitignored
notebooks/ 01..05, run end-to-end in order
src/ identifiers, disease, drugs, scoring, provenance
tests/ scoring unit tests
docs/ recovery_test_report.md, data_sources.md, known_limitations.md
The deliverable
When complete, the artifact to share is three files:
docs/recovery_test_report.md— the 2-page write-updata/results/ranked_candidates_v1.csv— the ranked drug list- The signature + drug profile files with provenance
Pipeline
| Notebook | Stage | Output |
|---|---|---|
01_setup_identifiers.ipynb |
Pin disease/gene IDs | data/processed/identifiers.json |
02_disease_signature.ipynb |
GEO + differential expression | sickle_cell_signature_v1.json |
03_drug_profiles.ipynb |
ChEMBL + LINCS | drug_profiles_v1.parquet |
04_connectivity_scoring.ipynb |
CMap scoring | ranked_candidates_v1.csv |
05_recovery_test.ipynb |
Validation | docs/recovery_test_report.md |
Every persisted artifact carries a confidence tier (A/B/C) and provenance. See PLAN.md §3.
Description
Languages
Python
93.8%
Jupyter Notebook
6.2%