# Reverso MVP — Sickle Cell Repurposing Pipeline A minimum viable drug repurposing pipeline for **sickle cell disease**: build a disease signature from public transcriptomic data, build drug profiles for ~300 small molecules, and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top? See [`PLAN.md`](PLAN.md) for the full specification, locked decisions, and week-by-week build plan. ## Quickstart ```bash # Requires Python >=3.11,<3.13 (see note below) pip install -e . # or: pip install -e ".[dev]" for test/lint tooling pytest # run unit tests ``` > **Python version note:** use Python 3.11–3.13 (`python3.13 -m venv .venv`). Python 3.14 is > not yet supported by all pipeline dependencies (`pydeseq2`, `cmapPy`). ## Project layout ``` data/ raw (downloaded, never edited) / processed / results — gitignored notebooks/ 01..05, run end-to-end in order src/ identifiers, disease, drugs, scoring, provenance tests/ scoring unit tests docs/ recovery_test_report.md, data_sources.md, known_limitations.md ``` ## The deliverable When complete, the artifact to share is three files: 1. `docs/recovery_test_report.md` — the 2-page write-up 2. `data/results/ranked_candidates_v1.csv` — the ranked drug list 3. The signature + drug profile files with provenance ## Pipeline | Notebook | Stage | Output | |---|---|---| | `01_setup_identifiers.ipynb` | Pin disease/gene IDs | `data/processed/identifiers.json` | | `02_disease_signature.ipynb` | GEO + differential expression | `sickle_cell_signature_v1.json` | | `03_drug_profiles.ipynb` | ChEMBL + LINCS | `drug_profiles_v1.parquet` | | `04_connectivity_scoring.ipynb` | CMap scoring | `ranked_candidates_v1.csv` | | `05_recovery_test.ipynb` | Validation | `docs/recovery_test_report.md` | Every persisted artifact carries a **confidence tier** (A/B/C) and provenance. See `PLAN.md` §3.