Files
Reverso/README.md
Junior B. b731478f5d Scaffold Reverso MVP pipeline structure
Set up the project skeleton per PLAN.md §4:
- src/ package: identifiers, disease, drugs, scoring, provenance
  with pydantic schemas and confidence-tier logic (working);
  data-pull/compute functions stubbed per their build week
- 5 starter notebooks (01-05) with PLAN-referenced steps
- tests/test_scoring.py: tier-assignment tests pass; scoring
  reference test xfail until Week 3
- docs/: recovery_test_report, data_sources, known_limitations skeletons
- pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README
- data/ tree preserved via .gitkeep; raw/processed/results gitignored

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 20:20:09 +02:00

49 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reverso MVP — Sickle Cell Repurposing Pipeline
A minimum viable drug repurposing pipeline for **sickle cell disease**: build a disease
signature from public transcriptomic data, build drug profiles for ~300 small molecules,
and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two
known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?
See [`PLAN.md`](PLAN.md) for the full specification, locked decisions, and week-by-week build plan.
## Quickstart
```bash
# Requires Python >=3.11,<3.13 (see note below)
pip install -e . # or: pip install -e ".[dev]" for test/lint tooling
pytest # run unit tests
```
> **Python version note:** use Python 3.113.13 (`python3.13 -m venv .venv`). Python 3.14 is
> not yet supported by all pipeline dependencies (`pydeseq2`, `cmapPy`).
## Project layout
```
data/ raw (downloaded, never edited) / processed / results — gitignored
notebooks/ 01..05, run end-to-end in order
src/ identifiers, disease, drugs, scoring, provenance
tests/ scoring unit tests
docs/ recovery_test_report.md, data_sources.md, known_limitations.md
```
## The deliverable
When complete, the artifact to share is three files:
1. `docs/recovery_test_report.md` — the 2-page write-up
2. `data/results/ranked_candidates_v1.csv` — the ranked drug list
3. The signature + drug profile files with provenance
## Pipeline
| Notebook | Stage | Output |
|---|---|---|
| `01_setup_identifiers.ipynb` | Pin disease/gene IDs | `data/processed/identifiers.json` |
| `02_disease_signature.ipynb` | GEO + differential expression | `sickle_cell_signature_v1.json` |
| `03_drug_profiles.ipynb` | ChEMBL + LINCS | `drug_profiles_v1.parquet` |
| `04_connectivity_scoring.ipynb` | CMap scoring | `ranked_candidates_v1.csv` |
| `05_recovery_test.ipynb` | Validation | `docs/recovery_test_report.md` |
Every persisted artifact carries a **confidence tier** (A/B/C) and provenance. See `PLAN.md` §3.