Scaffold Reverso MVP pipeline structure
Set up the project skeleton per PLAN.md §4: - src/ package: identifiers, disease, drugs, scoring, provenance with pydantic schemas and confidence-tier logic (working); data-pull/compute functions stubbed per their build week - 5 starter notebooks (01-05) with PLAN-referenced steps - tests/test_scoring.py: tier-assignment tests pass; scoring reference test xfail until Week 3 - docs/: recovery_test_report, data_sources, known_limitations skeletons - pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README - data/ tree preserved via .gitkeep; raw/processed/results gitignored Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
49
README.md
49
README.md
@@ -1 +1,48 @@
|
||||
# Reverso
|
||||
# Reverso MVP — Sickle Cell Repurposing Pipeline
|
||||
|
||||
A minimum viable drug repurposing pipeline for **sickle cell disease**: build a disease
|
||||
signature from public transcriptomic data, build drug profiles for ~300 small molecules,
|
||||
and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two
|
||||
known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?
|
||||
|
||||
See [`PLAN.md`](PLAN.md) for the full specification, locked decisions, and week-by-week build plan.
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
# Requires Python >=3.11,<3.13 (see note below)
|
||||
pip install -e . # or: pip install -e ".[dev]" for test/lint tooling
|
||||
pytest # run unit tests
|
||||
```
|
||||
|
||||
> **Python version note:** use Python 3.11–3.13 (`python3.13 -m venv .venv`). Python 3.14 is
|
||||
> not yet supported by all pipeline dependencies (`pydeseq2`, `cmapPy`).
|
||||
|
||||
## Project layout
|
||||
|
||||
```
|
||||
data/ raw (downloaded, never edited) / processed / results — gitignored
|
||||
notebooks/ 01..05, run end-to-end in order
|
||||
src/ identifiers, disease, drugs, scoring, provenance
|
||||
tests/ scoring unit tests
|
||||
docs/ recovery_test_report.md, data_sources.md, known_limitations.md
|
||||
```
|
||||
|
||||
## The deliverable
|
||||
|
||||
When complete, the artifact to share is three files:
|
||||
1. `docs/recovery_test_report.md` — the 2-page write-up
|
||||
2. `data/results/ranked_candidates_v1.csv` — the ranked drug list
|
||||
3. The signature + drug profile files with provenance
|
||||
|
||||
## Pipeline
|
||||
|
||||
| Notebook | Stage | Output |
|
||||
|---|---|---|
|
||||
| `01_setup_identifiers.ipynb` | Pin disease/gene IDs | `data/processed/identifiers.json` |
|
||||
| `02_disease_signature.ipynb` | GEO + differential expression | `sickle_cell_signature_v1.json` |
|
||||
| `03_drug_profiles.ipynb` | ChEMBL + LINCS | `drug_profiles_v1.parquet` |
|
||||
| `04_connectivity_scoring.ipynb` | CMap scoring | `ranked_candidates_v1.csv` |
|
||||
| `05_recovery_test.ipynb` | Validation | `docs/recovery_test_report.md` |
|
||||
|
||||
Every persisted artifact carries a **confidence tier** (A/B/C) and provenance. See `PLAN.md` §3.
|
||||
|
||||
Reference in New Issue
Block a user