Go to file

Junior B. 3417f85eb1 v1.1: full gene space + specificity z-score; hydroxyurea recovers

Post-hoc improvement after the pre-registered v1 recovery test failed.
Two changes, diagnosing v1's failure:
- score on the full 12,328-gene LINCS space (week2_lincs_extract.py),
  lifting signature overlap from 12% to 85% (brings erythroid markers in)
- src/scoring.py: KS connectivity + per-drug specificity z-score
  (spec_z = SDs below a 1,000 random-query null). Primary ranking is
  now spec_z. (Textbook tau saturated at +/-100 for a coherent query —
  documented; needs a reference-signature library, a v2 item.)
- week3_scoring.py: spec_z primary + WTCS reference + prior-blended
- tests: tau/spec_z calibration test; 19 passing
- scripts/exp_genespace.py: the BING vs all-12,328 comparison

Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%),
confirming the v1 failure was the landmark bottleneck not the algorithm.
Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite),
and negative controls (norethindrone, ciprofloxacin) rank top-3 —
connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory,
not a confirmatory test; reported as such in recovery_test_report.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-23 22:57:30 +02:00

data

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

docs

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

notebooks

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

scripts

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

src

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

tests

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

.gitignore

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

PLAN.md

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

pyproject.toml

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

README.md

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

README.md

Reverso MVP — Sickle Cell Repurposing Pipeline

A minimum viable drug repurposing pipeline for sickle cell disease: build a disease signature from public transcriptomic data, build drug profiles for ~300 small molecules, and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?

See PLAN.md for the full specification, locked decisions, and week-by-week build plan.

Quickstart

# Requires Python >=3.11,<3.13 (see note below)
pip install -e .            # or: pip install -e ".[dev]" for test/lint tooling
pytest                      # run unit tests

Python version note: use Python 3.11–3.13 (python3.13 -m venv .venv). Python 3.14 is not yet supported by all pipeline dependencies (pydeseq2, cmapPy).

Project layout

data/         raw (downloaded, never edited) / processed / results — gitignored
notebooks/    01..05, run end-to-end in order
src/          identifiers, disease, drugs, scoring, provenance
tests/        scoring unit tests
docs/         recovery_test_report.md, data_sources.md, known_limitations.md

The deliverable

When complete, the artifact to share is three files:

docs/recovery_test_report.md — the 2-page write-up
data/results/ranked_candidates_v1.csv — the ranked drug list
The signature + drug profile files with provenance

Pipeline

Notebook	Stage	Output
`01_setup_identifiers.ipynb`	Pin disease/gene IDs	`data/processed/identifiers.json`
`02_disease_signature.ipynb`	GEO + differential expression	`sickle_cell_signature_v1.json`
`03_drug_profiles.ipynb`	ChEMBL + LINCS	`drug_profiles_v1.parquet`
`04_connectivity_scoring.ipynb`	CMap scoring	`ranked_candidates_v1.csv`
`05_recovery_test.ipynb`	Validation	`docs/recovery_test_report.md`

Every persisted artifact carries a confidence tier (A/B/C) and provenance. See PLAN.md §3.

README.md Unescape Escape

Reverso MVP — Sickle Cell Repurposing Pipeline

Quickstart

Project layout

The deliverable

Pipeline

README.md