Junior B. 71069d3f10 Full 300-drug HDAC2 screen: specificity achieved, BG-1003 top hit
Corrected pipeline ran clean: 1 MSA query, 299/299 screened, 0 failures,
~$5-8 (vs the fragile 2.5hr/$15 version).

Results:
- Scale validation: HDAC inhibitors rank 1-9 (>=0.99); valproic-acid 0.90.
- DECISIVE specificity: best negative control = cetirizine rank 44 (P=0.39);
  all 26 negative controls rank low. Co-folding REJECTS unrelated drugs --
  exactly what connectivity could not do (where norethindrone/ciprofloxacin
  ranked top). The modality-pivot thesis vindicated at screen scale.
- Discovery: BG-1003 (rank 5, P=0.997, random sample) is the standout
  non-obvious binder, above several known HDAC inhibitors; also JW55,
  BRD-K14666757. 11 drugs P>0.9 (8 known inhibitors + 3 non-obvious).

Caveats kept honest: BG-1003 may be a known HDAC inhibitor in the random
sample (validation, not novelty) -- needs identity check; binding != efficacy;
prodrug/macrocycle false negatives. Full ranking in docs/results/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 10:05:06 +02:00

Reverso MVP — Sickle Cell Repurposing Pipeline

A minimum viable drug repurposing pipeline for sickle cell disease: build a disease signature from public transcriptomic data, build drug profiles for ~300 small molecules, and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?

See PLAN.md for the full specification, locked decisions, and week-by-week build plan.

Quickstart

# Requires Python >=3.11,<3.13 (see note below)
pip install -e .            # or: pip install -e ".[dev]" for test/lint tooling
pytest                      # run unit tests

Python version note: use Python 3.113.13 (python3.13 -m venv .venv). Python 3.14 is not yet supported by all pipeline dependencies (pydeseq2, cmapPy).

Project layout

data/         raw (downloaded, never edited) / processed / results — gitignored
notebooks/    01..05, run end-to-end in order
src/          identifiers, disease, drugs, scoring, provenance
tests/        scoring unit tests
docs/         recovery_test_report.md, data_sources.md, known_limitations.md

The deliverable

When complete, the artifact to share is three files:

  1. docs/recovery_test_report.md — the 2-page write-up
  2. data/results/ranked_candidates_v1.csv — the ranked drug list
  3. The signature + drug profile files with provenance

Pipeline

Notebook Stage Output
01_setup_identifiers.ipynb Pin disease/gene IDs data/processed/identifiers.json
02_disease_signature.ipynb GEO + differential expression sickle_cell_signature_v1.json
03_drug_profiles.ipynb ChEMBL + LINCS drug_profiles_v1.parquet
04_connectivity_scoring.ipynb CMap scoring ranked_candidates_v1.csv
05_recovery_test.ipynb Validation docs/recovery_test_report.md

Every persisted artifact carries a confidence tier (A/B/C) and provenance. See PLAN.md §3.

Description
Drug Repurposing Tool
Readme 448 KiB
Languages
Python 93.8%
Jupyter Notebook 6.2%