Files
Reverso/docs/recovery_test_report.md
Junior B. b731478f5d Scaffold Reverso MVP pipeline structure
Set up the project skeleton per PLAN.md §4:
- src/ package: identifiers, disease, drugs, scoring, provenance
  with pydantic schemas and confidence-tier logic (working);
  data-pull/compute functions stubbed per their build week
- 5 starter notebooks (01-05) with PLAN-referenced steps
- tests/test_scoring.py: tier-assignment tests pass; scoring
  reference test xfail until Week 3
- docs/: recovery_test_report, data_sources, known_limitations skeletons
- pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README
- data/ tree preserved via .gitkeep; raw/processed/results gitignored

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 20:20:09 +02:00

69 lines
2.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Sickle Cell Repurposing — Recovery Test Report
> **Status: DRAFT SCAFFOLD — not yet run.** Filled in during Week 4 from
> `notebooks/05_recovery_test.ipynb`. Target length: ~2 pages, readable by a sceptical
> pharma scientist in 5 minutes.
## Pre-registered success criteria
> ⚠️ **Commit this section to git _before_ running the recovery test** (PLAN.md §8, §10).
The MVP passes if:
- Hydroxyurea ranks in the **top 10%** (top 30 of 300), **AND**
- L-glutamine ranks in the **top 25%** (top 75) **OR** is documented as unscorable due to a
missing LINCS signature, **AND**
- At least **4 of 5** negative-control drugs rank in the **bottom half**.
_Pre-registered on: TBD (date of commit)_
---
## Section 1 — Methodology
_56 sentences: what was built, the GEO dataset used, the drug-set composition, and the
scoring method (CMap connectivity, Lamb 2006 / Subramanian 2017)._
## Section 2 — Recovery test result
| Drug | Rank | Percentile | Pass? |
|---|---|---|---|
| Hydroxyurea | TBD | TBD | TBD |
| L-glutamine | TBD | TBD | TBD |
Negative controls (expected: bottom half):
| Control drug | Rank | Bottom half? |
|---|---|---|
| TBD | TBD | TBD |
**Overall: PASS / FAIL against pre-registered criteria — TBD**
## Section 3 — Top 10 candidates
| Rank | Drug | Score | Known mechanism | Biological plausibility |
|---|---|---|---|---|
| 1 | TBD | TBD | TBD | TBD |
_Note: HDAC inhibitors and broad kinase inhibitors often dominate connectivity rankings due
to widespread expression effects — flag these honestly (PLAN.md §9.4)._
## Section 4 — One non-obvious candidate worth investigating
_A single paragraph on the most interesting result. Language must be careful: this is a
computational hypothesis to test, not a discovery (PLAN.md §9.7)._
## Section 5 — Honest limitations
- Cell-composition confound in whole-blood expression (PLAN.md §9.1)
- LINCS L1000 cell-line limitations — landmark genes measured mostly in cancer lines (§9.2)
- Missing signatures (e.g. L-glutamine) (§9.3)
- No mechanistic validation layer — discovery hypothesis generation, not validated prediction (§9.6)
## Section 6 — What v2 would fix
- Cell-type deconvolution of the disease signature
- Knowledge graph fallback for missing-signature drugs
- A second disease to test generalization (the real test — sickle cell results do not prove
the platform generalizes, §9.5)