Scaffold Reverso MVP pipeline structure

Set up the project skeleton per PLAN.md §4:
- src/ package: identifiers, disease, drugs, scoring, provenance
  with pydantic schemas and confidence-tier logic (working);
  data-pull/compute functions stubbed per their build week
- 5 starter notebooks (01-05) with PLAN-referenced steps
- tests/test_scoring.py: tier-assignment tests pass; scoring
  reference test xfail until Week 3
- docs/: recovery_test_report, data_sources, known_limitations skeletons
- pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README
- data/ tree preserved via .gitkeep; raw/processed/results gitignored

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-23 20:19:38 +02:00
parent e717cf40ed
commit b731478f5d
25 changed files with 1038 additions and 4 deletions

View File

@@ -1,4 +1,4 @@
# QPharma MVP — Sickle Cell Repurposing Pipeline
# Reverso MVP — Sickle Cell Repurposing Pipeline
> **For Claude Code:** This is the project specification. Read this entire document before suggesting actions or writing code. The decisions in section "Locked decisions" have already been made by the founder after extensive expert consultation; do not re-litigate them. Where the plan calls for a choice, propose options but default to the spec.
@@ -121,7 +121,7 @@ This is the most commercially important design decision in the whole pipeline. S
## 4. Directory structure
```
qpharma-mvp/
reverso-mvp/
├── PLAN.md # This file
├── README.md # Short project description
├── pyproject.toml # Dependencies (or requirements.txt)
@@ -375,7 +375,7 @@ These are real risks documented during planning. They are not paranoia.
1. **Cell-composition confound in sickle cell expression data.** Whole-blood differential expression in sickle cell partly reflects different blood cell ratios, not disease biology. v1 acknowledges this; v2 should deconvolve.
2. **LINCS L1000 cell-line limitations.** The 978 landmark genes were measured mostly in cancer cell lines (MCF7, A375, PC3, etc.). Signatures for non-oncology diseases may be noisy. This is a known field-wide limitation, not unique to QPharma.
2. **LINCS L1000 cell-line limitations.** The 978 landmark genes were measured mostly in cancer cell lines (MCF7, A375, PC3, etc.). Signatures for non-oncology diseases may be noisy. This is a known field-wide limitation, not unique to Reverso.
3. **L-glutamine probably has no LINCS signature.** Amino acids and metabolites weren't LINCS priorities. If true, the ground-truth test only has hydroxyurea, which is weaker. Document honestly.