Files
Reverso/docs/known_limitations.md
Junior B. b731478f5d Scaffold Reverso MVP pipeline structure
Set up the project skeleton per PLAN.md §4:
- src/ package: identifiers, disease, drugs, scoring, provenance
  with pydantic schemas and confidence-tier logic (working);
  data-pull/compute functions stubbed per their build week
- 5 starter notebooks (01-05) with PLAN-referenced steps
- tests/test_scoring.py: tier-assignment tests pass; scoring
  reference test xfail until Week 3
- docs/: recovery_test_report, data_sources, known_limitations skeletons
- pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README
- data/ tree preserved via .gitkeep; raw/processed/results gitignored

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 20:20:09 +02:00

40 lines
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Known Limitations
The honest list of what would break this MVP at scale or in a different disease. Useful for the
next pharma conversation: "yes, we know these are limitations, here's how v2 addresses them."
Source: PLAN.md §9.
1. **Cell-composition confound in sickle cell expression data.** Whole-blood differential
expression partly reflects different blood cell ratios, not disease biology. v1 acknowledges
this; v2 should deconvolve cell types.
2. **LINCS L1000 cell-line limitations.** The 978 landmark genes were measured mostly in cancer
cell lines (MCF7, A375, PC3, …). Signatures for non-oncology diseases may be noisy. A
field-wide limitation, not unique to Reverso.
3. **L-glutamine probably has no LINCS signature.** Amino acids and metabolites weren't LINCS
priorities. If true, the ground-truth test effectively rests on hydroxyurea alone, which is
weaker. _Status: TBD — record the actual finding here once LINCS is pulled (Week 2)._
4. **Connectivity scoring surfaces broad-effect drugs as false positives.** HDAC inhibitors and
broad kinase inhibitors often top connectivity rankings simply because they perturb many
genes. The mechanistic prior (Week 3) helps filter, but does not eliminate this.
5. **Hydroxyurea will probably pass the recovery test by construction.** Sickle cell +
hydroxyurea is a well-studied pair. Passing is necessary but not sufficient to claim the
platform generalizes. The next disease is the real test — do not sell sickle cell results as
proving the platform.
6. **No mechanistic validation layer.** Pure ML matching is not sufficient for extrapolation
(flagged by multiple experts). The MVP knowingly omits the mechanistic layer; it is a phase-2
addition. Position the MVP as "discovery hypothesis generation," not "validated prediction."
7. **Top-ranked novel candidates are not wet-lab validated.** They are computational hypotheses
to test, not discoveries. Use careful language in any write-up.
## Drug-specific gaps (fill in during Week 23)
| Drug | Issue | Handling |
|---|---|---|
| TBD | e.g. no LINCS signature | flagged "not scored, no signature available" |