Set up the project skeleton per PLAN.md §4: - src/ package: identifiers, disease, drugs, scoring, provenance with pydantic schemas and confidence-tier logic (working); data-pull/compute functions stubbed per their build week - 5 starter notebooks (01-05) with PLAN-referenced steps - tests/test_scoring.py: tier-assignment tests pass; scoring reference test xfail until Week 3 - docs/: recovery_test_report, data_sources, known_limitations skeletons - pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README - data/ tree preserved via .gitkeep; raw/processed/results gitignored Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
40 lines
2.1 KiB
Markdown
40 lines
2.1 KiB
Markdown
# Known Limitations
|
||
|
||
The honest list of what would break this MVP at scale or in a different disease. Useful for the
|
||
next pharma conversation: "yes, we know these are limitations, here's how v2 addresses them."
|
||
Source: PLAN.md §9.
|
||
|
||
1. **Cell-composition confound in sickle cell expression data.** Whole-blood differential
|
||
expression partly reflects different blood cell ratios, not disease biology. v1 acknowledges
|
||
this; v2 should deconvolve cell types.
|
||
|
||
2. **LINCS L1000 cell-line limitations.** The 978 landmark genes were measured mostly in cancer
|
||
cell lines (MCF7, A375, PC3, …). Signatures for non-oncology diseases may be noisy. A
|
||
field-wide limitation, not unique to Reverso.
|
||
|
||
3. **L-glutamine probably has no LINCS signature.** Amino acids and metabolites weren't LINCS
|
||
priorities. If true, the ground-truth test effectively rests on hydroxyurea alone, which is
|
||
weaker. _Status: TBD — record the actual finding here once LINCS is pulled (Week 2)._
|
||
|
||
4. **Connectivity scoring surfaces broad-effect drugs as false positives.** HDAC inhibitors and
|
||
broad kinase inhibitors often top connectivity rankings simply because they perturb many
|
||
genes. The mechanistic prior (Week 3) helps filter, but does not eliminate this.
|
||
|
||
5. **Hydroxyurea will probably pass the recovery test by construction.** Sickle cell +
|
||
hydroxyurea is a well-studied pair. Passing is necessary but not sufficient to claim the
|
||
platform generalizes. The next disease is the real test — do not sell sickle cell results as
|
||
proving the platform.
|
||
|
||
6. **No mechanistic validation layer.** Pure ML matching is not sufficient for extrapolation
|
||
(flagged by multiple experts). The MVP knowingly omits the mechanistic layer; it is a phase-2
|
||
addition. Position the MVP as "discovery hypothesis generation," not "validated prediction."
|
||
|
||
7. **Top-ranked novel candidates are not wet-lab validated.** They are computational hypotheses
|
||
to test, not discoveries. Use careful language in any write-up.
|
||
|
||
## Drug-specific gaps (fill in during Week 2–3)
|
||
|
||
| Drug | Issue | Handling |
|
||
|---|---|---|
|
||
| TBD | e.g. no LINCS signature | flagged "not scored, no signature available" |
|