Files
Reverso/docs/known_limitations.md
Junior B. 3417f85eb1 v1.1: full gene space + specificity z-score; hydroxyurea recovers
Post-hoc improvement after the pre-registered v1 recovery test failed.
Two changes, diagnosing v1's failure:
- score on the full 12,328-gene LINCS space (week2_lincs_extract.py),
  lifting signature overlap from 12% to 85% (brings erythroid markers in)
- src/scoring.py: KS connectivity + per-drug specificity z-score
  (spec_z = SDs below a 1,000 random-query null). Primary ranking is
  now spec_z. (Textbook tau saturated at +/-100 for a coherent query —
  documented; needs a reference-signature library, a v2 item.)
- week3_scoring.py: spec_z primary + WTCS reference + prior-blended
- tests: tau/spec_z calibration test; 19 passing
- scripts/exp_genespace.py: the BING vs all-12,328 comparison

Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%),
confirming the v1 failure was the landmark bottleneck not the algorithm.
Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite),
and negative controls (norethindrone, ciprofloxacin) rank top-3 —
connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory,
not a confirmatory test; reported as such in recovery_test_report.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 22:57:30 +02:00

62 lines
3.7 KiB
Markdown

# Known Limitations
The honest list of what would break this MVP at scale or in a different disease. Useful for the
next pharma conversation: "yes, we know these are limitations, here's how v2 addresses them."
Source: PLAN.md §9.
1. **Cell-composition confound in sickle cell expression data.** Whole-blood differential
expression partly reflects different blood cell ratios, not disease biology. v1 acknowledges
this; v2 should deconvolve cell types.
2. **LINCS L1000 cell-line limitations.** The 978 landmark genes were measured mostly in cancer
cell lines (MCF7, A375, PC3, …). Signatures for non-oncology diseases may be noisy. A
field-wide limitation, not unique to Reverso.
3. **L-glutamine LINCS coverage — RESOLVED, opposite of expected.** L-glutamine DOES have a
Phase I signature (hydroxyurea is Phase-II-only) — both ground-truth drugs are scorable. But
L-glutamine's connectivity is **ambiguous (WTCS=0)**: its up- and down-set enrichments share
a sign, so it shows no reversal. It ranks 100/300. So the ground-truth test effectively rests
on hydroxyurea, which itself only reaches top 13% (raw) — see the recovery test report.
4. **Connectivity scoring surfaces broad-effect drugs as false positives.** HDAC inhibitors and
broad kinase inhibitors often top connectivity rankings simply because they perturb many
genes. The mechanistic prior (Week 3) helps filter, but does not eliminate this.
5. **Hydroxyurea will probably pass the recovery test by construction.** Sickle cell +
hydroxyurea is a well-studied pair. Passing is necessary but not sufficient to claim the
platform generalizes. The next disease is the real test — do not sell sickle cell results as
proving the platform.
6. **No mechanistic validation layer.** Pure ML matching is not sufficient for extrapolation
(flagged by multiple experts). The MVP knowingly omits the mechanistic layer; it is a phase-2
addition. Position the MVP as "discovery hypothesis generation," not "validated prediction."
7. **Top-ranked novel candidates are not wet-lab validated.** They are computational hypotheses
to test, not discoveries. Use careful language in any write-up.
8. **Gene-space bottleneck (v1 → fixed in v1.1).** v1 scored on only the 978 landmark genes (12%
signature overlap) — the main driver of the v1 failure. v1.1 uses the full 12,328-gene space
(85% overlap) and recovers hydroxyurea. HBG1/HBG2 remain absent from LINCS entirely.
9. **No reference-signature library for tau.** Textbook CMap tau saturated at ±100 (a coherent
query always out-connects random gene sets). v1.1 substitutes a per-drug specificity z-score.
Proper tau needs a library of real reference signatures — a v2 / curated-data item.
10. **Negative-control criterion may be invalid for connectivity scoring.** Unrelated drugs
(norethindrone, ciprofloxacin) rank as top specific reversers — connectivity measures
expression reversal, not therapeutic relatedness.
## Recovery test outcome
Pre-registered test (**v1, confirmatory**): **FAILED** all three criteria (hydroxyurea rank
40/top 13%; L-glutamine rank 100; 1/5 negative controls bottom-half). Post-hoc (**v1.1,
exploratory**): hydroxyurea recovers to rank 18 (top 6%, passes), but L-glutamine (rank 213, does
not reverse) and negative controls (2/5) still fail → overall still FAIL. See
`recovery_test_report.md`.
| Drug | Issue | v1.1 status |
|---|---|---|
| hydroxyurea | needed the full gene space | rank 18 (top 6%) — recovered post-hoc |
| L-glutamine | metabolite, no reversal signal (positive connectivity) | rank 213 — genuine negative |
| neg controls | reverse the generic inflammation signature | 2/5 bottom-half — criterion questionable |