Files
Reverso/docs/known_limitations.md
Junior B. 3417f85eb1 v1.1: full gene space + specificity z-score; hydroxyurea recovers
Post-hoc improvement after the pre-registered v1 recovery test failed.
Two changes, diagnosing v1's failure:
- score on the full 12,328-gene LINCS space (week2_lincs_extract.py),
  lifting signature overlap from 12% to 85% (brings erythroid markers in)
- src/scoring.py: KS connectivity + per-drug specificity z-score
  (spec_z = SDs below a 1,000 random-query null). Primary ranking is
  now spec_z. (Textbook tau saturated at +/-100 for a coherent query —
  documented; needs a reference-signature library, a v2 item.)
- week3_scoring.py: spec_z primary + WTCS reference + prior-blended
- tests: tau/spec_z calibration test; 19 passing
- scripts/exp_genespace.py: the BING vs all-12,328 comparison

Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%),
confirming the v1 failure was the landmark bottleneck not the algorithm.
Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite),
and negative controls (norethindrone, ciprofloxacin) rank top-3 —
connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory,
not a confirmatory test; reported as such in recovery_test_report.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 22:57:30 +02:00

3.7 KiB

Known Limitations

The honest list of what would break this MVP at scale or in a different disease. Useful for the next pharma conversation: "yes, we know these are limitations, here's how v2 addresses them." Source: PLAN.md §9.

  1. Cell-composition confound in sickle cell expression data. Whole-blood differential expression partly reflects different blood cell ratios, not disease biology. v1 acknowledges this; v2 should deconvolve cell types.

  2. LINCS L1000 cell-line limitations. The 978 landmark genes were measured mostly in cancer cell lines (MCF7, A375, PC3, …). Signatures for non-oncology diseases may be noisy. A field-wide limitation, not unique to Reverso.

  3. L-glutamine LINCS coverage — RESOLVED, opposite of expected. L-glutamine DOES have a Phase I signature (hydroxyurea is Phase-II-only) — both ground-truth drugs are scorable. But L-glutamine's connectivity is ambiguous (WTCS=0): its up- and down-set enrichments share a sign, so it shows no reversal. It ranks 100/300. So the ground-truth test effectively rests on hydroxyurea, which itself only reaches top 13% (raw) — see the recovery test report.

  4. Connectivity scoring surfaces broad-effect drugs as false positives. HDAC inhibitors and broad kinase inhibitors often top connectivity rankings simply because they perturb many genes. The mechanistic prior (Week 3) helps filter, but does not eliminate this.

  5. Hydroxyurea will probably pass the recovery test by construction. Sickle cell + hydroxyurea is a well-studied pair. Passing is necessary but not sufficient to claim the platform generalizes. The next disease is the real test — do not sell sickle cell results as proving the platform.

  6. No mechanistic validation layer. Pure ML matching is not sufficient for extrapolation (flagged by multiple experts). The MVP knowingly omits the mechanistic layer; it is a phase-2 addition. Position the MVP as "discovery hypothesis generation," not "validated prediction."

  7. Top-ranked novel candidates are not wet-lab validated. They are computational hypotheses to test, not discoveries. Use careful language in any write-up.

  8. Gene-space bottleneck (v1 → fixed in v1.1). v1 scored on only the 978 landmark genes (12% signature overlap) — the main driver of the v1 failure. v1.1 uses the full 12,328-gene space (85% overlap) and recovers hydroxyurea. HBG1/HBG2 remain absent from LINCS entirely.

  9. No reference-signature library for tau. Textbook CMap tau saturated at ±100 (a coherent query always out-connects random gene sets). v1.1 substitutes a per-drug specificity z-score. Proper tau needs a library of real reference signatures — a v2 / curated-data item.

  10. Negative-control criterion may be invalid for connectivity scoring. Unrelated drugs (norethindrone, ciprofloxacin) rank as top specific reversers — connectivity measures expression reversal, not therapeutic relatedness.

Recovery test outcome

Pre-registered test (v1, confirmatory): FAILED all three criteria (hydroxyurea rank 40/top 13%; L-glutamine rank 100; 1/5 negative controls bottom-half). Post-hoc (v1.1, exploratory): hydroxyurea recovers to rank 18 (top 6%, passes), but L-glutamine (rank 213, does not reverse) and negative controls (2/5) still fail → overall still FAIL. See recovery_test_report.md.

Drug Issue v1.1 status
hydroxyurea needed the full gene space rank 18 (top 6%) — recovered post-hoc
L-glutamine metabolite, no reversal signal (positive connectivity) rank 213 — genuine negative
neg controls reverse the generic inflammation signature 2/5 bottom-half — criterion questionable