Files

Junior B. 72f1a49de6 Week 4: recovery test (FAIL, reported honestly) + 2-page report

Run the formal recovery test against the pre-registered criteria and
write the deliverable report (PLAN §6 Week 4):
- week4_recovery_test.py: evaluate hydroxyurea/L-glutamine + 5
  pre-specified negative controls vs the committed criteria
- recovery_test_report.md: methodology, FAIL result with diagnosis,
  top-10, lisinopril as the non-obvious candidate, limitations, v2
- known_limitations.md: L-glutamine coverage resolved, 12%-overlap
  driver, recovery outcome table

Outcome: FAIL on all 3 criteria (hydroxyurea top 13%, L-glutamine
WTCS=0, 1/5 negative controls bottom-half). Root cause is signature/
assay data limitations (lost erythroid+HbF axis, 12% landmark overlap),
not the matching algorithm — reported straight per the project ethos.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-23 22:38:56 +02:00

3.2 KiB

Raw Blame History

Known Limitations

The honest list of what would break this MVP at scale or in a different disease. Useful for the next pharma conversation: "yes, we know these are limitations, here's how v2 addresses them." Source: PLAN.md §9.

Cell-composition confound in sickle cell expression data. Whole-blood differential expression partly reflects different blood cell ratios, not disease biology. v1 acknowledges this; v2 should deconvolve cell types.
LINCS L1000 cell-line limitations. The 978 landmark genes were measured mostly in cancer cell lines (MCF7, A375, PC3, …). Signatures for non-oncology diseases may be noisy. A field-wide limitation, not unique to Reverso.
L-glutamine LINCS coverage — RESOLVED, opposite of expected. L-glutamine DOES have a Phase I signature (hydroxyurea is Phase-II-only) — both ground-truth drugs are scorable. But L-glutamine's connectivity is ambiguous (WTCS=0): its up- and down-set enrichments share a sign, so it shows no reversal. It ranks 100/300. So the ground-truth test effectively rests on hydroxyurea, which itself only reaches top 13% (raw) — see the recovery test report.
Connectivity scoring surfaces broad-effect drugs as false positives. HDAC inhibitors and broad kinase inhibitors often top connectivity rankings simply because they perturb many genes. The mechanistic prior (Week 3) helps filter, but does not eliminate this.
Hydroxyurea will probably pass the recovery test by construction. Sickle cell + hydroxyurea is a well-studied pair. Passing is necessary but not sufficient to claim the platform generalizes. The next disease is the real test — do not sell sickle cell results as proving the platform.
No mechanistic validation layer. Pure ML matching is not sufficient for extrapolation (flagged by multiple experts). The MVP knowingly omits the mechanistic layer; it is a phase-2 addition. Position the MVP as "discovery hypothesis generation," not "validated prediction."
Top-ranked novel candidates are not wet-lab validated. They are computational hypotheses to test, not discoveries. Use careful language in any write-up.
Only 12% of the signature is LINCS-scorable (56/477 genes). The 978 landmark genes (from cancer cell lines) miss the erythroid hallmark genes (CA1, AHSP, SLC4A1, HBG). Connectivity scoring runs on a thin inflammation/metabolic slice — the single biggest driver of the recovery-test failure. v2 fix: signature prediction or a mechanism graph to score the other 88%.

Recovery test outcome (Week 4)

The MVP failed all three pre-registered criteria on the primary raw ranking (hydroxyurea rank 40/top 13%; L-glutamine rank 100/WTCS=0; 1/5 negative controls in bottom half). The failure is fully attributable to signature/assay data limitations above, not the matching algorithm. See recovery_test_report.md.

Drug	Issue	Handling
hydroxyurea	HbF mechanism not in scorable gene space	scored (rank 40); recovered only by prior-weighted ranking
L-glutamine	signature present but WTCS ambiguous (=0)	scored (rank 100); no reversal signal
all 300	had LINCS signatures	0 marked "not scored" — coverage was not the issue; specificity was

3.2 KiB Raw Blame History

Known Limitations

Recovery test outcome (Week 4)

3.2 KiB

Raw Blame History