v1.1: full gene space + specificity z-score; hydroxyurea recovers
Post-hoc improvement after the pre-registered v1 recovery test failed. Two changes, diagnosing v1's failure: - score on the full 12,328-gene LINCS space (week2_lincs_extract.py), lifting signature overlap from 12% to 85% (brings erythroid markers in) - src/scoring.py: KS connectivity + per-drug specificity z-score (spec_z = SDs below a 1,000 random-query null). Primary ranking is now spec_z. (Textbook tau saturated at +/-100 for a coherent query — documented; needs a reference-signature library, a v2 item.) - week3_scoring.py: spec_z primary + WTCS reference + prior-blended - tests: tau/spec_z calibration test; 19 passing - scripts/exp_genespace.py: the BING vs all-12,328 comparison Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%), confirming the v1 failure was the landmark bottleneck not the algorithm. Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite), and negative controls (norethindrone, ciprofloxacin) rank top-3 — connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory, not a confirmatory test; reported as such in recovery_test_report.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -34,20 +34,28 @@ Source: PLAN.md §9.
|
||||
7. **Top-ranked novel candidates are not wet-lab validated.** They are computational hypotheses
|
||||
to test, not discoveries. Use careful language in any write-up.
|
||||
|
||||
8. **Only 12% of the signature is LINCS-scorable (56/477 genes).** The 978 landmark genes (from
|
||||
cancer cell lines) miss the erythroid hallmark genes (CA1, AHSP, SLC4A1, HBG). Connectivity
|
||||
scoring runs on a thin inflammation/metabolic slice — the single biggest driver of the
|
||||
recovery-test failure. v2 fix: signature prediction or a mechanism graph to score the other 88%.
|
||||
8. **Gene-space bottleneck (v1 → fixed in v1.1).** v1 scored on only the 978 landmark genes (12%
|
||||
signature overlap) — the main driver of the v1 failure. v1.1 uses the full 12,328-gene space
|
||||
(85% overlap) and recovers hydroxyurea. HBG1/HBG2 remain absent from LINCS entirely.
|
||||
|
||||
## Recovery test outcome (Week 4)
|
||||
9. **No reference-signature library for tau.** Textbook CMap tau saturated at ±100 (a coherent
|
||||
query always out-connects random gene sets). v1.1 substitutes a per-drug specificity z-score.
|
||||
Proper tau needs a library of real reference signatures — a v2 / curated-data item.
|
||||
|
||||
The MVP **failed** all three pre-registered criteria on the primary raw ranking (hydroxyurea
|
||||
rank 40/top 13%; L-glutamine rank 100/WTCS=0; 1/5 negative controls in bottom half). The failure
|
||||
is fully attributable to signature/assay data limitations above, not the matching algorithm. See
|
||||
10. **Negative-control criterion may be invalid for connectivity scoring.** Unrelated drugs
|
||||
(norethindrone, ciprofloxacin) rank as top specific reversers — connectivity measures
|
||||
expression reversal, not therapeutic relatedness.
|
||||
|
||||
## Recovery test outcome
|
||||
|
||||
Pre-registered test (**v1, confirmatory**): **FAILED** all three criteria (hydroxyurea rank
|
||||
40/top 13%; L-glutamine rank 100; 1/5 negative controls bottom-half). Post-hoc (**v1.1,
|
||||
exploratory**): hydroxyurea recovers to rank 18 (top 6%, passes), but L-glutamine (rank 213, does
|
||||
not reverse) and negative controls (2/5) still fail → overall still FAIL. See
|
||||
`recovery_test_report.md`.
|
||||
|
||||
| Drug | Issue | Handling |
|
||||
| Drug | Issue | v1.1 status |
|
||||
|---|---|---|
|
||||
| hydroxyurea | HbF mechanism not in scorable gene space | scored (rank 40); recovered only by prior-weighted ranking |
|
||||
| L-glutamine | signature present but WTCS ambiguous (=0) | scored (rank 100); no reversal signal |
|
||||
| all 300 | had LINCS signatures | 0 marked "not scored" — coverage was not the issue; specificity was |
|
||||
| hydroxyurea | needed the full gene space | rank 18 (top 6%) — recovered post-hoc |
|
||||
| L-glutamine | metabolite, no reversal signal (positive connectivity) | rank 213 — genuine negative |
|
||||
| neg controls | reverse the generic inflammation signature | 2/5 bottom-half — criterion questionable |
|
||||
|
||||
Reference in New Issue
Block a user