v1.1: full gene space + specificity z-score; hydroxyurea recovers
Post-hoc improvement after the pre-registered v1 recovery test failed. Two changes, diagnosing v1's failure: - score on the full 12,328-gene LINCS space (week2_lincs_extract.py), lifting signature overlap from 12% to 85% (brings erythroid markers in) - src/scoring.py: KS connectivity + per-drug specificity z-score (spec_z = SDs below a 1,000 random-query null). Primary ranking is now spec_z. (Textbook tau saturated at +/-100 for a coherent query — documented; needs a reference-signature library, a v2 item.) - week3_scoring.py: spec_z primary + WTCS reference + prior-blended - tests: tau/spec_z calibration test; 19 passing - scripts/exp_genespace.py: the BING vs all-12,328 comparison Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%), confirming the v1 failure was the landmark bottleneck not the algorithm. Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite), and negative controls (norethindrone, ciprofloxacin) rank top-3 — connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory, not a confirmatory test; reported as such in recovery_test_report.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -61,9 +61,10 @@ Reproduce with `scripts/week1_explore.py` (download + DE + concordance) then
|
||||
38%, as expected). 43 drugs carry target annotations; 46 carry mechanism-of-action.
|
||||
- **Tier:** all signature-backed drugs are Tier B (LINCS is a single source → fails Tier A's
|
||||
not-single-source rule).
|
||||
- **Signature↔landmark overlap:** only 56/477 (12%) of the disease signature genes are LINCS
|
||||
landmarks, so connectivity scoring (Week 3) uses a 30-up/26-down query. The erythroid hallmark
|
||||
genes (CA1, AHSP, SLC4A1, HBG) are NOT landmarks. This is a key limitation for the recovery test.
|
||||
- **Gene space (v1.1):** scoring uses the full **12,328-gene** LINCS space, not just the 978
|
||||
landmarks. Signature overlap is 406/477 (85%) vs 56/477 (12%) for landmark-only — the larger
|
||||
space is what recovers hydroxyurea (see recovery_test_report.md). HBG1/HBG2 are absent from
|
||||
LINCS entirely and remain unscoreable.
|
||||
- Reproduce: `week2_curate_drugset.py` → `week2_chembl.py` → download Level-5 GCTX →
|
||||
`week2_lincs_extract.py` → `week2_assemble.py`.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user