v1.1: full gene space + specificity z-score; hydroxyurea recovers
Post-hoc improvement after the pre-registered v1 recovery test failed. Two changes, diagnosing v1's failure: - score on the full 12,328-gene LINCS space (week2_lincs_extract.py), lifting signature overlap from 12% to 85% (brings erythroid markers in) - src/scoring.py: KS connectivity + per-drug specificity z-score (spec_z = SDs below a 1,000 random-query null). Primary ranking is now spec_z. (Textbook tau saturated at +/-100 for a coherent query — documented; needs a reference-signature library, a v2 item.) - week3_scoring.py: spec_z primary + WTCS reference + prior-blended - tests: tau/spec_z calibration test; 19 passing - scripts/exp_genespace.py: the BING vs all-12,328 comparison Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%), confirming the v1 failure was the landmark bottleneck not the algorithm. Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite), and negative controls (norethindrone, ciprofloxacin) rank top-3 — connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory, not a confirmatory test; reported as such in recovery_test_report.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -36,6 +36,7 @@ def main() -> None:
|
||||
return int(df.loc[name, "rank"]) if name in df.index else None
|
||||
|
||||
hu, glut = rk("hydroxyurea"), rk("glutamine")
|
||||
glut_z = df.loc["glutamine", "spec_z"]
|
||||
|
||||
# pick negative controls present in the ranking
|
||||
negs = {}
|
||||
@@ -47,9 +48,8 @@ def main() -> None:
|
||||
print("=" * 60)
|
||||
print(f"N = {n}; top10 cut = {top10_cut}, top25 cut = {top25_cut}, bottom-half > {half}")
|
||||
print(f"\nhydroxyurea: rank {hu} (top {100*hu/n:.1f}%) -> top-10%? {hu <= top10_cut}")
|
||||
glut_score = df.loc["glutamine", "connectivity_score"]
|
||||
print(f"L-glutamine: rank {glut} (top {100*glut/n:.1f}%), WTCS={glut_score:.3f} "
|
||||
f"-> top-25%? {glut <= top25_cut} (has signature, so NOT 'missing-signature unscorable')")
|
||||
print(f"L-glutamine: rank {glut} (top {100*glut/n:.1f}%), spec_z={glut_z:+.2f} "
|
||||
f"-> top-25%? {glut <= top25_cut} (positive z => does not reverse; has a signature)")
|
||||
print("\nnegative controls (pre-specified, 1 per category):")
|
||||
n_bottom = 0
|
||||
for d, (cat, r) in negs.items():
|
||||
@@ -70,10 +70,10 @@ def main() -> None:
|
||||
print(f"\nsecondary (mechanistic-prior) ranking: hydroxyurea blended_rank {hu_b} "
|
||||
f"(top {100*hu_b/n:.1f}%)")
|
||||
|
||||
print("\n--- TOP 10 (raw connectivity) ---")
|
||||
top10 = df.nsmallest(10, "connectivity_score")
|
||||
print("\n--- TOP 10 (primary spec_z ranking) ---")
|
||||
top10 = df.sort_values("rank").head(10)
|
||||
for name, r in top10.iterrows():
|
||||
print(f" {int(r['rank']):2d} {name:18s} {r['connectivity_score']:+.3f} "
|
||||
print(f" {int(r['rank']):2d} {name:18s} z={r['spec_z']:+.2f} "
|
||||
f"[{r['inclusion_reason']}] {str(r['known_targets'])[:45]}")
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user