v1.1: full gene space + specificity z-score; hydroxyurea recovers
Post-hoc improvement after the pre-registered v1 recovery test failed. Two changes, diagnosing v1's failure: - score on the full 12,328-gene LINCS space (week2_lincs_extract.py), lifting signature overlap from 12% to 85% (brings erythroid markers in) - src/scoring.py: KS connectivity + per-drug specificity z-score (spec_z = SDs below a 1,000 random-query null). Primary ranking is now spec_z. (Textbook tau saturated at +/-100 for a coherent query — documented; needs a reference-signature library, a v2 item.) - week3_scoring.py: spec_z primary + WTCS reference + prior-blended - tests: tau/spec_z calibration test; 19 passing - scripts/exp_genespace.py: the BING vs all-12,328 comparison Result: hydroxyurea recovers (rank 40 -> 18, top 6%, passes top-10%), confirming the v1 failure was the landmark bottleneck not the algorithm. Overall STILL FAILS: L-glutamine does not reverse (rank 213, metabolite), and negative controls (norethindrone, ciprofloxacin) rank top-3 — connectivity != therapeutic relatedness. v1.1 is post-hoc/exploratory, not a confirmatory test; reported as such in recovery_test_report.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -114,6 +114,33 @@ class TestMechanisticPrior:
|
||||
assert mechanistic_prior(["Some unrelated kinase"]) == 0.0
|
||||
|
||||
|
||||
class TestTauCalibration:
|
||||
"""tau should reward a SPECIFIC reverser and give a near-zero score to a noise drug."""
|
||||
|
||||
@staticmethod
|
||||
def _matrix() -> pd.DataFrame:
|
||||
genes = [f"U{i}" for i in range(5)] + [f"D{i}" for i in range(5)] + [f"G{i}" for i in range(40)]
|
||||
rng_vals = {g: 0.01 * ((hash(g) % 7) - 3) for g in genes} # tiny deterministic noise
|
||||
# specific reverser: query-up genes at the bottom, query-down at the top, rest ~0
|
||||
specific = dict(rng_vals)
|
||||
for i in range(5):
|
||||
specific[f"U{i}"] = -8 - i
|
||||
specific[f"D{i}"] = 8 + i
|
||||
noise = dict(rng_vals)
|
||||
return pd.DataFrame([specific, noise], index=["specific", "noise"])[genes]
|
||||
|
||||
def test_specific_reverser_has_strongly_negative_tau(self):
|
||||
from src.scoring import tau_calibrate
|
||||
up = [f"U{i}" for i in range(5)]
|
||||
down = [f"D{i}" for i in range(5)]
|
||||
out = tau_calibrate(up, down, self._matrix(), n_null=300, seed=0)
|
||||
# Ranked by spec_z (continuous); the specific reverser is the most negative.
|
||||
assert out.loc["specific", "spec_z"] < -2
|
||||
assert out.loc["specific", "spec_z"] < out.loc["noise", "spec_z"]
|
||||
assert out.loc["specific", "tau"] < -50 # tau also flags it (may saturate near -100)
|
||||
assert out.loc["specific", "rank"] == 1
|
||||
|
||||
|
||||
def test_rank_drugs_orders_by_reversal():
|
||||
from src.scoring import rank_drugs
|
||||
genes = ["U1", "U2", "D1", "D2"] + [f"N{i}" for i in range(10)]
|
||||
|
||||
Reference in New Issue
Block a user