Reverso

Files

Junior B. 649f617019 Phase D: supervised cross-disease (0.925 AUC degree-bias mirage)

Train GradientBoosting on 300 drugs x 839 GEO disease signatures with
Repurposing-Hub indications as labels (432 positives), disease-grouped CV.

Finding: 0.925 CV AUC looks like a win but is a MIRAGE. Feature
importances are all drug-level (drug_std 0.33, drug_mean 0.30,
broadness 0.17); drug-disease connectivity importance = 0.01. The model
learned a drug-POPULARITY prior, not disease-specific matching. On
held-out sickle it ranks hydroxyurea 231/300 (worse than baseline) and
tops out with promiscuous drugs (dexamethasone, methotrexate). Classic
degree-bias trap. Connectivity also has ~chance AUC (0.51) for predicting
approved indications.

Both obvious approaches now fail instructively: unsupervised = specificity
ceiling; naive supervised = degree bias. Real progress needs degree-
debiased training + much larger clean labels (a research effort).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-23 23:31:32 +02:00

exp_deconv_signature.py

Experiment: composition-adjusted signature (negative result)

2026-06-23 23:05:08 +02:00

exp_genespace.py

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

phaseA_reference_tau.py

Phase A: reference-library tau (negative result on specificity)

2026-06-23 23:19:26 +02:00

phaseD_supervised.py

Phase D: supervised cross-disease (0.925 AUC degree-bias mirage)