Reverso

Author	SHA1	Message	Date
Junior B.	72f1a49de6	Week 4: recovery test (FAIL, reported honestly) + 2-page report Run the formal recovery test against the pre-registered criteria and write the deliverable report (PLAN §6 Week 4): - week4_recovery_test.py: evaluate hydroxyurea/L-glutamine + 5 pre-specified negative controls vs the committed criteria - recovery_test_report.md: methodology, FAIL result with diagnosis, top-10, lisinopril as the non-obvious candidate, limitations, v2 - known_limitations.md: L-glutamine coverage resolved, 12%-overlap driver, recovery outcome table Outcome: FAIL on all 3 criteria (hydroxyurea top 13%, L-glutamine WTCS=0, 1/5 negative controls bottom-half). Root cause is signature/ assay data limitations (lost erythroid+HbF axis, 12% landmark overlap), not the matching algorithm — reported straight per the project ethos. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 22:38:56 +02:00
Junior B.	fd4591949c	Week 3: CMap connectivity scoring engine + ranked candidates Implement the matching engine (PLAN §6 Week 3): - src/scoring.py: weighted-KS/GSEA enrichment, weighted connectivity score (WTCS, Lamb 2006 / Subramanian 2017), signed NCS normalization, rank_drugs, and a sickle-pathway mechanistic prior - tests/test_scoring.py: real reference tests for the scorer (perfect reversal<null<mimic, same-sign->0, absent-gene invariance) + prior - week3_scoring.py: score 300 drugs -> ranked_candidates_v1.csv with a raw ranking and a secondary mechanistic-prior-weighted ranking Preliminary (formal recovery test is Week 4): hydroxyurea raw rank 40/300 (top 13%, just misses pre-registered top-10%), blended rank 7; L-glutamine WTCS=0 (ambiguous). Notably anti-inflammatory SCD drugs cluster in the raw top tier — the engine reverses the inflammation axis, not the erythroid axis, traceable to the 12% landmark-overlap caveat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 22:34:56 +02:00
Junior B.	47b0094079	Week 2: 300-drug profiles with LINCS signatures + ChEMBL Build the drug profile dataset (PLAN §6 Week 2): - week2_curate_drugset.py: 300-drug set (2 ground-truth + 32 related- mechanism + 26 negative-control + 240 random), restricted to LINCS-scorable compounds, seed=42 - week2_chembl.py: InChIKey->ChEMBL match (145/300), MoA + targets - week2_lincs_extract.py: cmapPy-slice both Level-5 GCTX phases to 978 landmark genes, mean-aggregate per drug to one consensus signature - week2_assemble.py: join into drug_profiles_v1.parquet, Tier B (LINCS single-source), scored flag per PLAN §6 Week 3 task 2 - docs/data_sources.md: drug set composition + LINCS/ChEMBL provenance Results (all gitignored data): 300/300 drugs scored, both ground-truth drugs present (hydroxyurea Phase II = CHEMBL467, L-glutamine Phase I). Key caveat recorded: only 56/477 (12%) of the disease signature genes are LINCS landmarks, so Week-3 scoring uses a 30-up/26-down query. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 22:25:00 +02:00
Junior B.	c7b6649d31	Week 1: Tier-A sickle cell signature via 2-study concordance Implement and run the Week 1 disease-signature pipeline: - src/disease.py: Welch t-test + BH DE (microarray), probe->symbol collapse, cross-study concordance filter, 2-study provenance schema - scripts/week1_explore.py: download GSE35007 + GSE16728, DE + concordance - scripts/week1_finalize.py: mygene ID mapping + persist signature - tests/test_disease.py: synthetic-data tests for DE/collapse/concordance - docs/data_sources.md: chosen datasets, group defs, reproduction steps Result: sickle_cell_signature_v1.json (gitignored), Tier A, 250 up / 227 down genes from 671 concordant (GSE35007 Illumina whole blood SS/AA + GSE16728 Affymetrix whole blood patient/control). Documented caveats: missing HbF axis (globin depletion) and reticulocyte composition confound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 20:43:54 +02:00

4 Commits