{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 03 \u2014 Drug profiles\n", "\n", "Week 2 (PLAN.md \u00a76). Curate the ~300-drug set, pull ChEMBL + LINCS L1000 data, and assemble `drug_profiles_v1.parquet`.\n\nDrug set: 2 ground-truth + ~50 related-mechanism + ~50 negative controls + ~200 random (fixed seed). Document any missing LINCS signatures in `docs/known_limitations.md`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "sys.path.insert(0, '..') # import the src package from notebooks/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from src import drugs\n", "from src import RANDOM_SEED\n", "\n", "# Step 1: drugs.curate_drug_set(seed=RANDOM_SEED) -> data/processed/drug_set_v1.csv\n", "# Step 2: drugs.fetch_chembl_profile(...) for each drug -> data/raw/chembl/\n", "# Step 3: drugs.fetch_lincs_signature(...) -> data/raw/lincs/\n", "# Step 4: drugs.persist_drug_profiles(...) -> drug_profiles_v1.parquet" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }