Reverso/notebooks/03_drug_profiles.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 03 \u2014 Drug profiles\n",
    "\n",
    "Week 2 (PLAN.md \u00a76). Curate the ~300-drug set, pull ChEMBL + LINCS L1000 data, and assemble `drug_profiles_v1.parquet`.\n\nDrug set: 2 ground-truth + ~50 related-mechanism + ~50 negative controls + ~200 random (fixed seed). Document any missing LINCS signatures in `docs/known_limitations.md`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "sys.path.insert(0, '..')  # import the src package from notebooks/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from src import drugs\n",
    "from src import RANDOM_SEED\n",
    "\n",
    "# Step 1: drugs.curate_drug_set(seed=RANDOM_SEED) -> data/processed/drug_set_v1.csv\n",
    "# Step 2: drugs.fetch_chembl_profile(...) for each drug -> data/raw/chembl/\n",
    "# Step 3: drugs.fetch_lincs_signature(...) -> data/raw/lincs/\n",
    "# Step 4: drugs.persist_drug_profiles(...) -> drug_profiles_v1.parquet"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}