Set up the project skeleton per PLAN.md §4: - src/ package: identifiers, disease, drugs, scoring, provenance with pydantic schemas and confidence-tier logic (working); data-pull/compute functions stubbed per their build week - 5 starter notebooks (01-05) with PLAN-referenced steps - tests/test_scoring.py: tier-assignment tests pass; scoring reference test xfail until Week 3 - docs/: recovery_test_report, data_sources, known_limitations skeletons - pyproject.toml (requires-python >=3.11,<3.14), .gitignore, README - data/ tree preserved via .gitkeep; raw/processed/results gitignored Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
29 lines
1.7 KiB
Markdown
29 lines
1.7 KiB
Markdown
# Data Sources
|
|
|
|
> Fill in version + download date for every source actually used. This file is the artifact
|
|
> that proves reproducibility (PLAN.md §6, Week 4 task 4). Record date and version for **all**
|
|
> downloads.
|
|
|
|
| Source | URL | Access | License | Use in MVP | Version | Download date |
|
|
|---|---|---|---|---|---|---|
|
|
| Open Targets | https://platform.opentargets.org | API, bulk Parquet | CC0 | Target-disease graph | TBD | TBD |
|
|
| MONDO | http://www.obofoundry.org/ontology/mondo.html | OBO file | CC BY 4.0 | Disease ID | TBD | TBD |
|
|
| Orphanet | https://www.orpha.net | Bulk XML | CC BY 4.0 | Rare disease metadata | TBD | TBD |
|
|
| OMIM | https://omim.org | Free for academic | License for commercial | Disease genetics | TBD | TBD |
|
|
| GEO | https://www.ncbi.nlm.nih.gov/geo/ | GEOparse, FTP | Public domain | Expression data | TBD | TBD |
|
|
| ChEMBL | https://www.ebi.ac.uk/chembl | Python client, bulk SQLite | CC BY-SA 3.0 | Drug structures, targets | TBD | TBD |
|
|
| LINCS L1000 | https://clue.io/data | Bulk download | Restricted academic free | Drug expression signatures | TBD | TBD |
|
|
| ClinicalTrials.gov | https://clinicaltrials.gov | API | Public domain | Trial history | TBD | TBD |
|
|
| FDA DailyMed | https://dailymed.nlm.nih.gov | API | Public domain | Approved labels | TBD | TBD |
|
|
| Reactome | https://reactome.org | API, bulk | CC0 | Pathway data (Week 3 prior) | TBD | TBD |
|
|
|
|
## Chosen GEO dataset
|
|
|
|
_Document the chosen study fully: accession, platform, n per group, publication, why it was
|
|
selected over the alternatives (GSE53441, GSE35007, …)._
|
|
|
|
## Licensing note for LINCS
|
|
|
|
Read the LINCS data use terms before commercial use. For the MVP (research / proof-of-concept)
|
|
the terms are permissive. For productization this needs legal review.
|