Go to file

Junior B. 08ed713cc8 GPU plan: ephemeral serverless co-folding (Modal) + app skeleton

docs/gpu_plan.md: cost-efficient plan for running AF3-class co-folding
(Boltz-2/DiffDock) on a GPU then paying nothing when idle.
- Key insight: structure-track data is tiny (MB of PDBs/SMILES); only the
  GPU + model weights are heavy -> serverless is ideal.
- Recommend Modal (per-second billing, scales to zero = nothing to kill);
  RunPod as the SSH-box alternative with idle auto-terminate.
- Lifecycle: image -> weights Volume (cache, don't re-download) -> run ->
  git push small results -> teardown automatic.
- Phase 1 validate on 3 known binders (~$1) before paying for a screen;
  Boltz-2 (affinity) on an L4/A10 (24-48GB); est total ~$5-15.

gpu/modal_app.py: Modal app skeleton (image, weights volume, GPU cofold()
function, local entrypoint); boltz invocation stubbed with TODOs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-24 16:45:04 +02:00

data

Structure-binding track: scaffold + ligand-retrieval baseline

2026-06-23 23:53:27 +02:00

docs

GPU plan: ephemeral serverless co-folding (Modal) + app skeleton

2026-06-24 16:45:04 +02:00

gpu

GPU plan: ephemeral serverless co-folding (Modal) + app skeleton

2026-06-24 16:45:04 +02:00

notebooks

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

scripts

Production docking: prep helps (7.9->4.8A) but Vina wrong tool for sickle

2026-06-24 16:38:54 +02:00

src

Structure-binding track: scaffold + ligand-retrieval baseline

2026-06-23 23:53:27 +02:00

tests

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

.gitignore

Docking baseline: toolchain solved, raw affinity is size-biased

2026-06-24 00:03:00 +02:00

PLAN.md

PLAN §12.9: leave door open for generative-guided retrieval

2026-06-23 23:43:25 +02:00

pyproject.toml

Structure-binding track: scaffold + ligand-retrieval baseline

2026-06-23 23:53:27 +02:00

README.md

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

README.md

Reverso MVP — Sickle Cell Repurposing Pipeline

A minimum viable drug repurposing pipeline for sickle cell disease: build a disease signature from public transcriptomic data, build drug profiles for ~300 small molecules, and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?

See PLAN.md for the full specification, locked decisions, and week-by-week build plan.

Quickstart

# Requires Python >=3.11,<3.13 (see note below)
pip install -e .            # or: pip install -e ".[dev]" for test/lint tooling
pytest                      # run unit tests

Python version note: use Python 3.11–3.13 (python3.13 -m venv .venv). Python 3.14 is not yet supported by all pipeline dependencies (pydeseq2, cmapPy).

Project layout

data/         raw (downloaded, never edited) / processed / results — gitignored
notebooks/    01..05, run end-to-end in order
src/          identifiers, disease, drugs, scoring, provenance
tests/        scoring unit tests
docs/         recovery_test_report.md, data_sources.md, known_limitations.md

The deliverable

When complete, the artifact to share is three files:

docs/recovery_test_report.md — the 2-page write-up
data/results/ranked_candidates_v1.csv — the ranked drug list
The signature + drug profile files with provenance

Pipeline

Notebook	Stage	Output
`01_setup_identifiers.ipynb`	Pin disease/gene IDs	`data/processed/identifiers.json`
`02_disease_signature.ipynb`	GEO + differential expression	`sickle_cell_signature_v1.json`
`03_drug_profiles.ipynb`	ChEMBL + LINCS	`drug_profiles_v1.parquet`
`04_connectivity_scoring.ipynb`	CMap scoring	`ranked_candidates_v1.csv`
`05_recovery_test.ipynb`	Validation	`docs/recovery_test_report.md`

Every persisted artifact carries a confidence tier (A/B/C) and provenance. See PLAN.md §3.

README.md Unescape Escape

Reverso MVP — Sickle Cell Repurposing Pipeline

Quickstart

Project layout

The deliverable

Pipeline

README.md