Go to file

Junior B. 4022c0cb94 GPU Phase 1 runnable: real Boltz-2 co-folding + alignment review

Flesh out the Modal app into a runnable Phase-1 positive-control test and
reconcile it with the plan:
- cofold() GPU fn: build Boltz-2 YAML (protein+ligand+affinity), run
  `boltz predict --use_msa_server --cache /weights/boltz`, parse affinity
  JSON + predicted pose; weights persist via Volume.
- Local helpers (CPU, import-tested against our PDBs): binding_chain_sequence
  (gemmi -- correctly picks the binding chain, e.g. alpha-globin for 5E83),
  pubchem_smiles, build_boltz_yaml, fetch_pdb (RCSB).
- main(): fan out cofold.starmap over 3 targets x (known binder + 2
  negatives); tabulate; PASS if known binder has top P(binder) for its target.

Alignment fixes:
- Rank by P(binder) (higher=better), NOT raw affinity_pred_value whose sign
  (~log IC50) is version-dependent -- avoids a backwards positive-control test.
- gpu_plan.md Phase 1 updated to affinity/P(binder) ranking; pose-RMSD noted
  as a later refinement (needs receptor superposition).

Local half verified (sequence/SMILES/YAML); cofold() needs a live `modal run`
(account + `modal token new`) to validate end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-24 16:56:27 +02:00

data

Structure-binding track: scaffold + ligand-retrieval baseline

2026-06-23 23:53:27 +02:00

docs

GPU Phase 1 runnable: real Boltz-2 co-folding + alignment review

2026-06-24 16:56:27 +02:00

gpu

GPU Phase 1 runnable: real Boltz-2 co-folding + alignment review

2026-06-24 16:56:27 +02:00

notebooks

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

scripts

Production docking: prep helps (7.9->4.8A) but Vina wrong tool for sickle

2026-06-24 16:38:54 +02:00

src

Structure-binding track: scaffold + ligand-retrieval baseline

2026-06-23 23:53:27 +02:00

tests

v1.1: full gene space + specificity z-score; hydroxyurea recovers

2026-06-23 22:57:30 +02:00

.gitignore

Docking baseline: toolchain solved, raw affinity is size-biased

2026-06-24 00:03:00 +02:00

PLAN.md

PLAN §12.9: leave door open for generative-guided retrieval

2026-06-23 23:43:25 +02:00

pyproject.toml

Structure-binding track: scaffold + ligand-retrieval baseline

2026-06-23 23:53:27 +02:00

README.md

Scaffold Reverso MVP pipeline structure

2026-06-23 20:20:09 +02:00

README.md

Reverso MVP — Sickle Cell Repurposing Pipeline

A minimum viable drug repurposing pipeline for sickle cell disease: build a disease signature from public transcriptomic data, build drug profiles for ~300 small molecules, and rank them by CMap-style connectivity scoring. Validated by a recovery test — do the two known sickle cell drugs (hydroxyurea, L-glutamine) rank near the top?

See PLAN.md for the full specification, locked decisions, and week-by-week build plan.

Quickstart

# Requires Python >=3.11,<3.13 (see note below)
pip install -e .            # or: pip install -e ".[dev]" for test/lint tooling
pytest                      # run unit tests

Python version note: use Python 3.11–3.13 (python3.13 -m venv .venv). Python 3.14 is not yet supported by all pipeline dependencies (pydeseq2, cmapPy).

Project layout

data/         raw (downloaded, never edited) / processed / results — gitignored
notebooks/    01..05, run end-to-end in order
src/          identifiers, disease, drugs, scoring, provenance
tests/        scoring unit tests
docs/         recovery_test_report.md, data_sources.md, known_limitations.md

The deliverable

When complete, the artifact to share is three files:

docs/recovery_test_report.md — the 2-page write-up
data/results/ranked_candidates_v1.csv — the ranked drug list
The signature + drug profile files with provenance

Pipeline

Notebook	Stage	Output
`01_setup_identifiers.ipynb`	Pin disease/gene IDs	`data/processed/identifiers.json`
`02_disease_signature.ipynb`	GEO + differential expression	`sickle_cell_signature_v1.json`
`03_drug_profiles.ipynb`	ChEMBL + LINCS	`drug_profiles_v1.parquet`
`04_connectivity_scoring.ipynb`	CMap scoring	`ranked_candidates_v1.csv`
`05_recovery_test.ipynb`	Validation	`docs/recovery_test_report.md`

Every persisted artifact carries a confidence tier (A/B/C) and provenance. See PLAN.md §3.

README.md Unescape Escape

Reverso MVP — Sickle Cell Repurposing Pipeline

Quickstart

Project layout

The deliverable

Pipeline

README.md