OncoLearn

OncoLearn is a multimodal machine learning toolkit for cancer genomics analysis and biomarker discovery. It integrates genomic, transcriptomic, clinical, and medical imaging data to enable end-to-end training of cancer classification and subtype prediction models across TCGA cohorts.

Full documentation is available on the OncoLearn Wiki.

Key Capabilities

Capability	Description
Data Acquisition	Download TCGA genomics data from UCSC Xena Browser, imaging data from TCIA, and clinical/molecular data from cBioPortal via a unified CLI
Multimodal Fusion	Train models that jointly learn from mRNA expression, clinical features, and MRI/pathology images
Pretrained Encoders	Leverages IBM's RNA BERT (110M) for gene expression and FM-BCMRI for hierarchical 3D image encoding
Pipeline DSL	Declare data loading pipelines in plain Python — `Load`, `Join`, `Sequence`, and transform nodes compose into arbitrary multi-source workflows
Hyperparameter Optimisation	Optuna-based HPO over optimizer, loss, scheduler, and model parameters, with optional cross-validation

Quickstart

git clone https://github.com/collaborativebioinformatics/OncoLearn.git
cd OncoLearn
git submodule update --init --recursive

# Start the Docker environment (choose your GPU profile)
docker compose --profile nvidia up -d    # NVIDIA
docker compose --profile amd up -d      # AMD (native Linux)
docker compose --profile amd-wsl up -d  # AMD (WSL2)

For platform-specific setup, local installation, and full CLI reference, see the Wiki.

Documentation

Comprehensive documentation is available on the OncoLearn Wiki:

Getting Started — Installation guides for Windows, Linux, and Docker
CLI Reference — train, preprocess, xena, tcia, cbioportal subcommands
Modeling — Encoder architecture, fusion model, and config reference
Pipeline DSL — Declare data loading and transformation pipelines in Python
Training Guide — Config options, variants, Docker usage, and output format
Python API — Programmatic access to Xena Browser, TCIA, and cBioPortal

Contributors

Heena Dalal (dalalhina@gmail.com / heena.dalal@kcl.ac.uk), Aryan Sharan Guda (aryanshg@andrew.cmu.edu), Seungjin Han (seungjih@andrew.cmu.edu), Seohyun Lee (seohyun4@andrew.cmu.edu), Yosen Lin (yosenl@andrew.cmu.edu), Isha Parikh (parikh.i@northeastern.edu), Diya Patidar (dpatidar@andrew.cmu.edu), Arunannamalai Sujatha Bharath Raj (asujatha@andrew.cmu.edu), Andrew Scouten (yzb2@txstate.edu), Jeffrey Wang (jdw2@andrew.cmu.edu), Qiyu (Charlie) Yang (qiyuy@andrew.cmu.edu), Zhaoyi (Zoey) You (zhaoyiyou.zoey@gmail.com), Xinru Zhang (mayzxr2203@gmail.com), River Zhu (riverz@andrew.cmu.edu)

License

This project is licensed under the MIT License — see the LICENSE file for details.

AI Disclosure

Artificial intelligence tools, including large language models (LLMs), were used during the development of this project to support writing, clarify technical concepts, and assist in generating code snippets. These tools served as an aid for idea refinement, debugging, and improving the readability of explanations and documentation. All AI-generated text and code were thoroughly reviewed, verified for correctness, and understood in full before being incorporated into this work. The responsibility for all final decisions, interpretations, and implementations remains solely with the contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
data		data
docker		docker
docs		docs
notebooks		notebooks
renv		renv
scripts		scripts
src/oncolearn		src/oncolearn
submodules		submodules
tests		tests
.Rprofile		.Rprofile
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
renv.lock		renv.lock
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OncoLearn

Key Capabilities

Quickstart

Documentation

Contributors

License

AI Disclosure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OncoLearn

Key Capabilities

Quickstart

Documentation

Contributors

License

AI Disclosure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages