Added OLMo(E) v1 by jonasrohw · Pull Request #816 · TransformerLensOrg/TransformerLens

jonasrohw · 2024-12-15T09:26:21Z

Description

Adds support for OLMo v1 model family and OLMoE. Transformers>3.40 will let numpy do a major upgrade; pyproject.toml prevents this now.

OLMO v2 will require dropping python3.8 support because the required Transformers version also drops it. It will be added in a separate PR based on TransformerLens 3.
This also completes PR: #718

Fixes # (issue)

[Proposal] Compatibility for OLMo and OLMo2? #804 (partially)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
[] I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Originally from TransformerLensOrg#718.

Add OLMoE

Fix to OLMo 2 normalization

taziksh · 2025-09-29T04:50:42Z

Hey @jonasrohw, looks like you've got this feature pretty much ready to go - just seeing type check failures blocking it. I'd be interested in taking a stab at fixing those type issues if you're not actively working on it.

jonasrohw · 2025-10-02T10:19:36Z

@taziksh Yeah, I didn't have time to fix some of the type issues. Go ahead!

taziksh · 2025-10-12T03:00:13Z

@jonasrohw
I've added type checking fixes to complete the OLMo implementation in #1081. Happy to collaborate however works best!

* added and tested: OLMo-1B,OLMo-7B * fixed: numpy do not do a major upgrade! * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * added and tested: OLMo-1B,OLMo-7B * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * Adjust error message to improve testing * conflict resolution * Updating lock * Fixed formatting, update error messages to properly test * more formatting * fixing type error * fix format error * Fix type issues * Fix type issues * Fix format issues * Fix format issues again * Fix format issues for black * another attempt at black formatting * Fix format issues for black again * Retyping the blocks in HookedTransformer and HookedEncoder * undo modulelist typing * Improve type checking in test_detect_head_with_invalid_head_name * removing unused import * Fixing Patchscopes_Generation_Demo.ipynb * Fixing the rest of the notebooks * Fixing the more notebooks * run_line_magic * BERT ipynb fix * Trying to fix the BERT set_grad cell * more set_grad cell fixes * Updated after rebase to fix missing 3.x changes * Updating OLMo PR to work with v3.x * Format fix * fix model ordering --------- Co-authored-by: Jonas Rohweder <jonas.rohw@gmail.com> Co-authored-by: Jonas Rohweder <jonas.rohweder@stud.tu-darmstadt.de> Co-authored-by: Joel Burget <joelburget@gmail.com> Co-authored-by: Jonas Rohw <40701485+jonasrohw@users.noreply.github.com> Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: Jay Zhou <zhejianz@usc.edu> Co-authored-by: jleechung <joseph.lee@u.nus.edu> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>

# Conflicts: # .gitignore # transformer_lens/HookedTransformer.py # transformer_lens/components/abstract_attention.py # transformer_lens/components/transformer_block.py # transformer_lens/config/HookedTransformerConfig.py # transformer_lens/loading_from_pretrained.py # transformer_lens/pretrained/weight_conversions/__init__.py # transformer_lens/pretrained/weight_conversions/olmo.py # transformer_lens/pretrained/weight_conversions/olmo2.py # transformer_lens/pretrained/weight_conversions/olmoe.py

jlarson4 · 2026-02-14T18:23:30Z

All the code that is from this PR was included via @taziksh's PR, and is now in dev-3.x. I've merged this with the dev-3.x branch and can confirm all relevant code is now in that branch.

In the next 3.x beta, OLMo will be available both via HookedTransformer, and TransformerBridge!

jonasrohw and others added 12 commits December 12, 2024 09:58

added and tested: OLMo-1B,OLMo-7B

1fe4d04

fixed: numpy do not do a major upgrade!

0f3e3b3

fixed: dimensions of 7b to be correct

3a101f4

tested: Loading checkpoints & model variations

1b34ccd

Reimplement OLMoE changes.

f0a0a68

Originally from TransformerLensOrg#718.

Implement TODO (norm_topk_prob)

8c094e5

Disable bos token for OLMoE.

7565c06

Add q and k norm.

04cd309

Correct normalization type for OLMoE.

68d6961

Merge pull request #1 from joelburget/olmoe

9afd032

Add OLMoE

Merge branch 'dev' into OLMo

96c1fbb

ran formatting

72fb903

joelburget mentioned this pull request Dec 17, 2024

Add allenai/OLMoE-1B-7B-0924. #718

Closed

7 tasks

bryce13950 and others added 8 commits February 5, 2025 00:27

Merge branch 'dev' into OLMo

9d3a85e

Merge branch 'dev' into OLMo

d4519b2

tmp update for olmo2

064310f

Fix: Olmo2 uses normalization after the attention/mlp

b1fd04b

Merge branch 'dev' into OLMo

871ba03

ran format

7939e8d

fixed some type issues

97fd1e7

Merge branch 'dev' into OLMo

9032fe7

bryce13950 added the pr-typing-issues label Jun 24, 2025

jleechung and others added 4 commits July 22, 2025 18:23

OLMo 2 RMS

39703c4

OLMo 2 RMS

1c283c1

Tested Instruct models

688a421

Merge pull request #3 from jleechung/OLMo

9febc5c

Fix to OLMo 2 normalization

taziksh mentioned this pull request Oct 12, 2025

Complete type checking for OLMo support (builds on #816) #1081

Merged

5 tasks

jlarson4 changed the base branch from dev to dev-3.x February 14, 2026 18:15

Resolve duplicate code & irrelevant changes

374034e

jlarson4 closed this Feb 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added OLMo(E) v1#816

Added OLMo(E) v1#816
jonasrohw wants to merge 26 commits intoTransformerLensOrg:dev-3.xfrom
jonasrohw:OLMo

jonasrohw commented Dec 15, 2024

Uh oh!

taziksh commented Sep 29, 2025

Uh oh!

jonasrohw commented Oct 2, 2025

Uh oh!

taziksh commented Oct 12, 2025 •

edited

Loading

Uh oh!

jlarson4 commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

jonasrohw commented Dec 15, 2024

Description

Type of change

Checklist:

Uh oh!

taziksh commented Sep 29, 2025

Uh oh!

jonasrohw commented Oct 2, 2025

Uh oh!

taziksh commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlarson4 commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

taziksh commented Oct 12, 2025 •

edited

Loading