Closed
Conversation
Originally from TransformerLensOrg#718.
Add OLMoE
7 tasks
Fix to OLMo 2 normalization
|
Hey @jonasrohw, looks like you've got this feature pretty much ready to go - just seeing type check failures blocking it. I'd be interested in taking a stab at fixing those type issues if you're not actively working on it. |
Contributor
Author
|
@taziksh Yeah, I didn't have time to fix some of the type issues. Go ahead! |
|
@jonasrohw
|
5 tasks
jlarson4
added a commit
that referenced
this pull request
Feb 13, 2026
* added and tested: OLMo-1B,OLMo-7B * fixed: numpy do not do a major upgrade! * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * added and tested: OLMo-1B,OLMo-7B * fixed: dimensions of 7b to be correct * tested: Loading checkpoints & model variations * Reimplement OLMoE changes. Originally from #718. * Implement TODO (norm_topk_prob) * Disable bos token for OLMoE. * Add q and k norm. * Correct normalization type for OLMoE. * ran formatting * tmp update for olmo2 * Fix: Olmo2 uses normalization after the attention/mlp * ran format * fixed some type issues * OLMo 2 RMS * OLMo 2 RMS * Tested Instruct models * fix: Olmo2DecoderLayer type issues * fix type assertions for attention * chore: bump min Python to 3.10 for jaxtyping mypy plugin compatibility * fix: sort imports in olmo2.py * docs: update Colab notebook for OLMo models * Adjust error message to improve testing * conflict resolution * Updating lock * Fixed formatting, update error messages to properly test * more formatting * fixing type error * fix format error * Fix type issues * Fix type issues * Fix format issues * Fix format issues again * Fix format issues for black * another attempt at black formatting * Fix format issues for black again * Retyping the blocks in HookedTransformer and HookedEncoder * undo modulelist typing * Improve type checking in test_detect_head_with_invalid_head_name * removing unused import * Fixing Patchscopes_Generation_Demo.ipynb * Fixing the rest of the notebooks * Fixing the more notebooks * run_line_magic * BERT ipynb fix * Trying to fix the BERT set_grad cell * more set_grad cell fixes * Updated after rebase to fix missing 3.x changes * Updating OLMo PR to work with v3.x * Format fix * fix model ordering --------- Co-authored-by: Jonas Rohweder <jonas.rohw@gmail.com> Co-authored-by: Jonas Rohweder <jonas.rohweder@stud.tu-darmstadt.de> Co-authored-by: Joel Burget <joelburget@gmail.com> Co-authored-by: Jonas Rohw <40701485+jonasrohw@users.noreply.github.com> Co-authored-by: Bryce Meyer <bryce13950@gmail.com> Co-authored-by: Jay Zhou <zhejianz@usc.edu> Co-authored-by: jleechung <joseph.lee@u.nus.edu> Co-authored-by: Jonah Larson <jlarson@equity-creative.com>
# Conflicts: # .gitignore # transformer_lens/HookedTransformer.py # transformer_lens/components/abstract_attention.py # transformer_lens/components/transformer_block.py # transformer_lens/config/HookedTransformerConfig.py # transformer_lens/loading_from_pretrained.py # transformer_lens/pretrained/weight_conversions/__init__.py # transformer_lens/pretrained/weight_conversions/olmo.py # transformer_lens/pretrained/weight_conversions/olmo2.py # transformer_lens/pretrained/weight_conversions/olmoe.py
Collaborator
|
All the code that is from this PR was included via @taziksh's PR, and is now in dev-3.x. I've merged this with the dev-3.x branch and can confirm all relevant code is now in that branch. In the next 3.x beta, OLMo will be available both via HookedTransformer, and TransformerBridge! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Adds support for OLMo v1 model family and OLMoE.
Transformers>3.40will letnumpydo a major upgrade;pyproject.tomlprevents this now.OLMO v2 will require dropping python3.8 support because the required
Transformersversion also drops it. It will be added in a separate PR based on TransformerLens 3.This also completes PR: #718
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Checklist: