Skip to content

Flatten esm-2 model package#1464

Open
pstjohn wants to merge 2 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/bio-237-flatten-esm2-model
Open

Flatten esm-2 model package#1464
pstjohn wants to merge 2 commits intoNVIDIA:mainfrom
pstjohn:pstjohn/bio-237-flatten-esm2-model

Conversation

@pstjohn
Copy link
Collaborator

@pstjohn pstjohn commented Feb 12, 2026

As we're adding the ESM-C model, we're going to need to install the esm package in the devcontainer for running esm-c model tests. This will conflict with our esm pseudo-package I created for the original ESM-2 model folder.

This PR just flattens the model into a single requirements.txt folder like llama3 and the other, newer models

Summary by CodeRabbit

Release Notes

  • New Features

    • Added comprehensive export workflow for converting HuggingFace ESM-2 checkpoints to Transformer Engine format, including parameter count formatting and verification.
  • Chores

    • Reorganized project structure and dependencies from pyproject.toml to modular configuration files.
    • Updated import paths and development setup instructions throughout codebase.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

The PR reorganizes the ESM2 module structure, flattening the codebase from src/esm/ to the root directory. It removes pyproject.toml, introduces requirements.txt, updates all imports from esm.* to direct module names, enhances the export pipeline with comprehensive functionality, and adjusts tests and CI scripts to reflect the new structure.

Changes

Cohort / File(s) Summary
Configuration & Dependency Management
bionemo-recipes/models/esm2/.ruff.toml, bionemo-recipes/models/esm2/requirements.txt, bionemo-recipes/models/esm2/pyproject.toml
Added Ruff config extending parent settings and new requirements.txt with dependencies (accelerate, datasets, hydra-core, jinja2, megatron-fsdp, omegaconf, peft, torch, torchao, transformer_engine, transformers). Removed pyproject.toml entirely.
Docker & Documentation
bionemo-recipes/models/esm2/Dockerfile, bionemo-recipes/models/esm2/README.md
Updated pip install from editable mode (-e .) to requirements.txt-based installation. Updated README with corrected import paths (esm.convert → convert, esm.modeling_esm_te → modeling_esm_te) and development container instructions.
Core Export Pipeline
bionemo-recipes/models/esm2/export.py
Introduced comprehensive export workflow replacing stub import: added BENCHMARK_RESULTS mapping, ESM_TAGS list, format_parameter_count() utility, and export_hf_checkpoint() function for HF-to-TE conversion with tokenizer saving, config patching, README templating, and smoke testing. Performs resource cleanup (gc.collect, torch.cuda.empty_cache).
Module Initialization & Import Updates
bionemo-recipes/models/esm2/src/esm/__init__.py, bionemo-recipes/models/esm2/convert.py
Removed license header from init.py. Updated convert.py imports from esm.state/esm.modeling_esm_te to direct module names (state, modeling_esm_te).
Removed Export Implementation
bionemo-recipes/models/esm2/src/esm/export.py
Deleted complete export module containing BENCHMARK_RESULTS, format_parameter_count(), and export_hf_checkpoint() (moved to root export.py).
Test Import Refactoring
bionemo-recipes/models/esm2/tests/conftest.py, bionemo-recipes/models/esm2/tests/test_collator.py, bionemo-recipes/models/esm2/tests/test_collator_context_parallel.py, bionemo-recipes/models/esm2/tests/test_cp_*.py, bionemo-recipes/models/esm2/tests/test_distributed_*.py, bionemo-recipes/models/esm2/tests/test_export.py, bionemo-recipes/models/esm2/tests/test_modeling_esm_te.py, bionemo-recipes/models/esm2/tests/test_peft.py
Updated all test imports from esm.* module paths to direct module names (collator, convert, modeling_esm_te, export, state). Added sys.path manipulations in several tests (conftest, cp tests, distributed tests) to enable bare imports when launched via torchrun.
CI/Build Integration
ci/scripts/check_copied_files.py
Updated SOURCE_TO_DESTINATION_MAP to reflect new source locations: shifted keys from src/esm/ paths to root-level paths (modeling_esm_te.py, collator.py, state.py), affecting file existence validation and copy logic.

Sequence Diagram

sequenceDiagram
    participant User as User/CLI
    participant Main as main()
    participant Loader as HF Model Loader
    participant Converter as ESM Converter
    participant Tokenizer as Tokenizer Handler
    participant Config as Config Patcher
    participant Template as Template Engine
    participant FileOps as File Operations
    participant TE as TE Model Loader
    
    User->>Main: Run export pipeline
    Main->>Main: Iterate ESM_TAGS
    Main->>Loader: Load HF checkpoint
    Loader-->>Converter: Return HF model
    Converter->>Converter: Align pooler layers
    Converter->>Converter: Convert to TE format
    Converter-->>FileOps: Serialized TE model
    FileOps->>FileOps: Save converted model
    Main->>Tokenizer: Load & save tokenizer
    Tokenizer-->>FileOps: Tokenizer artifacts
    Main->>Config: Patch config auto_map
    Config-->>FileOps: Updated config.json
    Main->>Template: Render README (benchmark_results, param_count)
    Template-->>FileOps: Generated README
    FileOps->>FileOps: Copy LICENSE, cleanup
    Main->>TE: Smoke test: load TE model
    TE-->>Main: Verify successful load
    Main->>Main: gc.collect(), torch.cuda.empty_cache()
    Main-->>User: Export complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Modules flattened, imports made clear,
Export pipeline blooms with cheer!
From stub to substance, feature take flight,
Requirements organized, configs just right!
ESM2 hops forward with structure so bright! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description explains the motivation and the restructuring being done, but is missing required template sections like Type of changes, CI Pipeline Configuration, and Pre-submit Checklist. Add the missing template sections: mark the type of change (likely 'Refactor'), configure CI pipeline labels if needed, and complete the pre-submit checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Flatten esm-2 model package' directly and clearly summarizes the main change in the PR, which restructures the ESM-2 model from a nested package layout to a flattened requirements.txt layout.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pstjohn pstjohn force-pushed the pstjohn/bio-237-flatten-esm2-model branch from 0156c98 to 9f32261 Compare February 12, 2026 14:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pstjohn/bio-237-flatten-esm2-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
@pstjohn pstjohn force-pushed the pstjohn/bio-237-flatten-esm2-model branch from d99aecd to f82e07c Compare February 12, 2026 15:27
@pstjohn
Copy link
Collaborator Author

pstjohn commented Feb 18, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
bionemo-recipes/models/esm2/README.md (1)

152-159: ⚠️ Potential issue | 🟡 Minor

Inconsistent CLI command: huggingface-cli upload vs hf upload.

Line 152 uses huggingface-cli upload but Line 159 uses hf upload. While hf may be a valid alias in some installations, this is inconsistent within the same document and could confuse users.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/README.md` around lines 152 - 159, The README
shows two different CLI invocations—`huggingface-cli upload` and `hf
upload`—which is inconsistent; pick one canonical command and update both
occurrences so the upload example and the loop use the same CLI invocation
(e.g., replace `huggingface-cli upload` with `hf upload` or vice versa)
including the examples that reference `checkpoint_export` and the loop over `*/`
to ensure the document consistently uses the chosen command name.
bionemo-recipes/models/esm2/tests/conftest.py (1)

105-112: ⚠️ Potential issue | 🟡 Minor

te_model_checkpoint is now function-scoped — conversion cost per test.

Without an explicit scope=, this fixture runs the full HF download + TE conversion + save_pretrained for every test that uses it. The change is a correctness fix (function-scoped tmp_path can't be used in session-scoped fixtures), but if multiple tests consume this fixture, scope="module" with tmp_path_factory would be a reasonable middle ground.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/tests/conftest.py` around lines 105 - 112, The
te_model_checkpoint fixture currently runs per-test and downloads/converts the
HF model each time; change its scope to "module" and switch from tmp_path to
tmp_path_factory so the expensive setup runs once: update the fixture signature
for te_model_checkpoint to accept tmp_path_factory, create a module-level
temporary directory via tmp_path_factory.mktemp(...), perform
AutoModelForMaskedLM.from_pretrained(...) and convert_esm_hf_to_te(...) once,
save_pretrained into that tmp dir, and return the path; keep the fixture name
te_model_checkpoint and calls to convert_esm_hf_to_te and
model_te.save_pretrained unchanged.
🧹 Nitpick comments (6)
bionemo-recipes/models/esm2/export.py (1)

29-36: Consider deriving benchmark results from ESM_TAGS or validating consistency.

BENCHMARK_RESULTS keys and ESM_TAGS are maintained independently. If a tag is added to ESM_TAGS without a corresponding entry in BENCHMARK_RESULTS, Line 103 (BENCHMARK_RESULTS[tag]) will raise a KeyError at runtime.

Proposed fix: add a validation
+# Validate that all ESM_TAGS have benchmark results
+assert set(ESM_TAGS) == set(BENCHMARK_RESULTS.keys()), (
+    f"ESM_TAGS and BENCHMARK_RESULTS keys must match. "
+    f"Missing: {set(ESM_TAGS) - set(BENCHMARK_RESULTS.keys())}"
+)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/export.py` around lines 29 - 36,
BENCHMARK_RESULTS is maintained separately from ESM_TAGS which can cause a
KeyError at runtime when the code does BENCHMARK_RESULTS[tag]; add a validation
step (at module import or before use) that iterates over ESM_TAGS and checks
each tag exists in BENCHMARK_RESULTS, raising a clear error or filling missing
entries with a sensible default; update the check near the code that accesses
BENCHMARK_RESULTS[tag] (referenced in the review as the access on line 103) and
ensure the validation references the symbols ESM_TAGS and BENCHMARK_RESULTS so
missing tags are detected early with an informative message.
bionemo-recipes/models/esm2/requirements.txt (1)

1-11: No version pins on dependencies.

All dependencies except torchao are unpinned. Since this runs inside the NGC PyTorch container which provides pre-installed, compatible versions of most of these packages, this is likely intentional. However, for reproducibility outside the container (e.g., local dev), unpinned deps can lead to breakage.

Consider whether adding version lower-bounds (e.g., transformers>=4.x) would be worthwhile for the local dev workflow described in the README.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/requirements.txt` around lines 1 - 11, The
requirements file currently lists open-ended dependencies which harms
reproducibility; update bionemo-recipes/models/esm2/requirements.txt to add
conservative lower-bound version pins for key packages (for example use
transformers>=4.x, datasets>=2.x, hydra-core>=1.x, jinja2>=3.x, omegaconf>=2.x,
accelerate>=0.x, peft>=0.x) while keeping the existing torchao!=0.14.0
exclusion; choose minimal compatible lower bounds that reflect the versions
tested in the README/NGC container and add them to each package line so local
dev installs are deterministic and compatible with the container-provided stack.
bionemo-recipes/models/esm2/tests/conftest.py (1)

29-30: Prefer sys.path.insert(0, ...) over sys.path.append for module lookup priority.

sys.path.append adds the local esm2/ directory at the end of the search path. If any installed package provides a module named convert, collator, or modeling_esm_te, it would shadow the local file. All other files in this PR that manipulate sys.path use sys.path.insert(0, ...) to guarantee the local directory takes precedence.

🔧 Suggested fix
-sys.path.append(Path(__file__).parent.parent.as_posix())
-sys.path.append(Path(__file__).parent.as_posix())
+sys.path.insert(0, Path(__file__).parent.parent.as_posix())
+sys.path.insert(0, Path(__file__).parent.as_posix())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/tests/conftest.py` around lines 29 - 30, Replace
sys.path.append calls so the local esm2 directories are prepended to module
search path: change the two occurrences that call sys.path.append with the
corresponding sys.path.insert(0, ...) using the same
Path(__file__).parent.parent.as_posix() and Path(__file__).parent.as_posix()
expressions, ensuring the local directories take precedence during import
resolution.
bionemo-recipes/models/esm2/tests/test_cp_dataloader.py (1)

33-35: Redundant import sys inside __main__ block after moving it to module level.

import sys is now imported at module level (line 33). The existing import sys in the if __name__ == "__main__": block (around line 1104) is now redundant.

🧹 Suggested cleanup
 if __name__ == "__main__":
-    import sys
     ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/tests/test_cp_dataloader.py` around lines 33 -
35, Remove the redundant import sys inside the if __name__ == "__main__":
block—since sys is already imported at module level (top of file), delete the
second import statement near the bottom of the file (in the main block) so there
are no duplicate imports; verify no conditional import-specific behavior relies
on that inner import.
bionemo-recipes/models/esm2/tests/test_cp_bshd.py (1)

23-26: sys.path.insert at module level mutates session-wide sys.path during pytest collection.

The insertion runs unconditionally whenever pytest imports the file — not only under torchrun. If conftest.py already adds this path, a duplicate entry is created at position 0, and any module named collator, convert, or modeling_esm_te in this directory will shadow identically named modules that appear later in sys.path for the entire test session.

An idempotency guard avoids the duplicate:

♻️ Proposed fix
-sys.path.insert(0, Path(__file__).resolve().parent.parent.as_posix())
+_model_dir = Path(__file__).resolve().parent.parent.as_posix()
+if _model_dir not in sys.path:
+    sys.path.insert(0, _model_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/tests/test_cp_bshd.py` around lines 23 - 26,
Avoid unconditionally mutating session-wide sys.path by checking for the target
path before inserting: compute the path
(Path(__file__).resolve().parent.parent.as_posix()) and only call
sys.path.insert(0, ...) if that string is not already present in sys.path;
replace the current unconditional sys.path.insert(0, Path(...).as_posix()) with
this idempotent guard so tests run under torchrun still get the path without
creating duplicate entries or shadowing modules.
bionemo-recipes/models/esm2/tests/test_cp_thd.py (1)

23-26: Same unconditional sys.path.insert as in test_cp_bshd.py — same idempotency concern applies.

Both files insert the same directory at sys.path[0] unconditionally, creating two duplicate entries when pytest collects them in the same session. Consider the same guard:

♻️ Proposed fix
-sys.path.insert(0, Path(__file__).resolve().parent.parent.as_posix())
+_model_dir = Path(__file__).resolve().parent.parent.as_posix()
+if _model_dir not in sys.path:
+    sys.path.insert(0, _model_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/tests/test_cp_thd.py` around lines 23 - 26, The
unconditional sys.path.insert call in test_cp_thd.py is causing duplicate
entries when multiple tests run; change it to compute the directory via
Path(__file__).resolve().parent.parent.as_posix() (same expression already used)
and only insert it if that string is not already present in sys.path (i.e.,
check "if model_dir not in sys.path: sys.path.insert(0, model_dir)"); mirror the
idempotent guard used in the other test (test_cp_bshd.py) to avoid duplicates
while preserving the conftest fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/models/esm2/export.py`:
- Around line 113-115: The cleanup currently deletes model_hf and model_te but
misses freeing model_hf_masked_lm, causing a memory leak; update the teardown in
export.py to explicitly delete model_hf_masked_lm (e.g., call del
model_hf_masked_lm) before gc.collect() and torch.cuda.empty_cache() in the same
block that deletes model_hf and model_te (refer to the symbols
model_hf_masked_lm, model_hf, model_te), ensuring any other large objects tied
to that masked LM are also released prior to collecting and emptying the CUDA
cache.
- Around line 118-122: The from_pretrained call for AutoModelForMaskedLM is
passing dtype instead of the correct HuggingFace kwarg torch_dtype, so bfloat16
is ignored; update the call that constructs model_te
(AutoModelForMaskedLM.from_pretrained) to pass torch_dtype=torch.bfloat16
(keeping trust_remote_code=True and the same path) so the model actually loads
in bfloat16 during the smoke test.

---

Outside diff comments:
In `@bionemo-recipes/models/esm2/README.md`:
- Around line 152-159: The README shows two different CLI
invocations—`huggingface-cli upload` and `hf upload`—which is inconsistent; pick
one canonical command and update both occurrences so the upload example and the
loop use the same CLI invocation (e.g., replace `huggingface-cli upload` with
`hf upload` or vice versa) including the examples that reference
`checkpoint_export` and the loop over `*/` to ensure the document consistently
uses the chosen command name.

In `@bionemo-recipes/models/esm2/tests/conftest.py`:
- Around line 105-112: The te_model_checkpoint fixture currently runs per-test
and downloads/converts the HF model each time; change its scope to "module" and
switch from tmp_path to tmp_path_factory so the expensive setup runs once:
update the fixture signature for te_model_checkpoint to accept tmp_path_factory,
create a module-level temporary directory via tmp_path_factory.mktemp(...),
perform AutoModelForMaskedLM.from_pretrained(...) and convert_esm_hf_to_te(...)
once, save_pretrained into that tmp dir, and return the path; keep the fixture
name te_model_checkpoint and calls to convert_esm_hf_to_te and
model_te.save_pretrained unchanged.

---

Nitpick comments:
In `@bionemo-recipes/models/esm2/export.py`:
- Around line 29-36: BENCHMARK_RESULTS is maintained separately from ESM_TAGS
which can cause a KeyError at runtime when the code does BENCHMARK_RESULTS[tag];
add a validation step (at module import or before use) that iterates over
ESM_TAGS and checks each tag exists in BENCHMARK_RESULTS, raising a clear error
or filling missing entries with a sensible default; update the check near the
code that accesses BENCHMARK_RESULTS[tag] (referenced in the review as the
access on line 103) and ensure the validation references the symbols ESM_TAGS
and BENCHMARK_RESULTS so missing tags are detected early with an informative
message.

In `@bionemo-recipes/models/esm2/requirements.txt`:
- Around line 1-11: The requirements file currently lists open-ended
dependencies which harms reproducibility; update
bionemo-recipes/models/esm2/requirements.txt to add conservative lower-bound
version pins for key packages (for example use transformers>=4.x, datasets>=2.x,
hydra-core>=1.x, jinja2>=3.x, omegaconf>=2.x, accelerate>=0.x, peft>=0.x) while
keeping the existing torchao!=0.14.0 exclusion; choose minimal compatible lower
bounds that reflect the versions tested in the README/NGC container and add them
to each package line so local dev installs are deterministic and compatible with
the container-provided stack.

In `@bionemo-recipes/models/esm2/tests/conftest.py`:
- Around line 29-30: Replace sys.path.append calls so the local esm2 directories
are prepended to module search path: change the two occurrences that call
sys.path.append with the corresponding sys.path.insert(0, ...) using the same
Path(__file__).parent.parent.as_posix() and Path(__file__).parent.as_posix()
expressions, ensuring the local directories take precedence during import
resolution.

In `@bionemo-recipes/models/esm2/tests/test_cp_bshd.py`:
- Around line 23-26: Avoid unconditionally mutating session-wide sys.path by
checking for the target path before inserting: compute the path
(Path(__file__).resolve().parent.parent.as_posix()) and only call
sys.path.insert(0, ...) if that string is not already present in sys.path;
replace the current unconditional sys.path.insert(0, Path(...).as_posix()) with
this idempotent guard so tests run under torchrun still get the path without
creating duplicate entries or shadowing modules.

In `@bionemo-recipes/models/esm2/tests/test_cp_dataloader.py`:
- Around line 33-35: Remove the redundant import sys inside the if __name__ ==
"__main__": block—since sys is already imported at module level (top of file),
delete the second import statement near the bottom of the file (in the main
block) so there are no duplicate imports; verify no conditional import-specific
behavior relies on that inner import.

In `@bionemo-recipes/models/esm2/tests/test_cp_thd.py`:
- Around line 23-26: The unconditional sys.path.insert call in test_cp_thd.py is
causing duplicate entries when multiple tests run; change it to compute the
directory via Path(__file__).resolve().parent.parent.as_posix() (same expression
already used) and only insert it if that string is not already present in
sys.path (i.e., check "if model_dir not in sys.path: sys.path.insert(0,
model_dir)"); mirror the idempotent guard used in the other test
(test_cp_bshd.py) to avoid duplicates while preserving the conftest fallback
behavior.

Comment on lines +113 to +115
del model_hf, model_te
gc.collect()
torch.cuda.empty_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

model_hf_masked_lm is not freed — memory leak for large models.

Line 113 deletes model_hf and model_te but model_hf_masked_lm (loaded on Line 71) is still alive. When iterating over 6 models including a 15B-parameter one, this leaks a full model's worth of GPU/CPU memory per iteration.

Proposed fix
-    del model_hf, model_te
+    del model_hf, model_hf_masked_lm, model_te
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
del model_hf, model_te
gc.collect()
torch.cuda.empty_cache()
del model_hf, model_hf_masked_lm, model_te
gc.collect()
torch.cuda.empty_cache()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/export.py` around lines 113 - 115, The cleanup
currently deletes model_hf and model_te but misses freeing model_hf_masked_lm,
causing a memory leak; update the teardown in export.py to explicitly delete
model_hf_masked_lm (e.g., call del model_hf_masked_lm) before gc.collect() and
torch.cuda.empty_cache() in the same block that deletes model_hf and model_te
(refer to the symbols model_hf_masked_lm, model_hf, model_te), ensuring any
other large objects tied to that masked LM are also released prior to collecting
and emptying the CUDA cache.

Comment on lines +118 to +122
model_te = AutoModelForMaskedLM.from_pretrained(
export_path / tag,
dtype=torch.bfloat16,
trust_remote_code=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

HuggingFace transformers AutoModelForMaskedLM from_pretrained torch_dtype dtype parameter

💡 Result:

In 🤗 Transformers (PyTorch), AutoModelForMaskedLM.from_pretrained(...) uses the keyword torch_dtype (not dtype) to control the floating-point dtype used when loading the model.

torch_dtype (PyTorch)

Supported values include:

  • A real torch dtype, e.g. torch.float16, torch.bfloat16, torch.float32 (fp32 is the default if you don’t set anything). (huggingface.co)
  • The string "auto", which tries (1) torch_dtype in the model’s config.json, else (2) infers dtype from the first floating-point weight in the checkpoint. (huggingface.co)

Example:

from transformers import AutoModelForMaskedLM
import torch

model = AutoModelForMaskedLM.from_pretrained(
    "bert-base-uncased",
    torch_dtype=torch.float16,   # or torch.bfloat16, or "auto"
)

(huggingface.co)

What about a dtype= parameter?

In the Transformers docs, dtype is documented for JAX/Flax model loading/computation (not PyTorch). For PyTorch you should use torch_dtype. (huggingface.co)

Citations:


🏁 Script executed:

find . -name "export.py" -path "*/esm2/*" | head -5

Repository: NVIDIA/bionemo-framework

Length of output: 107


🏁 Script executed:

cat -n bionemo-recipes/models/esm2/export.py | sed -n '110,130p'

Repository: NVIDIA/bionemo-framework

Length of output: 804


Change dtype to torch_dtype in the from_pretrained() call.

The HuggingFace transformers API uses torch_dtype, not dtype, for PyTorch models. Using dtype will be silently ignored and absorbed by **kwargs, causing the model to load in the default precision (fp32) instead of bfloat16, which increases memory usage during the smoke test.

Current code (lines 118-122):
    model_te = AutoModelForMaskedLM.from_pretrained(
        export_path / tag,
        dtype=torch.bfloat16,
        trust_remote_code=True,
    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/models/esm2/export.py` around lines 118 - 122, The
from_pretrained call for AutoModelForMaskedLM is passing dtype instead of the
correct HuggingFace kwarg torch_dtype, so bfloat16 is ignored; update the call
that constructs model_te (AutoModelForMaskedLM.from_pretrained) to pass
torch_dtype=torch.bfloat16 (keeping trust_remote_code=True and the same path) so
the model actually loads in bfloat16 during the smoke test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments