Skip to content

ValueError: not enough values to unpack (expected 6, got 5) during validation with CUDA Graphs enabled in NeMo ASR RNNT #15340

@lucasdanieldann-coder

Description

@lucasdanieldann-coder

I am encountering the following error when running training/validation of the ASR RNNT model in NeMo, with CUDA Graphs enabled for the RNNT decoder:

ValueError: not enough values to unpack (expected 6, got 5)
The error occurs in the _full_graph_compile function inside the tdt_label_looping.py file during the call to cu_call.

Full Traceback:

(Include the full traceback you provided, or at least the final part with the error)

Environment:

Python: 3.12.9
PyTorch: 2.6.0+cu124
NeMo Toolkit: 2.6.0
CUDA Driver: 550.163 (supports CUDA 12.4)
CUDA Toolkit: 12.6 (nvcc not found, but CUDA_HOME set to /usr/local/cuda-12.6)
Operating System: Linux (Sagemaker Studio)
GPU: (include GPU model if possible)
Installation commands tested:

bash

pip install torch==2.8.0+cu128 torchaudio==2.8.0+cu128 --index-url https://download.pytorch.org/whl/cu128
pip install nemo_toolkit[all]==2.6.0
pip install torch==2.0.1+cu117 torchaudio==2.0.1+cu117 torchvision==0.15.2+cu117
pip install torch==2.0.1 torchvision==0.15.2 --extra-index-url https://download.pytorch.org/whl/cu121
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install tensorflow==2.15.0
Environment diagnosis (env-doctor):

(Include the environment diagnosis output you ran, highlighting important points)

Detailed description:

The error happens during validation, specifically in the validation_step function of the RNNT model in NeMo.
The issue seems related to the cu_call call returning 5 values, but the code expects 6 values to unpack.
I have tried multiple versions of PyTorch, NeMo, and CUDA, but the error persists.
The environment is set up with CUDA 12.6 and a compatible driver, but nvcc is not installed (only CUDA runtime).
Questions:

Is this error known?
Is there any incompatibility between PyTorch/NeMo/CUDA versions that could cause this?
How can I work around or fix this issue?
Thank you in advance for your help!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions