Skip to content

Fix ONNX Runtime CUDA fallback by preloading shared library dependencies#276

Open
Misty-Star wants to merge 2 commits intonomadkaraoke:mainfrom
Misty-Star:feat/preload-onnxruntime-cuda-deps
Open

Fix ONNX Runtime CUDA fallback by preloading shared library dependencies#276
Misty-Star wants to merge 2 commits intonomadkaraoke:mainfrom
Misty-Star:feat/preload-onnxruntime-cuda-deps

Conversation

@Misty-Star
Copy link
Copy Markdown

@Misty-Star Misty-Star commented Mar 25, 2026

Summary

This PR improves GPU initialization for ONNX Runtime in pip-based CUDA environments by preloading
shared library dependencies before provider setup.

It also adds a clearer warning when an ONNX Runtime session cannot activate the requested
execution provider, and defers CLI imports so audio-separator help/usage paths do not eagerly
trigger heavy runtime initialization.

Problem

In some Linux environments where CUDA and cuDNN are installed via pip wheels, onnxruntime-gpu
may report CUDAExecutionProvider as available, but actual ONNX sessions can still fall back to
CPU because the required shared libraries are not visible to the dynamic loader at session
creation time.

In this case, users may see errors similar to:

  • Failed to load library ... libonnxruntime_providers_cuda.so
  • libcudnn.so.9: cannot open shared object file
  • Failed to create CUDAExecutionProvider

This can be confusing because PyTorch may still detect and use CUDA successfully, while ONNX
Runtime silently falls back to CPU for ONNX models.

Root Cause

The CUDA/cuDNN runtime libraries provided by pip-installed NVIDIA wheels are not always discovered
automatically by ONNX Runtime before the first CUDA execution provider session is created.

Changes

  • Call onnxruntime.preload_dlls() during accelerated device setup when the installed ONNX
    Runtime version supports it.
  • Add a warning in the MDX ONNX loading path when the requested execution provider is not actually
    activated by the created session.
  • Defer importing Separator in the CLI until it is actually needed, so no-argument/help flows do
    not eagerly trigger ONNX Runtime initialization.
  • Add unit coverage for the ONNX Runtime dependency preload path and for the CLI no-argument
    behavior.

Why this helps

This makes pip-installed CUDA/cuDNN runtimes visible to ONNX Runtime earlier in the startup flow,
which avoids a common failure mode where ONNX Runtime advertises CUDA support but then creates
CPU-only sessions in practice.

It also makes provider fallback much more obvious in logs, which should make future GPU
troubleshooting easier.

Validation

Validated locally with a pip-based GPU environment on Linux using:

  • Python 3.12
  • PyTorch 2.11.0+cu130
  • onnxruntime-gpu 1.24.4

Before this change:

  • PyTorch detected CUDA successfully.
  • ONNX Runtime exposed CUDAExecutionProvider in provider discovery.
  • A minimal ONNX InferenceSession(..., providers=["CUDAExecutionProvider"]) failed to load CUDA
    dependencies and fell back to CPUExecutionProvider.

After this change:

  • The same environment successfully created an ONNX Runtime session using CUDAExecutionProvider
    without requiring a manual LD_LIBRARY_PATH workaround.

Notes

This PR is focused on CUDA dependency loading and provider activation.

It does not attempt to address unrelated ONNX Runtime device discovery warnings such as Linux
DRM probing messages (for example, /sys/class/drm/card0/device/vendor on systems where card0
is a framebuffer device rather than the NVIDIA device).

Summary by CodeRabbit

  • Improvements

    • Better ONNX Runtime provider diagnostics with warnings when requested acceleration providers are unavailable.
    • Added ONNX Runtime dependency preloading to improve GPU acceleration reliability.
    • CLI imports and error handling improved to surface missing-dependency/help messaging reliably.
  • Tests

    • Added unit tests covering GPU runtime setup, dependency preloading failure handling, and CLI behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3515b783-5355-4e67-8a11-2b255880a77c

📥 Commits

Reviewing files that changed from the base of the PR and between 86ce059 and 51f5d6a.

📒 Files selected for processing (1)
  • tests/unit/test_gpu_runtime_setup.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_gpu_runtime_setup.py

Walkthrough

Adds ONNX Runtime visibility and optional shared-library preloading, defers Separator imports in the CLI, and adds unit tests for the GPU/ORT setup and CLI no-args behavior.

Changes

Cohort / File(s) Summary
ONNX Provider Logging
audio_separator/separator/architectures/mdx_separator.py
When loading an ONNX inference session (segment_size == dim_t path), retrieves session providers via get_providers() and compares against the requested provider; logs debug info or emits a warning if the requested provider wasn't activated.
Dependency Preloading & Device Setup
audio_separator/separator/separator.py
Added Separator.preload_onnxruntime_dependencies() and callsite in setup_accelerated_inferencing_device() to conditionally call ort.preload_dlls, logging success or warning on exception before configuring devices.
CLI Lazy Imports
audio_separator/utils/cli.py
Removed top-level Separator import; now imports Separator lazily inside each CLI branch and before the main separation workflow to defer module loading.
CLI Test Update
tests/unit/test_cli.py
Replaced skipped test with a simulated no-args CLI run that patches sys.modules['audio_separator.separator'] = None, expects SystemExit(1), and asserts help text is printed.
GPU/ORT Setup Tests
tests/unit/test_gpu_runtime_setup.py
Added two tests for Separator.setup_accelerated_inferencing_device(): one asserting normal preload+device setup with mocked ort.preload_dlls, another asserting device setup still runs and a warning is logged when ort.preload_dlls raises RuntimeError.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I dug a tunnel through the code so neat,

Preloaded DLLs and providers we now greet,
Lazy imports tiptoe in at call-time's chime,
Tests hop along to make the run sublime. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing ONNX Runtime CUDA fallback through preloading shared library dependencies, which directly aligns with the primary objective and code changes across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_gpu_runtime_setup.py`:
- Line 12: The test currently patches
"audio_separator.separator.separator.ort.preload_dlls" assuming the attribute
exists; update the patch call that creates mock_preload to include create=True
so the test tolerates environments where ort.preload_dlls is absent (i.e.,
change the patch for ort.preload_dlls to
patch("audio_separator.separator.separator.ort.preload_dlls", create=True) while
keeping the rest of the test and the mock_preload variable unchanged).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8146c522-64a5-4451-a9fc-5b22aadd4e81

📥 Commits

Reviewing files that changed from the base of the PR and between 153b2e4 and 86ce059.

📒 Files selected for processing (5)
  • audio_separator/separator/architectures/mdx_separator.py
  • audio_separator/separator/separator.py
  • audio_separator/utils/cli.py
  • tests/unit/test_cli.py
  • tests/unit/test_gpu_runtime_setup.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant