Skip to content

Silent Native Library Failure in torch::loadNativeBackEnd #1545

@hokggo

Description

@hokggo

Describe the bug
When the native back-end method fails to load a CUDA DLL such as cudnn_adv64_9, it will silently fail if LibTorchSharp appears to load. The code will also set nativeBackendCudaLoaded to true thus skipping the loader when a second load would want to be attempted after the static constructor has done its bit.

Based on the comment it looks like the behavior may be intentional, but it makes debugging more difficult given that the stringbuilder trace's lost.

To Reproduce
Remove cudnn_adv64_9 and check whether cuda's available. When debugging this you'll see that the method has a chain of ok statements of the form
ok = TryLoadNativeLibraryByName("cudnn_adv64_9", typeof(torch).Assembly, trace);
where each step will overwrite the prior one. On my end by changing it to bool = ok is true and setting each library to use
ok &= TryLoadNativeLibraryByName("lib", typeof(torch).Assembly, trace);
for each of the files to load. This then causes torch to attempt to load again, thus allowing us to get obtain error messages from the library loader.

Please complete the following information:

  • OS: Windows 11
  • Package Type [e.g. torchsharp-cpu, torchsharp-cuda-windows, torchsharp-cuda-linux]
  • Version 0.105.2 (same in 0.106.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions