Skip to content

Workflow for performance investigation? #4711

@jricker2

Description

@jricker2

Not sure on the best place to ask this, just a question -

Right now I'm looking into an issue where a particular ONNX model performs about the same when fp8/int8 quantization is applied vs. the fp16 version. To look into this further, I have been profiling with trtexec and dumping the layer by layer performance and trying to compare with the fp16 version. One issue with this is that since the models are slightly different, the layer names are also slightly different. There are also different layer-fusions between the two versions, making accurately comparing layer performance difficult. The approach that makes sense to me is to look from the ONNX perspective, IE what common ONNX operators have performance discrepancies. I'm surprised there are no tools out right now to help visualize things like this (AFAIK?). My main method has been just making some small python scripts to associate the TRT metadata back to ONNX ops, and print performance for both models. It is pretty inefficient/tedious.

Anyone have any other methods on this kind of investigation? Something in DL designer to really visualize this interactively would be very cool/useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions