-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Not sure on the best place to ask this, just a question -
Right now I'm looking into an issue where a particular ONNX model performs about the same when fp8/int8 quantization is applied vs. the fp16 version. To look into this further, I have been profiling with trtexec and dumping the layer by layer performance and trying to compare with the fp16 version. One issue with this is that since the models are slightly different, the layer names are also slightly different. There are also different layer-fusions between the two versions, making accurately comparing layer performance difficult. The approach that makes sense to me is to look from the ONNX perspective, IE what common ONNX operators have performance discrepancies. I'm surprised there are no tools out right now to help visualize things like this (AFAIK?). My main method has been just making some small python scripts to associate the TRT metadata back to ONNX ops, and print performance for both models. It is pretty inefficient/tedious.
Anyone have any other methods on this kind of investigation? Something in DL designer to really visualize this interactively would be very cool/useful.