Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167)#18167
Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167)#181673l1 wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18167
Note: Links to docs will display an error until the docs builds have been completed. ❌ 10 New Failures, 1 Unrelated FailureAs of commit 41e5640 with merge base 1e17e28 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
digantdesai
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
…ansposes in ToTosaMemoryFormatPass (pytorch#18167) Summary: Pull Request resolved: pytorch#18167 Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes: 1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic shape_indices on the raw shapes and preserves the last dimension (NHWC channel), skip inserting input/output transposes. The view_copy can operate directly on NHWC data. 2. Redundant permute_copy elimination: Model-level permute_copy ops whose permutation matches channels_last_order (NCHW→NHWC) or its inverse (NHWC→NCHW) are redundant with the tosa_dim_order annotation that already handles format conversion. Replace them with view_copy (identity reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations. This reduces Vela Transpose entries from 75→33 (-56%), Transpose op cycles from 33.4K→6.1K (-82%), and NPU operators from 367→329 (-38). Reviewed By: digantdesai Differential Revision: D96432610
…ansposes in ToTosaMemoryFormatPass (pytorch#18167) Summary: Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes: 1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic shape_indices on the raw shapes and preserves the last dimension (NHWC channel), skip inserting input/output transposes. The view_copy can operate directly on NHWC data. 2. Redundant permute_copy elimination: Model-level permute_copy ops whose permutation matches channels_last_order (NCHW→NHWC) or its inverse (NHWC→NCHW) are redundant with the tosa_dim_order annotation that already handles format conversion. Replace them with view_copy (identity reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations. Reviewed By: digantdesai Differential Revision: D96432610
…ansposes in ToTosaMemoryFormatPass (pytorch#18167) Summary: Pull Request resolved: pytorch#18167 Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes: 1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic shape_indices on the raw shapes and preserves the last dimension (NHWC channel), skip inserting input/output transposes. The view_copy can operate directly on NHWC data. 2. Redundant permute_copy elimination: Model-level permute_copy ops whose permutation matches channels_last_order (NCHW→NHWC) or its inverse (NHWC→NCHW) are redundant with the tosa_dim_order annotation that already handles format conversion. Replace them with view_copy (identity reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations. Reviewed By: digantdesai Differential Revision: D96432610
…ansposes in ToTosaMemoryFormatPass (pytorch#18167) Summary: Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes: 1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic shape_indices on the raw shapes and preserves the last dimension (NHWC channel), skip inserting input/output transposes. The view_copy can operate directly on NHWC data. 2. Redundant permute_copy elimination: Model-level permute_copy ops whose permutation matches channels_last_order (NCHW→NHWC) or its inverse (NHWC→NCHW) are redundant with the tosa_dim_order annotation that already handles format conversion. Replace them with view_copy (identity reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations. Reviewed By: digantdesai Differential Revision: D96432610
Summary:
Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:
NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
shape_indices on the raw shapes and preserves the last dimension (NHWC
channel), skip inserting input/output transposes. The view_copy can
operate directly on NHWC data.
Redundant permute_copy elimination: Model-level permute_copy ops whose
permutation matches channels_last_order (NCHW→NHWC) or its inverse
(NHWC→NCHW) are redundant with the tosa_dim_order annotation that
already handles format conversion. Replace them with view_copy (identity
reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
(rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.
Reviewed By: digantdesai
Differential Revision: D96432610