Skip to content

Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167)#18167

Open
3l1 wants to merge 1 commit intopytorch:mainfrom
3l1:export-D96432610
Open

Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167)#18167
3l1 wants to merge 1 commit intopytorch:mainfrom
3l1:export-D96432610

Conversation

@3l1
Copy link
Contributor

@3l1 3l1 commented Mar 13, 2026

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

  1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
    shape_indices on the raw shapes and preserves the last dimension (NHWC
    channel), skip inserting input/output transposes. The view_copy can
    operate directly on NHWC data.

  2. Redundant permute_copy elimination: Model-level permute_copy ops whose
    permutation matches channels_last_order (NCHW→NHWC) or its inverse
    (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
    already handles format conversion. Replace them with view_copy (identity
    reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
    (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610

@3l1 3l1 requested a review from digantdesai as a code owner March 13, 2026 19:59
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 13, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18167

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures, 1 Unrelated Failure

As of commit 41e5640 with merge base 1e17e28 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 13, 2026

@3l1 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96432610.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

3l1 added a commit to 3l1/executorch that referenced this pull request Mar 13, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

This reduces Vela Transpose entries from 75→33
(-56%), Transpose op cycles from 33.4K→6.1K (-82%), and NPU operators
from 367→329 (-38).

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from f9a57c8 to 6aac88a Compare March 13, 2026 20:45
@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Mar 13, 2026
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 13, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from 6aac88a to 7f039b2 Compare March 13, 2026 20:47
@3l1 3l1 force-pushed the export-D96432610 branch from 7f039b2 to c019a17 Compare March 13, 2026 21:21
3l1 added a commit to 3l1/executorch that referenced this pull request Mar 13, 2026
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:
Pull Request resolved: pytorch#18167

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
…ansposes in ToTosaMemoryFormatPass (pytorch#18167)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves the last dimension (NHWC
   channel), skip inserting input/output transposes. The view_copy can
   operate directly on NHWC data.

2. Redundant permute_copy elimination: Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) are redundant with the tosa_dim_order annotation that
   already handles format conversion. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Handles both 4D
   (rank>=4, sr>=2) and 3D (rank>=3, sr>=1) permutations.

Reviewed By: digantdesai

Differential Revision: D96432610
@3l1 3l1 force-pushed the export-D96432610 branch from c019a17 to 41e5640 Compare March 16, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants