Skip to content

[ARM/TOSA] Dependency cycle after Q/DQ de-tagging in TOSAPartitioner breaks models with attention blocks #18190

@beomwookang

Description

@beomwookang

Bug Description

TOSAPartitioner._detag_boundary_nodes() removes Q/DQ nodes from partition boundaries after CapabilityBasedPartitioner has produced cycle-free partitions.
However, this de-tagging can introduce dependency cycles for models that combine CNN and Transformer (attention) blocks, such as MobileViT.

Error

AssertionError: Invalid partition, found dependency cycles at torch/fx/passes/utils/fuser_utils.py (validate_partition)

Steps to Reproduce

  1. Export a MobileViT-S model to .pt2
  2. Run Ethos-U lowering with EthosUPartitioner targeting ethos-u85-256
  3. The lowering fails at the to_backend()_create_partitions_in_graph_module()create_submodule_from_nodes()fuse_as_graphmodule()validate_partition() path

Root Cause

The partitioning flow has a gap between partition creation and submodule extraction:

  1. CapabilityBasedPartitioner.propose_partitions() creates partitions with built-in cycle detection (maybe_merge_partitiondfs_iter_find_cycle). At this point, partitions are guaranteed cycle-free.

  2. TOSAPartitioner._detag_boundary_nodes() removes:

    • Q nodes whose inputs are outside the partition
    • DQ nodes whose users are outside the partition
    • Nodes with unpartitioned floating-point inputs

    No cycle re-validation is performed after de-tagging.

  3. _create_partitions_in_graph_module() collects the remaining tagged nodes and calls fuse_as_graphmodule()validate_partition(), which detects the cycle and raises AssertionError.

Why attention blocks trigger this

MobileViT combines CNN layers with Transformer blocks (self-attention with reshape/permute/matmul). CapabilityBasedPartitioner groups all these ops into one large partition.
When Q/DQ nodes at internal boundaries are de-tagged, the remaining partition nodes form cross-dependencies through the now-unpartitioned nodes:

[partition] Linear_Q → [de-tagged Q] → [outside] → [de-tagged DQ] → [partition] Matmul
[partition] Linear_K → [de-tagged Q] → [outside] → [de-tagged DQ] → [partition] Matmul

This creates paths that exit the partition and re-enter it, making subgraph extraction impossible.

Proposed Fix

After _detag_boundary_nodes(), validate each partition for dependency cycles.
When a cycle is detected, split the partition into connected components of the surviving (still-tagged) nodes.
Each component becomes a separate partition that is individually cycle-free.

I have a working implementation and will submit a PR.

Test Results

  • MobileViT-S on Ethos-U85: previously failed with AssertionError, now successfully produces a .pte file (5.7 MB). Nine attention-block partitions are each split into 3 sub-partitions. All sub-partitions remain on NPU (no CPU fallback).
  • CNN-only models (ResNet, MobileNetV2, EfficientNet): unaffected — no cycles after de-tagging, so no partition splitting occurs.

Environment

  • ExecuTorch: main branch (latest)
  • Target: Ethos-U85-256
  • Model: MobileViT-S (quantized via EthosUQuantizer)

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Metadata

Metadata

Assignees

Labels

partner: armFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions