-
Notifications
You must be signed in to change notification settings - Fork 878
Description
I’m converting qwen3-1_7b for SA8295 in a dockerized ExecuTorch environment.
I see two different failures depending on num_sharding:
-
num_sharding=1: Fails during QNN graph finalization/serialization with limit error:
graph requires estimated allocation of 2309576 KB, limit is 2097152 KB
Failed to finalize Qnn Graph with error: 1002
AssertionError: Failed to generate Qnn context binary. -
num_sharding=2
The process runs out of host memory (OOM), even on a large machine.
Environment:
Deployment style: Dockerized
Host OS: Linux (KVM VM)
CPU: Intel Xeon Platinum 8375C, 128 vCPUs
RAM: ~495 GiB total, no swap
Target SoC: SA8295
Model: Qwen3-1.7B
Executorch version: v1.1.0
local modification (static_llm_quant_recipe.py: 575-609):
- Set default_quant_dtype = QuantDtype.use_16a8w
- Updated the conv2d quantization to: QuantDtype.use_16a8w
- Updated granularity=QuantGranularity.PER_CHANNEL (instead of per-block)
Command used:
python examples/qualcomm/oss_scripts/llama/llama.py
-b build-android
-m SA8295
--decoder_model qwen3-1_7b
--model_mode hybrid
--prefill_ar_len 128
--max_seq_len 256 \ # Tried 128, 256, 512, 1024
--temperature 0
--prompt "I would like to learn python, could you teach me with a simple example?"
--tasks wikitext \
--limit 1
--compile_only
My current hypothesis is that the num_sharding=1 failure is likely a platform limitation on SA8295 (DSP arch v68), given the explicit QNN graph serialization cap (2309576 KB required vs 2097152 KB limit).
However, the num_sharding=2 behavior is confusing: instead of resolving the graph-size issue, it leads to host-side OOM even on a 500 GB RAM-class system. I’m not sure whether this is expected compile-time memory behavior, a tooling inefficiency, or a potential memory leak. Also, other model with num_shardings > 1 are showing the same OOM behavior (Llama3.2-3B).
convert_qwen3-1_7b_SA8295_20260302_082259.txt
convert_qwen3-1_7b_SA8295_20260302_065355.txt
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin