Conversation
This patch reimplements the evaluation feature that used to be in aot_arm_compiler.py while introducing a few improvements. The program is evaluate_model.py and it imports functions from aot_arm_compiler.py to compile a model in a similar manner, but runs its own code that is focused on evaluating a model using the evaluators classes in backends/arm/util/arm_model_evaluator.py. The following is supported in evaluate_model.py: - TOSA reference models (INT, FP). - Evaluating a model that is quantized and/or lowered. I.e., it is possible to evaluate a model that is quantized but not lowered, lowered but not quantized, or both at the same time. - The program can cast the model with the --dtype flag to evaluate a model in e.g., bf16 or fp16 format. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Change-Id: I85f731633364da1eb71abe602a0335f531ec7e46
Add two tests that exercise evaluate_model.py with different command line arguments. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Change-Id: I47304ea270518703dc4c826c4c6672c7aca95228
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18199
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 Awaiting Approval, 8 New Failures, 1 Cancelled JobAs of commit 87a2dfd with merge base 76df414 ( AWAITING APPROVAL - The following workflows need approval before CI can run:
NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label ciflow/trunk |
|
@pytorchbot label "partner: arm" |
|
@pytorchbot label "release notes: arm" |
There was a problem hiding this comment.
Pull request overview
This PR reintroduces Arm backend model evaluation as a dedicated CLI (evaluate_model.py), replacing the previously embedded evaluation flow from aot_arm_compiler.py, and adds tests to exercise common invocation modes.
Changes:
- Add
backends/arm/scripts/evaluate_model.pyto compile + (optionally) quantize and/or delegate a model, then evaluate it via Arm evaluator utilities. - Add pytest coverage for running
evaluate_model.pyagainst TOSA INT/FP targets and validating the emitted metrics JSON. - Update
examples/arm/aot_arm_compiler.pymessaging to point users to the new evaluation script.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| examples/arm/aot_arm_compiler.py | Updates the deprecation/error message to redirect evaluation usage to evaluate_model.py. |
| backends/arm/scripts/evaluate_model.py | Introduces the new evaluation CLI: argument parsing, compile/quantize/delegate pipeline, evaluator execution, and JSON metrics output. |
| backends/arm/test/misc/test_evaluate_model.py | Adds integration-style tests invoking the new script with representative CLI flags and checking output structure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if args.quant_mode is not None and args.dtype is not None: | ||
| raise ValueError("Cannot specify --dtype when --quant_mode is enabled.") | ||
|
|
||
| evaluators: list[Evaluator] = [ |
| "The model to test must be either quantized or delegated (--quant_mode or --delegate)." | ||
| ) | ||
|
|
|
|
||
| # Add evaluator for compression ratio of TOSA file | ||
| intermediates_path = Path(args.intermediates) | ||
| tosa_paths = list(intermediates_path.glob("*.tosa")) |
| # Add Executorch root to path so this script can be run from anywhere | ||
| _EXECUTORCH_DIR = Path(__file__).resolve().parents[3] | ||
| _EXECUTORCH_DIR_STR = str(_EXECUTORCH_DIR) | ||
| if _EXECUTORCH_DIR_STR not in sys.path: | ||
| sys.path.insert(0, _EXECUTORCH_DIR_STR) | ||
|
|
| "Evaluate a model quantized and/or delegated for the Arm backend." | ||
| " Evaluations include numerical comparison to the original model" | ||
| "and/or top-1/top-5 accuracy if applicable." |
| "Evaluate a model quantized and/or delegated for the Arm backend." | ||
| " Evaluations include numerical comparison to the original model" | ||
| "and/or top-1/top-5 accuracy if applicable." |
| "provided, up to 1000 samples are used for calibration. " | ||
| "Supported files: Common image formats (e.g., .png or .jpg) if " | ||
| "using imagenet evaluator, otherwise .pt/.pth files. If not provided," | ||
| "quantized models are calibrated on their example inputs." |
zingo
left a comment
There was a problem hiding this comment.
OK to merge, this adds a new file but as the sile is it's own test script buck2 files should not need updates.
Arm backend: Add evaluate_model.py
This patch reimplements the evaluation feature that used to be in
aot_arm_compiler.py while introducing a few improvements. The program is
evaluate_model.py and it imports functions from aot_arm_compiler.py to
compile a model in a similar manner, but runs its own code that is
focused on evaluating a model using the evaluators classes in
backends/arm/util/arm_model_evaluator.py.
The following is supported in evaluate_model.py:
I.e., it is possible to evaluate a model that is quantized but not
lowered, lowered but not quantized, or both at the same time.
model in e.g., bf16 or fp16 format.
Also add tests that exercise evaluate_model.py with different command
line arguments.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell