⚡️ Speed up method InjectPerfOnly.collect_instance_variables by 769% in PR #1418 (fix/pytorch-forward-method-instrumentation)
#1419
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1418
If you approve this dependent PR, these changes will be merged into the original PR branch
fix/pytorch-forward-method-instrumentation.📄 769% (7.69x) speedup for
InjectPerfOnly.collect_instance_variablesincodeflash/code_utils/instrument_existing_tests.py⏱️ Runtime :
1.30 milliseconds→150 microseconds(best of15runs)📝 Explanation and details
The optimized code achieves a 768% speedup (from 1.30ms to 150μs) by replacing the expensive
ast.walk()traversal with a targeted manual traversal strategy.Key Optimization:
The original code uses
ast.walk(func_node), which recursively visits every node in the entire AST tree - including all expression nodes, operators, literals, and other irrelevant node types. The line profiler shows this single loop consumed 87.3% of the execution time (9.2ms out of 10.5ms).The optimized version implements a work-list algorithm that only traverses statement nodes (body, orelse, finalbody, handlers). This dramatically reduces the number of nodes examined:
Why This Works:
Targeted traversal: Assignment statements (
ast.Assign) can only appear as statements, not as expressions buried deep in the tree. By only following statement-level structure (body,orelse, etc.), we skip visiting thousands of irrelevant expression nodes.Cache-friendly: Local variables
class_nameandinstance_varseliminate repeatedself.attribute lookups, reducing pointer indirection.Early filtering: The manual stack-based approach allows us to skip entire branches of the AST that can't contain assignments.
Performance Impact by Test Case:
The optimization preserves all functionality (same nodes discovered, same instance variables collected) while dramatically reducing the algorithmic complexity from O(all_nodes) to O(statement_nodes).
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1418-2026-02-06T22.39.42and push.