Refactor: rewrite multi-tensormap with shared shard core#310
Refactor: rewrite multi-tensormap with shared shard core#310jvjhfhg wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of TensorMap to a per-ring architecture. By creating PTO2TensorMapRing for per-ring resource management and making PTO2TensorMap a facade, the design improves isolation and enables a key performance optimization in lookup with break-on-stale logic. The changes are consistent and robust across all modified files. My review includes a couple of minor suggestions to simplify memory allocation and initialization by using calloc.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
Outdated
Show resolved
Hide resolved
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
Outdated
Show resolved
Hide resolved
| // Per-ring chain: entries are ordered newest-first (head-insert). | ||
| // A stale entry means all subsequent entries are also stale — break. | ||
| if (!entry_valid(*cur_entry)) { | ||
| cur_entry = next_entry; |
There was a problem hiding this comment.
这里需要还原为老的逻辑,把next置为nullptr,是否立即进行entry清理再看。
ec51fd1 to
0ef4a89
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the TensorMap to support multiple rings. The new design, which separates TensorMap into per-ring instances and a fallback map for external/cross-ring tensors, is clean and robust. The changes are consistent across the codebase, from the Tensor definition to the orchestration logic and even the examples. I have a few minor suggestions to simplify memory allocation initialization code.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
Outdated
Show resolved
Hide resolved
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
Outdated
Show resolved
Hide resolved
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
Outdated
Show resolved
Hide resolved
62022dd to
fbf25d0
Compare
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
Outdated
Show resolved
Hide resolved
fbf25d0 to
6aac0b0
Compare
6aac0b0 to
9acb727
Compare
- route owner and fallback history through one TensorMapShardImpl template with two fixed configurations - derive fallback cleanup from real producer ring retirement instead of storing redundant lifecycle fields on entries - add a tracked design note under tensormap_and_ringbuffer/docs describing the committed multi-tensormap rewrite
9acb727 to
c4051da
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant refactoring of the multi-tensormap architecture, moving towards a shared shard core implementation for both owner and fallback history. The changes streamline the lifecycle management by deriving cleanup semantics from producer retirement and enforce a unified shard implementation. A new design note (MULTI_TENSORMAP_REWRITE.md) has been added to document these architectural changes and their rationale. Additionally, the Tensor structure has been updated to include a ring_id, and tensor factory functions now incorporate this new field, along with the introduction of a thread-local ring ID for scope management. The orchestration logic has been updated to reflect these new tensor and tensormap behaviors. A high-severity concern has been raised regarding the use of static thread_local for __pto2_ring_id, which could lead to inconsistent ring_id assignments and incorrect tensor routing if not managed carefully with RAII guards.
| // Thread-local scope depth for tensor factory functions. | ||
| // Incremented/decremented by PTO2ScopeGuard and standalone scope wrappers. | ||
| // Tensor ring selection clamps this depth to the runtime's valid ring range. | ||
| static thread_local uint8_t __pto2_ring_id = 0; |
There was a problem hiding this comment.
The introduction of a static thread_local variable __pto2_ring_id for managing the current tensor ring ID ties the ring selection to the thread's scope depth. While thread_local mitigates global state issues across threads, it still represents global mutable state within a single thread. Incorrect nesting of PTO2_SCOPE blocks or direct manipulation of this variable outside of the intended RAII guards could lead to an inconsistent ring_id being assigned to tensors. This could result in tensors being incorrectly routed to owner or fallback shards, leading to subtle and hard-to-debug correctness issues in the tensor mapping logic.
TensorMapShardImpltemplate with two fixed configurations