Refactor: rewrite multi-tensormap with shared shard core by jvjhfhg · Pull Request #310 · hw-native-sys/simpler

jvjhfhg · 2026-03-17T12:09:59Z

route owner and fallback history through one TensorMapShardImpl template with two fixed configurations
derive fallback cleanup from real producer ring retirement instead of storing redundant lifecycle fields on entries
add a tracked design note under tensormap_and_ringbuffer/docs describing the committed multi-tensormap rewrite

gemini-code-assist · 2026-03-17T12:10:27Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the TensorMap implementation to improve performance and isolation by introducing a per-ring architecture. Instead of a single global TensorMap, each processing ring now maintains its own independent set of TensorMap data structures. This change enables more efficient cleanup and lookup operations, particularly through a "break-on-stale" optimization, and better manages tensor dependencies across different rings. The orchestrator has been updated to correctly assign ring IDs to tensors, ensuring proper routing within the new TensorMap structure.

Highlights

Per-Ring TensorMap Structure: Introduced PTO2TensorMapRing to manage independent buckets, entry pools, free lists, and task entry heads for each processing ring, enhancing isolation and performance.
TensorMap Facade: Refactored PTO2TensorMap to act as a facade, routing all operations (lookup, insert, cleanup) to the appropriate PTO2TensorMapRing instance based on the tensor's ring_id.
Tensor Ring ID: Added a uint8_t ring_id field to the Tensor struct, allowing tensors to be explicitly associated with a specific ring, with TENSOR_RING_ID_NONE indicating an unassigned or external tensor.
Orchestrator Ring ID Assignment: Modified the orchestrator to automatically assign the current ring's ring_id to OUTPUT tensors and unassigned INOUT tensors during task submission.
Break-on-Stale Optimization: Enabled a 'break-on-stale' optimization within PTO2TensorMapRing lookup chains, which leverages the newest-first ordering of entries to stop searching a chain as soon as a stale entry is encountered.
Decentralized Cleanup Tracking: Moved the tensormap_last_cleanup tracking variable from the orchestrator's global state into each individual PTO2TensorMapRing instance, further decentralizing and isolating ring-specific state.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
- Removed initialization of orch->tensormap_last_cleanup.
- Added logic to assign ring_id to INOUT tensors if unassigned, and to OUTPUT tensors.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.h
- Removed tensormap_last_cleanup member from PTO2OrchestratorState.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
- Renamed PTO2TensorMap to PTO2TensorMapRing and updated its init, destroy, print_stats, valid_count, lookup, insert, and cleanup_retired methods to operate on a single ring.
- Created a new PTO2TensorMap struct as a facade, containing an array of PTO2TensorMapRing instances and routing calls based on tensor.ring_id.
- Modified sync_tensormap to iterate through all PTO2TensorMapRing instances and perform per-ring validity synchronization and cleanup.
- Implemented the "break-on-stale" optimization in PTO2TensorMapRing::lookup.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
- Introduced PTO2TensorMapRing struct with per-ring specific members like buckets, entry_pool, free_entry_list, task_entry_heads, task_window_size, last_task_alive, and last_cleanup.
- Updated PTO2TensorMap to be a facade, holding an array of PTO2TensorMapRing objects.
- Adjusted comments to reflect the new per-ring design and the "break-on-stale" optimization.
- Removed ring_id parameter from sync_validity and cleanup_retired in PTO2TensorMapRing as it now operates on its own ring.
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/tensor.h
- Added uint8_t ring_id field to the Tensor struct.
- Defined TENSOR_RING_ID_NONE constant.
- Updated Tensor constructors and print method to include the new ring_id field.

Activity

No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of TensorMap to a per-ring architecture. By creating PTO2TensorMapRing for per-ring resource management and making PTO2TensorMap a facade, the design improves isolation and enables a key performance optimization in lookup with break-on-stale logic. The changes are consistent and robust across all modified files. My review includes a couple of minor suggestions to simplify memory allocation and initialization by using calloc.

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/tensor.h

poursoul · 2026-03-17T12:20:54Z

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h

+            // Per-ring chain: entries are ordered newest-first (head-insert).
+            // A stale entry means all subsequent entries are also stale — break.
            if (!entry_valid(*cur_entry)) {
-                cur_entry = next_entry;


这里需要还原为老的逻辑，把next置为nullptr，是否立即进行entry清理再看。

jvjhfhg · 2026-03-19T01:20:14Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant and well-executed refactoring of the TensorMap to support multiple rings. The new design, which separates TensorMap into per-ring instances and a fallback map for external/cross-ring tensors, is clean and robust. The changes are consistent across the codebase, from the Tensor definition to the orchestration logic and even the examples. I have a few minor suggestions to simplify memory allocation initialization code.

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp

- route owner and fallback history through one TensorMapShardImpl template with two fixed configurations - derive fallback cleanup from real producer ring retirement instead of storing redundant lifecycle fields on entries - add a tracked design note under tensormap_and_ringbuffer/docs describing the committed multi-tensormap rewrite

jvjhfhg · 2026-03-19T10:44:33Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant refactoring of the multi-tensormap architecture, moving towards a shared shard core implementation for both owner and fallback history. The changes streamline the lifecycle management by deriving cleanup semantics from producer retirement and enforce a unified shard implementation. A new design note (MULTI_TENSORMAP_REWRITE.md) has been added to document these architectural changes and their rationale. Additionally, the Tensor structure has been updated to include a ring_id, and tensor factory functions now incorporate this new field, along with the introduction of a thread-local ring ID for scope management. The orchestration logic has been updated to reflect these new tensor and tensormap behaviors. A high-severity concern has been raised regarding the use of static thread_local for __pto2_ring_id, which could lead to inconsistent ring_id assignments and incorrect tensor routing if not managed carefully with RAII guards.

gemini-code-assist · 2026-03-19T10:46:21Z

src/a2a3/runtime/tensormap_and_ringbuffer/orchestration/pto_orchestration_api.h

+// Thread-local scope depth for tensor factory functions.
+// Incremented/decremented by PTO2ScopeGuard and standalone scope wrappers.
+// Tensor ring selection clamps this depth to the runtime's valid ring range.
+static thread_local uint8_t __pto2_ring_id = 0;


The introduction of a static thread_local variable __pto2_ring_id for managing the current tensor ring ID ties the ring selection to the thread's scope depth. While thread_local mitigates global state issues across threads, it still represents global mutable state within a single thread. Incorrect nesting of PTO2_SCOPE blocks or direct manipulation of this variable outside of the intended RAII guards could lead to an inconsistent ring_id being assigned to tensors. This could result in tensors being incorrectly routed to owner or fallback shards, leading to subtle and hard-to-debug correctness issues in the tensor mapping logic.

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp Outdated Show resolved Hide resolved

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp Outdated Show resolved Hide resolved

poursoul reviewed Mar 17, 2026

View reviewed changes

jvjhfhg force-pushed the refactor-tensormap branch 3 times, most recently from ec51fd1 to 0ef4a89 Compare March 18, 2026 11:00

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

jvjhfhg force-pushed the refactor-tensormap branch 3 times, most recently from 62022dd to fbf25d0 Compare March 19, 2026 06:56

poursoul reviewed Mar 19, 2026

View reviewed changes

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h Show resolved Hide resolved

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp Outdated Show resolved Hide resolved

jvjhfhg marked this pull request as draft March 19, 2026 08:12

jvjhfhg force-pushed the refactor-tensormap branch from fbf25d0 to 6aac0b0 Compare March 19, 2026 10:33

jvjhfhg changed the title ~~Refactor TensorMap to per-ring buckets and pools~~ Refactor: unify tensormap shard implementation Mar 19, 2026

jvjhfhg force-pushed the refactor-tensormap branch from 6aac0b0 to 9acb727 Compare March 19, 2026 10:35

jvjhfhg changed the title ~~Refactor: unify tensormap shard implementation~~ Refactor: rewrite multi-tensormap with shared shard core Mar 19, 2026

jvjhfhg force-pushed the refactor-tensormap branch from 9acb727 to c4051da Compare March 19, 2026 10:36

jvjhfhg marked this pull request as ready for review March 19, 2026 10:44

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: rewrite multi-tensormap with shared shard core#310

Refactor: rewrite multi-tensormap with shared shard core#310
jvjhfhg wants to merge 1 commit intohw-native-sys:mainfrom
jvjhfhg:refactor-tensormap

jvjhfhg commented Mar 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poursoul Mar 17, 2026

Uh oh!

jvjhfhg commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvjhfhg commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jvjhfhg commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

poursoul Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

jvjhfhg commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jvjhfhg commented Mar 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jvjhfhg commented Mar 17, 2026 •

edited

Loading