<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition#3332
Open
MatheMatrix wants to merge 1 commit into5.5.6from
Open
<fix>[core]: synchronize consistent hash ring to prevent dual-MN race condition#3332MatheMatrix wants to merge 1 commit into5.5.6from
MatheMatrix wants to merge 1 commit into5.5.6from
Conversation
Walkthrough对管理节点哈希环和节点信息的访问进行了同步保护:在 ResourceDestinationMakerImpl 中将多处方法改为 synchronized,Portal 的 ManagementNodeManagerImpl 增加 lifecycleLock、suspectedMissingFromDb 并调整心跳/生命期事件的对齐与缺失确认流程。 Changes
Sequence Diagram(s)sequenceDiagram
participant HeartbeatReconciler as HeartbeatReconciler
participant Manager as ManagementNodeManagerImpl
participant DB as Database
participant HashRing as NodeHashRing
HeartbeatReconciler->>Manager: start reconciliation()
alt acquire lifecycleLock
Manager->>Manager: synchronized(lifecycleLock)
Manager->>HashRing: list nodes in hash ring
loop for each node
Manager->>DB: query node by uuid
alt DB has node
DB-->>Manager: node exists
Manager->>HashRing: ensure node present
Manager->>Manager: suspectedMissingFromDb.remove(node)
else DB missing node
DB-->>Manager: not found
Manager->>Manager: mark node in suspectedMissingFromDb (first round)
alt second consecutive round (still missing)
Manager->>HashRing: remove node from hash ring
Manager->>Manager: suspectedMissingFromDb.remove(node)
end
end
end
end
预估代码审查工作量🎯 4 (复杂) | ⏱️ ~45 分钟 诗
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Comment |
…talling In dual management node scenarios, concurrent modifications to the consistent hash ring from heartbeat reconciliation and canonical event callbacks can cause NodeHash/Nodes inconsistency, leading to message routing failures and task timeouts. Fix: (1) synchronized all ResourceDestinationMakerImpl methods to ensure atomic nodeHash+nodes updates, (2) added lifecycleLock in ManagementNodeManagerImpl to serialize heartbeat reconciliation with event callbacks, (3) added two-round delayed confirmation before removing nodes from hash ring to avoid race with NodeJoin events. Resolves: ZSTAC-77711 Change-Id: I3d33d53595dd302784dff17417a5b25f2d0f3426
e8732a5 to
312bd83
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/zstack/core/cloudbus/ResourceDestinationMakerImpl.java (1)
80-93:⚠️ Potential issue | 🔴 Critical修复 getNodeInfo 中 put 返回值导致的空返回。
Map.put返回的是旧值,当前写法会让info变为null并直接返回,功能错误。应先创建NodeInfo,再 put,并返回新对象。🔧 建议修复
if (info == null) { ManagementNodeVO vo = dbf.findByUuid(nodeUuid, ManagementNodeVO.class); if (vo == null) { throw new ManagementNodeNotFoundException(nodeUuid); } - nodeHash.add(nodeUuid); - info = nodes.put(nodeUuid, new NodeInfo(vo)); + info = new NodeInfo(vo); + nodeHash.add(nodeUuid); + nodes.put(nodeUuid, info); } return info;
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Files Changed
ResourceDestinationMakerImpl.java— synchronized lock on all methodsResolves: ZSTAC-77711
sync from gitlab !9154