Skip to content

<fix>[kvm]: defer skip list cleanup on nodeLeft to prevent VM split-brain#3330

Open
ZStack-Robot wants to merge 2 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-80821
Open

<fix>[kvm]: defer skip list cleanup on nodeLeft to prevent VM split-brain#3330
ZStack-Robot wants to merge 2 commits into5.5.6from
sync/ye.zou/fix/ZSTAC-80821

Conversation

@ZStack-Robot
Copy link
Collaborator

Summary

  • ZSTAC-80821: MN 重启时 nodeLeft 直接清空 vmsToSkip,正在启动的 VM 被误判为 Stopped → HA 重复拉起 → 脑裂
  • 改为 orphan 延迟清理机制:nodeLeft 移入 orphanedVmsToSkip (带时间戳),60s 周期清理超时条目(默认10分钟)
  • VmTracer 增加二次确认:orphaned + Starting/Migrating 状态的 VM 跳过判定

Files Changed

  • KvmVmSyncPingTask.java — orphan 集合 + 定时清理
  • VmTracer.java — shouldSkipMissingVmHandling hook
  • KVMGlobalConfig.java — orphanedVmSkipTimeout 配置项

Resolves: ZSTAC-80821

sync from gitlab !9152

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Walkthrough

在 KvmVmSyncPingTask 中新增对“孤立虚拟机跳过”映射及 10 分钟 TTL 支持:在同步周期清理过期条目、在节点离开时将 VM 条目标记为孤立并带时间戳、并在追踪/同步逻辑中考虑该映射以决定跳过或恢复追踪(含日志)。

Changes

Cohort / File(s) Summary
孤立虚拟机追踪管理
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java
新增 orphanedSkipVms 并发映射与 ORPHAN_TTL_MS(10 分钟);添加 cleanupExpiredOrphanedSkipVms();在同步开始清理过期条目;在同步与追踪续接流程中将孤立映射条目并入跳过集合;nodeLeft() 将 per-node 跳过条目移入孤立映射并记录时间戳与日志;isVmDoNotNeedToTrace() 扩展以检查孤立 TTL 并在过期时修剪与记录恢复追踪。

Sequence Diagram(s)

(基于此次变更范围为单文件内的同步/跳过逻辑调整,且交互并不跨越 3+ 明确独立组件,故略过序列图。)

代码审查工作量评估

🎯 2 (Simple) | ⏱️ ~10 分钟

诗篇

🐰✨ 孤儿虚机暂且眠,
十分钟候再看天,
时戳记下不匆忙,
同步轻步去前缘,
日志低语话安然。


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)
Check name Status Explanation Resolution
Title check ❌ Error 标题格式正确遵循 [scope]: ,但长度为73字符超过了72字符的限制。 将标题缩短至72字符以内,例如改为 '[kvm]: defer skip cleanup on nodeLeft to prevent split-brain'
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed 描述清晰详细,涉及特定问题(ZSTAC-80821)、解决方案(orphan延迟清理机制)和受影响的文件,与变更集高度相关。
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sync/ye.zou/fix/ZSTAC-80821

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 524-526: The protected method isVmOrphaned in KvmVmSyncPingTask is
unused and duplicates logic already present via
orphanedVmsToSkip.containsKey(vmUuid) in isVmDoNotNeedToTrace and
shouldSkipMissingVmHandling; remove the isVmOrphaned method to eliminate
redundancy, or if you intend to keep it for future/semantic reasons, add a clear
comment above isVmOrphaned explaining why it is retained (including planned
usage or TODO ticket reference) and mark it `@SuppressWarnings`("unused") if
desired so linters know it is intentional.
🧹 Nitpick comments (2)
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java (2)

319-352: 周期任务资源管理问题

startOrphanedVmCleanupTask() 提交了一个周期任务,但在 stop() 方法(第 415-418 行)中没有取消该任务。这可能导致组件停止后任务仍在运行。

建议保存 PeriodicTask 的引用并在 stop() 中取消:

♻️ 建议的修改

在类中添加字段保存任务引用:

private Future<?> orphanedVmCleanupTaskFuture;

修改 startOrphanedVmCleanupTask 保存返回值:

 private void startOrphanedVmCleanupTask() {
-    thdf.submitPeriodicTask(new PeriodicTask() {
+    orphanedVmCleanupTaskFuture = thdf.submitPeriodicTask(new PeriodicTask() {
         // ... existing code
     });
 }

修改 stop() 方法取消任务:

 `@Override`
 public boolean stop() {
+    if (orphanedVmCleanupTaskFuture != null) {
+        orphanedVmCleanupTaskFuture.cancel(false);
+    }
     return true;
 }

328-328: 建议提取魔法数字为常量

周期任务的间隔 60 秒直接硬编码在代码中。建议提取为类常量或考虑使其可配置,以便于维护和调整。

♻️ 建议的修改
+private static final long ORPHANED_VM_CLEANUP_INTERVAL_SECONDS = 60;
+
 private void startOrphanedVmCleanupTask() {
     thdf.submitPeriodicTask(new PeriodicTask() {
         // ...
         `@Override`
         public long getInterval() {
-            return 60; // Check every 60 seconds
+            return ORPHANED_VM_CLEANUP_INTERVAL_SECONDS;
         }
         // ...
     });
 }

Comment on lines 524 to 526
protected boolean isVmOrphaned(String vmUuid) {
return orphanedVmsToSkip.containsKey(vmUuid);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "isVmOrphaned" --type java

Repository: MatheMatrix/zstack

Length of output: 180


🏁 Script executed:

rg "extends KvmVmSyncPingTask" --type java

Repository: MatheMatrix/zstack

Length of output: 44


🏁 Script executed:

sed -n '510,535p' plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java

Repository: MatheMatrix/zstack

Length of output: 1028


移除未使用的 isVmOrphaned 方法

搜索结果确认该 protected 方法在整个代码库中无任何调用,且不存在 KvmVmSyncPingTask 的子类使用该方法。此外,相同逻辑已在 isVmDoNotNeedToTraceshouldSkipMissingVmHandling 方法中直接调用 orphanedVmsToSkip.containsKey(vmUuid),存在冗余。请移除该未使用的方法,或添加注释说明其保留的具体原因(如未来功能规划)。

🤖 Prompt for AI Agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines
524 - 526, The protected method isVmOrphaned in KvmVmSyncPingTask is unused and
duplicates logic already present via orphanedVmsToSkip.containsKey(vmUuid) in
isVmDoNotNeedToTrace and shouldSkipMissingVmHandling; remove the isVmOrphaned
method to eliminate redundancy, or if you intend to keep it for future/semantic
reasons, add a clear comment above isVmOrphaned explaining why it is retained
(including planned usage or TODO ticket reference) and mark it
`@SuppressWarnings`("unused") if desired so linters know it is intentional.

…plit-brain

When a management node departs, its VM skip-trace entries were
immediately removed. If VMs were still being started by kvmagent,
the next VM sync would falsely detect them as Stopped and trigger
HA, causing split-brain.

Fix: transfer departed MN skip-trace entries to an orphaned set with
10-minute TTL instead of immediate deletion. VMs in the orphaned set
remain skip-traced until the TTL expires or they are explicitly
continued, preventing false HA triggers during MN restart scenarios.

Resolves: ZSTAC-80821

Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
@MatheMatrix MatheMatrix force-pushed the sync/ye.zou/fix/ZSTAC-80821 branch from b3f652d to 9e241fe Compare February 12, 2026 05:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 493-503: This has a TOCTOU race: after reading orphanedAt from
orphanedSkipVms you unconditionally call orphanedSkipVms.remove(vmUuid), which
can delete a newly inserted timestamp from another thread (e.g., nodeLeft). Fix
by performing an atomic conditional remove using orphanedSkipVms.remove(vmUuid,
orphanedAt) (or re-check the current value and only remove if it equals the
previously read orphanedAt) when ORPHAN_TTL_MS has expired; keep the existing
logging and ensure you reference the same orphanedAt and vmUuid values for the
CAS removal.
- Around line 72-76: The hardcoded ORPHAN_TTL_MS in KvmVmSyncPingTask reduces
operational flexibility; replace it with a KVMGlobalConfig entry (e.g.,
kvm.orphanedVmSkipTimeout) and read that value at startup (and/or subscribe to
updates) instead of using the fixed constant. Update KvmVmSyncPingTask to
retrieve the timeout from the new config key and use that value when managing
orphanedSkipVms/expiry checks (keep a sensible default of 10*60*1000 ms if the
config is missing), and ensure the code references ORPHAN_TTL_MS is replaced
with the config-backed field or accessor so the TTL becomes configurable at
runtime.
🧹 Nitpick comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java (1)

517-523: 建议:使用 CAS 移除以保持一致性

isVmDoNotNeedToTrace 类似,迭代器移除不保证原子性。虽然影响较小,但为保持一致性,建议使用 remove(key, expectedValue)

♻️ 可选改进
         while (it.hasNext()) {
             Map.Entry<String, Long> entry = it.next();
             if (now - entry.getValue() >= ORPHAN_TTL_MS) {
-                it.remove();
+                orphanedSkipVms.remove(entry.getKey(), entry.getValue());
                 logger.info(String.format("cleaned up expired orphaned skip entry for VM[uuid:%s]", entry.getKey()));
             }
         }

Comment on lines +72 to +76
// Orphaned skip entries from departed MN nodes. Key=vmUuid, Value=timestamp when orphaned.
// These VMs remain in skip-trace state for ORPHAN_TTL_MS to avoid false HA triggers
// when a MN restarts and its in-flight VM operations haven't completed yet. See ZSTAC-80821.
private final ConcurrentHashMap<String, Long> orphanedSkipVms = new ConcurrentHashMap<>();
private static final long ORPHAN_TTL_MS = 10 * 60 * 1000; // 10 minutes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "KVMGlobalConfig.java" -type f

Repository: MatheMatrix/zstack

Length of output: 124


🏁 Script executed:

rg -n "orphanedVmSkipTimeout|orphanedSkipTimeout|ORPHAN.*TIMEOUT" --type java -A 2 -B 2

Repository: MatheMatrix/zstack

Length of output: 552


🏁 Script executed:

rg -n "class KVMGlobalConfig" --type java -A 50

Repository: MatheMatrix/zstack

Length of output: 6922


🏁 Script executed:

cat ./plugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.java

Repository: MatheMatrix/zstack

Length of output: 9331


🏁 Script executed:

rg -n "ORPHAN_TTL_MS|orphanedVmSkipTimeout|orphanedSkipTimeout" --type java -B 3 -A 3

Repository: MatheMatrix/zstack

Length of output: 4660


建议将硬编码的超时常量改为 KVMGlobalConfig 配置项

当前使用硬编码常量 ORPHAN_TTL_MS = 10 * 60 * 1000 限制了运维灵活性。建议在 KVMGlobalConfig 中添加可配置的超时参数(如 kvm.orphanedVmSkipTimeout),使不同环境可以根据实际需求调整孤立VM的保留时间。

🤖 Prompt for AI Agents
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines
72 - 76, The hardcoded ORPHAN_TTL_MS in KvmVmSyncPingTask reduces operational
flexibility; replace it with a KVMGlobalConfig entry (e.g.,
kvm.orphanedVmSkipTimeout) and read that value at startup (and/or subscribe to
updates) instead of using the fixed constant. Update KvmVmSyncPingTask to
retrieve the timeout from the new config key and use that value when managing
orphanedSkipVms/expiry checks (keep a sensible default of 10*60*1000 ms if the
config is missing), and ensure the code references ORPHAN_TTL_MS is replaced
with the config-backed field or accessor so the TTL becomes configurable at
runtime.

…anup

Resolves: ZSTAC-80821

Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants