Skip to content

Fix: eliminate Windows PID race condition by replacing concorekill.bat overwrite mechanism with safe PID registry (#391)#465

Open
GaneshPatil7517 wants to merge 2 commits intoControlCore-Project:devfrom
GaneshPatil7517:fix/windows-concorekill-race-condition
Open

Fix: eliminate Windows PID race condition by replacing concorekill.bat overwrite mechanism with safe PID registry (#391)#465
GaneshPatil7517 wants to merge 2 commits intoControlCore-Project:devfrom
GaneshPatil7517:fix/windows-concorekill-race-condition

Conversation

@GaneshPatil7517
Copy link

Problem

On Windows, concore.py wrote a single PID to concorekill.bat at module import time
using write mode ("w"). When multiple Python nodes launched simultaneously via
makestudy, each node overwrote the file — only the last node's PID survived.

This caused:

  • Race condition: N-1 of N nodes became unkillable via concorekill.bat
  • Stale PID danger: after a crash, the leftover file could contain a PID reassigned
    to an unrelated process, risking taskkill terminating the wrong program
  • No cleanup: PID file persisted after node exit

Solution

Replaced the single-PID overwrite with an append-based PID registry system:

Before After
concorekill.bat with one hardcoded taskkill line concorekill_pids.txt registry (one PID per line, append mode)
Last writer wins All nodes coexist safely
No exit cleanup atexit handler removes current PID on shutdown
Blind taskkill Generated concorekill.bat validates each PID via tasklist before killing

Key changes in concore.py:

  • _register_pid() — appends current PID to concorekill_pids.txt
  • _cleanup_pid() — removes current PID on exit; deletes both files when last node exits
  • _write_kill_script() — generates concorekill.bat that checks each PID is a running Python process before issuing taskkill

Backward compatibility:

Users still run concorekill.bat exactly as before — no workflow change required.
Only Python layer modified; C++, MATLAB, and Verilog implementations are untouched.

Tests

Added 9 tests in TestPidRegistry:

  • Registration creates file and appends (not overwrites)
  • Cleanup removes only current PID, preserves others
  • Last PID cleanup deletes both registry and kill script
  • Missing registry handled gracefully (no crash)
  • Kill script contains validation logic (tasklist + taskkill)
  • Multi-node simulation (3+ PIDs coexist)
  • Windows import-time registration

All 9 new tests pass. All pre-existing tests unaffected.

Fixes

Closes #391

…try (ControlCore-Project#391)

Problem:
On Windows, concore.py wrote a single PID to concorekill.bat at import
time using 'w' mode. When multiple Python nodes started simultaneously
(via makestudy), each overwrote the file. Only the last node's PID
survived, making the kill script unable to terminate other nodes.
Stale PIDs from crashed studies could also kill unrelated processes.

Solution:
- Replace single-PID overwrite with append-based PID registry
  (concorekill_pids.txt) storing one PID per line
- Register atexit handler to remove current PID on graceful shutdown
- Generate concorekill.bat dynamically with PID validation:
  each PID is checked via tasklist before taskkill executes
- Clean up both files when last node exits
- Backward compatible: users still run concorekill.bat

Added 9 tests covering registration, cleanup, multi-node scenarios,
missing registry handling, and kill script generation.

Fixes ControlCore-Project#391
…nature

read() returns a single value, not a tuple. The test incorrectly
unpacked into (result, ok) causing ValueError on CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant