[IR Container] Phase 2.6 Concurrency & Thread Safety#5971
[IR Container] Phase 2.6 Concurrency & Thread Safety#5971mdavis36 wants to merge 1 commit intomd/phase2-copy-movefrom
Conversation
|
!test |
Description
|
| Relevant files | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Lock ordering consistency
|
Test failures
-
(Low, 1)
Tensor numerical mismatches in nvFuser HopperMatmulTest suiteTest Name H100 Source HopperMatmulTest.PingPongPersistent ❌ Link
Add std::shared_mutex to IrContainer to protect shared mutable state during concurrent access from parallel compilation threads. - IrContainer public methods self-lock (shared_lock for reads, unique_lock for writes) - Fusion mutation methods (registerVal/Expr, removeVal/Expr, removeStatementsCreatedAfter) acquire unique_lock then delegate to lock-free ContainerMutator static methods, avoiding self-deadlock on nested calls (e.g., removeVal → removeExpr) - ContainerMutator is a PIMPL struct defined only in fusion.cpp, keeping lock-free impl methods out of the header - Remove kPhase2DisableParallelCompile guard, re-enabling parallel compilation now that the mutex is in place - Delete dead IrContainer::copy() and IrContainer::swap() methods
2cac45c to
8fb976b
Compare
192fd55 to
35b7405
Compare
|
!test |
Greptile SummaryThis PR adds
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph "Layer 1: Public API (self-locking)"
A["Fusion::registerVal()"] -->|"unique_lock(mutex_)"| CM_RV["ContainerMutator::registerVal()"]
B["Fusion::registerExpr()"] -->|"unique_lock(mutex_)"| CM_RE["ContainerMutator::registerExpr()"]
C["Fusion::removeVal()"] -->|"unique_lock(mutex_)"| CM_RMV["ContainerMutator::removeVal()"]
D["Fusion::removeExpr()"] -->|"unique_lock(mutex_)"| CM_RME["ContainerMutator::removeExpr()"]
E["Fusion::removeStatementsCreatedAfter()"] -->|"unique_lock(mutex_)"| CM_RSA["ContainerMutator::removeStatementsCreatedAfter()"]
end
subgraph "Layer 2: Lock-free internals (ContainerMutator)"
CM_RE -->|"nested call (no lock)"| CM_RME
CM_RMV -->|"nested call (no lock)"| CM_RME
CM_RME -->|"direct access"| IC["IrContainer fields: vals_up_, vals_, exprs_up_, exprs_, per_fusion_*"]
CM_RV -->|"direct access"| IC
CM_RSA -->|"direct access"| IC
end
subgraph "Read Path (shared_lock)"
R1["IrContainer::vals()"] -->|"shared_lock(mutex_)"| IC
R2["IrContainer::unordered_exprs()"] -->|"shared_lock(mutex_)"| IC
R3["IrContainer::inContainer()"] -->|"shared_lock(mutex_)"| IMPL["inContainerImpl()"]
IMPL --> IC
end
Last reviewed commit: 8fb976b |
| const std::unordered_set<Expr*>& IrContainer::unordered_exprs() const noexcept { | ||
| std::shared_lock lock(mutex_); | ||
| return exprs_; | ||
| } | ||
|
|
||
| const std::unordered_set<Val*>& IrContainer::vals() const noexcept { | ||
| std::shared_lock lock(mutex_); | ||
| return vals_; | ||
| } |
There was a problem hiding this comment.
Reference outlives lock scope
unordered_exprs(), vals(), sharingFusions(), valsOwnedBy(), and exprsOwnedBy() all return const& to internal data, but the shared_lock is released when the function returns. This means callers hold an unprotected reference to the data structure, and any concurrent writer (under unique_lock) could mutate or invalidate it.
This is safe in Phase 2 (containers are not shared across threads), but in Phase 3 where multiple threads write to the same IrContainer, this pattern will produce data races. Consider documenting this as a known limitation to address in Phase 3, or returning by value for these accessors (at the cost of a copy).
Additional Comments (1)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
|
!test |
Summary
Add
std::shared_mutexto IrContainer for concurrent read access during parallel compilation, remove thekPhase2DisableParallelCompileserialization guard introduced in PR 1, and validate that the full test suite passes with parallel compilation re-enabled.This is a future-proofing and defensive correctness change. Phase 2's
makeFusionpath does NOT share containers (each segment gets its own container via the default constructor), so parallel compilation is technically safe without the mutex. However, Phase 3 will changemakeFusionto use the copy constructor (shared container), at which point multiple threads will write to the sameIrContainerconcurrently. The mutex must be in place before Phase 3 can enable that.The Nested Call Problem and ContainerMutator
Five Fusion methods directly access IrContainer's internal fields because statement registration was moved from IrContainer to Fusion previously:
removeVal()callsremoveExpr(), andregisterExpr()also callsremoveExpr(). Sincestd::shared_mutexis not recursive, acquiringunique_lockin both the outer and inner methods would deadlock.The solution is a two-layer locking architecture:
ContainerMutatoris forward-declared infusion.h(2 lines) and fully defined infusion.cpp. This keeps the header clean and makes the locking architecture self-documenting: everything insideContainerMutatorassumes the lock is already held.Thread Safety Analysis
Dead Code Removal
Investigation revealed that
IrContainer::copy()andIrContainer::swap()have zero call sites — all copy/move/swap semantics are handled at the Fusion level after previous work. Removing them eliminates ~45 lines of dead code and avoids complex dual-locking patterns.Relationship to Phase 2
This PR completes the Phase 2 architectural work. With thread safety in place, the full shared scalar infrastructure is ready for Phase 3:
CI Risk
Low-medium. This is the first CI run with parallel compilation re-enabled since PR #5961 serialized it. Any latent concurrency issues would surface here. The parallel compilation path doesn't share containers in Phase 2, so the mutex is defensive — but re-enabling parallelism exercises the full concurrent codegen pipeline.