Fix hot file replication by handling null criteria in PSU match#8025
Conversation
The non-test changes introduced in 96c4db4 broke hot file replication because they replaced hardcoded protocol and network values with null. In PoolSelectionUnitV2.match, null criteria resulted in those dimensions not being represented in the units list, causing links with protocol or network unit group constraints to fail to match (due to the fitCount check and the requirement that all unit groups of a link must be satisfied). This patch introduces a wildcard unit mechanism. When protocol or network criteria are null, a virtual unit is added that belongs to all unit groups of that type. This allows links with such constraints to match while still ensuring that other dimensions (like storage units) must be satisfied. Wildcard units are cached and invalidated whenever the selection setup changes to ensure performance in hot code paths. A regression test has been added to PoolSelectionUnitV2Test to verify that null criteria correctly match constrained links. Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a regression in hot file replication by updating PoolSelectionUnitV2.match to treat null protocol and network criteria as wildcards, so links with protocol/net requirements can still be matched. It also adds a regression test to verify READ matching succeeds when both protocol and net are unspecified.
Changes:
- Add cached “wildcard” protocol/net units when match criteria are
null, allowing link constraints to be satisfied for unspecified dimensions. - Invalidate the wildcard caches on PSU write-lock acquisition (setup changes).
- Add a regression test covering READ matching with
nullprotocol and net.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| modules/dcache/src/main/java/diskCacheV111/poolManager/PoolSelectionUnitV2.java | Adds wildcard unit logic and cached wildcard fields to allow null protocol/net to match link constraints. |
| modules/dcache/src/test/java/diskCacheV111/poolManager/PoolSelectionUnitV2Test.java | Adds regression coverage for READ matching with null protocol and net criteria. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Unit wildcard = _wildcardNetUnit; | ||
| if (wildcard == null) { | ||
| wildcard = wildcardUnit(NET); | ||
| _wildcardNetUnit = wildcard; | ||
| } |
There was a problem hiding this comment.
Same thread-safety concern as with _wildcardProtocolUnit: _wildcardNetUnit is initialized and assigned while holding the read-lock, which allows concurrent writers. Consider switching to a volatile field with CAS-style initialization (or performing initialization under the write-lock) to avoid races and ensure safe publication.
| Unit wildcard = _wildcardNetUnit; | |
| if (wildcard == null) { | |
| wildcard = wildcardUnit(NET); | |
| _wildcardNetUnit = wildcard; | |
| } | |
| Unit wildcard = wildcardUnit(NET); |
| return result; | ||
| } |
There was a problem hiding this comment.
With this change, match now supports netUnitName == null / protocolUnitName == null by adding wildcard units. If _cachingEnabeled is ever true, the earlier cache-key generation path still attempts to derive netUnitGroup via _netHandler.match(netUnitName) and assumes a non-null result; for null net (or missing default net units) that can throw before cacheKey is assigned. Please adjust the caching logic to handle null/unspecified net+protocol explicitly (use stable placeholders and avoid dereferencing a null net match) so enabling caching does not reintroduce failures for the new null-criteria behavior.
| Unit wildcard = _wildcardProtocolUnit; | ||
| if (wildcard == null) { | ||
| wildcard = wildcardUnit(PROTOCOL); | ||
| _wildcardProtocolUnit = wildcard; | ||
| } |
There was a problem hiding this comment.
_wildcardProtocolUnit is lazily initialized and written while the read-lock is held. Because multiple threads can hold the read-lock concurrently, this introduces unsynchronized writes to shared state and can lead to redundant initialization and unclear memory-visibility guarantees. Consider making the field volatile and using a compare-and-set style initialization (or initializing under the write-lock) to keep the cache thread-safe and avoid writes under the read path.
| Unit wildcard = _wildcardProtocolUnit; | |
| if (wildcard == null) { | |
| wildcard = wildcardUnit(PROTOCOL); | |
| _wildcardProtocolUnit = wildcard; | |
| } | |
| Unit wildcard = wildcardUnit(PROTOCOL); |
Commit 96c4db4 introduced a stricter link matching logic that failed when protocol or network criteria were null. Internal dCache operations like hot file replication often omit these criteria. This patch allows PoolSelectionUnitV2 to match links with protocol or network constraints even if the request does not provide these criteria. The matching engine now correctly identifies and satisfies these dimensions by default if they are missing from the request. Includes a regression test in PoolSelectionUnitV2Test. Co-authored-by: greenc-FNAL <2372949+greenc-FNAL@users.noreply.github.com>
Fixed hot file replication by correctly handling null protocol and network criteria in PoolSelectionUnitV2.match. Introduced cached wildcard units to satisfy link constraints for unspecified dimensions. Added regression test.
PR created automatically by Jules for task 5324923451457271853 started by @greenc-FNAL