Skip to content

Conversation

@JinwooHwang
Copy link
Contributor

Problem

PulseSecurityWithSSLTest intermittently failed on GitHub Actions with authentication errors (BAD_CREDS) despite using correct credentials. Tests consistently passed locally but failed in CI where JMX/Management Service initialization is slower.

Root Cause: Race condition where authentication attempts occurred before ManagementService and MemberMXBean were fully initialized.

Solution

Added synchronization using GeodeAwaitility.await() to ensure ManagementService is ready before authentication:

ManagementService service = ManagementService.getExistingManagementService(locator.getLocator().getCache());
await().untilAsserted(() -> assertThat(service.getMemberMXBean()).isNotNull());

Applied to both test methods:

  • loginWithIncorrectAndThenCorrectPassword
  • loginWithDeprecatedSSLOptions

This matches the pattern already used in PulseSecurityIntegrationTest.

Testing

  • Both tests pass locally (27.7s total, 0 failures)
  • Code compiles and passes tests
  • No timing-dependent failures observed

Impact

  • Eliminates flaky test behavior in CI
  • Improves test reliability and developer experience
  • No functional changes to production code

For all changes, please confirm:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?
  • Has your PR been rebased against the latest commit within the target branch (typically develop)?
  • Is your initial contribution a single, squashed commit?
  • Does gradlew build run cleanly?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

@JinwooHwang JinwooHwang force-pushed the feature/GEODE-10556 branch 3 times, most recently from 67e0151 to 209deba Compare February 11, 2026 21:30
Increase wait time to 10 seconds after verifying ManagementService and Pulse
web app readiness, to allow SSL/JMX authentication backend full initialization.

The key issue is that this test starts the locator within each test method,
unlike PulseSecurityIntegrationTest which uses withAutoStart() causing the
locator to start before any tests run. SSL handshakes and JMX registration
under security manager require additional time on slower CI runners.

Added debug logging to diagnose timing. 10 seconds is conservative to ensure
reliability on slow CI; can be reduced once we confirm this resolves the issue.
- Replace System.out with System.err for better capture in test reports
- Add @before method to log test start with environment details
- Debug statements will appear in HTML reports and CI artifacts when tests fail
…d retry logic

Root cause: HTTP client had no timeouts, so when Pulse JMX authentication backend
wasn't ready, the POST /pulse/login hung indefinitely instead of timing out.

Debug output showed:
- Wrong password login worked immediately (fast auth check)
- Correct password login hung forever (JMX connection establishment not ready)

Changes:
1. Add 30-second connection and response timeouts to GeodeHttpClientRule
2. Replace fixed 10-second sleep with await() polling that retries login until backend is ready
3. Login attempts will now timeout after 30s and retry (up to 5 minutes total)

This allows the test to handle varying JMX backend initialization times on CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant