Skip to content

Spock 382: Add per-subscription conflict statistics (PG18+)#356

Open
danolivo wants to merge 4 commits intomainfrom
spock-382
Open

Spock 382: Add per-subscription conflict statistics (PG18+)#356
danolivo wants to merge 4 commits intomainfrom
spock-382

Conversation

@danolivo
Copy link
Contributor

@danolivo danolivo commented Feb 20, 2026

Summary

Add per-subscription conflict statistics tracked via PG18 custom pgstat
infrastructure, fix flaky replication_set tests, and apply review feedback.

  • Per-subscription conflict statistics (PG18+): Register a custom pgstat
    kind (SPOCK_PGSTAT_KIND_LRCONFLICTS) via pgstat_register_kind() to count
    replication conflicts per subscription keyed by MyDatabaseId. Introduce
    Spock_Stat_StatSubEntry / Spock_Stat_PendingSubEntry structs sized to
    six conflict types and expose results through
    spock.get_subscription_stats(oid) and spock.reset_subscription_stats(oid)
    SQL functions. Currently only SPOCK_CT_UPDATE_MISSING is actively
    reported; full spock_report_conflict() integration is left as a TODO.
    On PG < 18 both SQL functions raise a feature-not-supported error.

  • Deterministic replication_set test output: Truncate exception_log on
    first subscriber visit, drop the non-deterministic command_counter column
    from queries, and order by (table_schema, table_name, remote_commit_ts)
    with COLLATE "C" for stable output across runs.

  • Review fixes (C-rabbit): Guard spock_conflict_stat.h include behind
    PG_VERSION_NUM >= 180000, bump custom pgstat kind ID to 28, replace
    Assert with ereport(ERROR) for unexpected conflict types, add refcount
    sanity check in spock_stat_create_subscription(), provide stub functions
    for PG < 18, add CI step to diagnose Docker cluster startup failures, and
    tighten a comment in the UPDATE_MISSING reporting path.

  • Review fixes (Ibrar): Add get_subscription_stats and
    reset_subscription_stats to the 5.0.4--6.0.0-devel upgrade script.

@danolivo danolivo self-assigned this Feb 20, 2026
@danolivo danolivo added the feature New feature label Feb 20, 2026
@danolivo danolivo requested a review from mason-sharp February 20, 2026 09:45
@coderabbitai
Copy link

coderabbitai bot commented Feb 20, 2026

📝 Walkthrough

Walkthrough

Adds PostgreSQL-18+-gated subscription conflict statistics: new public header and C module, SQL functions to fetch/reset stats, Makefile test gating and a regression test, lifecycle hooks in apply/create/drop paths, enum/macro adjustment, replication test output tweaks, and CI diagnostics. All stat behavior is enabled only when PostgreSQL version >= 18.

Changes

Cohort / File(s) Summary
Build & Test
contrib/spock/Makefile
Introduce REGRESS_PG18 (set to conflict_stat when PGVER >= 18) and insert into REGRESS after excluded_schema.
New Regression Test
tests/regress/sql/conflict_stat.sql
Add PG18+ regression exercising UPDATE_MISSING conflict reporting, subscription stats checks, resets, exception_log inspection, and cleanup.
Headers: conflict types & stats
include/spock_conflict.h, include/spock_conflict_stat.h
Make SPOCK_CT_INSERT_EXISTS = 0, add SPOCK_CONFLICT_NUM_TYPES; add PG18+-guarded public structs and API prototypes for conflict stats.
SQL interface
sql/spock--6.0.0-devel.sql, sql/spock--5.0.4--6.0.0-devel.sql
Add SQL-callable functions spock.get_subscription_stats(...) and spock.reset_subscription_stats(...) backed by new C implementations.
New implementation module
src/spock_conflict_stat.c
Add PG18+-guarded module implementing pgstat kind, shared/pending storage, registration, reporting/create/drop/fetch APIs, reset timestamp handling, and SQL wrappers.
Integration & hooks
src/spock.c, src/spock_apply_heap.c, src/spock_conflict.c, src/spock_functions.c
Conditionally include stat header; register stat kind at init; report UPDATE_MISSING during apply; call create/drop stat hooks on subscription create/drop — all gated by PG_VERSION_NUM >= 180000.
Replication tests adjustments
tests/regress/sql/replication_set.sql
Truncate spock.exception_log during subscriber setup, remove command_counter from outputs, and order exception_log deterministically using COLLATE "C".
CI diagnostics
.github/workflows/installcheck.yml
Add start_cluster step with continue-on-error: true and a conditional diagnostic step to gather container and PostgreSQL logs when cluster startup fails.

Poem

🐇 I hop through headers, counters in tow,
I count each conflict where the replicas go,
Shared memory hums a timestamped tune,
Subscriptions tally morning to moon,
A rabbit logs stats beneath the lune.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: adding per-subscription conflict statistics functionality for PostgreSQL 18+.
Description check ✅ Passed The PR description is clearly related to the changeset, detailing per-subscription conflict statistics for PG18+, test determinism fixes, and review feedback implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch spock-382

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/spock_apply_heap.c (1)

65-68: Add version guard for consistency with other files.
Three other files guard this include with #if PG_VERSION_NUM >= 180000, while src/spock_apply_heap.c does not. Since the header is internally guarded, this is not a build safety issue, but aligning the include guard here would maintain consistency.

🔧 Suggested guard
-#include "spock_conflict_stat.h"
+#if PG_VERSION_NUM >= 180000
+#include "spock_conflict_stat.h"
+#endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply_heap.c` around lines 65 - 68, Wrap the inclusion of the
Postgres-18-only header in spock_apply_heap.c with the same version guard used
elsewhere: add an `#if` PG_VERSION_NUM >= 180000 ... `#endif` around the `#include`
"spock_conflict_stat.h" (and any other includes that are PG‑18+ in this file) so
the file matches the pattern used in the other sources; keep the guard
conditional on PG_VERSION_NUM and leave other includes (e.g., spock_common.h,
spock_conflict.h, spock_executor.h) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_conflict_stat.c`:
- Around line 92-104: The function spock_stat_report_subscription_conflict
currently only uses Assert to validate the SpockConflictType `type`, which can
lead to an out-of-bounds write to `pending->conflict_count[type]` in non-debug
builds; add a runtime guard that checks `if (type < 0 || type >=
SPOCK_CONFLICT_NUM_TYPES)` and handle it safely (e.g., log a warning via elog or
process logger and return early or map unsupported types like
SPOCK_CT_DELETE_LATE to a safe bucket) before performing
`pending->conflict_count[type]++` to prevent buffer overflow.

---

Nitpick comments:
In `@src/spock_apply_heap.c`:
- Around line 65-68: Wrap the inclusion of the Postgres-18-only header in
spock_apply_heap.c with the same version guard used elsewhere: add an `#if`
PG_VERSION_NUM >= 180000 ... `#endif` around the `#include` "spock_conflict_stat.h"
(and any other includes that are PG‑18+ in this file) so the file matches the
pattern used in the other sources; keep the guard conditional on PG_VERSION_NUM
and leave other includes (e.g., spock_common.h, spock_conflict.h,
spock_executor.h) unchanged.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_conflict_stat.c`:
- Around line 113-124: spock_stat_create_subscription currently calls
pgstat_get_entry_ref(..., create=true) and discards the PgStat_EntryRef*,
leaking the refcount; capture the return value from pgstat_get_entry_ref (assign
to a PgStat_EntryRef *ref), call pgstat_reset_entry (as is), then call
pgstat_release_entry_ref(ref) to release the reference; keep all existing
identifiers (SPOCK_PGSTAT_KIND_LRCONFLICTS, MyDatabaseId, subid) and ensure the
release happens in the same function before returning.
- Around line 27-31: SPOCK_PGSTAT_KIND_LRCONFLICTS currently uses value 27 which
conflicts with pg_stat_plans; change the defined constant in
src/spock_conflict_stat.c to a safe temporary placeholder
PGSTAT_KIND_EXPERIMENTAL (24) for development, and for production pick and
document an unused ID from the allowed range (25,26,28–32) and update the macro
definition accordingly; after choosing a production ID, reserve it on the
PostgreSQL custom cumulative statistics registry per the project guidelines.

---

Duplicate comments:
In `@src/spock_conflict_stat.c`:
- Around line 92-108: spock_stat_report_subscription_conflict currently aborts
the apply worker via ereport(ERROR) when an unexpected SpockConflictType is
seen; change this to a non-fatal warning and early return (use elog(WARNING) or
remove the guard) so the stats path won't kill the replication transaction.
Locate the check in spock_stat_report_subscription_conflict that compares type
!= SPOCK_CT_UPDATE_MISSING and replace the ereport(ERROR, errcode(...)) with
elog(WARNING, ...) plus an immediate return (or simply drop the guard) so only
SPOCK_CT_UPDATE_MISSING proceeds to pgstat_prep_pending_entry and
pending->conflict_count[type]++.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/spock_conflict_stat.c`:
- Around line 27-32: The Kind ID defined as SPOCK_PGSTAT_KIND_LRCONFLICTS (value
28) may be already reserved — verify the PostgreSQL CustomCumulativeStats
registry (per the wiki) to find an unused kind ID, reserve it, and update the
`#define` SPOCK_PGSTAT_KIND_LRCONFLICTS to that verified value; also update or
remove the TODO comment in spock_conflict_stat.c to record the chosen/registered
ID and add a brief note or comment with the registry URL and reservation
confirmation.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
.github/workflows/installcheck.yml (2)

79-82: docker compose cp … /dev/stdout is slightly non-standard.

This works on Linux but relies on /dev/stdout being writable as a file path by the Docker CLI. A more portable and idiomatic alternative that avoids the filesystem-device trick:

♻️ Alternative
-            docker compose cp pgedge-$node:/home/pgedge/logfile.log /dev/stdout 2>/dev/null | tail -80 || echo "(not available)"
+            docker compose exec pgedge-$node cat /home/pgedge/logfile.log 2>/dev/null | tail -80 || echo "(not available)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/installcheck.yml around lines 79 - 82, Replace the
non-portable "docker compose cp pgedge-$node:/home/pgedge/logfile.log
/dev/stdout" pattern with a portable exec+cat approach: run "docker compose exec
-T pgedge-$node cat /home/pgedge/logfile.log | tail -80" (or use "docker cp
pgedge-$node:/home/pgedge/logfile.log - | tail -80" if you prefer docker cp) so
the logfile is streamed to stdout without relying on /dev/stdout; update the
loop where "docker compose cp pgedge-$node:/home/pgedge/logfile.log /dev/stdout
2>/dev/null | tail -80 || echo \"(not available)\"" appears to use the exec/cat
(or docker cp -) form and preserve the error fallback.

72-74: docker compose ps -a missing || true could truncate diagnostic output.

GitHub Actions uses bash -eo pipefail by default. If docker compose ps -a unexpectedly returns non-zero, the script exits before collecting any per-node logs — the most useful part of the diagnostic.

🛡️ Suggested fix
-          docker compose ps -a
+          docker compose ps -a || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/installcheck.yml around lines 72 - 74, The docker compose
status command in the workflow step ("docker compose ps -a") can return non‑zero
and abort the job due to bash -eo pipefail; change the command to ignore
failures (append a tolerant operator such as "|| true") so the step never fails
and subsequent diagnostics (per-node log collection) still run; update the line
containing "docker compose ps -a" in the installcheck workflow to append the
failure-tolerant suffix.
src/spock_apply_heap.c (1)

1073-1082: The comment wording "successful pathway" is misleading in this error branch.

elog(ERROR) is unconditional immediately after this block, so "successful" is confusing. The actual intent — only count when the error is not caught by a try block (first-occurrence path) — is explained in the body comment but the subject phrase could be clearer.

🔧 Suggested comment rewording
-#if PG_VERSION_NUM >= 180000
-		if (!MyApplyWorker->use_try_block)
-			/*
-			 * To avoid duplicate messages, only report the conflict on the
-			 * successful pathway.  We skip counting when the update logic has
-			 * already failed because the conflict would be misleading.
-			 */
-			spock_stat_report_subscription_conflict(MyApplyWorker->subid,
-													SPOCK_CT_UPDATE_MISSING);
-#endif
+#if PG_VERSION_NUM >= 180000
+		if (!MyApplyWorker->use_try_block)
+		{
+			/*
+			 * Report the conflict only on the first-occurrence (non-try-block)
+			 * path to avoid counting the same event twice when a try block
+			 * catches the error and retries with use_try_block=true.
+			 */
+			spock_stat_report_subscription_conflict(MyApplyWorker->subid,
+													SPOCK_CT_UPDATE_MISSING);
+		}
+#endif
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply_heap.c` around lines 1073 - 1082, The comment "successful
pathway" is misleading because elog(ERROR) is unconditional; update the comment
above the spock_stat_report_subscription_conflict call to clarify that we only
increment the conflict counter when the error will not be caught by a
surrounding try block (i.e., when MyApplyWorker->use_try_block is false), so
this counts only the first-occurrence / uncaught error path before the
unconditional elog(ERROR) is invoked; reference MyApplyWorker->use_try_block,
spock_stat_report_subscription_conflict(..., SPOCK_CT_UPDATE_MISSING) and the
subsequent elog(ERROR) in the revised wording.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/spock_conflict_stat.c`:
- Around line 128-147: The new call to
pgstat_get_entry_ref(SPOCK_PGSTAT_KIND_LRCONFLICTS, MyDatabaseId, subid, true,
NULL) increments the shared entry refcount but the function never calls
pgstat_release_entry_ref(ref, ...) before returning, leaking the ref and
preventing reclamation; fix by calling pgstat_release_entry_ref(ref,
/*appropriate flags*/ ) after the diagnostic check and before calling
pgstat_reset_entry (i.e., release the ref obtained in ref prior to returning
from this function), ensuring the refcount is decremented so
pgstat_drop_transactional can eventually free the shared entry.
- Around line 27-33: Replace the hard-coded unregistered kind ID by using the
PostgreSQL experimental constant: change the macro definition
SPOCK_PGSTAT_KIND_LRCONFLICTS from the literal 28 to PGSTAT_KIND_EXPERIMENTAL
(i.e., use PGSTAT_KIND_EXPERIMENTAL in spock_conflict_stat.c where
SPOCK_PGSTAT_KIND_LRCONFLICTS is defined) so the extension uses the reserved
experimental slot until you reserve a permanent ID on the CustomCumulativeStats
wiki.

---

Nitpick comments:
In @.github/workflows/installcheck.yml:
- Around line 79-82: Replace the non-portable "docker compose cp
pgedge-$node:/home/pgedge/logfile.log /dev/stdout" pattern with a portable
exec+cat approach: run "docker compose exec -T pgedge-$node cat
/home/pgedge/logfile.log | tail -80" (or use "docker cp
pgedge-$node:/home/pgedge/logfile.log - | tail -80" if you prefer docker cp) so
the logfile is streamed to stdout without relying on /dev/stdout; update the
loop where "docker compose cp pgedge-$node:/home/pgedge/logfile.log /dev/stdout
2>/dev/null | tail -80 || echo \"(not available)\"" appears to use the exec/cat
(or docker cp -) form and preserve the error fallback.
- Around line 72-74: The docker compose status command in the workflow step
("docker compose ps -a") can return non‑zero and abort the job due to bash -eo
pipefail; change the command to ignore failures (append a tolerant operator such
as "|| true") so the step never fails and subsequent diagnostics (per-node log
collection) still run; update the line containing "docker compose ps -a" in the
installcheck workflow to append the failure-tolerant suffix.

In `@src/spock_apply_heap.c`:
- Around line 1073-1082: The comment "successful pathway" is misleading because
elog(ERROR) is unconditional; update the comment above the
spock_stat_report_subscription_conflict call to clarify that we only increment
the conflict counter when the error will not be caught by a surrounding try
block (i.e., when MyApplyWorker->use_try_block is false), so this counts only
the first-occurrence / uncaught error path before the unconditional elog(ERROR)
is invoked; reference MyApplyWorker->use_try_block,
spock_stat_report_subscription_conflict(..., SPOCK_CT_UPDATE_MISSING) and the
subsequent elog(ERROR) in the revised wording.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/spock_conflict_stat.c (1)

29-32: Use PGSTAT_KIND_EXPERIMENTAL as a development placeholder; register a production ID on the PostgreSQL wiki before release.

The current value (28) is an unregistered custom stats kind ID that hasn't been reserved on the PostgreSQL wiki's CustomCumulativeStats registry. The previous value (27) was changed due to a confirmed conflict with pg_stat_plans. Per PostgreSQL best practices, development code should use PGSTAT_KIND_EXPERIMENTAL (24 in PostgreSQL 18+) as a temporary placeholder. Only when approaching a production release should you select an unused ID from the available range, document it in the code, and register it on the wiki to prevent collisions with other extensions in the ecosystem.

🔧 Suggested interim change
-#define SPOCK_PGSTAT_KIND_LRCONFLICTS	28
+/* Development placeholder — switch to a registered ID before production release */
+#define SPOCK_PGSTAT_KIND_LRCONFLICTS	PGSTAT_KIND_EXPERIMENTAL
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_conflict_stat.c` around lines 29 - 32, Replace the hard-coded
custom stats ID 28 used in the SPOCK_PGSTAT_KIND_LRCONFLICTS macro with the
development placeholder constant PGSTAT_KIND_EXPERIMENTAL; locate the definition
of SPOCK_PGSTAT_KIND_LRCONFLICTS in src/spock_conflict_stat.c and change its
value to PGSTAT_KIND_EXPERIMENTAL, and add a brief comment reminding to pick and
register a permanent unused ID on the PostgreSQL CustomCumulativeStats wiki
before production release.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_conflict_stat.c`:
- Around line 291-295: spock_get_subscription_stats currently returns
PG_RETURN_VOID() for PG < 18, but the SQL signature expects a composite/record
so returning void is incorrect and can crash callers; replace the
PG_RETURN_VOID() with an error that clearly states the function is unsupported
on PostgreSQL versions < 18 by calling PostgreSQL's error reporting API (e.g.,
ereport/errmsg) inside spock_get_subscription_stats to raise an ERROR with a
message like "spock_get_subscription_stats is not supported on PostgreSQL
versions before 18"; ensure you reference the function name
spock_get_subscription_stats in the message so the failure is clear.
- Around line 99-107: The guard that reports unexpected conflict types uses
ereport(FATAL, ...) which will crash the apply worker and cause an infinite
restart loop; change that to ereport(ERROR, ...) so the error aborts only the
current transaction and keeps the worker alive. Locate the check comparing type
against SPOCK_CT_UPDATE_MISSING in spock_conflict_stat.c (the branch that
currently calls ereport(FATAL, (errcode(...), errmsg("unexpected conflict type
%d reported for subscription %u", type, subid)))); replace the FATAL error level
with ERROR and keep the same errcode/errmsg payload so behavior and diagnostics
remain unchanged.

---

Duplicate comments:
In `@src/spock_conflict_stat.c`:
- Around line 128-146: The code calls
pgstat_get_entry_ref(SPOCK_PGSTAT_KIND_LRCONFLICTS, MyDatabaseId, subid, true,
NULL) which increments the shared entry refcount but never releases it, leaking
a reference; after doing the diagnostic pg_atomic_read_u32 check and before
calling pgstat_reset_entry (or before returning from the function), call
pgstat_release_entry_ref(ref, false) to drop the temporary ref acquired by
pgstat_get_entry_ref so the refcount returns to the expected value and the
pgstat infrastructure can reclaim the entry later; ensure this release happens
on all control paths that use the ref variable (i.e., immediately after the
warning/diagnostic and before any early exits).

---

Nitpick comments:
In `@src/spock_conflict_stat.c`:
- Around line 29-32: Replace the hard-coded custom stats ID 28 used in the
SPOCK_PGSTAT_KIND_LRCONFLICTS macro with the development placeholder constant
PGSTAT_KIND_EXPERIMENTAL; locate the definition of SPOCK_PGSTAT_KIND_LRCONFLICTS
in src/spock_conflict_stat.c and change its value to PGSTAT_KIND_EXPERIMENTAL,
and add a brief comment reminding to pick and register a permanent unused ID on
the PostgreSQL CustomCumulativeStats wiki before production release.

@danolivo danolivo force-pushed the spock-382 branch 4 times, most recently from c13381a to aed824d Compare February 20, 2026 12:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_conflict_stat.c`:
- Around line 30-35: The code currently uses a hard-coded custom stats ID
(SPOCK_PGSTAT_KIND_LRCONFLICTS defined as 28) without wiki registration; change
the temporary value to PGSTAT_KIND_EXPERIMENTAL (24) in the definition of
SPOCK_PGSTAT_KIND_LRCONFLICTS, and update the TODO comment near that symbol to
state "use PGSTAT_KIND_EXPERIMENTAL (24) until ID 28 is registered on the
PostgreSQL CustomCumulativeStats wiki, then switch SPOCK_PGSTAT_KIND_LRCONFLICTS
to 28 before production and remove the TODO."

---

Duplicate comments:
In `@src/spock_conflict_stat.c`:
- Around line 128-147: spock_stat_create_subscription currently calls
pgstat_get_entry_ref(...) and never calls pgstat_release_entry_ref, leaking the
shared ref; after you finish using the ref (you already check
ref->shared_entry->refcount and call pgstat_reset_entry), add a call to
pgstat_release_entry_ref(ref) to drop the extra reference so the entry can be
reclaimed; locate the ref variable returned by pgstat_get_entry_ref and invoke
pgstat_release_entry_ref(ref) (or equivalent release API) before the function
returns.

PgStat_EntryRef *entry_ref;
Spock_Stat_PendingSubEntry *pending;

if (type != SPOCK_CT_UPDATE_MISSING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just for development, then it must be an Assert, not an ERROR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually yes. But we are concerned about the lack of test coverage. So, I use ERROR to stop potentially SEGFAULTing logging and, at the same time, expose the problem (LOG and WARNING messages sometimes aren't caught for a long time).

@danolivo danolivo requested a review from ibrarahmad February 24, 2026 14:39
Register a custom pgstat kind (SPOCK_PGSTAT_KIND_LRCONFLICTS) via the
PG18 pgstat_register_kind() infrastructure to track replication conflict
counts per subscription.  Unlike vanilla PostgreSQL subscription stats,
spock statistics use MyDatabaseId as the key because spock node, origin,
and subscription IDs are unique only within a database.

The implementation introduces spock-specific stat types
(Spock_Stat_StatSubEntry, Spock_Stat_PendingSubEntry) sized to
SPOCK_CONFLICT_NUM_TYPES (6 types, excluding SPOCK_CT_DELETE_LATE which
is not yet tracked).  Column names in the SQL-callable function are kept
in sync with the SpockConflictType enum via designated initializers in
SpockConflictStatColNames[].

Currently only SPOCK_CT_UPDATE_MISSING conflicts are actively reported
(from spock_apply_heap.c).  Reporting from spock_report_conflict() is
left as a TODO until SPOCK_CT_DELETE_LATE is either included in the
stats array or filtered out to prevent overflow.

SQL functions: spock.get_subscription_stats(oid),
               spock.reset_subscription_stats(oid).

Includes a regression test exercising UPDATE_MISSING counting, counter
increment, and stats reset.
Truncate the log on the first subscriber visit so subsequent checks are
not polluted.
Also drop the non-deterministic command_counter column from exception_log
queries and order by (table_schema, table_name, remote_commit_ts) with
COLLATE "C" instead, making the output stable across runs.
Guard spock_conflict_stat.h include and improve conflict type validation
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/spock_conflict_stat.c (2)

118-147: pgstat_get_entry_ref return value is never released — refcount leak persists.

The ref obtained on line 128 increments the shared entry's refcount, but the function returns without calling pgstat_release_entry_ref(ref). This was flagged in a prior review. While ref is now captured (previously it was discarded), the release call is still missing — each spock_stat_create_subscription call permanently inflates the refcount by 1.

Proposed fix
 	if (pg_atomic_read_u32(&ref->shared_entry->refcount) != 2)
 		/*
 		 * Should never happen: a new subscription stats entry should have
 		 * exactly two references (the hashtable entry and our own).  A higher
 		 * count means a stale entry from a previous subscription with the same
 		 * OID was not properly cleaned up.
 		 */
 		ereport(WARNING,
 				(errmsg("conflict statistics entry for subscription %u "
 						"already has %u references",
 						subid,
 						pg_atomic_read_u32(&ref->shared_entry->refcount)),
 				 errhint("This may indicate that a previous subscription with "
 						 "the same OID was not fully dropped.")));

 	pgstat_reset_entry(SPOCK_PGSTAT_KIND_LRCONFLICTS, MyDatabaseId, subid, 0);
+	pgstat_release_entry_ref(SPOCK_PGSTAT_KIND_LRCONFLICTS, ref, false);
 }
#!/bin/bash
# Verify pgstat_release_entry_ref signature and whether it takes kind as first arg or just ref
# Also check if there's any auto-release mechanism we might be missing
rg -n --type=c -B2 -A8 'pgstat_release_entry_ref' | head -60
PostgreSQL 18 pgstat_release_entry_ref function signature parameters
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_conflict_stat.c` around lines 118 - 147,
spock_stat_create_subscription is leaking a reference obtained from
pgstat_get_entry_ref (ref) because it never calls pgstat_release_entry_ref;
after performing the refcount check and before exiting the function (after
pgstat_reset_entry call), call pgstat_release_entry_ref(ref) to release the
reference and avoid inflating shared_entry->refcount—ensure you use the proper
pgstat_release_entry_ref signature that accepts the PgStat_EntryRef* returned by
pgstat_get_entry_ref.

30-35: Wiki registration TODO is still open — track it before production release.

ID 28 is currently unassigned and safe for development, but the TODO to register it on the PostgreSQL CustomCumulativeStats wiki should be completed before shipping. This was noted in a prior review and remains outstanding.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_conflict_stat.c` around lines 30 - 35, The TODO about registering
the custom stats kind must be resolved before production: update the comment
around SPOCK_PGSTAT_KIND_LRCONFLICTS to indicate that ID 28 is temporary for
development and add a tracking task/reference (e.g., a Bug/Issue ID or a TODO
with owner and deadline) to complete registration on the PostgreSQL
CustomCumulativeStats wiki; ensure the comment references the symbol
SPOCK_PGSTAT_KIND_LRCONFLICTS and the wiki URL so the release checklist can
verify the registration is completed prior to shipping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/spock_conflict_stat.c`:
- Around line 118-147: spock_stat_create_subscription is leaking a reference
obtained from pgstat_get_entry_ref (ref) because it never calls
pgstat_release_entry_ref; after performing the refcount check and before exiting
the function (after pgstat_reset_entry call), call pgstat_release_entry_ref(ref)
to release the reference and avoid inflating shared_entry->refcount—ensure you
use the proper pgstat_release_entry_ref signature that accepts the
PgStat_EntryRef* returned by pgstat_get_entry_ref.
- Around line 30-35: The TODO about registering the custom stats kind must be
resolved before production: update the comment around
SPOCK_PGSTAT_KIND_LRCONFLICTS to indicate that ID 28 is temporary for
development and add a tracking task/reference (e.g., a Bug/Issue ID or a TODO
with owner and deadline) to complete registration on the PostgreSQL
CustomCumulativeStats wiki; ensure the comment references the symbol
SPOCK_PGSTAT_KIND_LRCONFLICTS and the wiki URL so the release checklist can
verify the registration is completed prior to shipping.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aed824d and 9a37393.

⛔ Files ignored due to path filters (2)
  • tests/regress/expected/conflict_stat.out is excluded by !**/*.out
  • tests/regress/expected/replication_set.out is excluded by !**/*.out
📒 Files selected for processing (13)
  • .github/workflows/installcheck.yml
  • Makefile
  • include/spock_conflict.h
  • include/spock_conflict_stat.h
  • sql/spock--5.0.4--6.0.0-devel.sql
  • sql/spock--6.0.0-devel.sql
  • src/spock.c
  • src/spock_apply_heap.c
  • src/spock_conflict.c
  • src/spock_conflict_stat.c
  • src/spock_functions.c
  • tests/regress/sql/conflict_stat.sql
  • tests/regress/sql/replication_set.sql
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/regress/sql/conflict_stat.sql
  • .github/workflows/installcheck.yml
  • src/spock_functions.c
  • sql/spock--6.0.0-devel.sql
  • src/spock.c
  • include/spock_conflict.h
  • tests/regress/sql/replication_set.sql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants