Conversation
…fy implementation
WalkthroughAdds batch UpdateNodes plumbing and BatchOption support; implements Neo4j dual-buffer and hashing-keyed batched updates; adds PostgreSQL large-node bulk-update via staging + COPY FROM + MERGE; adds cypher utility to stringify deleted properties; updates drivers, graph API, ops, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant App
participant Driver
participant Batch
participant Txn as Transaction
participant DB as Database
App->>Driver: BatchOperation(ctx, delegate, options...)
Driver->>Driver: apply BatchOptions (BatchSize)
Driver->>DB: open connection / start transaction
Driver->>Batch: new batch (config.BatchSize)
Driver->>Batch: delegate(batch)
App->>Batch: UpdateNodes([node1,node2,...])
Batch->>Batch: accumulate in nodeUpdateBuffer / nodeUpdateByBuffer
Batch->>Batch: compute updateKey (xxhash) and group params
Batch->>Batch: if threshold exceeded -> flushNodeUpdateBuffer
alt Neo4j batched path
Batch->>Txn: cypherBuildNodeUpdateQueryBatch(params)
Txn->>DB: EXECUTE UNWIND-based update statements
DB->>Txn: ack
Txn->>Batch: logWrites
else per-id path
Batch->>Txn: flushNodeUpdateByBuffer -> execute per-id updates
end
Driver->>Batch: Commit()
Batch->>Txn: flush remaining buffers
Txn->>DB: commit
DB->>Driver: commit ack
Driver->>App: return
sequenceDiagram
participant App
participant Driver
participant Batch
participant Staging as StagingTable
participant Txn as Transaction
participant DB as Database
App->>Driver: BatchOperation(ctx, delegate)
Driver->>DB: open connection
Driver->>Batch: new batch
App->>Batch: UpdateNodes([>LargeNodeUpdateThreshold nodes])
Batch->>Batch: detect large batch -> largeUpdate flow
Batch->>Staging: CREATE TEMP TABLE (staging)
Batch->>Staging: COPY FROM staging (stream rows)
Staging->>Batch: rows loaded
Batch->>Txn: Format MERGE from staging -> target node table
Txn->>DB: EXECUTE MERGE (update kind_ids / properties)
DB->>Txn: ack
Txn->>Staging: ON COMMIT DROP (cleanup)
Driver->>App: return
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present. |
superlinkx
left a comment
There was a problem hiding this comment.
No major logical concerns, would like to see a couple small things patched up, but otherwise this looks good
|
@coderabbitai full review |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
drivers/neo4j/cypher.go (1)
261-263:⚠️ Potential issue | 🔴 CriticalBug: Iterating
AddedKindstwice instead ofDeletedKinds.The second loop incorrectly iterates over
node.AddedKinds.Strings()when it should iterate overnode.DeletedKinds.Strings(). This causes nodes with differentDeletedKindsbut identicalAddedKindsto be incorrectly grouped into the same batch, leading to incorrect label removal operations.🐛 Proposed fix
- for _, removedKindStr := range node.AddedKinds.Strings() { + for _, removedKindStr := range node.DeletedKinds.Strings() { kindSet[removedKindStr] = struct{}{} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/cypher.go` around lines 261 - 263, The loop building kindSet mistakenly iterates node.AddedKinds.Strings() twice; change the second loop to iterate node.DeletedKinds.Strings() so that kindSet includes both added and deleted kind strings. Locate the loops around kindSet construction (reference symbols: node.AddedKinds, node.DeletedKinds, kindSet) in the code that prepares batches for label updates and replace the second occurrence of node.AddedKinds.Strings() with node.DeletedKinds.Strings() so label removal grouping uses the correct deleted kinds.
🧹 Nitpick comments (6)
ops/ops.go (2)
606-611: Consider usingvardeclaration for empty slice.Minor style suggestion for idiomatic Go initialization.
♻️ Proposed change
func UpdateNodes(ctx context.Context, graphDB graph.Database, nodes []*graph.Node, batchSize ...int) error { - options := []graph.BatchOption{} + var options []graph.BatchOption if len(batchSize) > 0 { options = append(options, graph.WithBatchSize(batchSize[0])) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ops/ops.go` around lines 606 - 611, The slice options in UpdateNodes is initialized with a composite literal; change it to a var declaration for idiomatic Go by declaring "var options []graph.BatchOption" instead of "options := []graph.BatchOption{}", leaving the rest of the logic (appending graph.WithBatchSize when batchSize provided) unchanged; update references to options in the function to use this new variable.
613-619: Simplify the return statement.The if-else block checking the error can be simplified by returning the error directly.
♻️ Proposed simplification
- if err := graphDB.BatchOperation(ctx, func(batch graph.Batch) error { + return graphDB.BatchOperation(ctx, func(batch graph.Batch) error { return batch.UpdateNodes(nodes) - }, options...); err != nil { - return err - } - - return nil + }, options...) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ops/ops.go` around lines 613 - 619, Replace the verbose error-checking around graphDB.BatchOperation with a direct return of the call result: call graphDB.BatchOperation(ctx, func(batch graph.Batch) error { return batch.UpdateNodes(nodes) }, options...) and return its error directly instead of the current if err := ...; if err != nil { return err } return nil pattern to simplify the control flow.drivers/neo4j/cypher.go (2)
316-327: Missing empty kind check when removing node labels.Same concern as above - should skip
graph.EmptyKindto maintain consistency with the relationship update function and avoid generating invalid Cypher statements.♻️ Proposed fix
if len(batch.nodeKindsToRemove) > 0 { output.WriteString(" remove ") for idx, kindToRemove := range batch.nodeKindsToRemove { + if kindToRemove == graph.EmptyKind { + continue + } if idx > 0 { output.WriteString(",") } output.WriteString("n:") output.WriteString(kindToRemove.String()) } }Note: With this change, you'll also need to track whether any non-empty kinds have been written to correctly handle the comma separator logic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/cypher.go` around lines 316 - 327, Skip graph.EmptyKind when iterating batch.nodeKindsToRemove to avoid generating invalid Cypher; update the loop in the block that writes the " remove " labels so it filters out kindToRemove == graph.EmptyKind and only writes commas between previously-written non-empty kinds (track a boolean like wroteKind or use a separate counter) before calling output.WriteString("n:") and output.WriteString(kindToRemove.String()).
309-314: Missing empty kind check when adding node labels.Unlike
cypherBuildRelationshipUpdateQueryBatch(lines 174-180) which explicitly skipsgraph.EmptyKind, this function doesn't check for empty kinds before writing labels. This inconsistency could generate invalid Cypher if an empty kind is present.♻️ Proposed fix
if len(batch.nodeKindsToAdd) > 0 { for _, kindToAdd := range batch.nodeKindsToAdd { + if kindToAdd == graph.EmptyKind { + continue + } output.WriteString(", n:") output.WriteString(kindToAdd.String()) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/cypher.go` around lines 309 - 314, The code that appends node labels from batch.nodeKindsToAdd should skip empty kinds to avoid generating invalid Cypher; update the loop in the function that writes to output (the block using output.WriteString and kindToAdd.String()) to check if kindToAdd == graph.EmptyKind and continue/skip when true (same logic used in cypherBuildRelationshipUpdateQueryBatch), so no "n:" labels are written for graph.EmptyKind values.drivers/neo4j/batch_integration_test.go (1)
23-25: Avoid overriding environment variables ininit().The
init()function unconditionally setsNEO4J_CONNECTION, which will override any value set externally (e.g., in CI/CD pipelines). Consider removing this or making it conditional.♻️ Proposed fix: Only set if not already configured
func init() { - os.Setenv(Neo4jConnectionStringEnv, "neo4j://neo4j:neo4jj@localhost:7687") + if os.Getenv(Neo4jConnectionStringEnv) == "" { + os.Setenv(Neo4jConnectionStringEnv, "neo4j://neo4j:neo4jj@localhost:7687") + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/batch_integration_test.go` around lines 23 - 25, The init() currently unconditionally sets the Neo4j connection env var (Neo4jConnectionStringEnv), which can override external configuration; change init() so it only sets os.Setenv(Neo4jConnectionStringEnv, ...) when os.Getenv(Neo4jConnectionStringEnv) is empty (i.e., if not already configured), or remove the init() entirely if tests should always rely on externally provided NEO4J_CONNECTION; update the init() surrounding code that references Neo4jConnectionStringEnv accordingly.graph/graph.go (1)
286-288: Consider expanding theUpdateNodesdocumentation.The comment "Updates nodes by ID" is brief. Consider documenting behavior details like whether nodes must have valid IDs, what properties are updated, and how kinds modifications (AddedKinds/DeletedKinds) are handled.
📝 Suggested documentation improvement
- // Updates nodes by ID + // UpdateNodes updates multiple nodes in batch by their IDs. Each node's modified properties, + // added kinds, and deleted kinds will be persisted. Nodes must have valid IDs assigned. UpdateNodes(nodes []*Node) error🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@graph/graph.go` around lines 286 - 288, Expand the UpdateNodes method comment to clearly state expectations and behavior: indicate that each Node in the slice must include a valid non-empty ID (reference Node.ID), list which fields are updated (e.g., Name, Metadata, Properties) and whether unspecified fields are left unchanged or cleared, and document how AddedKinds/DeletedKinds are applied (are kinds appended/removed from the existing kinds set or does the Node.Kinds replace existing kinds). Also describe partial-failure semantics and return value (does UpdateNodes fail fast, return an aggregate error, or skip invalid nodes), and note any concurrency or transactional guarantees provided by UpdateNodes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@drivers/neo4j/batch_integration_test.go`:
- Around line 125-128: The assertion is wrong: after deleting HeatProperty the
test should assert it was removed; update the loop over nodes in the test (the
block referencing nodes, HeatProperty and NewProperty) to check that
HeatProperty no longer exists (e.g., use assert.False or assert.Equal to false
against node.Properties.Exists(HeatProperty) or
node.Properties.Get(HeatProperty).Any()) while keeping the NewProperty assertion
intact.
In `@drivers/neo4j/batch.go`:
- Around line 61-68: The UpdateNodes method appends the entire nodes slice to
s.nodeUpdateBuffer before checking s.batchWriteSize, allowing a single call to
bypass batching; change s.UpdateNodes (and use s.nodeUpdateBuffer,
s.batchWriteSize, and flushNodeUpdateBuffer) to append nodes in chunks: iterate
over the incoming nodes slice, append only as many items as fit until
s.batchWriteSize is reached, call s.flushNodeUpdateBuffer() when full, then
continue appending remaining items—repeat until all input nodes are consumed—so
the buffer never grows past batchWriteSize and existing flushNodeUpdateBuffer
logic is reused.
In `@drivers/pg/batch.go`:
- Around line 114-125: The large-update path currently builds a [][]any for
every node via NewLargeNodeUpdateRows and nodeRows.AppendAll and then passes
nodeRows.Rows() into tx.Conn().CopyFrom, which preallocates and retains the
entire payload; instead implement a streaming CopyFromSource (e.g.,
NodeUpdateStream or NodeUpdateCopySource) that implements pgx.CopyFromSource
(Next/Values/Error) to yield Values() per-row on demand (or iterate in bounded
chunks) and replace the nodeRows.AppendAll + nodeRows.Rows() call with
tx.Conn().CopyFrom(s.ctx, pgx.Identifier{sql.NodeUpdateStagingTable},
sql.NodeUpdateStagingColumns, nodeUpdateStream), so rows are encoded and emitted
lazily (or in small batches) rather than prebuilding [][]any in memory.
- Around line 199-202: Before taking the large update fast path, ensure any
buffered/ batched writes are flushed so updates see prior creations/changes: in
UpdateNodes, before calling s.largeUpdate(nodes), call the batch flush/commit
method (e.g., s.Commit()) and handle its error; if your codebase has a dedicated
flush method (e.g., flushPendingBuffers or flushWrites) use that instead—update
UpdateNodes to call that flush function, return any error, then proceed to
s.largeUpdate(nodes).
In `@graph/node.go`:
- Around line 121-135: The method StripAllPropertiesExcept currently
dereferences s.Properties (calling Exists, Get and accessing Deleted) without a
nil check; add a nil guard at the start of StripAllPropertiesExcept to handle a
nil s.Properties (e.g., return early or initialize s.Properties via
NewProperties()), then proceed to build newProperties and copy/exclude entries
as before; ensure references to s.Properties.Exists, s.Properties.Get and
s.Properties.Deleted are only done after the nil check and replace s.Properties
with newProperties at the end.
---
Duplicate comments:
In `@drivers/neo4j/cypher.go`:
- Around line 261-263: The loop building kindSet mistakenly iterates
node.AddedKinds.Strings() twice; change the second loop to iterate
node.DeletedKinds.Strings() so that kindSet includes both added and deleted kind
strings. Locate the loops around kindSet construction (reference symbols:
node.AddedKinds, node.DeletedKinds, kindSet) in the code that prepares batches
for label updates and replace the second occurrence of node.AddedKinds.Strings()
with node.DeletedKinds.Strings() so label removal grouping uses the correct
deleted kinds.
---
Nitpick comments:
In `@drivers/neo4j/batch_integration_test.go`:
- Around line 23-25: The init() currently unconditionally sets the Neo4j
connection env var (Neo4jConnectionStringEnv), which can override external
configuration; change init() so it only sets os.Setenv(Neo4jConnectionStringEnv,
...) when os.Getenv(Neo4jConnectionStringEnv) is empty (i.e., if not already
configured), or remove the init() entirely if tests should always rely on
externally provided NEO4J_CONNECTION; update the init() surrounding code that
references Neo4jConnectionStringEnv accordingly.
In `@drivers/neo4j/cypher.go`:
- Around line 316-327: Skip graph.EmptyKind when iterating
batch.nodeKindsToRemove to avoid generating invalid Cypher; update the loop in
the block that writes the " remove " labels so it filters out kindToRemove ==
graph.EmptyKind and only writes commas between previously-written non-empty
kinds (track a boolean like wroteKind or use a separate counter) before calling
output.WriteString("n:") and output.WriteString(kindToRemove.String()).
- Around line 309-314: The code that appends node labels from
batch.nodeKindsToAdd should skip empty kinds to avoid generating invalid Cypher;
update the loop in the function that writes to output (the block using
output.WriteString and kindToAdd.String()) to check if kindToAdd ==
graph.EmptyKind and continue/skip when true (same logic used in
cypherBuildRelationshipUpdateQueryBatch), so no "n:" labels are written for
graph.EmptyKind values.
In `@graph/graph.go`:
- Around line 286-288: Expand the UpdateNodes method comment to clearly state
expectations and behavior: indicate that each Node in the slice must include a
valid non-empty ID (reference Node.ID), list which fields are updated (e.g.,
Name, Metadata, Properties) and whether unspecified fields are left unchanged or
cleared, and document how AddedKinds/DeletedKinds are applied (are kinds
appended/removed from the existing kinds set or does the Node.Kinds replace
existing kinds). Also describe partial-failure semantics and return value (does
UpdateNodes fail fast, return an aggregate error, or skip invalid nodes), and
note any concurrency or transactional guarantees provided by UpdateNodes.
In `@ops/ops.go`:
- Around line 606-611: The slice options in UpdateNodes is initialized with a
composite literal; change it to a var declaration for idiomatic Go by declaring
"var options []graph.BatchOption" instead of "options := []graph.BatchOption{}",
leaving the rest of the logic (appending graph.WithBatchSize when batchSize
provided) unchanged; update references to options in the function to use this
new variable.
- Around line 613-619: Replace the verbose error-checking around
graphDB.BatchOperation with a direct return of the call result: call
graphDB.BatchOperation(ctx, func(batch graph.Batch) error { return
batch.UpdateNodes(nodes) }, options...) and return its error directly instead of
the current if err := ...; if err != nil { return err } return nil pattern to
simplify the control flow.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 44bf29f5-9c5f-4a1b-8c7d-058900fa49e6
📒 Files selected for processing (18)
cypher/models/pgsql/type.gocypher/models/pgsql/type_test.godrivers/neo4j/batch.godrivers/neo4j/batch_integration_test.godrivers/neo4j/cypher.godrivers/neo4j/cypher_internal_test.godrivers/neo4j/driver.godrivers/neo4j/transaction.godrivers/pg/batch.godrivers/pg/driver.godrivers/pg/query/format.godrivers/pg/query/format_test.gograph/graph.gograph/graph_test.gograph/node.gograph/node_test.gograph/switch.goops/ops.go
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (6)
graph/node.go (1)
121-131:⚠️ Potential issue | 🟠 MajorAdd a nil guard before dereferencing
s.Properties.At Line 125 and Line 129,
s.Propertiesis used without a nil check. A node created with nil properties will panic here.🛡️ Proposed fix
func (s *Node) StripAllPropertiesExcept(except ...string) { newProperties := NewProperties() + if s.Properties == nil { + s.Properties = newProperties + return + } for _, exclusion := range except { if s.Properties.Exists(exclusion) { newProperties.Set(exclusion, s.Properties.Get(exclusion).Any()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@graph/node.go` around lines 121 - 131, The method StripAllPropertiesExcept on struct Node dereferences s.Properties without checking for nil, which will panic for nodes with nil Properties; update Node.StripAllPropertiesExcept to first guard that s.Properties is non-nil (e.g., return early or initialize newProperties appropriately) before calling s.Properties.Exists, s.Properties.Get, or accessing s.Properties.Deleted, so that NewProperties, property copying (Exists/Get/Any/Set) and Delete logic only run when s.Properties is present.drivers/pg/batch.go (2)
199-202:⚠️ Potential issue | 🟠 MajorFlush pending batch buffers before entering
largeUpdate.At Line 200, the code branches directly to
largeUpdate. Buffered writes already queued in this batch may not be applied yet, causing ordering inconsistencies for same-batch operations.🧩 Proposed fix
func (s *batch) UpdateNodes(nodes []*graph.Node) error { if len(nodes) > LargeNodeUpdateThreshold { + if err := s.tryFlush(0); err != nil { + return err + } return s.largeUpdate(nodes) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/pg/batch.go` around lines 199 - 202, The UpdateNodes method on type batch currently jumps to s.largeUpdate when node count exceeds LargeNodeUpdateThreshold without flushing any buffered writes; before invoking s.largeUpdate(nodes) ensure pending batch buffers are flushed (e.g., call the existing flush/commit method on batch such as s.flush or s.flushPendingBuffers) so earlier queued writes are applied and ordering is preserved, then proceed to call s.largeUpdate(nodes).
114-125:⚠️ Potential issue | 🟠 MajorLarge-update path still materializes the full COPY payload in memory.
At Line 114-Line 125 and Line 145-Line 197, all rows are encoded into
[][]anybeforeCopyFrom. At the 1,000,000 threshold this can create large memory spikes/OOM risk.♻️ Suggested direction (stream rows lazily)
- nodeRows := NewLargeNodeUpdateRows(len(nodes)) - if err := nodeRows.AppendAll(s.ctx, nodes, s.schemaManager, s.kindIDEncoder); err != nil { - return err - } + nodeSource := NewLargeNodeUpdateCopySource(s.ctx, nodes, s.schemaManager, s.kindIDEncoder) if _, err := tx.Conn().CopyFrom( s.ctx, pgx.Identifier{sql.NodeUpdateStagingTable}, sql.NodeUpdateStagingColumns, - pgx.CopyFromRows(nodeRows.Rows()), + nodeSource, ); err != nil { return fmt.Errorf("copying nodes into staging table: %w", err) }Add a
pgx.CopyFromSourceimplementation to encode one row at a time.Also applies to: 145-197
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/pg/batch.go` around lines 114 - 125, The current large-update path builds a [][]any payload via NewLargeNodeUpdateRows + nodeRows.AppendAll and then passes nodeRows.Rows() into tx.Conn().CopyFrom, which materializes all rows in memory; instead implement a pgx.CopyFromSource that streams/encodes one row at a time (e.g., add a type LargeNodeUpdateSource that implements CopyFromSource with Next(), Values(), Err()), change NewLargeNodeUpdateRows (or add a factory like NewLargeNodeUpdateSource) to return that source, and call tx.Conn().CopyFrom(s.ctx, pgx.Identifier{sql.NodeUpdateStagingTable}, sql.NodeUpdateStagingColumns, largeNodeUpdateSource) so rows are produced lazily; apply the same pattern to the other large-update block that currently uses nodeRows.Rows().drivers/neo4j/batch_integration_test.go (1)
125-127:⚠️ Potential issue | 🟠 MajorAssertion is inverted for
HeatPropertyafter update.
HeatPropertyis deleted beforebatch.UpdateNodes(nodes), so this should assert absence, not presence.🐛 Proposed fix
- for _, node := range nodes { - assert.True(t, node.Properties.Exists(HeatProperty)) - assert.Equal(t, true, node.Properties.Get(NewProperty).Any()) - } + for _, node := range nodes { + assert.False(t, node.Properties.Exists(HeatProperty), "HeatProperty should have been deleted") + assert.Equal(t, true, node.Properties.Get(NewProperty).Any()) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/batch_integration_test.go` around lines 125 - 127, The test currently asserts that HeatProperty exists on each node after calling batch.UpdateNodes(nodes), but HeatProperty is deleted before the update; change the assertion to assert that HeatProperty does not exist by replacing assert.True(t, node.Properties.Exists(HeatProperty)) with an assertion that it is false (e.g., assert.False or equivalent) while keeping the NewProperty assertion (assert.Equal(t, true, node.Properties.Get(NewProperty).Any())) intact; locate the check in the loop over nodes in the test that calls batch.UpdateNodes(nodes).drivers/neo4j/batch.go (1)
61-66:⚠️ Potential issue | 🟠 Major
UpdateNodescurrently bypassesbatchWriteSizeon large slices.Appending all nodes before checking the threshold allows one call to exceed configured batch limits.
🐛 Proposed fix
func (s *batchTransaction) UpdateNodes(nodes []*graph.Node) error { - s.nodeUpdateBuffer = append(s.nodeUpdateBuffer, nodes...) - - if len(s.nodeUpdateBuffer) > s.batchWriteSize { - return s.flushNodeUpdateBuffer() - } - - return nil + for _, node := range nodes { + s.nodeUpdateBuffer = append(s.nodeUpdateBuffer, node) + + if len(s.nodeUpdateBuffer) >= s.batchWriteSize { + if err := s.flushNodeUpdateBuffer(); err != nil { + return err + } + } + } + + return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/batch.go` around lines 61 - 66, UpdateNodes currently appends the entire nodes slice to s.nodeUpdateBuffer and only then checks s.batchWriteSize, which can allow a single call to exceed the batch limit; modify UpdateNodes to append and flush in chunks so the buffer never grows beyond s.batchWriteSize by repeatedly appending up to the remaining capacity and calling s.flushNodeUpdateBuffer() as needed (i.e., loop over the input slice, add a portion to s.nodeUpdateBuffer, flush when len(s.nodeUpdateBuffer) >= s.batchWriteSize, then continue until all nodes are consumed), referencing UpdateNodes, s.nodeUpdateBuffer, s.batchWriteSize and flushNodeUpdateBuffer.drivers/neo4j/cypher.go (1)
261-263:⚠️ Potential issue | 🔴 CriticalBug: Iterating over
AddedKindsinstead ofDeletedKinds.This should iterate over
node.DeletedKindsto correctly compute the update key. Currently, nodes with differentDeletedKindsbut identicalAddedKindswill be batched together, causing incorrect label removals.🐛 Proposed fix
- for _, removedKindStr := range node.AddedKinds.Strings() { + for _, removedKindStr := range node.DeletedKinds.Strings() { kindSet[removedKindStr] = struct{}{} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/cypher.go` around lines 261 - 263, The code is populating kindSet from the wrong field: change the loop that currently iterates node.AddedKinds.Strings() to iterate node.DeletedKinds.Strings() so the update key reflects label removals correctly; specifically update the loop that assigns to kindSet (the one using node.AddedKinds.Strings()) to use node.DeletedKinds.Strings() instead so nodes are grouped by their deleted kinds when computing the update key.
🧹 Nitpick comments (1)
drivers/pg/query/format.go (1)
124-125: Avoid exposing mutable global staging column state.At Line 125,
NodeUpdateStagingColumnsis exported as a mutable slice. External mutation can silently break COPY column alignment at runtime. Prefer an unexported backing value with a getter that returns a copy.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/pg/query/format.go` around lines 124 - 125, Replace the exported mutable slice NodeUpdateStagingColumns with an unexported backing variable (e.g., nodeUpdateStagingColumns) and add an exported getter function NodeUpdateStagingColumns() []string that returns a copy of the slice; specifically, rename the current var to nodeUpdateStagingColumns and implement NodeUpdateStagingColumns() to allocate a new slice, copy the contents of nodeUpdateStagingColumns into it, and return that copy to prevent external mutation from breaking COPY column alignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cypher/models/pgsql/type.go`:
- Around line 60-69: DeletedPropertiesToString currently calls
properties.DeletedProperties() without guarding for a nil receiver which will
panic; update DeletedPropertiesToString to return "{}" immediately if properties
== nil (mirroring the nil-check used by PropertiesToJSONB/MapOrEmpty), then
proceed to call properties.DeletedProperties(), build the quoted slice and join
as before; also add a unit test that calls DeletedPropertiesToString(nil) and
expects "{}" to ensure nil-safety.
In `@drivers/neo4j/batch_integration_test.go`:
- Around line 23-25: The init() function unconditionally calls
os.Setenv(Neo4jConnectionStringEnv, "...") which overrides any existing
NEO4J_CONNECTION value; remove that unconditional override and instead only set
a default when the env var is empty (e.g., check
os.Getenv(Neo4jConnectionStringEnv) and only call os.Setenv if it's ""), or
delete the init() entirely so tests respect CI/local configuration; update the
code around the init() function and references to Neo4jConnectionStringEnv
accordingly.
In `@graph/graph.go`:
- Around line 369-372: The WithBatchSize option currently sets
BatchConfig.BatchSize unconditionally; guard against non-positive values by
validating the incoming size in WithBatchSize and only assigning it when size >
0 (otherwise leave the existing config.BatchSize unchanged or set a sensible
default). Update the WithBatchSize(func(config *BatchConfig)) to check size <= 0
and return early (or set to a default constant) so BatchConfig.BatchSize is
never set to a non-positive value; reference BatchConfig.BatchSize,
WithBatchSize, and BatchOption when making the change.
In `@ops/ops.go`:
- Around line 609-611: The code silently ignores extra variadic batchSize
values; update the handling around batchSize to explicitly reject multiple
values: if len(batchSize) > 1 return an error (e.g., fmt.Errorf("only one
batchSize allowed, got %d", len(batchSize))) so misuse is visible, and keep the
existing branch that appends graph.WithBatchSize(batchSize[0]) when exactly one
value is provided; modify the enclosing function signature to return an error if
it doesn't already so you can propagate this validation failure.
---
Duplicate comments:
In `@drivers/neo4j/batch_integration_test.go`:
- Around line 125-127: The test currently asserts that HeatProperty exists on
each node after calling batch.UpdateNodes(nodes), but HeatProperty is deleted
before the update; change the assertion to assert that HeatProperty does not
exist by replacing assert.True(t, node.Properties.Exists(HeatProperty)) with an
assertion that it is false (e.g., assert.False or equivalent) while keeping the
NewProperty assertion (assert.Equal(t, true,
node.Properties.Get(NewProperty).Any())) intact; locate the check in the loop
over nodes in the test that calls batch.UpdateNodes(nodes).
In `@drivers/neo4j/batch.go`:
- Around line 61-66: UpdateNodes currently appends the entire nodes slice to
s.nodeUpdateBuffer and only then checks s.batchWriteSize, which can allow a
single call to exceed the batch limit; modify UpdateNodes to append and flush in
chunks so the buffer never grows beyond s.batchWriteSize by repeatedly appending
up to the remaining capacity and calling s.flushNodeUpdateBuffer() as needed
(i.e., loop over the input slice, add a portion to s.nodeUpdateBuffer, flush
when len(s.nodeUpdateBuffer) >= s.batchWriteSize, then continue until all nodes
are consumed), referencing UpdateNodes, s.nodeUpdateBuffer, s.batchWriteSize and
flushNodeUpdateBuffer.
In `@drivers/neo4j/cypher.go`:
- Around line 261-263: The code is populating kindSet from the wrong field:
change the loop that currently iterates node.AddedKinds.Strings() to iterate
node.DeletedKinds.Strings() so the update key reflects label removals correctly;
specifically update the loop that assigns to kindSet (the one using
node.AddedKinds.Strings()) to use node.DeletedKinds.Strings() instead so nodes
are grouped by their deleted kinds when computing the update key.
In `@drivers/pg/batch.go`:
- Around line 199-202: The UpdateNodes method on type batch currently jumps to
s.largeUpdate when node count exceeds LargeNodeUpdateThreshold without flushing
any buffered writes; before invoking s.largeUpdate(nodes) ensure pending batch
buffers are flushed (e.g., call the existing flush/commit method on batch such
as s.flush or s.flushPendingBuffers) so earlier queued writes are applied and
ordering is preserved, then proceed to call s.largeUpdate(nodes).
- Around line 114-125: The current large-update path builds a [][]any payload
via NewLargeNodeUpdateRows + nodeRows.AppendAll and then passes nodeRows.Rows()
into tx.Conn().CopyFrom, which materializes all rows in memory; instead
implement a pgx.CopyFromSource that streams/encodes one row at a time (e.g., add
a type LargeNodeUpdateSource that implements CopyFromSource with Next(),
Values(), Err()), change NewLargeNodeUpdateRows (or add a factory like
NewLargeNodeUpdateSource) to return that source, and call
tx.Conn().CopyFrom(s.ctx, pgx.Identifier{sql.NodeUpdateStagingTable},
sql.NodeUpdateStagingColumns, largeNodeUpdateSource) so rows are produced
lazily; apply the same pattern to the other large-update block that currently
uses nodeRows.Rows().
In `@graph/node.go`:
- Around line 121-131: The method StripAllPropertiesExcept on struct Node
dereferences s.Properties without checking for nil, which will panic for nodes
with nil Properties; update Node.StripAllPropertiesExcept to first guard that
s.Properties is non-nil (e.g., return early or initialize newProperties
appropriately) before calling s.Properties.Exists, s.Properties.Get, or
accessing s.Properties.Deleted, so that NewProperties, property copying
(Exists/Get/Any/Set) and Delete logic only run when s.Properties is present.
---
Nitpick comments:
In `@drivers/pg/query/format.go`:
- Around line 124-125: Replace the exported mutable slice
NodeUpdateStagingColumns with an unexported backing variable (e.g.,
nodeUpdateStagingColumns) and add an exported getter function
NodeUpdateStagingColumns() []string that returns a copy of the slice;
specifically, rename the current var to nodeUpdateStagingColumns and implement
NodeUpdateStagingColumns() to allocate a new slice, copy the contents of
nodeUpdateStagingColumns into it, and return that copy to prevent external
mutation from breaking COPY column alignment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 07233c5b-1a0e-4303-b794-a862fc49d584
📒 Files selected for processing (18)
cypher/models/pgsql/type.gocypher/models/pgsql/type_test.godrivers/neo4j/batch.godrivers/neo4j/batch_integration_test.godrivers/neo4j/cypher.godrivers/neo4j/cypher_internal_test.godrivers/neo4j/driver.godrivers/neo4j/transaction.godrivers/pg/batch.godrivers/pg/driver.godrivers/pg/query/format.godrivers/pg/query/format_test.gograph/graph.gograph/graph_test.gograph/node.gograph/node_test.gograph/switch.goops/ops.go
There was a problem hiding this comment.
🧹 Nitpick comments (3)
graph/graph.go (1)
286-288: Minor doc comment convention.Go doc comments for interface methods should start with the method name. Consider:
- // Updates nodes by ID + // UpdateNodes updates nodes by ID. UpdateNodes(nodes []*Node) error🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@graph/graph.go` around lines 286 - 288, The interface method comment "Updates nodes by ID" should follow Go doc conventions and start with the method name; update the comment to begin with "UpdateNodes ..." so it reads like "UpdateNodes updates nodes by ID" (or similar), placing it immediately above the UpdateNodes method declaration in the interface to satisfy GoDoc style and linters.ops/ops.go (1)
617-624: Simplify the return statement.The
if err != nil { return err }; return nilpattern can be replaced by directly returning the result.♻️ Suggested simplification
- if err := graphDB.BatchOperation(ctx, func(batch graph.Batch) error { + return graphDB.BatchOperation(ctx, func(batch graph.Batch) error { return batch.UpdateNodes(nodes) - }, options...); err != nil { - return err - } - - return nil + }, options...) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ops/ops.go` around lines 617 - 624, The code calls graphDB.BatchOperation(...) and currently does "if err := ...; err != nil { return err } return nil"; replace that pattern by directly returning the result of the BatchOperation call. Locate the block containing graphDB.BatchOperation(ctx, func(batch graph.Batch) error { return batch.UpdateNodes(nodes) }, options...) and remove the conditional check so the function simply returns the BatchOperation call's error value.graph/node.go (1)
115-140: LGTM! Nil guard addressed; minor naming clarity suggestion.The implementation correctly handles the nil
Propertiescase and the logic for preserving specified properties is sound. Direct map access ons.Properties.Deletedis safe in Go for nil maps.One optional refinement: the loop variable
exclusionis semantically inverted—these are properties to keep (excluded from removal), not properties being excluded. A name likepropertyToKeeporkeywould improve readability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@graph/node.go` around lines 115 - 140, The loop variable name "exclusion" in the StripAllPropertiesExcept method is misleading because it represents keys to keep; rename it to something clearer (e.g., propertyToKeep or key) to improve readability: update the for loop header and all uses inside the loop (the Exists/Get/Delete calls) inside the StripAllPropertiesExcept function to use the new identifier, leaving all logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@graph/graph.go`:
- Around line 286-288: The interface method comment "Updates nodes by ID" should
follow Go doc conventions and start with the method name; update the comment to
begin with "UpdateNodes ..." so it reads like "UpdateNodes updates nodes by ID"
(or similar), placing it immediately above the UpdateNodes method declaration in
the interface to satisfy GoDoc style and linters.
In `@graph/node.go`:
- Around line 115-140: The loop variable name "exclusion" in the
StripAllPropertiesExcept method is misleading because it represents keys to
keep; rename it to something clearer (e.g., propertyToKeep or key) to improve
readability: update the for loop header and all uses inside the loop (the
Exists/Get/Delete calls) inside the StripAllPropertiesExcept function to use the
new identifier, leaving all logic unchanged.
In `@ops/ops.go`:
- Around line 617-624: The code calls graphDB.BatchOperation(...) and currently
does "if err := ...; err != nil { return err } return nil"; replace that pattern
by directly returning the result of the BatchOperation call. Locate the block
containing graphDB.BatchOperation(ctx, func(batch graph.Batch) error { return
batch.UpdateNodes(nodes) }, options...) and remove the conditional check so the
function simply returns the BatchOperation call's error value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0f34a6b0-f49d-4083-8cf3-0edb4fa04dbb
📒 Files selected for processing (4)
cypher/models/pgsql/type.gograph/graph.gograph/node.goops/ops.go
ff7a3d6 to
1ac2011
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
drivers/neo4j/cypher.go (2)
308-313: Consider adding an EmptyKind check for consistency.The relationship update function (
cypherBuildRelationshipUpdateQueryBatch, lines 175-176) skipsgraph.EmptyKindwhen adding labels, but this function does not. IfAddedKindscould ever contain an empty kind, this would produce invalid Cypher liken:.🔧 Proposed fix for defensive consistency
if len(batch.nodeKindsToAdd) > 0 { for _, kindToAdd := range batch.nodeKindsToAdd { + if kindToAdd == nil || kindToAdd.Is(graph.EmptyKind) { + continue + } output.WriteString(", n:") output.WriteString(kindToAdd.String()) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/cypher.go` around lines 308 - 313, The loop that appends node labels from batch.nodeKindsToAdd can emit an empty label (`n:`) if a kind equals graph.EmptyKind; update the logic in cypherBuildNodeUpdateQueryBatch (the block iterating over batch.nodeKindsToAdd) to skip any kind where kindToAdd == graph.EmptyKind before calling kindToAdd.String(), mirroring the check used in cypherBuildRelationshipUpdateQueryBatch, so no empty label tokens are written to output.
315-326: Same EmptyKind consideration applies to removed kinds.For symmetry with the relationship batch logic and defensive coding, consider adding the same guard here.
🔧 Proposed fix
if len(batch.nodeKindsToRemove) > 0 { output.WriteString(" remove ") + firstRemoval := true for idx, kindToRemove := range batch.nodeKindsToRemove { - if idx > 0 { + if kindToRemove == nil || kindToRemove.Is(graph.EmptyKind) { + continue + } + if !firstRemoval { output.WriteString(",") + } else { + firstRemoval = false } output.WriteString("n:") output.WriteString(kindToRemove.String()) } }Note: If empty kinds are filtered earlier in the pipeline and cannot reach this function, this is purely defensive and can be skipped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@drivers/neo4j/cypher.go` around lines 315 - 326, The loop that appends removed node kinds should defensively skip EmptyKind entries like the relationship batch logic does; in the block iterating batch.nodeKindsToRemove, add a guard that continues the loop when kindToRemove == EmptyKind (or equivalent sentinel) so you don't emit "n:" for empty kinds, ensuring you still handle comma placement correctly when some entries are skipped; use the batch.nodeKindsToRemove identifier and kindToRemove.String() to locate and adjust the loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@drivers/neo4j/cypher_internal_test.go`:
- Line 50: Fix the typo in the comment containing "Digest 3 and 4 should have a
distinct update key from digests 1 and 2 becaus the query" by changing "becaus"
to "because" so the sentence reads correctly; locate that comment in
cypher_internal_test.go (search for the exact comment text) and update it in
place (no behavior changes required).
---
Nitpick comments:
In `@drivers/neo4j/cypher.go`:
- Around line 308-313: The loop that appends node labels from
batch.nodeKindsToAdd can emit an empty label (`n:`) if a kind equals
graph.EmptyKind; update the logic in cypherBuildNodeUpdateQueryBatch (the block
iterating over batch.nodeKindsToAdd) to skip any kind where kindToAdd ==
graph.EmptyKind before calling kindToAdd.String(), mirroring the check used in
cypherBuildRelationshipUpdateQueryBatch, so no empty label tokens are written to
output.
- Around line 315-326: The loop that appends removed node kinds should
defensively skip EmptyKind entries like the relationship batch logic does; in
the block iterating batch.nodeKindsToRemove, add a guard that continues the loop
when kindToRemove == EmptyKind (or equivalent sentinel) so you don't emit "n:"
for empty kinds, ensuring you still handle comma placement correctly when some
entries are skipped; use the batch.nodeKindsToRemove identifier and
kindToRemove.String() to locate and adjust the loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4c2d5175-a592-400c-8b4b-45bc2d28dddd
📒 Files selected for processing (2)
drivers/neo4j/cypher.godrivers/neo4j/cypher_internal_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@drivers/neo4j/batch_integration_test.go`:
- Line 64: Fix the typo in the inline comment that currently reads "Regsiter a
cleanup step to wipe the database" by changing "Regsiter" to "Register" so the
comment reads "Register a cleanup step to wipe the database"; update the comment
near the cleanup registration code (the comment containing the exact text
"Regsiter a cleanup step to wipe the database") to correct the spelling.
In `@drivers/neo4j/cypher.go`:
- Around line 308-326: The node batch Cypher generation must skip EmptyKind to
avoid producing labels like "n:" or "remove n:"; update the loops that iterate
batch.nodeKindsToAdd and batch.nodeKindsToRemove in the function building the
Cypher (where output.WriteString is used) to check for and continue on kinds
equal to EmptyKind (the same guard used in the relationship batch logic),
ensuring only non-empty kind.String() values are appended; this mirrors the
handling in the AddKinds/DeleteKinds flow and prevents invalid Cypher when
AddKinds()/DeleteKinds() leave EmptyKind in the slices.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: daae105c-679c-4884-aeeb-f986099ee261
📒 Files selected for processing (3)
drivers/neo4j/batch_integration_test.godrivers/neo4j/cypher.godrivers/neo4j/cypher_internal_test.go
zinic
left a comment
There was a problem hiding this comment.
I'm impressed at the plumbing of the copy table method. I figured it would be too ugly to plumb to make it approachable but I can see that I was wrong. This looks really tight.
I'm good with this approach on the PG side and the Neo4j side is patched up but will want for one last stamp before this goes in.
This PR adds the ability to batch update nodes.
A couple things to note:
UpdateNodes(nodes []*Node) errorsignature for the Batch interface because this allows me to use two different implementations for updating depending on the number of nodes being updated. If the update is below a certain threshold,LargeNodeUpdateThreshold(currently set to 1M nodes), we do the typical batched SQLUPDATE...SET.... Above that threshold, we instead do aCOPY...FROM...into a temporary table for the transaction, and thenMERGE INTOthe nodes table. This seems to be marginally faster for 1M+ updates, and should have even better scalability for much larger updates.BatchOptionparameter to theBatchOperationfunction signature that allows one to set aBatchWriteSizefor theBatchOperationops.go:func UpdateNodes(ctx context.Context, graphDB graph.Database, nodes []*graph.Node, batchSize ...int) errorthat I think should be the first tool users of DAWGS reach for when updating nodes(s *Node) StripAllPropertiesExcept(except ...string). The reason for this is if a developer is working with fully hydrated nodes in memory, but they only change one property and want to update all these nodes, we can cut down on a lot of network traffic to DB by removing all but the properties we care about before sending the nodes to the DB to be updated. The DB also does not need to bother parsing unnecessary information as well.Summary by CodeRabbit
New Features
Tests