Conversation
Polars equivalent of the pandas compare module. Key differences: - Uses pl.DataFrame.join() with coalesce=False to detect membership via null patterns on join keys (polars has no merge indicator) - Maps 'outer' to 'full' for polars join API compatibility - Uses eq_missing/ne_missing for null-aware comparisons - Validates join key uniqueness via pl.struct().is_unique() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0c2ca5af1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
buckaroo/polars_compare.py
Outdated
| left_key = join_columns[0] | ||
| right_key = f"{left_key}{df2_suffix}" | ||
| m_df = m_df.with_columns( | ||
| pl.when(pl.col(left_key).is_not_null() & pl.col(right_key).is_not_null()) |
There was a problem hiding this comment.
Track row origin without nullable join keys
membership is derived from whether join_columns[0] and its suffixed counterpart are null, but that logic fails when the first join key itself can be null. In an outer/full join, a df1-only row with a null first key will have both key columns null after the join and gets labeled as 2 (df2-only) instead of 1, which miscolors row provenance and can distort downstream comparisons that depend on membership. Derive origin from explicit per-side marker columns added before the join (or another non-null indicator) rather than key nullness.
Useful? React with 👍 / 👎.
📦 TestPyPI package publishedpip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22427399358or with uv: uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22427399358MCP server for Claude Codeclaude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.12.12.dev22427399358" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table |
Addresses Codex review: null join keys broke membership detection when relying on key null patterns. Now adds non-null boolean marker columns (__bk_left, __bk_right) before the join, derives membership from those, then drops them. This is immune to null join keys. Adds test for nullable join key edge case. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
buckaroo/polars_compare.pywithcol_join_dfsfor Polars DataFrames, mirroring the pandas compare module from Fix BuckarooCompare for arbitrary join keys #589pl.DataFrame.join()withcoalesce=Falseto detect membership via null patterns on join keys (polars has no merge indicator)"outer"to"full"for polars join API compatibilityeq_missing/ne_missingfor null-aware comparisons (polars!=returns null when either operand is null)pl.struct().is_unique()Test plan
tests/unit/polars_compare_test.pycovering:join_columnsnormalizationouter/fullhow alias🤖 Generated with Claude Code