Skip to content

Perfect hash join panics with overflow on full-range signed integer keys #20995

@mbutrovich

Description

@mbutrovich

Describe the bug

Comet's CometFuzzTestSuite "join" test uses a mix of extreme and random values to exercise operators, which found this bug while testing DF 53.0.0-rc2 (apache/datafusion-comet#3629).

try_build_perfect_hash_map panics with attempt to add with overflow when a signed integer join key spans the full range of its type (e.g. both i64::MIN and i64::MAX are present).

Introduced in #19411 (c98fa5616).

To Reproduce

Self-join on an Int64 column containing i64::MIN and i64::MAX:

#[tokio::test]
async fn test_perfect_hash_join_overflow_full_int64_range() -> Result<()> {
    let task_ctx = prepare_task_ctx(8192, true);
    let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int64, true)]));
    let batch = RecordBatch::try_new(
        Arc::clone(&schema),
        vec![Arc::new(Int64Array::from(vec![i64::MIN, i64::MAX]))],
    )?;
    let left = TestMemoryExec::try_new_exec(&[vec![batch.clone()]], Arc::clone(&schema), None)?;
    let right = TestMemoryExec::try_new_exec(&[vec![batch]], schema, None)?;
    let on: JoinOn = vec![(
        Arc::new(Column::new_with_schema("a", &left.schema())?) as _,
        Arc::new(Column::new_with_schema("a", &right.schema())?) as _,
    )];
    let (_columns, _batches, _metrics) = join_collect(
        left, right, on, &JoinType::Inner,
        NullEquality::NullEqualsNothing, task_ctx,
    ).await?;
    Ok(())
}

Root cause

In exec.rs, try_build_perfect_hash_map:

let range = ArrayMap::calculate_range(min_val, max_val);  // line 156
let num_row: usize = batches.iter().map(|x| x.num_rows()).sum();
let dense_ratio = (num_row as f64) / ((range + 1) as f64);  // line 158 — OVERFLOW
// ...
if range == usize::MAX as u64 {  // line 174 — guard is too late
    return Ok(None);
}

When key_to_u64(i64::MIN) = 9223372036854775808 and key_to_u64(i64::MAX) = 9223372036854775807, calculate_range returns u64::MAX via wrapping_sub. The range + 1 on line 158 then overflows (panic in debug, wrap-to-zero in release → division by zero).

The existing guard on line 174 correctly checks for this case but is placed after the overflow.

Proposed fix

Move the range == usize::MAX as u64 guard before line 158, or replace line 158 with:

let Some(range_plus_one) = range.checked_add(1) else {
    return Ok(None);
};
let dense_ratio = (num_row as f64) / (range_plus_one as f64);

Claude helped me write the issue after I found it while testing Comet. Claude also provided the proposed fix, so please double check the latter before opening a PR with that fix. I can look at it in more detail tomorrow if no one else can get to it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions