-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Describe the bug
Comet's CometFuzzTestSuite "join" test uses a mix of extreme and random values to exercise operators, which found this bug while testing DF 53.0.0-rc2 (apache/datafusion-comet#3629).
try_build_perfect_hash_map panics with attempt to add with overflow when a signed integer join key spans the full range of its type (e.g. both i64::MIN and i64::MAX are present).
Introduced in #19411 (c98fa5616).
To Reproduce
Self-join on an Int64 column containing i64::MIN and i64::MAX:
#[tokio::test]
async fn test_perfect_hash_join_overflow_full_int64_range() -> Result<()> {
let task_ctx = prepare_task_ctx(8192, true);
let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int64, true)]));
let batch = RecordBatch::try_new(
Arc::clone(&schema),
vec![Arc::new(Int64Array::from(vec![i64::MIN, i64::MAX]))],
)?;
let left = TestMemoryExec::try_new_exec(&[vec![batch.clone()]], Arc::clone(&schema), None)?;
let right = TestMemoryExec::try_new_exec(&[vec![batch]], schema, None)?;
let on: JoinOn = vec![(
Arc::new(Column::new_with_schema("a", &left.schema())?) as _,
Arc::new(Column::new_with_schema("a", &right.schema())?) as _,
)];
let (_columns, _batches, _metrics) = join_collect(
left, right, on, &JoinType::Inner,
NullEquality::NullEqualsNothing, task_ctx,
).await?;
Ok(())
}Root cause
In exec.rs, try_build_perfect_hash_map:
let range = ArrayMap::calculate_range(min_val, max_val); // line 156
let num_row: usize = batches.iter().map(|x| x.num_rows()).sum();
let dense_ratio = (num_row as f64) / ((range + 1) as f64); // line 158 — OVERFLOW
// ...
if range == usize::MAX as u64 { // line 174 — guard is too late
return Ok(None);
}When key_to_u64(i64::MIN) = 9223372036854775808 and key_to_u64(i64::MAX) = 9223372036854775807, calculate_range returns u64::MAX via wrapping_sub. The range + 1 on line 158 then overflows (panic in debug, wrap-to-zero in release → division by zero).
The existing guard on line 174 correctly checks for this case but is placed after the overflow.
Proposed fix
Move the range == usize::MAX as u64 guard before line 158, or replace line 158 with:
let Some(range_plus_one) = range.checked_add(1) else {
return Ok(None);
};
let dense_ratio = (num_row as f64) / (range_plus_one as f64);Claude helped me write the issue after I found it while testing Comet. Claude also provided the proposed fix, so please double check the latter before opening a PR with that fix. I can look at it in more detail tomorrow if no one else can get to it.