Skip to content

Unroll interleave -25-30%#9542

Open
Dandandan wants to merge 3 commits intoapache:mainfrom
Dandandan:take_native
Open

Unroll interleave -25-30%#9542
Dandandan wants to merge 3 commits intoapache:mainfrom
Dandandan:take_native

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Mar 12, 2026

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change


🤖: Benchmark completed

Details

group                                                                                        main                                   interleave
-----                                                                                        ----                                   -----------
interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]                                   1.08    805.6±8.28ns        ? ?/sec    1.00   748.5±14.05ns        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.18      2.6±0.00µs        ? ?/sec    1.00      2.2±0.01µs        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                  1.21      2.6±0.01µs        ? ?/sec    1.00      2.2±0.02µs        ? ?/sec
interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]                                   1.16   1431.6±3.11ns        ? ?/sec    1.00  1232.9±14.26ns        ? ?/sec
interleave dict_distinct 100                                                                 1.03      2.9±0.12µs        ? ?/sec    1.00      2.9±0.07µs        ? ?/sec
interleave dict_distinct 1024                                                                1.02      2.9±0.06µs        ? ?/sec    1.00      2.8±0.03µs        ? ?/sec
interleave dict_distinct 2048                                                                1.03      2.9±0.02µs        ? ?/sec    1.00      2.8±0.08µs        ? ?/sec
interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]                            1.00      2.7±0.26µs        ? ?/sec    1.02      2.8±0.21µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                  1.11      5.3±0.31µs        ? ?/sec    1.00      4.8±0.40µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]                           1.16      4.8±0.25µs        ? ?/sec    1.00      4.1±0.23µs        ? ?/sec
interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]                            1.05      3.5±0.31µs        ? ?/sec    1.00      3.3±0.29µs        ? ?/sec
interleave i32(0.0) 100 [0..100, 100..230, 450..1000]                                        1.21    313.8±1.03ns        ? ?/sec    1.00    258.9±4.98ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.34  1856.5±17.40ns        ? ?/sec    1.00  1385.9±32.73ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000]                                       1.34   1848.6±8.80ns        ? ?/sec    1.00  1382.4±48.64ns        ? ?/sec
interleave i32(0.0) 400 [0..100, 100..230, 450..1000]                                        1.37    843.3±7.37ns        ? ?/sec    1.00   615.5±22.71ns        ? ?/sec
interleave i32(0.5) 100 [0..100, 100..230, 450..1000]                                        1.09    604.2±5.60ns        ? ?/sec    1.00    555.1±4.48ns        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.12      4.3±0.01µs        ? ?/sec    1.00      3.8±0.04µs        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000]                                       1.13      4.4±0.06µs        ? ?/sec    1.00      3.9±0.17µs        ? ?/sec
interleave i32(0.5) 400 [0..100, 100..230, 450..1000]                                        1.12  1889.4±19.68ns        ? ?/sec    1.00  1691.5±17.15ns        ? ?/sec
interleave list<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000]                           1.07      2.7±0.03µs        ? ?/sec    1.00      2.5±0.03µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.06     26.2±0.11µs        ? ?/sec    1.00     24.6±0.31µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000]                          1.06     25.9±0.14µs        ? ?/sec    1.00     24.5±0.29µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000]                           1.07     10.5±0.21µs        ? ?/sec    1.00      9.9±0.06µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000]                           1.05      5.8±0.25µs        ? ?/sec    1.00      5.5±0.06µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.05     47.4±2.23µs        ? ?/sec    1.00     45.2±0.14µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000]                          1.06     48.0±2.35µs        ? ?/sec    1.00     45.5±0.64µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000]                           1.05     19.2±0.90µs        ? ?/sec    1.00     18.2±0.03µs        ? ?/sec
interleave str(20, 0.0) 100 [0..100, 100..230, 450..1000]                                    1.01    786.8±1.50ns        ? ?/sec    1.00    779.4±4.35ns        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.04      6.3±0.12µs        ? ?/sec    1.00      6.0±0.02µs        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                   1.04      6.2±0.08µs        ? ?/sec    1.00      6.0±0.01µs        ? ?/sec
interleave str(20, 0.0) 400 [0..100, 100..230, 450..1000]                                    1.09      2.7±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
interleave str(20, 0.5) 100 [0..100, 100..230, 450..1000]                                    1.04  1064.4±19.37ns        ? ?/sec    1.00   1023.8±3.56ns        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.03     10.3±0.06µs        ? ?/sec    1.00     10.1±0.13µs        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000]                                   1.02     10.3±0.05µs        ? ?/sec    1.00     10.1±0.54µs        ? ?/sec
interleave str(20, 0.5) 400 [0..100, 100..230, 450..1000]                                    1.04      3.7±0.03µs        ? ?/sec    1.00      3.6±0.17µs        ? ?/sec
interleave str_view(0.0) 100 [0..100, 100..230, 450..1000]                                   1.01    856.9±2.90ns        ? ?/sec    1.00    849.1±7.00ns        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.00      5.0±0.15µs        ? ?/sec    1.02      5.1±0.02µs        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000]                                  1.00      4.9±0.05µs        ? ?/sec    1.04      5.1±0.02µs        ? ?/sec
interleave str_view(0.0) 400 [0..100, 100..230, 450..1000]                                   1.00      2.2±0.05µs        ? ?/sec    1.03      2.2±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 100 [0..100, 100..230, 450..1000]                       1.20    874.3±4.12ns        ? ?/sec    1.00   729.1±12.04ns        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]             1.34      4.0±0.01µs        ? ?/sec    1.00      3.0±0.02µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000]                      1.31      4.0±0.04µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 400 [0..100, 100..230, 450..1000]                       1.24  1905.1±19.48ns        ? ?/sec    1.00  1532.8±33.13ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 100 [0..100, 100..230, 450..1000]                   1.00   1340.9±6.76ns        ? ?/sec    1.01  1347.8±12.50ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]         1.08      8.3±0.16µs        ? ?/sec    1.00      7.7±0.02µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                  1.08      8.3±0.06µs        ? ?/sec    1.00      7.7±0.06µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 400 [0..100, 100..230, 450..1000]                   1.09      3.7±0.13µs        ? ?/sec    1.00      3.4±0.02µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 100 [0..100, 100..230, 450..1000]              1.05   1927.3±9.31ns        ? ?/sec    1.00  1842.2±18.19ns        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000, 0..1000]    1.04     12.6±0.06µs        ? ?/sec    1.00     12.1±0.08µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000]             1.04     12.6±0.03µs        ? ?/sec    1.00     12.1±0.14µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 400 [0..100, 100..230, 450..1000]              1.04      5.4±0.07µs        ? ?/sec    1.00      5.2±0.04µs        ? ?/sec

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Dandandan
Copy link
Contributor Author

Dandandan commented Mar 12, 2026

run benchmark take_kernels

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 12, 2026
@Dandandan Dandandan changed the title Unroll take_native Unroll take_native -25% Mar 12, 2026
@apache apache deleted a comment from alamb-ghbot Mar 12, 2026
@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take_native (50c5da7) to 9d0e8be diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take_native
Results will be posted here when complete

@Dandandan Dandandan changed the title Unroll take_native -25% Unroll take_native, interleave -25% Mar 12, 2026
@Dandandan
Copy link
Contributor Author

run benchmark interleave_kernels

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                     main                                   take_native
-----                                                                     ----                                   -----------
take bool 1024                                                            1.00   1329.1±2.25ns        ? ?/sec    1.00  1332.5±11.11ns        ? ?/sec
take bool 512                                                             1.00   730.0±11.00ns        ? ?/sec    1.00    730.4±9.07ns        ? ?/sec
take bool null indices 1024                                               1.01   1649.7±9.24ns        ? ?/sec    1.00  1627.9±38.00ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.04µs        ? ?/sec    1.00      2.6±0.02µs        ? ?/sec
take bool null values null indices 1024                                   1.00      3.2±0.02µs        ? ?/sec    1.16      3.6±0.06µs        ? ?/sec
take check bounds i32 1024                                                1.00    844.2±9.82ns        ? ?/sec    1.03   868.7±14.00ns        ? ?/sec
take check bounds i32 512                                                 1.01    535.4±3.41ns        ? ?/sec    1.00    527.9±2.64ns        ? ?/sec
take i32 1024                                                             1.08   715.4±10.47ns        ? ?/sec    1.00    665.0±2.75ns        ? ?/sec
take i32 512                                                              1.00    381.9±1.94ns        ? ?/sec    1.02    388.8±2.61ns        ? ?/sec
take i32 null indices 1024                                                1.00    995.2±2.99ns        ? ?/sec    1.00    995.6±3.85ns        ? ?/sec
take i32 null values 1024                                                 1.03      2.0±0.01µs        ? ?/sec    1.00   1951.0±3.34ns        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.6±0.02µs        ? ?/sec    1.02      2.6±0.03µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      3.6±0.01µs        ? ?/sec    1.00      3.6±0.02µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.01      5.0±0.06µs        ? ?/sec    1.00      5.0±0.10µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     20.2±0.13µs        ? ?/sec    1.20     24.2±0.07µs        ? ?/sec
take str 1024                                                             1.01     11.0±0.10µs        ? ?/sec    1.00     10.9±0.10µs        ? ?/sec
take str 512                                                              1.01      5.4±0.03µs        ? ?/sec    1.00      5.3±0.04µs        ? ?/sec
take str null indices 1024                                                1.02      6.9±0.04µs        ? ?/sec    1.00      6.8±0.03µs        ? ?/sec
take str null indices 512                                                 1.01      3.4±0.32µs        ? ?/sec    1.00      3.3±0.03µs        ? ?/sec
take str null values 1024                                                 1.01      8.7±0.16µs        ? ?/sec    1.00      8.6±0.10µs        ? ?/sec
take str null values null indices 1024                                    1.00      6.4±0.05µs        ? ?/sec    1.02      6.5±0.09µs        ? ?/sec
take stringview 1024                                                      1.00    883.5±2.10ns        ? ?/sec    1.56   1374.3±8.02ns        ? ?/sec
take stringview 512                                                       1.00    587.8±1.65ns        ? ?/sec    1.23    721.3±5.73ns        ? ?/sec
take stringview null indices 1024                                         1.00  1437.7±19.71ns        ? ?/sec    1.00  1437.1±23.03ns        ? ?/sec
take stringview null indices 512                                          1.00    737.0±3.13ns        ? ?/sec    1.06    778.8±3.59ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.02µs        ? ?/sec    1.22      2.6±0.00µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.9±0.02µs        ? ?/sec    1.01      2.9±0.04µs        ? ?/sec

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take_native (01a1161) to 9d0e8be diff
BENCH_NAME=interleave_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench interleave_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take_native
Results will be posted here when complete

@Dandandan
Copy link
Contributor Author

Hmm this looks actually worse on this machine (probably x86 vs ARM...)...

@Dandandan Dandandan closed this Mar 12, 2026
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                        main                                   take_native
-----                                                                                        ----                                   -----------
interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]                                   1.08    805.6±8.28ns        ? ?/sec    1.00   748.5±14.05ns        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.18      2.6±0.00µs        ? ?/sec    1.00      2.2±0.01µs        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                  1.21      2.6±0.01µs        ? ?/sec    1.00      2.2±0.02µs        ? ?/sec
interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]                                   1.16   1431.6±3.11ns        ? ?/sec    1.00  1232.9±14.26ns        ? ?/sec
interleave dict_distinct 100                                                                 1.03      2.9±0.12µs        ? ?/sec    1.00      2.9±0.07µs        ? ?/sec
interleave dict_distinct 1024                                                                1.02      2.9±0.06µs        ? ?/sec    1.00      2.8±0.03µs        ? ?/sec
interleave dict_distinct 2048                                                                1.03      2.9±0.02µs        ? ?/sec    1.00      2.8±0.08µs        ? ?/sec
interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]                            1.00      2.7±0.26µs        ? ?/sec    1.02      2.8±0.21µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                  1.11      5.3±0.31µs        ? ?/sec    1.00      4.8±0.40µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]                           1.16      4.8±0.25µs        ? ?/sec    1.00      4.1±0.23µs        ? ?/sec
interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]                            1.05      3.5±0.31µs        ? ?/sec    1.00      3.3±0.29µs        ? ?/sec
interleave i32(0.0) 100 [0..100, 100..230, 450..1000]                                        1.21    313.8±1.03ns        ? ?/sec    1.00    258.9±4.98ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.34  1856.5±17.40ns        ? ?/sec    1.00  1385.9±32.73ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000]                                       1.34   1848.6±8.80ns        ? ?/sec    1.00  1382.4±48.64ns        ? ?/sec
interleave i32(0.0) 400 [0..100, 100..230, 450..1000]                                        1.37    843.3±7.37ns        ? ?/sec    1.00   615.5±22.71ns        ? ?/sec
interleave i32(0.5) 100 [0..100, 100..230, 450..1000]                                        1.09    604.2±5.60ns        ? ?/sec    1.00    555.1±4.48ns        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.12      4.3±0.01µs        ? ?/sec    1.00      3.8±0.04µs        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000]                                       1.13      4.4±0.06µs        ? ?/sec    1.00      3.9±0.17µs        ? ?/sec
interleave i32(0.5) 400 [0..100, 100..230, 450..1000]                                        1.12  1889.4±19.68ns        ? ?/sec    1.00  1691.5±17.15ns        ? ?/sec
interleave list<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000]                           1.07      2.7±0.03µs        ? ?/sec    1.00      2.5±0.03µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.06     26.2±0.11µs        ? ?/sec    1.00     24.6±0.31µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000]                          1.06     25.9±0.14µs        ? ?/sec    1.00     24.5±0.29µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000]                           1.07     10.5±0.21µs        ? ?/sec    1.00      9.9±0.06µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000]                           1.05      5.8±0.25µs        ? ?/sec    1.00      5.5±0.06µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.05     47.4±2.23µs        ? ?/sec    1.00     45.2±0.14µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000]                          1.06     48.0±2.35µs        ? ?/sec    1.00     45.5±0.64µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000]                           1.05     19.2±0.90µs        ? ?/sec    1.00     18.2±0.03µs        ? ?/sec
interleave str(20, 0.0) 100 [0..100, 100..230, 450..1000]                                    1.01    786.8±1.50ns        ? ?/sec    1.00    779.4±4.35ns        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.04      6.3±0.12µs        ? ?/sec    1.00      6.0±0.02µs        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                   1.04      6.2±0.08µs        ? ?/sec    1.00      6.0±0.01µs        ? ?/sec
interleave str(20, 0.0) 400 [0..100, 100..230, 450..1000]                                    1.09      2.7±0.01µs        ? ?/sec    1.00      2.4±0.01µs        ? ?/sec
interleave str(20, 0.5) 100 [0..100, 100..230, 450..1000]                                    1.04  1064.4±19.37ns        ? ?/sec    1.00   1023.8±3.56ns        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.03     10.3±0.06µs        ? ?/sec    1.00     10.1±0.13µs        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000]                                   1.02     10.3±0.05µs        ? ?/sec    1.00     10.1±0.54µs        ? ?/sec
interleave str(20, 0.5) 400 [0..100, 100..230, 450..1000]                                    1.04      3.7±0.03µs        ? ?/sec    1.00      3.6±0.17µs        ? ?/sec
interleave str_view(0.0) 100 [0..100, 100..230, 450..1000]                                   1.01    856.9±2.90ns        ? ?/sec    1.00    849.1±7.00ns        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.00      5.0±0.15µs        ? ?/sec    1.02      5.1±0.02µs        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000]                                  1.00      4.9±0.05µs        ? ?/sec    1.04      5.1±0.02µs        ? ?/sec
interleave str_view(0.0) 400 [0..100, 100..230, 450..1000]                                   1.00      2.2±0.05µs        ? ?/sec    1.03      2.2±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 100 [0..100, 100..230, 450..1000]                       1.20    874.3±4.12ns        ? ?/sec    1.00   729.1±12.04ns        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]             1.34      4.0±0.01µs        ? ?/sec    1.00      3.0±0.02µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000]                      1.31      4.0±0.04µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 400 [0..100, 100..230, 450..1000]                       1.24  1905.1±19.48ns        ? ?/sec    1.00  1532.8±33.13ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 100 [0..100, 100..230, 450..1000]                   1.00   1340.9±6.76ns        ? ?/sec    1.01  1347.8±12.50ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]         1.08      8.3±0.16µs        ? ?/sec    1.00      7.7±0.02µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                  1.08      8.3±0.06µs        ? ?/sec    1.00      7.7±0.06µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 400 [0..100, 100..230, 450..1000]                   1.09      3.7±0.13µs        ? ?/sec    1.00      3.4±0.02µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 100 [0..100, 100..230, 450..1000]              1.05   1927.3±9.31ns        ? ?/sec    1.00  1842.2±18.19ns        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000, 0..1000]    1.04     12.6±0.06µs        ? ?/sec    1.00     12.1±0.08µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000]             1.04     12.6±0.03µs        ? ?/sec    1.00     12.1±0.14µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 400 [0..100, 100..230, 450..1000]              1.04      5.4±0.07µs        ? ?/sec    1.00      5.2±0.04µs        ? ?/sec

@Dandandan
Copy link
Contributor Author

run benchmark take_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take_native (01a1161) to 9d0e8be diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take_native
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                     main                                   take_native
-----                                                                     ----                                   -----------
take bool 1024                                                            1.00  1334.0±36.78ns        ? ?/sec    1.00   1328.9±7.22ns        ? ?/sec
take bool 512                                                             1.00    729.2±5.61ns        ? ?/sec    1.00    727.1±5.66ns        ? ?/sec
take bool null indices 1024                                               1.02  1658.4±92.98ns        ? ?/sec    1.00  1632.9±33.60ns        ? ?/sec
take bool null values 1024                                                1.00      2.6±0.01µs        ? ?/sec    1.00      2.6±0.02µs        ? ?/sec
take bool null values null indices 1024                                   1.00      3.2±0.08µs        ? ?/sec    1.17      3.7±0.07µs        ? ?/sec
take check bounds i32 1024                                                1.00    859.3±6.12ns        ? ?/sec    1.00    860.0±9.54ns        ? ?/sec
take check bounds i32 512                                                 1.02    537.0±4.52ns        ? ?/sec    1.00    528.6±5.18ns        ? ?/sec
take i32 1024                                                             1.08    720.4±6.62ns        ? ?/sec    1.00    664.6±1.81ns        ? ?/sec
take i32 512                                                              1.00    386.3±2.20ns        ? ?/sec    1.01    389.2±1.92ns        ? ?/sec
take i32 null indices 1024                                                1.01   1002.5±8.12ns        ? ?/sec    1.00    995.8±2.22ns        ? ?/sec
take i32 null values 1024                                                 1.03      2.0±0.02µs        ? ?/sec    1.00  1956.5±33.84ns        ? ?/sec
take i32 null values null indices 1024                                    1.00      2.6±0.03µs        ? ?/sec    1.01      2.6±0.06µs        ? ?/sec
take primitive fsb value len: 12, indices: 1024                           1.00      3.6±0.01µs        ? ?/sec    1.00      3.6±0.01µs        ? ?/sec
take primitive fsb value len: 12, null values, indices: 1024              1.00      5.0±0.01µs        ? ?/sec    1.00      5.0±0.01µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     20.1±0.16µs        ? ?/sec    1.19     23.9±0.11µs        ? ?/sec
take str 1024                                                             1.02     11.2±0.30µs        ? ?/sec    1.00     10.9±0.07µs        ? ?/sec
take str 512                                                              1.02      5.4±0.13µs        ? ?/sec    1.00      5.3±0.03µs        ? ?/sec
take str null indices 1024                                                1.02      7.0±0.12µs        ? ?/sec    1.00      6.8±0.11µs        ? ?/sec
take str null indices 512                                                 1.03      3.4±0.09µs        ? ?/sec    1.00      3.3±0.02µs        ? ?/sec
take str null values 1024                                                 1.01      8.8±0.11µs        ? ?/sec    1.00      8.7±0.23µs        ? ?/sec
take str null values null indices 1024                                    1.00      6.4±0.04µs        ? ?/sec    1.01      6.5±0.05µs        ? ?/sec
take stringview 1024                                                      1.00    886.9±4.19ns        ? ?/sec    1.56  1379.8±10.17ns        ? ?/sec
take stringview 512                                                       1.00    589.1±1.01ns        ? ?/sec    1.22    719.1±3.59ns        ? ?/sec
take stringview null indices 1024                                         1.01  1430.9±23.75ns        ? ?/sec    1.00  1413.1±29.05ns        ? ?/sec
take stringview null indices 512                                          1.00   732.0±10.19ns        ? ?/sec    1.07    786.9±7.05ns        ? ?/sec
take stringview null values 1024                                          1.00      2.1±0.01µs        ? ?/sec    1.22      2.6±0.05µs        ? ?/sec
take stringview null values null indices 1024                             1.00      2.9±0.02µs        ? ?/sec    1.00      2.9±0.03µs        ? ?/sec

@Dandandan Dandandan changed the title Unroll take_native, interleave -25% Unroll interleave -25% Mar 12, 2026
@Dandandan Dandandan reopened this Mar 12, 2026
@Dandandan
Copy link
Contributor Author

run benchmark interleave_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take_native (01a1161) to 9d0e8be diff
BENCH_NAME=interleave_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench interleave_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take_native
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                        main                                   take_native
-----                                                                                        ----                                   -----------
interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]                                   1.04    805.2±7.13ns        ? ?/sec    1.00    777.5±7.56ns        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.17      2.6±0.02µs        ? ?/sec    1.00      2.2±0.01µs        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                  1.19      2.6±0.01µs        ? ?/sec    1.00      2.2±0.01µs        ? ?/sec
interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]                                   1.12   1418.5±8.89ns        ? ?/sec    1.00   1264.8±6.99ns        ? ?/sec
interleave dict_distinct 100                                                                 1.00      2.8±0.03µs        ? ?/sec    1.03      2.9±0.05µs        ? ?/sec
interleave dict_distinct 1024                                                                1.00      2.8±0.02µs        ? ?/sec    1.02      2.9±0.04µs        ? ?/sec
interleave dict_distinct 2048                                                                1.00      2.8±0.05µs        ? ?/sec    1.02      2.9±0.02µs        ? ?/sec
interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]                            1.00      2.7±0.21µs        ? ?/sec    1.07      2.8±0.29µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                  1.17      5.5±0.30µs        ? ?/sec    1.00      4.7±0.36µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]                           1.14      4.8±0.28µs        ? ?/sec    1.00      4.2±0.25µs        ? ?/sec
interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]                            1.07      3.4±0.27µs        ? ?/sec    1.00      3.2±0.20µs        ? ?/sec
interleave i32(0.0) 100 [0..100, 100..230, 450..1000]                                        1.24    316.6±3.70ns        ? ?/sec    1.00    256.1±0.49ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.35   1849.9±4.01ns        ? ?/sec    1.00   1374.3±4.54ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000]                                       1.36  1843.1±18.43ns        ? ?/sec    1.00   1354.8±1.82ns        ? ?/sec
interleave i32(0.0) 400 [0..100, 100..230, 450..1000]                                        1.40    842.3±7.51ns        ? ?/sec    1.00    603.6±1.46ns        ? ?/sec
interleave i32(0.5) 100 [0..100, 100..230, 450..1000]                                        1.10    607.4±5.88ns        ? ?/sec    1.00   554.6±35.14ns        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.13      4.3±0.06µs        ? ?/sec    1.00      3.8±0.02µs        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000]                                       1.14      4.4±0.03µs        ? ?/sec    1.00      3.8±0.03µs        ? ?/sec
interleave i32(0.5) 400 [0..100, 100..230, 450..1000]                                        1.12  1893.6±12.56ns        ? ?/sec    1.00  1687.5±23.57ns        ? ?/sec
interleave list<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000]                           1.07      2.7±0.03µs        ? ?/sec    1.00      2.5±0.02µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.06     26.1±0.12µs        ? ?/sec    1.00     24.7±0.28µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000]                          1.05     25.9±0.10µs        ? ?/sec    1.00     24.6±0.46µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000]                           1.07     10.6±0.14µs        ? ?/sec    1.00      9.9±0.04µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000]                           1.03      5.7±0.02µs        ? ?/sec    1.00      5.5±0.01µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.03     46.4±0.36µs        ? ?/sec    1.00     45.0±0.12µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000]                          1.04     47.0±1.18µs        ? ?/sec    1.00     45.4±0.09µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000]                           1.03     18.7±0.10µs        ? ?/sec    1.00     18.2±0.10µs        ? ?/sec
interleave str(20, 0.0) 100 [0..100, 100..230, 450..1000]                                    1.02    791.5±9.05ns        ? ?/sec    1.00    778.5±1.54ns        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.04      6.3±0.03µs        ? ?/sec    1.00      6.1±0.15µs        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                   1.06      6.3±0.04µs        ? ?/sec    1.00      6.0±0.02µs        ? ?/sec
interleave str(20, 0.0) 400 [0..100, 100..230, 450..1000]                                    1.04      2.5±0.01µs        ? ?/sec    1.00      2.4±0.00µs        ? ?/sec
interleave str(20, 0.5) 100 [0..100, 100..230, 450..1000]                                    1.03  1060.4±10.47ns        ? ?/sec    1.00  1024.7±11.65ns        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.03     10.4±0.20µs        ? ?/sec    1.00     10.2±0.08µs        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000]                                   1.06     10.5±0.45µs        ? ?/sec    1.00     10.0±0.10µs        ? ?/sec
interleave str(20, 0.5) 400 [0..100, 100..230, 450..1000]                                    1.06      3.7±0.31µs        ? ?/sec    1.00      3.5±0.04µs        ? ?/sec
interleave str_view(0.0) 100 [0..100, 100..230, 450..1000]                                   1.05    898.7±7.81ns        ? ?/sec    1.00   854.0±34.76ns        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.00      5.1±0.01µs        ? ?/sec    1.01      5.1±0.12µs        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000]                                  1.00      5.0±0.02µs        ? ?/sec    1.00      5.0±0.04µs        ? ?/sec
interleave str_view(0.0) 400 [0..100, 100..230, 450..1000]                                   1.00      2.2±0.01µs        ? ?/sec    1.00      2.2±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 100 [0..100, 100..230, 450..1000]                       1.29   940.0±10.08ns        ? ?/sec    1.00    730.4±9.49ns        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]             1.34      4.1±0.06µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000]                      1.28      3.9±0.01µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 400 [0..100, 100..230, 450..1000]                       1.29  1964.3±19.31ns        ? ?/sec    1.00  1528.5±18.37ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 100 [0..100, 100..230, 450..1000]                   1.02   1342.7±6.89ns        ? ?/sec    1.00  1318.3±12.41ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]         1.09      8.3±0.05µs        ? ?/sec    1.00      7.7±0.02µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                  1.08      8.3±0.10µs        ? ?/sec    1.00      7.7±0.02µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 400 [0..100, 100..230, 450..1000]                   1.03      3.6±0.05µs        ? ?/sec    1.00      3.5±0.01µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 100 [0..100, 100..230, 450..1000]              1.00  1839.7±51.76ns        ? ?/sec    1.00  1841.1±23.00ns        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000, 0..1000]    1.03     12.7±0.16µs        ? ?/sec    1.00     12.3±0.08µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000]             1.03     12.6±0.13µs        ? ?/sec    1.00     12.3±0.24µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 400 [0..100, 100..230, 450..1000]              1.01      5.5±0.16µs        ? ?/sec    1.00      5.4±0.04µs        ? ?/sec

@Dandandan Dandandan changed the title Unroll interleave -25% Unroll interleave -25-30% Mar 12, 2026
@Dandandan
Copy link
Contributor Author

@comphead @andygrove FYI as you seem to be interested in sort performance (which is one of the consumers)

@Dandandan
Copy link
Contributor Author

@mbutrovich as well :)

@mbutrovich
Copy link
Contributor

I'm curious what a SIMD gather (AVX2, SVE) intrinsics' performance is like vs. unrolling manually on modern architectures. The former likely ends up as similar μops anyway. I don't love specializing the code like this (in either direction, SIMD intrinsics or unrolling especially with ARM increasingly prevalent in the cloud) but if today it's faster I'm okay with it. It's also a pretty well-understood technique at this point.

let chunks = indices.chunks_exact(8);
let remainder = chunks.remainder();
for chunk in chunks {
let v0 = arrays[chunk[0].0].value(chunk[0].1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking aloud, is 8 a constant? does it depend on number of cpu/threads, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just a loop unrolling constant - the higher the more code it will generate inside the loop so it can hide more latency by running/pipelining the instructions not depending on the loads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, thats what I was thinking if user may want to modify this value depending on their hardware/OS setup?

Or if someone would like to benchmark take or interleave with different chunks values, they would have to rewrite the method respectively? maybe some macro could help? it can be done as follow up if needed

.collect::<Vec<_>>();
for idx in remainder {
// SAFETY: base < len == output capacity
unsafe { dst.add(base).write(arrays[idx.0].value(idx.1)) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for unsafe calls, would it make sense to debug_assert safety requirements to catch unexpected things on CI debug builds?

Copy link
Contributor Author

@Dandandan Dandandan Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do that

@Dandandan
Copy link
Contributor Author

Dandandan commented Mar 12, 2026

I'm curious what a SIMD gather (AVX2, SVE) intrinsics' performance is like vs. unrolling manually on modern architectures. The former likely ends up as similar μops anyway. I don't love specializing the code like this (in either direction, SIMD intrinsics or unrolling especially with ARM increasingly prevalent in the cloud) but if today it's faster I'm okay with it. It's also a pretty well-understood technique at this point.

Agreed. Auto-vectorization is preferred and in an ideal scenario we don't need to specialize the code and write our APIs so LLVM can generate good code automatically.

Ideally I think, the interleave and take APIs would also have a (safe) way of not doing bounds checks - in that case the compiler could do a much better job at vectorizing the code and not having the branches.

I am thinking perhaps we can at least remove the bounds checks for interleave of the outer batch dimension by doing it upfront - I'll check that now.

@Dandandan
Copy link
Contributor Author

run benchmark interleave_kernels

This reverts commit fc1adb9.
@Dandandan
Copy link
Contributor Author

I am thinking perhaps we can at least remove the bounds checks for interleave of the outer batch dimension by doing it upfront - I'll check that now.

Oh wait that is a not so useful idea as we get them for all of the rows (so it will be more expensive to check upfront than it is to do "on the go")

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing take_native (8ff8bbf) to 9d0e8be diff
BENCH_NAME=interleave_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench interleave_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=take_native
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                        main                                   take_native
-----                                                                                        ----                                   -----------
interleave dict(20, 0.0) 100 [0..100, 100..230, 450..1000]                                   1.00    757.1±3.84ns        ? ?/sec    1.03    781.8±2.85ns        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.00      2.2±0.00µs        ? ?/sec    1.18      2.6±0.01µs        ? ?/sec
interleave dict(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                  1.00      2.2±0.00µs        ? ?/sec    1.17      2.6±0.00µs        ? ?/sec
interleave dict(20, 0.0) 400 [0..100, 100..230, 450..1000]                                   1.00   1242.5±3.19ns        ? ?/sec    1.12   1396.2±5.93ns        ? ?/sec
interleave dict_distinct 100                                                                 1.04      3.0±0.02µs        ? ?/sec    1.00      2.8±0.02µs        ? ?/sec
interleave dict_distinct 1024                                                                1.05      3.0±0.02µs        ? ?/sec    1.00      2.8±0.02µs        ? ?/sec
interleave dict_distinct 2048                                                                1.05      3.0±0.03µs        ? ?/sec    1.00      2.8±0.04µs        ? ?/sec
interleave dict_sparse(20, 0.0) 100 [0..100, 100..230, 450..1000]                            1.00      2.5±0.20µs        ? ?/sec    1.08      2.7±0.19µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                  1.01      5.0±0.36µs        ? ?/sec    1.00      5.0±0.25µs        ? ?/sec
interleave dict_sparse(20, 0.0) 1024 [0..100, 100..230, 450..1000]                           1.00      4.2±0.22µs        ? ?/sec    1.06      4.5±0.15µs        ? ?/sec
interleave dict_sparse(20, 0.0) 400 [0..100, 100..230, 450..1000]                            1.00      3.2±0.21µs        ? ?/sec    1.03      3.3±0.17µs        ? ?/sec
interleave i32(0.0) 100 [0..100, 100..230, 450..1000]                                        1.23    315.3±2.70ns        ? ?/sec    1.00    256.8±3.27ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.35  1857.3±30.30ns        ? ?/sec    1.00  1377.1±21.39ns        ? ?/sec
interleave i32(0.0) 1024 [0..100, 100..230, 450..1000]                                       1.35  1845.9±16.19ns        ? ?/sec    1.00  1362.8±49.93ns        ? ?/sec
interleave i32(0.0) 400 [0..100, 100..230, 450..1000]                                        1.39    843.9±6.31ns        ? ?/sec    1.00    605.4±3.22ns        ? ?/sec
interleave i32(0.5) 100 [0..100, 100..230, 450..1000]                                        1.15    636.1±7.00ns        ? ?/sec    1.00    555.5±2.24ns        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                              1.13      4.3±0.06µs        ? ?/sec    1.00      3.8±0.01µs        ? ?/sec
interleave i32(0.5) 1024 [0..100, 100..230, 450..1000]                                       1.15      4.4±0.07µs        ? ?/sec    1.00      3.8±0.01µs        ? ?/sec
interleave i32(0.5) 400 [0..100, 100..230, 450..1000]                                        1.12  1901.5±13.38ns        ? ?/sec    1.00   1693.7±6.90ns        ? ?/sec
interleave list<i64>(0.0,0.0,20) 100 [0..100, 100..230, 450..1000]                           1.15      2.9±0.01µs        ? ?/sec    1.00      2.5±0.01µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.18     29.0±0.14µs        ? ?/sec    1.00     24.6±0.11µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 1024 [0..100, 100..230, 450..1000]                          1.18     28.9±0.29µs        ? ?/sec    1.00     24.5±0.24µs        ? ?/sec
interleave list<i64>(0.0,0.0,20) 400 [0..100, 100..230, 450..1000]                           1.17     11.6±0.08µs        ? ?/sec    1.00     10.0±0.22µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 100 [0..100, 100..230, 450..1000]                           1.08      6.0±0.10µs        ? ?/sec    1.00      5.6±0.28µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000, 0..1000]                 1.07     48.8±0.19µs        ? ?/sec    1.00     45.4±0.53µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 1024 [0..100, 100..230, 450..1000]                          1.08     49.5±0.84µs        ? ?/sec    1.00     45.7±0.35µs        ? ?/sec
interleave list<i64>(0.1,0.1,20) 400 [0..100, 100..230, 450..1000]                           1.08     19.8±0.35µs        ? ?/sec    1.00     18.3±0.24µs        ? ?/sec
interleave str(20, 0.0) 100 [0..100, 100..230, 450..1000]                                    1.01   769.5±12.22ns        ? ?/sec    1.00    761.8±1.21ns        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.00      5.9±0.03µs        ? ?/sec    1.02      6.0±0.03µs        ? ?/sec
interleave str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                                   1.00      5.9±0.20µs        ? ?/sec    1.01      6.0±0.05µs        ? ?/sec
interleave str(20, 0.0) 400 [0..100, 100..230, 450..1000]                                    1.00      2.4±0.01µs        ? ?/sec    1.00      2.4±0.00µs        ? ?/sec
interleave str(20, 0.5) 100 [0..100, 100..230, 450..1000]                                    1.01   1063.5±9.67ns        ? ?/sec    1.00   1053.0±3.78ns        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000, 0..1000]                          1.00      9.9±0.04µs        ? ?/sec    1.01      9.9±0.06µs        ? ?/sec
interleave str(20, 0.5) 1024 [0..100, 100..230, 450..1000]                                   1.00      9.8±0.05µs        ? ?/sec    1.02      9.9±0.11µs        ? ?/sec
interleave str(20, 0.5) 400 [0..100, 100..230, 450..1000]                                    1.00      3.5±0.02µs        ? ?/sec    1.00      3.5±0.02µs        ? ?/sec
interleave str_view(0.0) 100 [0..100, 100..230, 450..1000]                                   1.00   830.7±19.98ns        ? ?/sec    1.01    842.1±5.24ns        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]                         1.00      4.8±0.06µs        ? ?/sec    1.06      5.1±0.03µs        ? ?/sec
interleave str_view(0.0) 1024 [0..100, 100..230, 450..1000]                                  1.00      4.7±0.01µs        ? ?/sec    1.07      5.0±0.05µs        ? ?/sec
interleave str_view(0.0) 400 [0..100, 100..230, 450..1000]                                   1.00      2.1±0.07µs        ? ?/sec    1.07      2.2±0.02µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 100 [0..100, 100..230, 450..1000]                       1.22    872.0±9.63ns        ? ?/sec    1.00    716.9±2.32ns        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]             1.33      4.1±0.16µs        ? ?/sec    1.00      3.0±0.02µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 1024 [0..100, 100..230, 450..1000]                      1.32      4.0±0.05µs        ? ?/sec    1.00      3.0±0.01µs        ? ?/sec
interleave struct(i32(0.0), i32(0.0) 400 [0..100, 100..230, 450..1000]                       1.25  1900.5±20.74ns        ? ?/sec    1.00  1525.3±11.31ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 100 [0..100, 100..230, 450..1000]                   1.06   1365.6±3.87ns        ? ?/sec    1.00   1292.1±9.39ns        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000, 0..1000]         1.07      8.1±0.09µs        ? ?/sec    1.00      7.6±0.06µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 1024 [0..100, 100..230, 450..1000]                  1.06      8.0±0.02µs        ? ?/sec    1.00      7.6±0.04µs        ? ?/sec
interleave struct(i32(0.0), str(20, 0.0) 400 [0..100, 100..230, 450..1000]                   1.02      3.6±0.01µs        ? ?/sec    1.00      3.5±0.01µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 100 [0..100, 100..230, 450..1000]              1.02  1886.9±14.41ns        ? ?/sec    1.00  1854.1±25.88ns        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000, 0..1000]    1.03     12.6±0.20µs        ? ?/sec    1.00     12.2±0.09µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 1024 [0..100, 100..230, 450..1000]             1.01     12.3±0.08µs        ? ?/sec    1.00     12.2±0.10µs        ? ?/sec
interleave struct(str(20, 0.0), str(20, 0.0)) 400 [0..100, 100..230, 450..1000]              1.00      5.2±0.11µs        ? ?/sec    1.02      5.3±0.09µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants