Skip to content

unroll nextMany scalar fill loop by 4#514

Merged
lemire merged 2 commits intoRoaringBitmap:masterfrom
ahxxm:unroll-scalar-fill
Feb 27, 2026
Merged

unroll nextMany scalar fill loop by 4#514
lemire merged 2 commits intoRoaringBitmap:masterfrom
ahxxm:unroll-scalar-fill

Conversation

@ahxxm
Copy link
Contributor

@ahxxm ahxxm commented Feb 27, 2026

Description

Seems to be a hot path, unrolling the loop to amortize branching makes BenchmarkNextsRLE 33% faster on my CPU(WSL2, 6800h), 17-32% improvement on run-heavy real datasets, no regressions on others.

Type of Change

  • Performance improvement

Benchmarks

BenchmarkNextsRLE baseline
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring/v2
cpu: AMD Ryzen 7 6800H with Radeon Graphics
BenchmarkNextsRLE/next-16         	     178	   7062349 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     172	   7306398 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     169	   7223803 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     169	   7055468 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     162	   6897329 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1131	   1157612 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1078	   1164884 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1088	   1180406 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1028	   1145430 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1112	   1146625 ns/op	    8320 B/op	       2 allocs/op
PASS
ok  	github.com/RoaringBitmap/roaring/v2	17.469s
BenchmarkNextsRLE after
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring/v2
cpu: AMD Ryzen 7 6800H with Radeon Graphics
BenchmarkNextsRLE/next-16         	     181	   7009443 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     165	   6994907 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     170	   7212088 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     178	   7269497 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/next-16         	     170	   6876003 ns/op	     112 B/op	       1 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1497	    808012 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1596	    774690 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1573	    780475 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1338	    769944 ns/op	    8320 B/op	       2 allocs/op
BenchmarkNextsRLE/nextmany-16     	    1418	    766276 ns/op	    8320 B/op	       2 allocs/op
PASS
ok  	github.com/RoaringBitmap/roaring/v2	16.812s

Real benchmarks and details

Dataset Baseline (ns/op) After (ns/op)
census-income_srt 8,077,007 5,687,244
census-income 14,479,076 15,422,757
census1881_srt 1,236,144 1,019,923
census1881 1,556,954 1,544,594
dimension_003 6,917,545 5,758,556
dimension_008 4,133,790 2,999,982
dimension_033 4,798,393 3,269,909
uscensus2000 61,391 62,133
weather_sept_85_srt 21,223,028 15,980,821
weather_sept_85 30,582,768 29,055,214
wikileaks-noquotes_srt 485,889 419,402
wikileaks-noquotes 924,636 951,535
Baseline
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring/v2
cpu: AMD Ryzen 7 6800H with Radeon Graphics
BenchmarkRealDataNextMany/census-income_srt-16     	     148	   8077007 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/census-income-16         	      84	  14479076 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/census1881_srt-16        	     944	   1236144 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/census1881-16            	     734	   1556954 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/dimension_003-16         	     182	   6917545 ns/op	 1998119 B/op	   15483 allocs/op
BenchmarkRealDataNextMany/dimension_008-16         	     297	   4133790 ns/op	  687114 B/op	    5241 allocs/op
BenchmarkRealDataNextMany/dimension_033-16         	     266	   4798393 ns/op	   38528 B/op	     174 allocs/op
BenchmarkRealDataNextMany/uscensus2000-16          	   18696	     61391 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/weather_sept_85_srt-16   	      61	  21223028 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/weather_sept_85-16       	      40	  30582768 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/wikileaks-noquotes_srt-16         	    2812	    485889 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/wikileaks-noquotes-16             	    1308	    924636 ns/op	   41984 B/op	     201 allocs/op
PASS
ok  	github.com/RoaringBitmap/roaring/v2	176.780s
After
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring/v2
cpu: AMD Ryzen 7 6800H with Radeon Graphics
BenchmarkRealDataNextMany/census-income_srt-16     	     195	   5687244 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/census-income-16         	      88	  15422757 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/census1881_srt-16        	    1209	   1019923 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/census1881-16            	     762	   1544594 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/dimension_003-16         	     207	   5758556 ns/op	 1998120 B/op	   15483 allocs/op
BenchmarkRealDataNextMany/dimension_008-16         	     376	   2999982 ns/op	  687114 B/op	    5241 allocs/op
BenchmarkRealDataNextMany/dimension_033-16         	     339	   3269909 ns/op	   38528 B/op	     174 allocs/op
BenchmarkRealDataNextMany/uscensus2000-16          	   18458	     62133 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/weather_sept_85_srt-16   	      79	  15980821 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/weather_sept_85-16       	      42	  29055214 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/wikileaks-noquotes_srt-16         	    3120	    419402 ns/op	   41984 B/op	     201 allocs/op
BenchmarkRealDataNextMany/wikileaks-noquotes-16             	    1236	    951535 ns/op	   41984 B/op	     201 allocs/op
PASS
ok  	github.com/RoaringBitmap/roaring/v2	174.272s

@lemire
Copy link
Member

lemire commented Feb 27, 2026

Running tests.

@lemire lemire merged commit 6e38489 into RoaringBitmap:master Feb 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants