Skip to content

Redesign intrinsic-test to use simple comparison#2063

Draft
sayantn wants to merge 4 commits intorust-lang:mainfrom
sayantn:intrinsic-test
Draft

Redesign intrinsic-test to use simple comparison#2063
sayantn wants to merge 4 commits intorust-lang:mainfrom
sayantn:intrinsic-test

Conversation

@sayantn
Copy link
Contributor

@sayantn sayantn commented Mar 16, 2026

Currently intrinsic-test prints the outputs and then compares the outputs manually. This PR uses a different approach -- generate C wrappers for the intrinsics, link to them from Rust, and then just use simple rust tests to compare outputs

@sayantn sayantn force-pushed the intrinsic-test branch 3 times, most recently from fc52b8d to feb1dcd Compare March 16, 2026 00:27
@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

---- test_vdupq_n_f16 stdout ----

thread 'test_vdupq_n_f16' (2187) panicked at mod_0/src/lib.rs:13773:17:
assertion `left == right` failed: 
  left: [NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0)]
 right: [NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(0.0), NiceF16(1.43e-5), NiceF16(0.0), NiceF16(-50430.0), NiceF16(1.79e-5)]

This seems weird (left is the Rust output, right is the C one, and NiceF16 is a wrapper which implements PartialEq as a == b || (a.is_nan() && b.is_nan())). This looks like ABI-related issue. For reference, the declaration looks like

unsafe extern "C" {
    fn vdup_n_f16_wrapper(value: f16) -> float16x4_t;
}

In fact most f16 tests fail in armv7. @folkertdev can you help?

Edit:

To work around this issue I have modified the tool to communicate with C via pointers (e.g. the C wrapper for _mm_add_ps looks like void _mm_add_ps_wrapper(__m128 *dst, const __m128* a, const __m128* b). This fixed the AArch64 and ARMv7 problems, but now the AArch64BE tests are failing, because apparently C and Rust have different pointer load semantics for matrix-like vectors (e.g. uint64x2x2_t) https://godbolt.org/z/j1d16z1P9

@Amanieu is this intended behavior or a bug?

@sayantn sayantn force-pushed the intrinsic-test branch 4 times, most recently from e2346ff to db1b2ca Compare March 16, 2026 06:16
@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

Btw the time gains are significant, it reduces the Arm and aarch64 times to 2-3 minutes, and the full x86 run (we did 20% previously) to around 12 mins for release and 17 mins for dev

@folkertdev
Copy link
Contributor

Great work. Quick sanity check on f16, perhaps we're still using LLVM 21 to compile the C? If LLVM 21 is used (and also on windows apparently still with LLVM 22) then on some targets the ABI is inconsistent.

@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

@folkertdev ooh, that makes sense. I don't particularly care about windows, but we are using LLVM20 in the CI. I can change it to use the build from kernel.org

@folkertdev
Copy link
Contributor

I'm seeing clang-18 here even https://triage.rust-lang.org/gha-logs/rust-lang/stdarch/67182959392?pr=2063. I'm not sure what the best solution is really. You could ask T-infra if they have ideas.

@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

yeah, but I can use the LLVM github builds or the kernel.org builds

@tgross35
Copy link
Contributor

Can f16 tests just be gated with #[cfg(target_has_reliable_f16)]? That's likely easier than working around the Windows and old LLVM failures.

@sayantn
Copy link
Contributor Author

sayantn commented Mar 16, 2026

@tgross35 the f16 tests are mostly fine now. More concerning is that a lot of tests are failing in all 3 arm archs, e.g. vzipq. The C version seems to return all zeros

edit: sorry, my mistake, they are still failing in ARMv7. I will gate them against the flag

@folkertdev
Copy link
Contributor

With LLVM 22 f16 should work on armv7 though?

@tgross35
Copy link
Contributor

FTZ/DAZ-related perhaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants