Skip to content

feat: Add SVE kernels for TopKV#1256

Open
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels
Open

feat: Add SVE kernels for TopKV#1256
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels

Conversation

@morgolock
Copy link
Contributor

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799

ARM_COMPUTE_ERROR_ON(c >= C);

uint32_t t = 0;
std::memcpy(&t, tgt_it.ptr(), sizeof(uint32_t)); // target sample idx for this class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy seems to be a complicated way to read data from this pointer. why not reinterpret cast & de-reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in next patch

filelist.json Outdated
"src/cpu/kernels/topkv/generic/neon/qasymm8_signed.cpp"
]
}
"files": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the indentation change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next patch

while (idx < N)
{
const uint32_t remaining = N - idx;
const uint32_t bw = block_width<Scalar>(remaining);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function gets the SVE vector length for each element in the tensor and compares it with remaining. I think we should get it once before the execute_window_loop and reuse. The svcnt in the wrapper namespace can help with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in next patch

template <>
inline uint32_t count_gt_block<int32_t>(const int32_t *ptr, int32_t thr, uint32_t remaining)
{
const uint32_t bw = block_width<int32_t>(remaining);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do this here again. The parameter remaining is already the min(vector_len, remaining-in-the-caller). We're taking another min on top of this result, i.e. if I'm not mistaken, we're doing smth like

min(min(x,y)) = min(x,y)

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 1deb127 to 7944860 Compare February 6, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants