[linux-nvidia-6.17] Use architecture specific HBM training status register#331
Conversation
nvmochs
left a comment
There was a problem hiding this comment.
This looks good to me.
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
|
@ankita-nv Are there plans to upstream this patch? |
|
|
Yeah, I'll post it shortly after internal review. |
|
Ankit requested that we hold on getting this integrated. |
58fc644 to
4b04466
Compare
…diness check Blackwell-Next GPUs report device readiness via the CXL DVSEC Range 1 Low register (offset 0x1C) instead of the BAR0 HBM training register used by GB200. The GPU memory readiness is checked by polling for the Memory_Active bit (bit 1) for the Memory_Active_Timeout (bits 15:13). Add runtime detection by checking the presence of the DVSEC register. Route to the new method if present, otherwise continue using the legacy approach. Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
4b04466 to
f400624
Compare
|
Updated the code now. Verified by Nirmoy and Manish. Please continue with the process. |
|
LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2144760
|
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
|
PR sent to Canonical. |
|
@nirmoy whom do we need to ping to get this into canonical? |
We'll ping Canonical today for an update. |
Blackwell-Next GPUs use a different BAR0 offset (0xAD00BC) for the HBM
training status register than GB200 (0x200BC). Add runtime detection by
reading the architecture field from PMC BOOT_42 and selecting the
appropriate offset when polling for device readiness.
Signed-off-by: Ankit Agrawal ankita@nvidia.com