[linux-nvidia-6.17-next] Add CXL Type-2 device support, RAS error handling, reset, state save/restore, and interleaving support#342
Open
JiandiAnNVIDIA wants to merge 642 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
Conversation
BugLink: https://bugs.launchpad.net/bugs/2139315 This reverts commit 93c54a5 so that it can be replaced by the upstream equivalents. Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Add cpu part and model macro definitions for NVIDIA Olympus core. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit e185c8a) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Add the part number and MIDR for NVIDIA Olympus. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> (cherry picked from commit d5e4c71) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Add NVIDIA Olympus MIDR to neoverse_spe range list. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> (backported from commit d852b83) [mochs: Minor context cleanup due to absence of "perf arm_spe: Add CPU variants supporting common data source packet"] Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 The documentation in nvidia-pmu.rst contains PMUs specific to NVIDIA Tegra241 SoC. Rename the file for this specific SoC to have better distinction with other NVIDIA SoC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Adds Unified Coherent Fabric PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Add interface to get ACPI device associated with the PMU. This ACPI device may contain additional properties not covered by the standard properties. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Adds PCIE PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Adds PCIE-TGT PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Adds CPU Memory (CMEM) Latency PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Adds NVIDIA C2C PMU support in Tegra410 SOC. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139315 Enable driver for NVIDIA TEGRA410 CMEM Latency and C2C PMU device. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260126181155.2776097-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
… events BugLink: https://bugs.launchpad.net/bugs/2139315 Add JSON files for NVIDIA Tegra410 Olympus core PMU events. Also updated the common-and-microarch.json. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> (backported from https://lore.kernel.org/all/20260127225909.3296202-1-bwicaksono@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
…EGRA410_CMEM_LATENCY_PMU BugLink: https://bugs.launchpad.net/bugs/2139315 Set the following kconfigs to enable these PMUs on T410: CONFIG_NVIDIA_TEGRA410_C2C_PMU=m CONFIG_NVIDIA_TEGRA410_CMEM_LATENCY_PMU=m Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138131 Properly pass the variadic arguments so it can be called with or without them depending on the format. Signed-off-by: Lucas De Marchi <ldemarchi@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343 A vDEVICE has been a hard requirement for attaching a nested domain to the device. This makes sense when installing a guest STE, since a vSID must be present and given to the kernel during the vDEVICE allocation. But, when CR0.SMMUEN is disabled, VM doesn't really need a vSID to program the vSMMU behavior as GBPA will take effect, in which case the vSTE in the nested domain could have carried the bypass or abort configuration in GBPA register. Thus, having such a hard requirement doesn't work well for GBPA. Skip vmaster allocation in arm_smmu_attach_prepare_vmaster() for an abort or bypass vSTE. Note that device on this attachment won't report vevents. Update the uAPI doc accordingly. Link: https://patch.msgid.link/r/20251103172755.2026145-1-nicolinc@nvidia.com Tested-by: Shameer Kolothum <skolothumtho@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> (backported from commit 81c45c6) Signed-off-by: Nathan Chen <nathanc@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343 The function hugetlb_reserve_pages() returns the number of pages added to the reservation map on success and a negative error code on failure (e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1 directly. For example, a failure at: if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages; results in returning -1 (since add = -1), which may be misinterpreted in userspace as -EPERM. Fix this by explicitly capturing and propagating the return values from helper functions, and using -EINVAL for all other failure cases. Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com Fixes: 986f5f2 ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated") Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicolin Chen <nicolinc@nvidia.com> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (backported from commit 9ee5d17) Signed-off-by: Nathan Chen <nathanc@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343 The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset the HW registers. So, the driver explicitly clears all the registers when a VINTF or VCMDQ is being initialized calling its hw_deinit() function. However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ getting reset in tegra241_vcmdq_hw_init(). Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF at that stage. Then, this may result in dirty VCMDQ registers, which can fail the VM. Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to fix this bug. This is required by a host kernel. Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support") Cc: stable@vger.kernel.org Reported-by: Bao Nguyen <ncqb@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> (backported from commit 80f1a2c) Signed-off-by: Nathan Chen <nathanc@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
… transfer BugLink: https://bugs.launchpad.net/bugs/2139640 When the ISR thread wakes up late and finds that the timeout handler has already processed the transfer (curr_xfer is NULL), return IRQ_HANDLED instead of IRQ_NONE. Use a similar approach to tegra_qspi_handle_timeout() by reading QSPI_TRANS_STATUS and checking the QSPI_RDY bit to determine if the hardware actually completed the transfer. If QSPI_RDY is set, the interrupt was legitimate and triggered by real hardware activity. The fact that the timeout path handled it first doesn't make it spurious. Returning IRQ_NONE incorrectly suggests the interrupt wasn't for this device, which can cause issues with shared interrupt lines and interrupt accounting. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-1-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit aabd8ea linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640 Move the assignment of the transfer pointer from curr_xfer inside the spinlock critical section in both handle_cpu_based_xfer() and handle_dma_based_xfer(). Previously, curr_xfer was read before acquiring the lock, creating a window where the timeout path could clear curr_xfer between reading it and using it. By moving the read inside the lock, the handlers are guaranteed to see a consistent value that cannot be modified by the timeout path. Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller") Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Thierry Reding <treding@nvidia.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-2-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit ef13ba3 linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
…transfer_one BugLink: https://bugs.launchpad.net/bugs/2139640 When the timeout handler processes a completed transfer and signals completion, the transfer thread can immediately set up the next transfer and assign curr_xfer to point to it. If a delayed ISR from the previous transfer then runs, it checks if (!tqspi->curr_xfer) (currently without the lock also -- to be fixed soon) to detect stale interrupts, but this check passes because curr_xfer now points to the new transfer. The ISR then incorrectly processes the new transfer's context. Protect the curr_xfer assignment with the spinlock to ensure the ISR either sees NULL (and bails out) or sees the new value only after the assignment is complete. Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-3-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit f5a4d7f linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640 The curr_xfer field is read by the IRQ handler without holding the lock to check if a transfer is in progress. When clearing curr_xfer in the combined sequence transfer loop, protect it with the spinlock to prevent a race with the interrupt handler. Protect the curr_xfer clearing at the exit path of tegra_qspi_combined_seq_xfer() with the spinlock to prevent a race with the interrupt handler that reads this field. Without this protection, the IRQ handler could read a partially updated curr_xfer value, leading to NULL pointer dereference or use-after-free. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-4-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit bf4528a linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ined_seq_xfer BugLink: https://bugs.launchpad.net/bugs/2139640 Protect the curr_xfer clearing in tegra_qspi_non_combined_seq_xfer() with the spinlock to prevent a race with the interrupt handler that reads this field to check if a transfer is in progress. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-5-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit 6d7723e linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640 Now that all other accesses to curr_xfer are done under the lock, protect the curr_xfer NULL check in tegra_qspi_isr_thread() with the spinlock. Without this protection, the following race can occur: CPU0 (ISR thread) CPU1 (timeout path) ---------------- ------------------- if (!tqspi->curr_xfer) // sees non-NULL spin_lock() tqspi->curr_xfer = NULL spin_unlock() handle_*_xfer() spin_lock() t = tqspi->curr_xfer // NULL! ... t->len ... // NULL dereference! With this patch, all curr_xfer accesses are now properly synchronized. Although all accesses to curr_xfer are done under the lock, in tegra_qspi_isr_thread() it checks for NULL, releases the lock and reacquires it later in handle_cpu_based_xfer()/handle_dma_based_xfer(). There is a potential for an update in between, which could cause a NULL pointer dereference. To handle this, add a NULL check inside the handlers after acquiring the lock. This ensures that if the timeout path has already cleared curr_xfer, the handler will safely return without dereferencing the NULL pointer. Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260126-tegra_xfer-v2-6-6d2115e4f387@debian.org Signed-off-by: Mark Brown <broonie@kernel.org> (cherry picked from commit edf9088 linux-next) Signed-off-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139648 Currently cpu-clock event always returns 0 count, e.g., perf stat -e cpu-clock -- sleep 1 Performance counter stats for 'sleep 1': 0 cpu-clock # 0.000 CPUs utilized 1.002308394 seconds time elapsed The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle error of some clock events")' adds PERF_EF_UPDATE flag check before calling cpu_clock_event_update() to update the count, however the PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in counting mode (pmu->dev() -> cpu_clock_event_del() -> cpu_clock_event_stop()). This leads to the cpu-clock event count is never updated. To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event just like what task-clock does. Fixes: bc4394e ("perf: Fix the throttle error of some clock events") Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com (cherry picked from commit f1f9651) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2093957 Signed-off-by: Jeremy Szu <jszu@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138892 Remove this declaration which is now used within the file after merging upstream "vfio/nvgrace-gpu: register device memory for poison handling". Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2136828 Add PCI_VENDOR_ID_ASPEED to the shared pci_ids.h header and remove the duplicate local definition from ehci-pci.c. This prepares for adding a PCI quirk for ASPEED devices. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> (backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-1-nirmoyd@nvidia.com/) Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2136828 ASPEED BMC controllers have VGA and USB functions behind a PCIe-to-PCI bridge that causes them to share the same stream ID: [e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0 ASPEED Graphics Family \-02.0 ASPEED USB Controller Both devices get stream ID 0x5e200 due to bridge aliasing, causing the USB controller to be rejected with 'Aliasing StreamID unsupported'. Per ASPEED, the AST1150 doesn't use a real PCI bus and always forwards the original requester ID from downstream devices rather than replacing it with any alias. Add a new PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES flag and apply it to the AST1150. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> (backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-2-nirmoyd@nvidia.com/) [nirmoy: set PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES to (1 << 15) instead of (1 << 14)] Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ivation (LFA) BugLink: https://bugs.launchpad.net/bugs/2138342 The Arm Live Firmware Activation (LFA) is a specification [1] to describe activating firmware components without a reboot. Those components (like TF-A's BL31, EDK-II, TF-RMM, secure paylods) would be updated the usual way: via fwupd, FF-A or other secure storage methods, or via some IMPDEF Out-Of-Bound method. The user can then activate this new firmware, at system runtime, without requiring a reboot. The specification covers the SMCCC interface to list and query available components and eventually trigger the activation. Add a new directory under /sys/firmware to present firmware components capable of live activation. Each of them is a directory under lfa/, and is identified via its GUID. The activation will be triggered by echoing "1" into the "activate" file: ========================================== /sys/firmware/lfa # ls -l . 6c* .: total 0 drwxr-xr-x 2 0 0 0 Jan 19 11:33 47d4086d-4cfe-9846-9b95-2950cbbd5a00 drwxr-xr-x 2 0 0 0 Jan 19 11:33 6c0762a6-12f2-4b56-92cb-ba8f633606d9 drwxr-xr-x 2 0 0 0 Jan 19 11:33 d6d0eea7-fcea-d54b-9782-9934f234b6e4 6c0762a6-12f2-4b56-92cb-ba8f633606d9: total 0 --w------- 1 0 0 4096 Jan 19 11:33 activate -r--r--r-- 1 0 0 4096 Jan 19 11:33 activation_capable -r--r--r-- 1 0 0 4096 Jan 19 11:33 activation_pending --w------- 1 0 0 4096 Jan 19 11:33 cancel -r--r--r-- 1 0 0 4096 Jan 19 11:33 cpu_rendezvous -r--r--r-- 1 0 0 4096 Jan 19 11:33 current_version -rw-r--r-- 1 0 0 4096 Jan 19 11:33 force_cpu_rendezvous -r--r--r-- 1 0 0 4096 Jan 19 11:33 may_reset_cpu -r--r--r-- 1 0 0 4096 Jan 19 11:33 name -r--r--r-- 1 0 0 4096 Jan 19 11:33 pending_version /sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # grep . * grep: activate: Permission denied activation_capable:1 activation_pending:1 grep: cancel: Permission denied cpu_rendezvous:1 current_version:0.0 force_cpu_rendezvous:1 may_reset_cpu:0 name:TF-RMM pending_version:0.0 /sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # echo 1 > activate [ 2825.797871] Arm LFA: firmware activation succeeded. /sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # ========================================== [1] https://developer.arm.com/documentation/den0147/latest/ Signed-off-by: Salman Nabi <salman.nabi@arm.com> Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> (backported from https://lore.kernel.org/all/20260119122729.287522-2-salman.nabi@arm.com/) [nirmoyd: Added image_name fallback to fw_uuid in update_fw_image_node()] Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>
Use cxl api for creating a region using the endpoint decoder related to a DPA range. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
A PIO buffer is a region of device memory to which the driver can write a packet for TX, with the device handling the transmit doorbell without requiring a DMA for getting the packet data, which helps reducing latency in certain exchanges. With CXL mem protocol this latency can be lowered further. With a device supporting CXL and successfully initialised, use the cxl region to map the memory range and use this mapping for PIO buffers. Add the disabling of those CXL-based PIO buffers if the callback for potential cxl endpoint removal by the CXL code happens. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (backported from https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…smaller granularities for lower levels The CXL specification supports multi-level interleaving "as long as all the levels use different, but consecutive, HPA bits to select the target and no Interleave Set has more than 8 devices" (from 3.2). Currently the kernel expects that a decoder's "interleave granularity is a multiple of @parent_port granularity". That is, the granularity of a lower level is bigger than those of the parent and uses the outer HPA bits as selector. It works e.g. for the following 8-way config: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 256 granularity * Selector: HPA[8:9] * sub-link (CXL Host bridge config of the HDM): * 2-way * 1024 granularity * Selector: HPA[10] Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way config could look like this: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 512 granularity * Selector: HPA[9:10] * sub-link (CXL Host bridge config of the HDM): * 2-way * 256 granularity * Selector: HPA[8] The enumeration of decoders for this configuration fails then with following error: cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200] cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff cxl_port endpoint12: failed to attach decoder12.0 to region0: -6 Note that this happens only if firmware is setting up the decoders (CXL_REGION_F_AUTO). For userspace region assembly the granularities are chosen to increase from root down to the lower levels. That is, outer HPA bits are always used for lower interleaving levels. Rework the implementation to also support multi-level interleaving with smaller granularities for lower levels. Determine the interleave set of autodetected decoders. Check that it is a subset of the root interleave. The HPA selector bits are extracted for all decoders of the set and checked that there is no overlap and bits are consecutive. All decoders can be programmed now to use any bit range within the region's target selector. Signed-off-by: Robert Richter <rrichter@amd.com> (backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/) [jan: Resolved minor conflicts] Signed-off-by: Jiandi An <jan@nvidia.com>
…er definitions
PCI: Add CXL DVSEC control, lock, and range register definitions
Add register offset and field definitions for CXL DVSEC registers needed
by CXL state save/restore across resets:
- CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
- CONFIG_LOCK bit in the LOCK register
- RWL (read-write-when-locked) field masks for CTRL and range base
registers.
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
… to include/cxl/cxl.h Move CXL HDM decoder register defines, register map structs (cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map, cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(), enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs() declarations from internal CXL headers to include/cxl/pci.h. This makes them accessible to code outside the CXL subsystem, in particular the PCI core CXL state save/restore support added in a subsequent patch. No functional change. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Resolve conflicts by moving certain definitions to include/cxl/cxl.h instead of to include/cxl/pci.h to align with its dependency of Alejandro's series] Signed-off-by: Jiandi An <jan@nvidia.com>
…state Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require a real capability in config space. The existing pci_add_ext_cap_save_buffer() cannot be used for CXL DVSEC state because it calls pci_find_saved_ext_cap() which searches for a matching capability in PCI config space. The CXL state saved here is a synthetic snapshot (DVSEC+HDM) and should not be tied to a real extended-cap instance. A virtual extended-cap save buffer API (cap IDs above PCI_EXT_CAP_ID_MAX) allows PCI to track this state without a backing config space capability. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
Save and restore CXL DVSEC control registers (CTRL, CTRL2), range base registers, and lock state across PCI resets. When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields become read-only and hardware may have updated them. Blindly restoring saved values would be silently ignored or conflict with hardware state. Instead, a read-merge-write approach is used: current hardware values are read for the RWL (read-write-when-locked) fields and merged with saved state, so only writable bits are restored while locked bits retain their hardware values. Hooked into pci_save_state()/pci_restore_state() so all PCI reset paths automatically preserve CXL DVSEC configuration. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Resolve minor conflict in drivers/pci/Makefile due to code line shifts ] Signed-off-by: Jiandi An <jan@nvidia.com>
Save and restore CXL HDM decoder registers (global control, per-decoder base/size/target-list, and commit state) across PCI resets. On restore, decoders that were committed are reprogrammed and recommitted with a 10ms timeout. Locked decoders that are already committed are skipped, since their state is protected by hardware and reprogramming them would fail. The Register Locator DVSEC is parsed directly via PCI config space reads rather than calling cxl_find_regblock()/cxl_setup_regs(), since this code lives in the PCI core and must not depend on CXL module symbols. MSE is temporarily enabled during save/restore to allow MMIO access to the HDM decoder register block. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Include <cxl/cxl.h> in drivers/pci/cxl.c due to conflict resolution in "4acbc27592b8 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h"] Signed-off-by: Jiandi An <jan@nvidia.com>
…efinitions Add CXL DVSEC register definitions needed for CXL device reset per CXL r3.2 section 8.1.3.1: - Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE, RST_TIMEOUT, RST_MEM_CLR_CAPABLE - Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST, RST_MEM_CLR_EN - Status2 register: CACHE_INV, RST_DONE, RST_ERR - Non-CXL Function Map DVSEC register offset Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) [jan: Resolve conflicts where PCI_DVSEC_CXL_CACHE_CAPABLE is already added by "72bd823fb4f1 NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices"] Signed-off-by: Jiandi An <jan@nvidia.com>
…_restore() Export pci_dev_save_and_disable() and pci_dev_restore() so that subsystems performing non-standard reset sequences (e.g. CXL) can reuse the PCI core standard pre/post reset lifecycle: driver reset_prepare/reset_done callbacks, PCI config space save/restore, and device disable/re-enable. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
Add infrastructure for quiescing the CXL data path before reset: - Memory offlining: check if CXL-backed memory is online and offline it via offline_and_remove_memory() before reset, per CXL spec requirement to quiesce all CXL.mem transactions before issuing CXL Reset. - CPU cache flush: invalidate cache lines before reset as a safety measure after memory offline. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…XL reset Add sibling PCI function save/disable/restore coordination for CXL reset. Before reset, all CXL.cachemem sibling functions are locked, saved, and disabled; after reset they are restored. The Non-CXL Function Map DVSEC and per-function DVSEC capability register are consulted to skip non-CXL and CXL.io-only functions. A global mutex serializes concurrent resets to prevent deadlocks between sibling functions. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…ration
cxl_dev_reset() implements the hardware reset sequence:
optionally enable memory clear, initiate reset via
CTRL2, wait for completion, and re-enable caching.
cxl_do_reset() orchestrates the full reset flow:
1. CXL pre-reset: mem offlining and cache flush (when memdev present)
2. PCI save/disable: pci_dev_save_and_disable() automatically saves
CXL DVSEC and HDM decoder state via PCI core hooks
3. Sibling coordination: save/disable CXL.cachemem sibling functions
4. Execute CXL DVSEC reset
5. Sibling restore: always runs to re-enable sibling functions
6. PCI restore: pci_dev_restore() automatically restores CXL state
The CXL-specific DVSEC and HDM save/restore is handled
by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).
Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Add a "cxl_reset" sysfs attribute to PCI devices that support CXL Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on devices with both CXL.cache and CXL.mem capabilities and the CXL Reset Capable bit set in the DVSEC. Writing "1" to the attribute triggers the full CXL reset flow via cxl_do_reset(). The interface is decoupled from memdev creation: when a CXL memdev exists, memory offlining and cache flush are performed; otherwise reset proceeds without the memory management. The sysfs attribute is managed entirely by the CXL module using sysfs_create_group() / sysfs_remove_group() rather than the PCI core's static attribute groups. This avoids cross-module symbol dependencies between the PCI core (always built-in) and CXL_BUS (potentially modular). At module init, existing PCI devices are scanned and a PCI bus notifier handles hot-plug/unplug. kernfs_drain() makes sure that any in-flight store() completes before sysfs_remove_group() returns, preventing use-after-free during module unload. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…tribute Document the cxl_reset sysfs attribute added to PCI devices that support CXL Reset. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com>
…and RAS support
Add Ubuntu kernel config annotations for CXL-related configs introduced
or changed by the following cherry-picked patch series:
- drivers/cxl changes between v6.17.9 and upstream 7.0 (which includes
a portion of Terry Bowman's v14 CXL RAS series merged via
for-7.0/cxl-aer-prep)
- Alejandro Lucero's v23 CXL Type-2 device support series
- Smita Koralahalli's v6 patch 3/9 (cxl/region: Skip decoder reset on
detach for autodiscovered regions)
CONFIG_CXL_BUS: Enable CXL bus support built-in; required for
CXL Type-2 device and RAS support
CONFIG_CXL_PCI: Enable CXL PCI management built-in; auto-selects
CXL_MEM; required for CXL Type-2 device support
CONFIG_CXL_MEM: Auto-selected by CXL_PCI; required for CXL
memory expansion and Type-2 device support
CONFIG_CXL_PORT: Required for CXL port enumeration; defaults to
CXL_BUS value
CONFIG_FWCTL: Selected by CXL_BUS when CXL_FEATURES is enabled;
required for CXL feature mailbox access
CONFIG_CXL_RAS: New def_bool replacing PCIEAER_CXL (Terry Bowman
v14); auto-enabled with ACPI_APEI_GHES+PCIEAER+
CXL_BUS for CXL RAS error handling
CONFIG_SFC_CXL: Solarflare SFC9100-family CXL Type-2 device
support; not needed for NVIDIA platforms (n)
CONFIG_ACPI_APEI_EINJ: Required prerequisite for CONFIG_ACPI_APEI_EINJ_CXL
CONFIG_ACPI_APEI_EINJ_CXL: CXL protocol error injection support via APEI EINJ
CONFIG_PCIEAER_CXL: Remove it from debian.master policy. This config
was removed from Kconfig by upstream commit d18f1b7
(PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS) which is included
in this port.
CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION: Override debian.master
amd64-only policy to include arm64. Commit 4d873c5 added
'select ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION' to arch/arm64/Kconfig,
making this y on arm64 as well.
CONFIG_GENERIC_CPU_CACHE_MAINTENANCE: New bool config defined by
c460697 in lib/Kconfig. Selected by arm64 via 4d873c5;
not selected by x86. Set arm64: y, amd64: -.
CONFIG_CACHEMAINT_FOR_HOTPLUG: New optional menuconfig defined by
2ec3b54 in drivers/cache/Kconfig. Depends on
GENERIC_CPU_CACHE_MAINTENANCE so becomes visible on arm64. Defaults
to n; HiSilicon HHA driver not needed for NVIDIA platforms.
Set arm64: n, amd64: -.
Signed-off-by: Jiandi An <jan@nvidia.com>
…memory access
Override debian.master policy (m->y) for DEV_DAX, DEV_DAX_CXL, and
DEV_DAX_KMEM to ensure CXL memory regions are accessible as both raw
DAX devices and hotplugged System-RAM nodes.
debian.master sets these to 'm' (modules). For NVIDIA platforms with
CXL Type-2 devices, built-in (y) is required to ensure CXL memory
regions provisioned early in boot are immediately accessible without
relying on module loading order.
CONFIG_DEV_DAX: Override m->y; prerequisite for DEV_DAX_CXL and
DEV_DAX_KMEM to be built-in; depends on
TRANSPARENT_HUGEPAGE (already y in debian.master)
CONFIG_DEV_DAX_CXL: Override m->y; creates /dev/daxX.Y devices for CXL
RAM regions not in the default system memory map
(Soft Reserved or dynamically provisioned regions);
depends on CXL_BUS+CXL_REGION+DEV_DAX (all y)
CONFIG_DEV_DAX_KMEM: Override m->y; onlines CXL DAX devices as System-RAM
NUMA nodes via memory hotplug, making CXL memory
available for normal kernel and userspace allocation
Signed-off-by: Jiandi An <jan@nvidia.com>
…/restore
Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by
the CXL DVSEC and HDM state save/restore series (Srirangan Madhavan).
CONFIG_PCI_CXL: Hidden bool in drivers/pci/Kconfig; auto-enabled when
CXL_BUS=y. Gates compilation of drivers/pci/cxl.o which
saves and restores CXL DVSEC control/range registers and
HDM decoder state across PCI resets and link transitions.
Signed-off-by: Jiandi An <jan@nvidia.com>
1de21c1 to
acf188b
Compare
Author
Fixed. |
Collaborator
|
I see the compiling issue is fixed. That was my concern. |
nvmochs
approved these changes
Mar 24, 2026
Collaborator
nvmochs
left a comment
There was a problem hiding this comment.
I reviewed the name change fix in "PCI: Update CXL DVSEC definitions" and confirmed it builds successfully for arm64.
No further issues or concerns from me.
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
clsotog
approved these changes
Mar 24, 2026
Collaborator
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
Collaborator
|
Tried this on GB300 yesterday with the complilation issue manually fixed. Ran CUDA DVS tests like http://10.112.214.250:8002/. We still need to make sure that there are no regression with older RM driver. With that |
Collaborator
|
PR sent to Canonical. |
9364d8b to
8dab82a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This patch series adds comprehensive CXL (Compute Express Link) support to the
nvidia-6.17 kernel, including:
SmartNICs) to use CXL for coherent memory access via firmware-provisioned
regions
Implements PCIe Port Protocol error handling and logging for CXL Root Ports,
Downstream Switch Ports, and Upstream Switch Ports
registers and HDM decoder programming across PCI resets and link transitions,
enabling device re-initialization after reset for firmware-provisioned
configurations
Sections 8.1.3, 9.6, 9.7) via a sysfs interface for Type-2 devices,
including memory offlining, cache flushing, multi-function sibling
coordination, and DVSEC reset sequencing
interleaving where lower levels use smaller granularities than parent ports
(reverse HPA bit ordering)
upstream
torvalds/mastercovering the range from v6.17.9 to the mergepoint of Terry Bowman's v14 series into v7.0
mapping CXL DAX devices as System-RAM
Key Features Added:
(replacing the old
PCIEAER_CXLsymbol with the newCXL_RASdef_bool)/sys/bus/pci/devices/.../cxl_reset) forType-2 devices with Reset Capable bit set
Function Map DVSEC
cpu_cache_invalidate_memregion()during resetlevels (firmware-provisioned configurations)
DEV_DAX_CXL) and System-RAM mapping(
DEV_DAX_KMEM)ACPI_APEI_EINJ_CXL)Justification
CXL Type-2 device support is critical for next-generation NVIDIA accelerators
and data center workloads:
Source
Patch Breakdown (139 patches + 1 revert):
torvalds/master(v6.17.9 → merge of Terry Bowman v14 into v7.0)Notes on the upstream cherry-picks (item 2):
The 103 upstream commits span
1bfd0faa78d0(v6.17.9) to0da3050bdded(Merge offor-7.0/cxl-aer-prepintocxl-for-next).This range includes 17 out of 34 patches from Terry Bowman's v14 series
that were reworked by the CXL maintainer and merged into v7.0 via the
for-7.0/cxl-aer-prepbranch. The remaining 17 patches from Terry's v14were refactored into v15 (9 patches, not yet merged) and are not included
in this port.
Notes on the save/restore and reset series (items 6–7):
Srirangan's patches were authored against upstream v7.0-rc1 (which does not
include Alejandro's v23 Type-2 series). For this port, the header
reorganization in patch 2/5 of the save/restore series was adapted to align
with Alejandro's v23 approach: HDM decoder and register map definitions were
moved to
include/cxl/cxl.h(notinclude/cxl/pci.has in the originalpatch) to follow the convention established by Alejandro's series. Upstream
reviewers have indicated that Srirangan's series should be rebased on top of
Alejandro's once it merges.
Lore Links:
Terry Bowman's CXL RAS series (v14, partially merged into v7.0):
https://lore.kernel.org/all/20260114182055.46029-1-terry.bowman@amd.com/
Smita Koralahalli's CXL EINJ series (v6, patch 3/9 only):
https://lore.kernel.org/linux-cxl/20260210064501.157591-1-Smita.KoralahalliChannabasappa@amd.com/
Alejandro Lucero's CXL Type-2 series (v23):
https://lore.kernel.org/linux-cxl/20260201155438.2664640-1-alejandro.lucero-palau@amd.com/
Robert Richter's multi-level interleaving fix (v1):
https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/
Srirangan Madhavan's CXL state save/restore series:
https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/
Srirangan Madhavan's CXL reset series (v5):
https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/
Upstream Status:
torvalds/master(v7.0 range)for-7.0/cxl-aer-prepTesting
Build Validation:
Config Verification:
CXL-related configs enabled as expected:
Runtime Testing:
ls /sys/bus/cxl/devices/)echo 1 > /sys/bus/pci/devices/<dev>/cxl_reset)Notes
CONFIG_PCIEAER_CXLhas been removed from Kconfig by upstream commitd18f1b7beadf(PCI/AER: Replace PCIEAER_CXL symbol with CXL_RAS).The
debian.masterannotation forPCIEAER_CXL=yis overridden to-in
debian.nvidia-6.17/config/annotations.CONFIG_CXL_BUS,CONFIG_CXL_PCI,CONFIG_CXL_MEM,CONFIG_CXL_PORTremain tristate (not bool) — the v14 series kept them as tristate,
unlike earlier draft versions.
CONFIG_DEV_DAX,CONFIG_DEV_DAX_CXL, andCONFIG_DEV_DAX_KMEMareoverridden from
m(debian.master default) toyto support built-inCXL RAM region DAX access and System-RAM mapping.
CONFIG_PCI_CXLis a new hidden bool introduced by the save/restoreseries; auto-enabled when
CXL_BUS=y. Gates compilation ofdrivers/pci/cxl.ofor DVSEC and HDM state save/restore.CONFIG_GENERIC_CPU_CACHE_MAINTENANCEandCONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGIONare new configsintroduced by the upstream cherry-picks; arm64 auto-selects both.
cpu_cache_invalidate_memregion()is also used by the CXL resetseries for cache flushing during reset.
debian.nvidia-6.17/config/annotationsto reflect all of the above changes.
align with Alejandro's v23 approach (
include/cxl/cxl.hinstead ofinclude/cxl/pci.h). See commit message on patch 2/5 for details.