[linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes by fyu1 · Pull Request #348 · NVIDIA/NV-Kernels

fyu1 · 2026-03-20T18:07:13Z

This PR replaces #328

This branch fixes a few MPAM issues including:

Performance issue due to small MBW_MIN on Grace: https://nvbugspro.nvidia.com/bug/5928376
Performance issue due to 0 CMAX on Vera: https://nvbugspro.nvidia.com/bug/5717435
Stress Online/offline issue on Vera: https://nvbugspro.nvidia.com/bug/5919525
Clean up numa node MBA/MBM code to avoid future issues.

There are total 49 patches:

The first 10 patches revert ARM's extra patches which are numa node, event filter, and mem hotplug patches. The patches are buggy and cause most of the above issues.
The patches 11 and 12 revert old buggy T241-MPAM-4 Grace erratum workaround and apply an updated one.
The patches 13-42 are from resctrl upstream for mainly alignment of monitoring type for the later numa patches.
The patches 43-49 are mainly supporting CPU-less and numa node, plus fixing IOMMU, MSC tear down, MBWU type issues.

This is patches list:
0001-Revert-NVIDIA-SAUCE-untested-arm_mpam-resctrl-Allow-.patch
0002-Revert-NVIDIA-SAUCE-arm_mpam-resctrl-Add-NUMA-node-n.patch
0003-Revert-NVIDIA-SAUCE-untested-arm_mpam-resctrl-Split-.patch
0004-Revert-NVIDIA-SAUCE-arm_mpam-resctrl-Change-domain_h.patch
0005-Revert-NVIDIA-SAUCE-arm_mpam-resctrl-Pick-whether-MB.patch
0006-Revert-NVIDIA-SAUCE-Fix-unused-variable-warning.patch
0007-Revert-NVIDIA-SAUCE-fs-resctrl-Add-mount-option-for-.patch
0008-Revert-NVIDIA-SAUCE-fs-resctrl-Take-memory-hotplug-l.patch
0009-Revert-NVIDIA-SAUCE-mm-memory_hotplug-Add-lockdep-as.patch
0010-Revert-NVIDIA-SAUCE-untested-arm_mpam-resctrl-Allow-.patch
0011-Revert-NVIDIA-SAUCE-arm_mpam-Add-workaround-for-T241.patch
0012-NVIDIA-SAUCE-arm_mpam-Add-workaround-for-T241-MPAM-4.patch
0013-x86-fs-resctrl-Improve-domain-type-checking.patch
0014-x86-resctrl-Move-L3-initialization-into-new-helper-f.patch
0015-x86-resctrl-Refactor-domain_remove_cpu_mon-ready-for.patch
0016-x86-resctrl-Clean-up-domain_remove_cpu_ctrl.patch
0017-x86-fs-resctrl-Refactor-domain-create-remove-using-s.patch
0018-fs-resctrl-Split-L3-dependent-parts-out-of-mon_eve.patch
0019-x86-fs-resctrl-Use-struct-rdt_domain_hdr-when-readin.patch
0020-x86-fs-resctrl-Rename-struct-rdt_mon_domain-and-rdt.patch
0021-x86-fs-resctrl-Rename-some-L3-specific-functions.patch
0022-fs-resctrl-Make-event-details-accessible-to-function.patch
0023-x86-fs-resctrl-Handle-events-that-can-be-read-from-a.patch
0024-x86-fs-resctrl-Support-binary-fixed-point-event-coun.patch
0025-x86-fs-resctrl-Add-an-architectural-hook-called-for-.patch
0026-x86-fs-resctrl-Add-and-initialize-a-resource-for-pac.patch
0027-fs-resctrl-Emphasize-that-L3-monitoring-resource-is-.patch
0028-x86-resctrl-Discover-hardware-telemetry-events.patch
0029-x86-fs-resctrl-Fill-in-details-of-events-for-perform.patch
0030-x86-fs-resctrl-Add-architectural-event-pointer.patch
0031-x86-resctrl-Find-and-enable-usable-telemetry-events.patch
0032-x86-resctrl-Read-telemetry-events.patch
0033-fs-resctrl-Refactor-mkdir_mondata_subdir.patch
0034-fs-resctrl-Refactor-rmdir_mondata_subdir_allrdtgrp.patch
0035-x86-fs-resctrl-Handle-domain-creation-deletion-for-R.patch
0036-x86-resctrl-Add-energy-perf-choices-to-rdt-boot-opti.patch
0037-x86-resctrl-Handle-number-of-RMIDs-supported-by-RDT.patch
0038-fs-resctrl-Move-allocation-free-of-closid_num_dirty_.patch
0039-x86-fs-resctrl-Compute-number-of-RMIDs-as-minimum-ac.patch
0040-fs-resctrl-Move-RMID-initialization-to-first-mount.patch
0041-x86-resctrl-Enable-RDT_RESOURCE_PERF_PKG.patch
0042-x86-fs-resctrl-Update-documentation-for-telemetry-ev.patch
0043-NVIDIA-VR-SAUCE-arm_mpam-Fix-compilation-errors.patch
0044-NVIDIA-SAUCE-arm_mpam-Avoid-MSC-teardown-for-the-SW-.patch
0045-NVIDIA-VR-SAUCE-arm_mpam-Handle-CPU-less-numa-nodes.patch
0046-NVIDIA-VR-SAUCE-arm_mpam-Include-all-associated-MSC-.patch
0047-NVIDIA-SAUCE-resctrl-mpam-reset-RIS-by-applying-expl.patch
0048-NVIDIA-SAUCE-iommu-arm-smmu-v3-Fix-MPAM-for-indentit.patch
0049-NVIDIA-VR-SAUCE-arm_mpam-Resolve-MBWU-type-before-fe.patch

Test results are in http://10.112.214.86/vera/tests/ including

init registers test
iommu assignment test
online/offline test
Spec2017 performance test
CXL test

GPU MPAM test is not covered because as of now there is SBIOS support for the feature yet.

LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2146389

Ignore: yes Signed-off-by: Ian May <ianm@nvidia.com> Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

… DGX Spark BugLink: https://bugs.launchpad.net/bugs/2138269 This driver manages PCIe link for NVIDIA ConnectX-7 (CX7) hot-plug/unplug on DGX Spark systems with GB10 SoC. It disables the PCIe link on cable removal and enables it on cable insertion. Upstream-friendly improvements over 6.14 driver: - Separated from MTK pinctrl driver into NVIDIA platform driver - Configuration via ACPI (_CRS and _DSD), no hardcoded values - Device-managed resources (devm_*) for automatic cleanup - Thread-safe state management with locking - Enhanced error handling and logging - Uses standard Linux kernel APIs The driver exposes a sysfs interface to emulate cable plug in/out: echo 1 > /sys/devices/platform/MTKP0001:00/pcie_hotplug/debug_state # plug in echo 0 > /sys/devices/platform/MTKP0001:00/pcie_hotplug/debug_state # plug out It also provides a runtime enable/disable switch via sysfs: echo 1 > /sys/devices/platform/MTKP0001:00/pcie_hotplug/hotplug_enabled # Enable echo 0 > /sys/devices/platform/MTKP0001:00/pcie_hotplug/hotplug_enabled # Disable This allows enabling/disabling hotplug functionality. Hotplug is disabled by default and must be explicitly enabled via userspace. It also implements uevent notifications for coordination with userspace: * cable plug-in: Report plug-in uevent (driver) Enable PCIe link (driver) Rescan CX7 devices (application) * cable removal: Report removal uevent (driver) Remove CX7 devices (application) Disable PCIe link (driver) Signed-off-by: Vaibhav Vyas <vavyas@nvidia.com> Signed-off-by: Scott Fudally <sfudally@nvidia.com> Signed-off-by: Surabhi Chythanya Kumar <schythanyaku@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2059814 Signed-off-by: Brad Figg <bfigg@nvidia.com> Acked-by: Brad Figg <bfigg@nvidia.com> Acked-by: Ian May <ian.may@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Jacob Martin <jacob.martin@canonical.com> (cherry picked from commit a64b597 linux-nvidia-6.14) Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

Ignore: yes Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2137561 Properties: no-test-build Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

…ernel-versions (main/d2025.12.18) BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

… control BugLink: https://bugs.launchpad.net/bugs/2138755 The selection of MLO mode should depend on the capabilities of the STA rather than those of the peer AP to avoid compatibility issues with certain APs, such as Xiaomi BE5000 WiFi7 router. Fixes: 69acd6d ("wifi: mt76: mt7925: add mt7925_change_vif_links") Signed-off-by: Leon Yen <leon.yen@mediatek.com> (backported from https://lore.kernel.org/all/20251211123836.4169436-1-leon.yen@mediatek.com/) Signed-off-by: Muteeb Akram <mdoctor@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jacob Martin <jacob.martin@canonical.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

Ignore: yes Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2138765 Properties: no-test-build Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

BugLink: https://bugs.launchpad.net/bugs/2138329 Tegra410 and Tegra241 have deprecated HIDREV register. It is recommended to use ARM SMCCC calls to get chip_id, major and minor revisions. Use ARM SMCCC to get chip_id, major and minor revision. Signed-off-by: Kartik Rajput <kkartik@nvidia.com> Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…ERIC_ARCH_TOPOLOGY BugLink: https://bugs.launchpad.net/bugs/2138375 The arm_pmu driver is using topology_core_has_smt() for retrieving the SMT implementation which depends on CONFIG_GENERIC_ARCH_TOPOLOGY. The config is optional on arm platforms so provide a !CONFIG_GENERIC_ARCH_TOPOLOGY stub for topology_core_has_smt(). Fixes: c3d78c3 ("perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511041757.vuCGOmFc-lkp@intel.com/ Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Yicong Yang <yangyccccc@gmail.com> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> (cherry picked from commit 7ab06ea) Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138266 Type 2 devices are being introduced and will require finer-grained reset mechanisms beyond bus-wide reset methods. Add support for CXL reset per CXL v3.2 Section 9.6/9.7 Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/all/20250221043906.1593189-3-smadhavan@nvidia.com/) [Nirmoy: Add #include "../cxl/cxlpci.h" and fix a compile error with if (reg & CXL_DVSEC_CXL_RST_CAPABLE == 0)] Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138266 The cxl core in linux updated to supported committed decoders of zero size, because this is allowed by the CXL spec. This patch updates cxl_test to enable decoders 1 and 2 in the host-bridge 0 port, in a switch uport under hb0, and the endpoints ports with size zero simulating committed zero sized decoders. Signed-off-by: Vishal Aslot <vaslot@nvidia.com> (backported from https://lore.kernel.org/all/20251015024019.1189713-1-vaslot@nvidia.com/) Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138266 CXL spec permits committing zero sized decoders. Linux currently considers them as an error. Zero-sized decoders are helpful when the BIOS is committing them. Often BIOS will also lock them to prevent them being changed due to the TSP requirement. For example, if the type 3 device is part of a TCB. The host bridge, switch, and end-point decoders can all be committed with zero-size. If they are locked along the VH, it is often to prevent hotplugging of a new device that could not be attested post boot and cannot be included in TCB. The caller leaves the decoder allocated but does not add it. It simply continues to the next decoder. Signed-off-by: Vishal Aslot <vaslot@nvidia.com> (backported from https://lore.kernel.org/all/20251015024019.1189713-1-vaslot@nvidia.com/) Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138266 The loop condition in __cxl_dpa_reserve() is missing the comparison operator, causing potential infinite loop and array out-of-bounds: for (int i = 0; cxlds->nr_partitions; i++) Should be: for (int i = 0; i < cxlds->nr_partitions; i++) Without the '<' operator, if no partition matches the decoder's DPA resource, 'i' increments beyond the part[] array bounds (size 2), triggering UBSAN errors and corrupting the part index. Fixes: be5cbd0 ("cxl: Kill enum cxl_decoder_mode") Signed-off-by: Koba Ko <kobak@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…access BugLink: https://bugs.launchpad.net/bugs/2138266 Check partition index bounds before accessing cxlds->part[] to prevent out-of-bounds when part is -1 or invalid. Fixes: 5ec6759) cxl/region: Drop goto pattern of construct_region() Signed-off-by: Koba Ko <kobak@nvidia.com> Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138266 Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Add compatible and the hardware struct for Tegra256. Tegra256 controllers use a different parent clock. Hence the timing parameters are different from the previous generations to meet the expected frequencies. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> (cherry picked from commit 6e3cb25) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 On Tegra264, not all I2C controllers have the necessary interface to GPC DMA, this causes failures when function tegra_i2c_init_dma() is called. Ensure that "dmas" device-tree property is present before initializing DMA in function tegra_i2c_init_dma(). Signed-off-by: Kartik Rajput <kkartik@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> (backported from https://lore.kernel.org/linux-tegra/20251118140620.549-1-akhilrajeev@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…stplus BugLink: https://bugs.launchpad.net/bugs/2138238 The current implementation uses a single value of THIGH, TLOW and setup hold time for both fast and fastplus. But these values can be different for each speed mode and should be using separate variables. Split the variables used for fast and fast plus mode. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> (backported from https://lore.kernel.org/linux-tegra/20251118140620.549-1-akhilrajeev@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Update the timing parameters of Tegra256 so that the signals are complaint with the I2C specification for SCL low time. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> (backported from https://lore.kernel.org/linux-tegra/20251118140620.549-1-akhilrajeev@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Add support for High Speed (HS) mode transfers for Tegra194 and later chips. While HS mode has been documented in the technical reference manuals since Tegra20, the hardware implementation appears to be broken on all chips prior to Tegra194. When HS mode is not supported, set the frequency to FM+ instead. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Signed-off-by: Kartik Rajput <kkartik@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> (backported from https://lore.kernel.org/linux-tegra/20251118140620.549-1-akhilrajeev@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Add support for SW mutex register introduced in Tegra264 to provide an option to share the interface between multiple firmwares and/or VMs. This involves following steps: - A firmware/OS writes its unique ID to the mutex REQUEST field. - Ownership is established when reading the GRANT field returns the same ID. - If GRANT shows a different non-zero ID, the firmware/OS retries until timeout. - After completing access, it releases the mutex by writing 0. However, the hardware does not ensure any protection based on the values. The driver/firmware should honor the peer who already holds the mutex. Signed-off-by: Kartik Rajput <kkartik@nvidia.com> Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> (backported from https://lore.kernel.org/linux-tegra/20251118140620.549-1-akhilrajeev@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Add support for Tegra264 SoC which supports 17 generic I2C controllers, two of which are in the AON (always-on) partition of the SoC. In addition to the features supported by Tegra194 it also supports a SW mutex register to allow sharing the same I2C instance across multiple firmware. Signed-off-by: Akhil R <akhilrajeev@nvidia.com> Signed-off-by: Kartik Rajput <kkartik@nvidia.com> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> (backported from https://lore.kernel.org/linux-tegra/20251118140620.549-1-akhilrajeev@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…y DVC and VI BugLink: https://bugs.launchpad.net/bugs/2138238 Replace the per-instance boolean flags with an enum tegra_i2c_variant since DVC and VI are mutually exclusive. Update IS_DVC/IS_VI and variant initialization accordingly. Suggested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Kartik Rajput <kkartik@nvidia.com> (backported from https://lore.kernel.org/all/20260107142649.14917-1-kkartik@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Move the variant field into tegra_i2c_hw_feature and populate it for all SoCs. Add dedicated SoC data for "nvidia,tegra20-i2c-dvc" and "nvidia,tegra210-i2c-vi" compatibles. Drop the compatible-string checks from tegra_i2c_parse_dt to initialize the Tegra I2C variant. Signed-off-by: Kartik Rajput <kkartik@nvidia.com> (backported from https://lore.kernel.org/all/20260107142649.14917-1-kkartik@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

…r offsets BugLink: https://bugs.launchpad.net/bugs/2138238 Tegra410 use different offsets for existing I2C registers, update the logic to use appropriate offsets per SoC. As the registers offsets are now also defined for dvc and vi, following function are not required and they are removed: - tegra_i2c_reg_addr(): No translation required. - dvc_writel(): Replaced with i2c_writel() with DVC check. - dvc_readl(): Replaced with i2c_readl(). Signed-off-by: Kartik Rajput <kkartik@nvidia.com> (backported from https://lore.kernel.org/all/20260107142649.14917-1-kkartik@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

BugLink: https://bugs.launchpad.net/bugs/2138238 Add support for the Tegra410 SoC, which has 4 I2C controllers. The controllers are feature-equivalent to Tegra264; only the register offsets differ. Signed-off-by: Kartik Rajput <kkartik@nvidia.com> (backported from https://lore.kernel.org/all/20260107142649.14917-1-kkartik@nvidia.com/) Signed-off-by: Matthew R. Ochs <mochs@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Abdur Rahman <abdur.rahman@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Signed-off-by: Brad Figg <bfigg@nvidia.com>

Each CPU collects data for telemetry events that it sends to the nearest telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID changes, or when a two millisecond timer expires. There is a feature type ("energy" or "perf"), GUID, and MMIO region associated with each aggregator. This combination links to an XML description of the set of telemetry events tracked by the aggregator. XML files are published by Intel in a GitHub repository¹. The telemetry event aggregators maintain per-RMID per-event counts of the total seen for all the CPUs. There may be multiple telemetry event aggregators per package. There are separate sets of aggregators for each feature type. Aggregators in a set may have different GUIDs. All aggregators with the same feature type and GUID are symmetric keeping counts for the same set of events for the CPUs that provide data to them. The XML file for each aggregator provides the following information: 0) Feature type of the events ("perf" or "energy") 1) Which telemetry events are tracked by the aggregator. 2) The order in which the event counters appear for each RMID. 3) The value type of each event counter (integer or fixed-point). 4) The number of RMIDs supported. 5) Which additional aggregator status registers are included. 6) The total size of the MMIO region for an aggregator. Introduce struct event_group that condenses the relevant information from an XML file. Hereafter an "event group" refers to a group of events of a particular feature type (event_group::pfname set to "energy" or "perf") with a particular GUID. Use event_group::pfname to determine the feature id needed to obtain the aggregator details. It will later be used in console messages and with the rdt= boot parameter. The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events. This driver provides intel_pmt_get_regions_by_feature() to list all available telemetry event aggregators of a given feature type. The list includes the "guid", the base address in MMIO space for the region where the event counters are exposed, and the package id where the all the CPUs that report to this aggregator are located. Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event group to obtain a private copy of that event group's aggregator data. Duplicate the aggregator data between event groups that have the same feature type but different GUID. Further processing on this private copy will be unique to the event group. ¹https://github.com/intel/Intel-PMT [ bp: Zap text explaining the code, s/guid/GUID/g ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 1fb2daa) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…GUIDs The telemetry event aggregators of the Intel Clearwater Forest CPU support two RMID-based feature types: "energy" with GUID 0x26696143¹, and "perf" with GUID 0x26557651². The event counter offsets in an aggregator's MMIO space are arranged in groups for each RMID. E.g., the "energy" counters for GUID 0x26696143 are arranged like this: MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY ... MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY After all counters there are three status registers that provide indications of how many times an aggregator was unable to process event counts, the time stamp for the most recent loss of data, and the time stamp of the most recent successful update. MMIO offset:0x2400 AGG_DATA_LOSS_COUNT MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP Define event_group structures for both of these aggregator types and define the events tracked by the aggregators in the file system code. PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point format. File system code must output as floating point values. ¹https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml ²https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml [ bp: Massage commit message. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 8f6b6ad) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

The resctrl file system layer passes the domain, RMID, and event id to the architecture to fetch an event counter. Fetching a telemetry event counter requires additional information that is private to the architecture, for example, the offset into MMIO space from where the counter should be read. Add mon_evt::arch_priv that architecture can use for any private data related to the event. The resctrl filesystem initializes mon_evt::arch_priv when the architecture enables the event and passes it back to architecture when needing to fetch an event counter. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (backported from commit 8ccb1f8) [fenghuay: fix minor conflicts in __check_limbo()] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Every event group has a private copy of the data of all telemetry event aggregators (aka "telemetry regions") tracking its feature type. Included may be regions that have the same feature type but tracking different GUID from the event group's. Traverse the event group's telemetry region data and mark all regions that are not usable by the event group as unusable by clearing those regions' MMIO addresses. A region is considered unusable if: 1) GUID does not match the GUID of the event group. 2) Package ID is invalid. 3) The enumerated size of the MMIO region does not match the expected value from the XML description file. Hereafter any telemetry region with an MMIO address is considered valid for the event group it is associated with. Enable all the event group's events as long as there is at least one usable region from where data for its events can be read. Enabling of an event can fail if the same event has already been enabled as part of another event group. It should never happen that the same event is described by different GUID supported by the same system so just WARN (via resctrl_enable_mon_event()) and skip the event. Note that it is architecturally possible that some telemetry events are only supported by a subset of the packages in the system. It is not expected that systems will ever do this. If they do the user will see event files in resctrl that always return "Unavailable". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 7e6df96) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Introduce intel_aet_read_event() to read telemetry events for resource RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each package, so scan all of them and add up all counters. Aggregators may return an invalid data indication if they have received no records for a given RMID. The user will see "Unavailable" if none of the aggregators on a package provide valid counts. Resctrl now uses readq() so depends on X86_64. Update Kconfig. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 51541f6) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Population of a monitor group's mon_data directory is unreasonably complicated because of the support for Sub-NUMA Cluster (SNC) mode. Split out the SNC code into a helper function to make it easier to add support for a new telemetry resource. Move all the duplicated code to make and set owner of domain directories into the mon_add_all_files() helper and rename to _mkdir_mondata_subdir(). Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 0ec1db4) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Clearing a monitor group's mon_data directory is complicated because of the support for Sub-NUMA Cluster (SNC) mode. Refactor the SNC case into a helper function to make it easier to add support for a new telemetry resource. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 93d9fd8) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…_PKG The L3 resource has several requirements for domains. There are per-domain structures that hold the 64-bit values of counters, and elements to keep track of the overflow and limbo threads. None of these are needed for the PERF_PKG resource. The hardware counters are wide enough that they do not wrap around for decades. Define a new rdt_perf_pkg_mon_domain structure which just consists of the standard rdt_domain_hdr to keep track of domain id and CPU mask. Update resctrl_online_mon_domain() for RDT_RESOURCE_PERF_PKG. The only action needed for this resource is to create and populate domain directories if a domain is added while resctrl is mounted. Similarly resctrl_offline_mon_domain() only needs to remove domain directories. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit f4e0cd8) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Legacy resctrl features are enumerated by X86_FEATURE_* flags. These may be overridden by quirks to disable features in the case of errata. Users can use kernel command line options to either disable a feature, or to force enable a feature that was disabled by a quirk. A different approach is needed for hardware features that do not have an X86_FEATURE_* flag. Update parsing of the "rdt=" boot parameter to call the telemetry driver directly to handle new "perf" and "energy" options that controls activation of telemetry monitoring of the named type. By itself a "perf" or "energy" option controls the forced enabling or disabling (with ! prefix) of all event groups of the named type. A ":guid" suffix allows for fine grained control per event group. [ bp: s/intel_aet_option/intel_handle_aet_option/g ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (backported from commit 842e7f9) [fenghuay: fix a minor conflict in kernel-parameters.txt doc] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

There are now three meanings for "number of RMIDs": 1) The number for legacy features enumerated by CPUID leaf 0xF. This is the maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled will force scaling down the CPUID enumerated value by the number of SNC nodes per L3-cache. 2) The number of registers in MMIO space for each event. This is enumerated in the XML files and is the value initialized into event_group::num_rmid. 3) The number of "hardware counters" (this isn't a strictly accurate description of how things work, but serves as a useful analogy that does describe the limitations) feeding to those MMIO registers. This is enumerated in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature(). Event groups with insufficient "hardware counters" to track all RMIDs are difficult for users to use, since the system may reassign "hardware counters" at any time. This means that users cannot reliably collect two consecutive event counts to compute the rate at which events are occurring. Disable such event groups by default. The user may override this with a command line "rdt=" option. In this case limit an under-resourced event group's number of possible monitor resource groups to the lowest number of "hardware counters". Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource "num_rmid" value to the smallest of these values as this value will be used later to compare against the number of RMIDs supported by other resources to determine how many monitoring resource groups are supported. N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and the type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid) won't complain about mixing signed and unsigned types. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 67640e3) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

closid_num_dirty_rmid[] and rmid_ptrs[] are allocated together during resctrl initialization and freed together during resctrl exit. Telemetry events are enumerated on resctrl mount so only at resctrl mount will the number of RMID supported by all monitoring resources and needed as size for rmid_ptrs[] be known. Separate closid_num_dirty_rmid[] and rmid_ptrs[] allocation and free in preparation for rmid_ptrs[] to be allocated on resctrl mount. Keep the rdtgroup_mutex protection around the allocation and free of closid_num_dirty_rmid[] as ARM needs this to guarantee memory ordering. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit ee7f6af) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

resctrl assumes that only the L3 resource supports monitor events, so it simply takes the rdt_resource::num_rmid from RDT_RESOURCE_L3 as the system's number of RMIDs. The addition of telemetry events in a different resource breaks that assumption. Compute the number of available RMIDs as the minimum value across all mon_capable resources (analogous to how the number of CLOSIDs is computed across alloc_capable resources). Note that mount time enumeration of the telemetry resource means that this number can be reduced. If this happens, then some memory will be wasted as the allocations for rdt_l3_mon_domain::mbm_states[] and rdt_l3_mon_domain::rmid_busy_llc created during resctrl initialization will be larger than needed. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 0ecc988) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

L3 monitor features are enumerated during resctrl initialization and rmid_ptrs[] that tracks all RMIDs and depends on the number of supported RMIDs is allocated during this time. Telemetry monitor features are enumerated during first resctrl mount and may support a different number of RMIDs compared to L3 monitor features. Delay allocation and initialization of rmid_ptrs[] until first mount. Since the number of RMIDs cannot change on later mounts, keep the same set of rmid_ptrs[] until resctrl_exit(). This is required because the limbo handler keeps running after resctrl is unmounted and needs to access rmid_ptrs[] as it keeps tracking busy RMIDs after unmount. Rename routines to match what they now do: dom_data_init() -> setup_rmid_lru_list() dom_data_exit() -> free_rmid_lru_list() Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (backported from commit d089164) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> [fenghuay: fix minor conflicts in setup_rmid_lru_list() and dom_data_exit()] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Since telemetry events are enumerated on resctrl mount the RDT_RESOURCE_PERF_PKG resource is not considered "monitoring capable" during early resctrl initialization. This means that the domain list for RDT_RESOURCE_PERF_PKG is not built when the CPU hotplug notifiers are registered and run for the first time right after resctrl initialization. Mark the RDT_RESOURCE_PERF_PKG as "monitoring capable" upon successful telemetry event enumeration to ensure future CPU hotplug events include this resource and initialize its domain list for CPUs that are already online. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit 4bbfc90) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Update resctrl filesystem documentation with the details about the resctrl files that support telemetry events. [ bp: Drop the debugfs hunk of the documentation until a better debugging solution is found. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com (cherry picked from commit a8848c4) Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…rl L3 domain and arch API updates Upstream resctrl renamed the L3 monitor domain type and extended the arch hooks: 1. Use struct rdt_l3_mon_domain in MPAM's resctrl integration, 2. Pass struct rdt_domain_hdr * into resctrl_online_mon_domain() / resctrl_offline_mon_domain(), 3. Match the new resctrl_arch_rmid_read() prototype (header pointer + arch_priv). 4. Update resctrl_arch_cntr_read(), resctrl_arch_reset_rmid(), resctrl_arch_reset_cntr(), and resctrl_arch_config_cntr() to take struct rdt_l3_mon_domain *. 5. Call the new resctrl_enable_mon_event() signature when wiring monitor events and set mon_capable from its return value. 6. Add a no-op resctrl_arch_pre_mount() so MPAM builds with the generic resctrl mount path. Fixes: a42549e ("NVIDIA: SAUCE: arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation") Fixes: ae2a29c ("NVIDIA: SAUCE: arm_mpam: resctrl: Add support for csu counters") Fixes: 1cbc0f2 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use") Fixes: dd44394 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()") Fixes: 8429670 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr()") Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…rors No need to destory MSC instance for the user/admin programming errors sicne it's not causing any functional issues. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from 316e5833ccb2ef66f50290e48c45b70bf286c8fd dev/dev-main-nvidia-pset-linux-6.19.6) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

In a NUMA system, each node may include CPUs, memory, MPAM MSC instances, or any combination thereof. Some high-end servers may have NUMA nodes that include MPAM MSC but no CPUs. In such cases, associate all possible CPUs for those MSCs. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (cherry picked from f902b5abf39fe10a50b7062dc9ae9d2cfc723248 dev/dev-main-nvidia-pset-linux-6.19.6) Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…ring domain setup The current MPAM driver only considers the first component associated with an online/offline CPU during domain creation and teardown. This is insufficient, as CPU-initiated traffic may traverse multiple MSCs before reaching the target, and each MSC must be programmed consistently for proper resource partitioning. Update the MPAM driver to include all components associated with a given CPU during domain setup/teardown to expose expected schemata to userspace for effective resource control. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (backported from 4309ce9856f87170670c9db40546d9f2fc9dbb86 dev/dev-main-nvidia-pset-linux-6.19.6) [fenghuay: In addition to the core change, this backport includes the following adaptations to bridge the gap between the 24.04 (6.17) MPAM driver and the 6.19.6 base the original was written against: - Add for_each_mpam_resctrl_control() and for_each_mpam_resctrl_mon() iteration macros (from pset c15c066 and 4f42221) - Add MPAM_MAX_EVENT constant to bound the monitor event array - Add traffic_matches_l3() to validate that a memory-class MSC's traffic matches L3 egress topology (from pset ebc0760) Remove redundant if (class->type != MPAM_CLASS_MEMORY) - Replace exposed_alloc_capable/exposed_mon_capable static bools with dynamic resctrl_arch_alloc_capable()/resctrl_arch_mon_capable() that iterate over resources - Change mpam_resctrl_offline_cpu() return type from int to void - Change mpam_resctrl_monitor_init() return type from void to int and propagate errors - Change num_rmid from mpam_pmg_max + 1 to resctrl_arch_system_num_rmid_idx() - Use guard(mutex) for domain_list_lock - Use INIT_LIST_HEAD_RCU for domain lists - Fix not found mba issue on GMEM by only checking traffic_matches_l3() in mpam_resctrl_pick_mba() on class that doesn't have NUMA node] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…onfig Reset an RIS by building a default mpam_config and applying it via mpam_reprogram_ris_partid(), like any other config. - mpam_init_reset_cfg(): set features and default values only for controls supported by the RIS (cpor_part, mbw_part, mbw_max, mbw_prop, cmax_cmax, cmax_cmin). Use full masks for CPBM/MBW_PBM and MPAMCFG_* defaults for MBW_MAX, CMAX, CMIN. - mpam_reprogram_ris_partid(): apply cfg for all supported controls (no separate reset path). Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (backported from c076b208842db87ed50b1c63cff302975a9c8f67 dev/dev-main-nvidia-pset-linux-6.19.6) [fenghuay: Fix porting conflicts and compilaton errors. Remove this sentence in the commit message to avoid confusion because MBW_PROP feature is not supported on Vera/Grace: "Include mpam_feat_mbw_prop when supported so MBW_PROP is written to 0 on reset."] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

There is no struct arm_smmu_domain context for domains configured with identity mappings. Use the device to obtain the necessary information to program PARTID and PMGID. Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> (backported from e5020b38475ef58c5bb3d1a92028d4e0dd7aff4d dev/dev-main-nvidia-pset-linux-6.19.6) [fenghuay: Koba Ko fixes a typo in iommu_group_get_qos_params(): s/!ops->set_group_qos_params/!ops->get_group_qos_params/] Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

…n mpam_msmon_read Resolve mpam_feat_msmon_mbwu to the concrete counter type (31/44/63) before mpam_has_feature() and before filling the mon_read arg. This avoids -EOPNOTSUPP when only a specific MBWU feature is set, and ensures _msmon_read() gets the resolved type in arg.type. Fixes: 5b91005 ("NVIDIA: SAUCE: arm_mpam: Use long MBWU counters if supported") Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

fyu1 · 2026-03-25T22:24:12Z

I fixed a blocking issue on GMEM test failure in the patch "NVIDIA: VR: SAUCE: arm_mpam: Include all associated MSC components during domain setup" and updated its commit message. Here is the fix patch:
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index f7c2bf8aba99..0accede8cc09 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1162,7 +1162,9 @@ static void mpam_resctrl_pick_mba(void)
continue;
}

```
  if (!traffic_matches_l3(class)) {
```

  /* Check memory at egress from L3 for MSC with L3 */

  if (!cpumask_equal(&class->affinity, cpu_possible_mask) &&

      !traffic_matches_l3(class)) {
  	pr_debug("class %u traffic doesn't match L3 egress\n",
  		 class->level);
  	continue;

With this fix, I don't see MBA/MBM issue on GMEM test with an engineer built SBIOS enabling GPU MPAM.

If this PR is good for you, please merge it to 6.17 BaseOS.

Thank you very much for your help!

nvmochs · 2026-03-25T22:51:27Z

I fixed a blocking issue on GMEM test failure in the patch "NVIDIA: VR: SAUCE: arm_mpam: Include all associated MSC components during domain setup" and updated its commit message. Here is the fix patch: diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c index f7c2bf8aba99..0accede8cc09 100644 --- a/drivers/resctrl/mpam_resctrl.c +++ b/drivers/resctrl/mpam_resctrl.c @@ -1162,7 +1162,9 @@ static void mpam_resctrl_pick_mba(void) continue; }
  if (!traffic_matches_l3(class)) {
  /* Check memory at egress from L3 for MSC with L3 */
  if (!cpumask_equal(&class->affinity, cpu_possible_mask) &&
      !traffic_matches_l3(class)) {
  	pr_debug("class %u traffic doesn't match L3 egress\n",
  		 class->level);
  	continue;
With this fix, I don't see MBA/MBM issue on GMEM test with an engineer built SBIOS enabling GPU MPAM.

If this PR is good for you, please merge it to 6.17 BaseOS.

Thank you very much for your help!

Re-reviewed and confirmed this was the only change. No issues with the change.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

jamieNguyenNVIDIA · 2026-03-25T23:11:40Z

Acked-by: Jamie Nguyen <jamien@nvidia.com>

clsotog · 2026-03-24T14:52:28Z

drivers/resctrl/mpam_resctrl.c

+		return false;
+	}
+
+	cpu = cpumask_any_and(&class->affinity, cpu_online_mask);


Should we put a check cpu >= nr_cpu_ids like in function topology_matches_l3.

Although adding another sanity checking doesn't hurt, without the sanity checking, there won't be any issue because the next statements will check any invalid cpu anyway:
err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
if (err) {

Great, thanks for looking.

nvmochs · 2026-03-25T23:21:40Z

PR sent to Canonical.

ianm-nv and others added 30 commits February 13, 2026 16:49

NVIDIA: [Config]: Update annotations

b45d6cb

Ignore: yes Signed-off-by: Ian May <ianm@nvidia.com> Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

UBUNTU: Start new release

bd20a6b

Ignore: yes Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

UBUNTU: link-to-tracker: update tracking bug

c639128

BugLink: https://bugs.launchpad.net/bugs/2137561 Properties: no-test-build Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

UBUNTU: [Packaging] debian.nvidia-6.17/dkms-versions -- update from k…

e8d685e

…ernel-versions (main/d2025.12.18) BugLink: https://bugs.launchpad.net/bugs/1786013 Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

UBUNTU: Ubuntu-nvidia-6.17-6.17.0-1007.7

92bdfd3

Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

UBUNTU: Start new release

1fad63d

Ignore: yes Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

UBUNTU: link-to-tracker: update tracking bug

5633069

BugLink: https://bugs.launchpad.net/bugs/2138765 Properties: no-test-build Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

UBUNTU: Ubuntu-nvidia-6.17-6.17.0-1008.8

93fceb2

Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

aegl and others added 22 commits March 25, 2026 22:18

fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch from cc8ab11 to ecd11fd Compare March 25, 2026 22:19

nvmochs changed the title ~~Please merge MPAM fixes branch: 24.04 linux nvidia 6.17 next.mpam.extras.fixes2~~ [linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes Mar 25, 2026

clsotog reviewed Mar 25, 2026

View reviewed changes

nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch from 9364d8b to 8dab82a Compare April 2, 2026 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes#348

[linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes#348
fyu1 wants to merge 563 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
fyu1:24.04_linux-nvidia-6.17-next.mpam.extras.fixes2

fyu1 commented Mar 20, 2026 •

edited by nvmochs

Loading

Uh oh!

fyu1 commented Mar 25, 2026 •

edited

Loading

Uh oh!

nvmochs commented Mar 25, 2026

Uh oh!

jamieNguyenNVIDIA commented Mar 25, 2026

Uh oh!

clsotog Mar 24, 2026

Uh oh!

fyu1 Mar 26, 2026

Uh oh!

clsotog Mar 26, 2026

Uh oh!

nvmochs commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

fyu1 commented Mar 20, 2026 • edited by nvmochs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fyu1 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvmochs commented Mar 25, 2026

Uh oh!

jamieNguyenNVIDIA commented Mar 25, 2026

Uh oh!

clsotog Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

fyu1 Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

clsotog Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

nvmochs commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

fyu1 commented Mar 20, 2026 •

edited by nvmochs

Loading

fyu1 commented Mar 25, 2026 •

edited

Loading