arch/arm64/include/asm/mpam.h | 12 +- arch/x86/include/asm/resctrl.h | 7 + drivers/resctrl/mpam_devices.c | 103 ++++++++--- drivers/resctrl/mpam_internal.h | 6 +- drivers/resctrl/mpam_resctrl.c | 294 ++++++++++++++++++++++++++++---- fs/resctrl/monitor.c | 50 +++++- fs/resctrl/rdtgroup.c | 24 ++- include/linux/arm_mpam.h | 17 ++ include/linux/resctrl.h | 21 +++ 9 files changed, 469 insertions(+), 65 deletions(-)
Background
==========
On x86, the resctrl allows creating up to num_rmids monitoring groups
under parent control group. However, ARM64 MPAM is currently limited by
the PMG (Performance Monitoring Group) count, which is typically much
smaller than the theoretical RMID limit. This creates a significant
scalability gap: users expecting fine-grained per-process or per-thread
monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
remain available.
The Narrow-PARTID feature, defined in the ARM MPAM architecture,
addresses this by associating reqPARTIDs with intPARTIDs through a
programmable many-to-one mapping. This allows the kernel to present more
logical monitoring contexts.
Design Overview
===============
The implementation extends the RMID encoding to carry reqPARTID
information:
RMID = reqPARTID * NUM_PMG + PMG
In this patchset, a monitoring group is uniquely identified by the
combination of reqPARTID and PMG. The closid is represented by intPARTID,
which is exactly the original PARTID.
For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
driver exposes the full reqPARTID range directly. For heterogeneous
systems where some MSCs lack Narrow-PARTID support, the driver utilizes
PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
capability. The sole exception is when any type of MSCs lack Narrow-PARTID
support, their percentage-based control mechanism prevents the use of
PARTIDs as reqPARTIDs.
Capability Improvements
=======================
--------------------------------------------------------------------------
The maximum | Sub-monitoring groups | System-wide
number of | under a control group | monitoring groups
--------------------------------------------------------------------------
Without reqPARTID | PMG | intPARTID * PMG
--------------------------------------------------------------------------
reqPARTID | |
static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG
--------------------------------------------------------------------------
reqPARTID | |
dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
--------------------------------------------------------------------------
Note: The number of intPARTIDs can be capped via the boot parameter
mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
or equal to intPARTID count.
Series Structure
================
Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
Patches 2-6: Implement static reqPARTID allocation.
Patches 7-10: Implement dynamic reqPARTID allocation.
Changes
=======
Compared with v7:
- Add boot parameter to limit mpam_intpartid_max.
- Update the Narrow-PARTID enablement condition checks.
- Add default group detection in mpam_thread_switch().
- Correct patch series revision tag for consistency.
Compared with v6:
- Add dynamic reqPARTID allocation implementation.
- Add Patch 1 to fix pre-existing out-of-range PARTID issue.
- Drop original patch 4 which has been merged into the baseline.
Compared with v5:
- Redefine the RMID information.
- Refactor the resctrl_arch_rmid_idx_decode() and
resctrl_arch_rmid_idx_encode().
- Simplify closid_rmid2reqpartid() to rmid2reqpartid() and replace it
accordingly.
Compared with RFC-v4:
- Rebase the patch set on the v6.14-rc1 branch.
Compared with RFC-v3:
- Add limitation of the Narrow-PARTID feature (See Patch 2).
- Remove redundant reqpartid2closid() and reqpartid_pmg2rmid().
- Refactor closid_rmid2reqpartid() partially.
- Merge the PARTID conversion-related patches into a single patch for
bisectability.
- Skip adaptation of resctrl_arch_set_rmid() which is going to be
removed.
Compared with RFC-v2:
- Refactor closid/rmid pair translation.
- Simplify the logic of synchronize configuration.
- Remove reqPARTID source bitmap.
Compared with RFC-v1:
- Rebase this patch set on latest MPAM driver of the v6.12-rc1 branch.
Previous Versions
=================
v7: https://lore.kernel.org/all/20260317132141.1272506-1-zengheng4@huawei.com/
v6: https://lore.kernel.org/all/20250222112448.2438586-1-zengheng4@huawei.com/
v5: https://lore.kernel.org/all/20250217031852.2014939-1-zengheng4@huawei.com/
RFC-v4: https://lore.kernel.org/all/20250104101224.873926-1-zengheng4@huawei.com/
RFC-v3: https://lore.kernel.org/all/20241207092136.2488426-1-zengheng4@huawei.com/
RFC-v2: https://lore.kernel.org/all/20241119135104.595630-1-zengheng4@huawei.com/
RFC-v1: https://lore.kernel.org/all/20241114135037.918470-1-zengheng4@huawei.com/
---
Zeng Heng (10):
fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state
during umount
arm_mpam: Add intPARTID and reqPARTID support for Narrow-PARTID
feature
arm_mpam: Disable reqPARTID expansion when Narrow-PARTID is
unavailable
arm_mpam: Refactor rmid to reqPARTID/PMG mapping
arm_mpam: Propagate control group config to sub-monitoring groups
arm_mpam: Add boot parameter to limit mpam_intpartid_max
fs/resctrl: Add rmid_entry state helpers
arm_mpam: Implement dynamic reqPARTID allocation for monitoring groups
fs/resctrl: Wire up rmid expansion and reclaim functions
arm_mpam: Add mpam_sync_config() for dynamic rmid expansion
arch/arm64/include/asm/mpam.h | 12 +-
arch/x86/include/asm/resctrl.h | 7 +
drivers/resctrl/mpam_devices.c | 103 ++++++++---
drivers/resctrl/mpam_internal.h | 6 +-
drivers/resctrl/mpam_resctrl.c | 294 ++++++++++++++++++++++++++++----
fs/resctrl/monitor.c | 50 +++++-
fs/resctrl/rdtgroup.c | 24 ++-
include/linux/arm_mpam.h | 17 ++
include/linux/resctrl.h | 21 +++
9 files changed, 469 insertions(+), 65 deletions(-)
--
2.25.1
On 2026/4/13 16:53, Zeng Heng wrote:
> Background
> ==========
>
> On x86, the resctrl allows creating up to num_rmids monitoring groups
> under parent control group. However, ARM64 MPAM is currently limited by
> the PMG (Performance Monitoring Group) count, which is typically much
> smaller than the theoretical RMID limit. This creates a significant
> scalability gap: users expecting fine-grained per-process or per-thread
> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
> remain available.
>
> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
> addresses this by associating reqPARTIDs with intPARTIDs through a
> programmable many-to-one mapping. This allows the kernel to present more
> logical monitoring contexts.
>
> Design Overview
> ===============
>
> The implementation extends the RMID encoding to carry reqPARTID
> information:
>
> RMID = reqPARTID * NUM_PMG + PMG
>
> In this patchset, a monitoring group is uniquely identified by the
> combination of reqPARTID and PMG. The closid is represented by intPARTID,
> which is exactly the original PARTID.
>
> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
> driver exposes the full reqPARTID range directly. For heterogeneous
> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
> support, their percentage-based control mechanism prevents the use of
> PARTIDs as reqPARTIDs.
>
> Capability Improvements
> =======================
>
> --------------------------------------------------------------------------
> The maximum | Sub-monitoring groups | System-wide
> number of | under a control group | monitoring groups
> --------------------------------------------------------------------------
> Without reqPARTID | PMG | intPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID | |
> static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID | |
> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
>
> Note: The number of intPARTIDs can be capped via the boot parameter
> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
> or equal to intPARTID count.
>
Thank you for your review. Please find the test report below for this
patch set.
Test Environment:
- Kernel: linux/master (commit: 6596a02b207886e9e00bb0161c7fd59fea53c081)
- Architecture: arm64
- Platform: Kunpeng 950
- MPAM Features: 256 reqPARTIDs, 32 intPARTIDs, 4 PMGs
Functional Test Example:
~~~
# dmesg | grep -i pmg
[ 11.528919] MPAM enabled with 256 reqPARTIDs, 16 intPARTIDs and 4 PMGs
# cat /proc/cmdline
<...> mpam.intpartid_max=15
# cat /sys/module/mpam/parameters/intpartid_max
15
# mount -t resctrl resctrl /sys/fs/resctrl/ -o debug
# cd /sys/fs/resctrl/
# mkdir p1
# cd p1/mon_groups/
# mkdir m{1..963}
# ls m963
cpus cpus_list mon_data mon_hw_id tasks
~~~
Able to create 964 resource groups ((256 - 15) * 4), which matches the
expected result.
MPAM KUnit Test Log:
~~~
[ 11.515338] KTAP version 1
[ 11.515340] 1..2
[ 11.515398] KTAP version 1
[ 11.515400] # Subtest: mpam_devices_test_suite
[ 11.515402] # module: mpam
[ 11.515405] 1..3
[ 11.515497] ok 1 test_mpam_reset_msc_bitmap
[ 11.515568] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[ 11.515579] mpam:__props_mismatch: cleared cpor_part
[ 11.515581] mpam:__props_mismatch: cleared mbw_part
[ 11.515592] mpam:__props_mismatch: took the min bwa_wd
[ 11.515594] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[ 11.515613] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[ 11.515658] ok 2 test_mpam_enable_merge_features
[ 11.515722] ok 3 test__props_mismatch
[ 11.515725] # mpam_devices_test_suite: pass:3 fail:0 skip:0 total:3
[ 11.515726] # Totals: pass:3 fail:0 skip:0 total:3
[ 11.515728] ok 1 mpam_devices_test_suite
[ 11.515730] KTAP version 1
[ 11.515732] # Subtest: mpam_resctrl_test_suite
[ 11.515734] # module: mpam
[ 11.515735] 1..6
[ 11.515792] ok 1 test_get_mba_granularity
[ 11.515795] KTAP version 1
[ 11.515797] # Subtest: test_mbw_max_to_percent
[ 11.515850] ok 1 pc=1, width=8, value=0x01
[ 11.515905] ok 2 pc=1, width=12, value=0x027
[ 11.515959] ok 3 pc=1, width=16, value=0x028e
[ 11.516011] ok 4 pc=25, width=8, value=0x3f
[ 11.516069] ok 5 pc=25, width=12, value=0x3ff
[ 11.516127] ok 6 pc=25, width=16, value=0x3fff
[ 11.516182] ok 7 pc=33, width=8, value=0x53
[ 11.516236] ok 8 pc=33, width=12, value=0x546
[ 11.516290] ok 9 pc=33, width=16, value=0x5479
[ 11.516345] ok 10 pc=35, width=8, value=0x58
[ 11.516401] ok 11 pc=35, width=12, value=0x598
[ 11.516456] ok 12 pc=35, width=16, value=0x5998
[ 11.516511] ok 13 pc=45, width=8, value=0x72
[ 11.516565] ok 14 pc=45, width=12, value=0x732
[ 11.516617] ok 15 pc=45, width=16, value=0x7332
[ 11.516673] ok 16 pc=50, width=8, value=0x7f
[ 11.516726] ok 17 pc=50, width=12, value=0x7ff
[ 11.516781] ok 18 pc=50, width=16, value=0x7fff
[ 11.516835] ok 19 pc=52, width=8, value=0x84
[ 11.516889] ok 20 pc=52, width=12, value=0x850
[ 11.516943] ok 21 pc=52, width=16, value=0x851d
[ 11.516997] ok 22 pc=55, width=8, value=0x8b
[ 11.517073] ok 23 pc=55, width=12, value=0x8cb
[ 11.517139] ok 24 pc=55, width=16, value=0x8ccb
[ 11.517196] ok 25 pc=58, width=8, value=0x93
[ 11.517254] ok 26 pc=58, width=12, value=0x946
[ 11.517309] ok 27 pc=58, width=16, value=0x9479
[ 11.517364] ok 28 pc=75, width=8, value=0xbf
[ 11.517418] ok 29 pc=75, width=12, value=0xbff
[ 11.517474] ok 30 pc=75, width=16, value=0xbfff
[ 11.517529] ok 31 pc=80, width=8, value=0xcb
[ 11.517583] ok 32 pc=80, width=12, value=0xccb
[ 11.517638] ok 33 pc=80, width=16, value=0xcccb
[ 11.517693] ok 34 pc=88, width=8, value=0xe0
[ 11.517750] ok 35 pc=88, width=12, value=0xe13
[ 11.517806] ok 36 pc=88, width=16, value=0xe146
[ 11.517861] ok 37 pc=95, width=8, value=0xf2
[ 11.517918] ok 38 pc=95, width=12, value=0xf32
[ 11.517972] ok 39 pc=95, width=16, value=0xf332
[ 11.518027] ok 40 pc=100, width=8, value=0xff
[ 11.518084] ok 41 pc=100, width=12, value=0xfff
[ 11.518141] ok 42 pc=100, width=16, value=0xffff
[ 11.518144] # test_mbw_max_to_percent: pass:42 fail:0 skip:0 total:42
[ 11.518146] ok 2 test_mbw_max_to_percent
[ 11.518149] KTAP version 1
[ 11.518151] # Subtest: test_percent_to_mbw_max
[ 11.518204] ok 1 pc=1, width=8, value=0x01
[ 11.518259] ok 2 pc=1, width=12, value=0x027
[ 11.518321] ok 3 pc=1, width=16, value=0x028e
[ 11.518380] ok 4 pc=25, width=8, value=0x3f
[ 11.518434] ok 5 pc=25, width=12, value=0x3ff
[ 11.518492] ok 6 pc=25, width=16, value=0x3fff
[ 11.518546] ok 7 pc=33, width=8, value=0x53
[ 11.518603] ok 8 pc=33, width=12, value=0x546
[ 11.518662] ok 9 pc=33, width=16, value=0x5479
[ 11.518718] ok 10 pc=35, width=8, value=0x58
[ 11.518776] ok 11 pc=35, width=12, value=0x598
[ 11.518832] ok 12 pc=35, width=16, value=0x5998
[ 11.518887] ok 13 pc=45, width=8, value=0x72
[ 11.518943] ok 14 pc=45, width=12, value=0x732
[ 11.519000] ok 15 pc=45, width=16, value=0x7332
[ 11.519055] ok 16 pc=50, width=8, value=0x7f
[ 11.519110] ok 17 pc=50, width=12, value=0x7ff
[ 11.519166] ok 18 pc=50, width=16, value=0x7fff
[ 11.519219] ok 19 pc=52, width=8, value=0x84
[ 11.519276] ok 20 pc=52, width=12, value=0x850
[ 11.519331] ok 21 pc=52, width=16, value=0x851d
[ 11.519387] ok 22 pc=55, width=8, value=0x8b
[ 11.519444] ok 23 pc=55, width=12, value=0x8cb
[ 11.519504] ok 24 pc=55, width=16, value=0x8ccb
[ 11.519561] ok 25 pc=58, width=8, value=0x93
[ 11.519616] ok 26 pc=58, width=12, value=0x946
[ 11.519676] ok 27 pc=58, width=16, value=0x9479
[ 11.519730] ok 28 pc=75, width=8, value=0xbf
[ 11.519785] ok 29 pc=75, width=12, value=0xbff
[ 11.519840] ok 30 pc=75, width=16, value=0xbfff
[ 11.519893] ok 31 pc=80, width=8, value=0xcb
[ 11.519946] ok 32 pc=80, width=12, value=0xccb
[ 11.519999] ok 33 pc=80, width=16, value=0xcccb
[ 11.520054] ok 34 pc=88, width=8, value=0xe0
[ 11.520106] ok 35 pc=88, width=12, value=0xe13
[ 11.520161] ok 36 pc=88, width=16, value=0xe146
[ 11.520217] ok 37 pc=95, width=8, value=0xf2
[ 11.520270] ok 38 pc=95, width=12, value=0xf32
[ 11.520323] ok 39 pc=95, width=16, value=0xf332
[ 11.520376] ok 40 pc=100, width=8, value=0xff
[ 11.520428] ok 41 pc=100, width=12, value=0xfff
[ 11.520481] ok 42 pc=100, width=16, value=0xffff
[ 11.520483] # test_percent_to_mbw_max: pass:42 fail:0 skip:0 total:42
[ 11.520485] ok 3 test_percent_to_mbw_max
[ 11.520487] KTAP version 1
[ 11.520488] # Subtest: test_mbw_max_to_percent_limits
[ 11.520540] ok 1 wd=1
[ 11.520594] ok 2 wd=2
[ 11.520647] ok 3 wd=3
[ 11.520700] ok 4 wd=4
[ 11.520752] ok 5 wd=5
[ 11.520807] ok 6 wd=6
[ 11.520858] ok 7 wd=7
[ 11.520909] ok 8 wd=8
[ 11.520962] ok 9 wd=9
[ 11.521017] ok 10 wd=10
[ 11.521078] ok 11 wd=11
[ 11.521138] ok 12 wd=12
[ 11.521193] ok 13 wd=13
[ 11.521245] ok 14 wd=14
[ 11.521296] ok 15 wd=15
[ 11.521346] ok 16 wd=16
[ 11.521348] # test_mbw_max_to_percent_limits: pass:16 fail:0
skip:0 total:16
[ 11.521350] ok 4 test_mbw_max_to_percent_limits
[ 11.521375] # test_percent_to_max_rounding: Round-up rate: 43%
(18/42)
[ 11.521409] ok 5 test_percent_to_max_rounding
[ 11.521411] KTAP version 1
[ 11.521413] # Subtest: test_percent_max_roundtrip_stability
[ 11.521465] ok 1 wd=1
[ 11.521515] ok 2 wd=2
[ 11.521565] ok 3 wd=3
[ 11.521614] ok 4 wd=4
[ 11.521667] ok 5 wd=5
[ 11.521719] ok 6 wd=6
[ 11.521770] ok 7 wd=7
[ 11.521820] ok 8 wd=8
[ 11.521871] ok 9 wd=9
[ 11.521922] ok 10 wd=10
[ 11.521973] ok 11 wd=11
[ 11.522022] ok 12 wd=12
[ 11.522075] ok 13 wd=13
[ 11.522126] ok 14 wd=14
[ 11.522178] ok 15 wd=15
[ 11.522231] ok 16 wd=16
[ 11.522233] # test_percent_max_roundtrip_stability: pass:16
fail:0 skip:0 total:16
[ 11.522235] ok 6 test_percent_max_roundtrip_stability
[ 11.522237] # mpam_resctrl_test_suite: pass:6 fail:0 skip:0 total:6
[ 11.522238] # Totals: pass:118 fail:0 skip:0 total:118
[ 11.522240] ok 2 mpam_resctrl_test_suite
~~~
All 118 MPAM KUnit tests passed.
Tested-by: Zeng Heng <zengheng4@huawei.com>
Best regards,
Zeng Heng
Hi Shaopeng, > Hello Zeng Heng, > > Could you tell me which branch this patch series based on? > > Best regards, > Shaopent TAN As indicated in the patch series tags, this patch set applies to the linux-next repository, specifically the master branch at: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git Keep me in the mail list for follow-up responses if you want my feedback in time. I was accidentally dropped from the mail list in a previous thread (see https://lore.kernel.org/all/TY4PR01MB16930EB1ACB3A3356A92169BC8B232@TY4PR01MB16930.jpnprd01.prod.outlook.com/). Kind regards, Zeng Heng
Hi Zeng, (beware this is the first version I've seen - arm have been silently deleting your mail, it looks like a problem with DKIM signatures) On 13/04/2026 09:53, Zeng Heng wrote: > Background > ========== > > On x86, the resctrl allows creating up to num_rmids monitoring groups > under parent control group. However, ARM64 MPAM is currently limited by > the PMG (Performance Monitoring Group) count, which is typically much > smaller than the theoretical RMID limit. The MPAM PMG limit is 255. Is that not enough? I think the real problem is the CHI interconnect protocol is forcing people to only have 1 bit of PMG - regardless of what the architecture says. This isn't an MPAM problem as such - its an implementation issue. (but we can try and work around it) > This creates a significant > scalability gap: users expecting fine-grained per-process or per-thread > monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs > remain available. This is more about MPAM's philosophical stance that PMG extents PARTID, whereas on x86 RMID is an independent number. Please don't muddle these - it results in muddled patches! If we want to try and attack both with narrowing, we should do them separately. > The Narrow-PARTID feature, defined in the ARM MPAM architecture, > addresses this by associating reqPARTIDs with intPARTIDs through a > programmable many-to-one mapping. This allows the kernel to present more > logical monitoring contexts. I'd put this as "can be abused to avoid this problem"! We still have a problem with controls that don't alias and need to be removed from MSC that don't support narrowing. This isn't what the feature was designed for - but it is a really cool trick, it works for some real platforms, and solves a problem seen in user-space. However - throughout this series you seem to be discarding all the control-group support for a monitoring-only setup that allocates intPARTID for everything. This might work for your use-case on your platform, but it doesn't generalise to platforms without narrowing or where multiple control-groups are needed. > Design Overview > =============== > > The implementation extends the RMID encoding to carry reqPARTID > information: > > RMID = reqPARTID * NUM_PMG + PMG > > In this patchset, a monitoring group is uniquely identified by the > combination of reqPARTID and PMG. The closid is represented by intPARTID, > which is exactly the original PARTID. The way I think of this is 'RMID' bits being spilled into PARTID. This means each control group has a set of PARTID. For MSC using narrowing, CLOSID would be the intPARTID value. But as you note, we need to support mismatches: > For systems with homogeneous MSCs (all supporting Narrow-PARTID), the > driver exposes the full reqPARTID range directly. For heterogeneous > systems where some MSCs lack Narrow-PARTID support, the driver utilizes > PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring > capability. The sole exception is when any type of MSCs lack Narrow-PARTID > support, their percentage-based control mechanism prevents the use of > PARTIDs as reqPARTIDs. It'd be good to have some discussion about what the interface between the mpam_devices code and any other user (like resctrl) should be. As a hypothetical system to think about: 64 PARTID at the L3, which support CPOR and CCAP 64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR 64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX I think whether using intPARTID is a benefit needs to be user-space policy. You've likely got a platform where that choice is obvious - but it is a trade-off as you lose the non-aliasing controls. In the example above, using narrowing on this system means losing the CCAP controls on L3 as they don't alias [*]. Where its a policy, its likely to be one policy for resctrl, and another for any other user. We can get the resctrl glue code to turn it on unconditionally if there is no trade off, I think that means: no non-aliasing controls in any class that doesn't support narrowing - including 'unknown'. (we couldn't add them to resctrl in the future if you already chose to enable this). As for the interface with mpam_devices: I think this means the resctrl glue code needs to be able to discover which classes support intPARTID, and how many controls they actually have. From there it can apply to policy to determine whether its better to support fewer features in resctrl to get more RMID. (the alternative is always to ignore the MSC with narrowing - narrowing lets hardware lie about the features it supports). Currently the resctrl glue code has to program a configuration for two PARTID when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like to explore generalising it as this narrowing stuff will also need to apply a configuration to a set of PARTID when that MSC doesn't support narrowing. In the example above, we'd need to discard the CCAP controls and write the same CPOR bitmap to each PARTID that is mapped together by narrowing. I think this means the resctrl glue code will need to be able to write a configuration to controls using the full partid_max range as it does today. But also be able to set the narrowing mapping on classes that support it. For the monitors, the resctrl glue code will need to allocate and configure a set of monitors, and read and sum them. This will be regardless of whether narrowing is supported. I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list. I'm hoping we can make CDP a subset of this problem. Some clever arithmetic may save allocating memory for a table - but if we change resctrl to do this dynamically, the numbers become arbitrary forcing it to be a table. It might also be possible to support moving monitor-groups between control groups with the table driven approach. (see what you think on how complex it ends up ...) I'd like to keep that grouping static for now, the table needs creating at setup time, (+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings can be written once at setup time. I'd like to avoid exposing user ABI to control this until we get it working, then we can talk about whether to try making the grouping dynamically managed by resctrl. (there were some proposals in that area - but I can't find them on lore). If there are platforms were its certainly not a trade-off, we can enable it unconditionally - but I'm wary of this being "what we care about now", requiring user-abi to enable features that were detectable. e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't expose it if we were using narrowing. Now its a trade-off. > Capability Improvements > ======================= > > -------------------------------------------------------------------------- > The maximum | Sub-monitoring groups | System-wide > number of | under a control group | monitoring groups > -------------------------------------------------------------------------- > Without reqPARTID | PMG | intPARTID * PMG > -------------------------------------------------------------------------- > reqPARTID | | > static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG > -------------------------------------------------------------------------- > reqPARTID | | > dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG > -------------------------------------------------------------------------- > > Note: The number of intPARTIDs can be capped via the boot parameter > mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than > or equal to intPARTID count. > > Series Structure > ================ > > Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions. > Patches 2-6: Implement static reqPARTID allocation. > Patches 7-10: Implement dynamic reqPARTID allocation. I've had a hard time following this series. You dive in with invasive changes, then unbreak things in later patches. Please added the needed infrastructure in mpam_devices.c first. This should be free of resctrl-isms, and 'only' needs reviewing against the architecture. Then add the resctrl glue code stuff. That needs to comply with what resctrl expects. I think the cleanest way to think about this is to break the mapping between CLOSID and PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers to explicitly do this early in those patches will make your changes clearer. Please avoid spraying the narrowing terms for things everywhere. Thanks, James [*] It's terminology from discussing this with Dave, just in case a summary is needed: aliasing controls are like CPOR where two different PARTID with the same bitmap compete for the same resource. If you give them each the same 50% of the portions, they can't exceed that together. non-aliasing controls are like CCAP where to different PARTID with the same fraction compete for different resources. If you give them each 50% of the capacity, it adds up to 100%. You can't represent 'the same' 50% using these controls. Narrowing papers over this problem with its remapping table, which gives you a 'same' property. For MSC that have controls of that shape - and where more monitors are desired - we'd have to drop the controls. I think "more monitors are desired" is going to need to be user-space policy. But we can come back to how to do that later.
Hi James, On 2026/5/15 1:06, James Morse wrote: > Hi Zeng, > > (beware this is the first version I've seen - arm have been silently deleting your mail, > it looks like a problem with DKIM signatures) > Thanks for your informing. I will try to send community mails using huaweicloud email to avoid DKIM signature issues. Hope it works. > On 13/04/2026 09:53, Zeng Heng wrote: >> Background >> ========== >> >> On x86, the resctrl allows creating up to num_rmids monitoring groups >> under parent control group. However, ARM64 MPAM is currently limited by >> the PMG (Performance Monitoring Group) count, which is typically much >> smaller than the theoretical RMID limit. > > The MPAM PMG limit is 255. Is that not enough? > > I think the real problem is the CHI interconnect protocol is forcing people > to only have 1 bit of PMG - regardless of what the architecture says. This > isn't an MPAM problem as such - its an implementation issue. > > (but we can try and work around it) > Yes, the architecture theoretically allows PMG to be up to 8 bits wide, but many platforms I've worked with (not just Kunpeng) implement far fewer bits in practice. > >> This creates a significant >> scalability gap: users expecting fine-grained per-process or per-thread >> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs >> remain available. > > This is more about MPAM's philosophical stance that PMG extents PARTID, whereas > on x86 RMID is an independent number. > No value judgment here. ARM seeks to expand the number of monitoring groups by combining PARTID and PMG within limited bit-width constraints, which inherently introduces coupling between the two. > Please don't muddle these - it results in muddled patches! > If we want to try and attack both with narrowing, we should do them separately. > > >> The Narrow-PARTID feature, defined in the ARM MPAM architecture, >> addresses this by associating reqPARTIDs with intPARTIDs through a >> programmable many-to-one mapping. This allows the kernel to present more >> logical monitoring contexts. > > I'd put this as "can be abused to avoid this problem"! We still have a problem with > controls that don't alias and need to be removed from MSC that don't support narrowing. > This isn't what the feature was designed for - but it is a really cool trick, it works > for some real platforms, and solves a problem seen in user-space. > > However - throughout this series you seem to be discarding all the control-group support > for a monitoring-only setup that allocates intPARTID for everything. This might work for > your use-case on your platform, but it doesn't generalise to platforms without narrowing > or where multiple control-groups are needed. > Currently, for MSCs that have non-aliasing controls but do not support the Narrow PARTID feature, this solution will directly disable itself, rather than hiding the non-aliasing control capabilities (Patch 3: https://lore.kernel.org/all/20260413085405.1166412-4-zengheng4@huawei.com/). This does indeed affect the enablement of this solution on MSC systems without narrowing capability. On the contrary, the solution attempts to preserve as many intPARTIDs (i.e., control groups) as original. In principle, I hope that on systems where narrow PARTID was not previously enabled, this patch set can create as many monitoring groups as possible without changing any other functionality. And also allows users to limit the intpartid_max count via boot parameters. (Patch 6: https://lore.kernel.org/all/20260413085405.1166412-7-zengheng4@huawei.com/) > >> Design Overview >> =============== >> >> The implementation extends the RMID encoding to carry reqPARTID >> information: >> >> RMID = reqPARTID * NUM_PMG + PMG >> >> In this patchset, a monitoring group is uniquely identified by the >> combination of reqPARTID and PMG. The closid is represented by intPARTID, >> which is exactly the original PARTID. > > The way I think of this is 'RMID' bits being spilled into PARTID. This > means each control group has a set of PARTID. For MSC using narrowing, > CLOSID would be the intPARTID value. But as you note, we need to support > mismatches: > > Yes. >> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the >> driver exposes the full reqPARTID range directly. For heterogeneous >> systems where some MSCs lack Narrow-PARTID support, the driver utilizes >> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring >> capability. The sole exception is when any type of MSCs lack Narrow-PARTID >> support, their percentage-based control mechanism prevents the use of >> PARTIDs as reqPARTIDs. > > It'd be good to have some discussion about what the interface between the > mpam_devices code and any other user (like resctrl) should be. > > As a hypothetical system to think about: > 64 PARTID at the L3, which support CPOR and CCAP > 64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR > 64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX > By the way, in this case, the L3 does not support NP and has CCAP, so the PARTID mapping extension(PME) is not enabled by default. If we exclude the L3 CCAP, the solution would support 16 control groups and (64 * PMG) monitoring groups. > I think whether using intPARTID is a benefit needs to be user-space policy. > You've likely got a platform where that choice is obvious - but it is a > trade-off as you lose the non-aliasing controls. In the example above, using > narrowing on this system means losing the CCAP controls on L3 as they don't alias [*]. > Where its a policy, its likely to be one policy for resctrl, and another for any other > user. > We can get the resctrl glue code to turn it on unconditionally if there is no trade off, > I think that means: no non-aliasing controls in any class that doesn't support narrowing > - including 'unknown'. (we couldn't add them to resctrl in the future if you already chose > to enable this). > Currently, after MPAM initialization, the PARTID mapping extension(PME) is enabled by default unless there exists an MSC that both lacks NP support and has non-aliasing controls — this is purely beneficial with no downsides. Going forward, we may consider adding a `force_reqpartid` option to forcibly enable the feature and disable non-aliasing controls. > As for the interface with mpam_devices: > I think this means the resctrl glue code needs to be able to discover which > classes support intPARTID, and how many controls they actually have. From there > it can apply to policy to determine whether its better to support fewer features > in resctrl to get more RMID. (the alternative is always to ignore the MSC with > narrowing - narrowing lets hardware lie about the features it supports). > > Currently the resctrl glue code has to program a configuration for two PARTID > when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like > to explore generalising it as this narrowing stuff will also need to apply a > configuration to a set of PARTID when that MSC doesn't support narrowing. > In the example above, we'd need to discard the CCAP controls and write the same > CPOR bitmap to each PARTID that is mapped together by narrowing. > One option is to expand CDP compatibility by PME: L3DATA and L3CODE would still be controlled separately, while MB control would be consolidated via narrow mapping onto a single intPARTID. Of course, this requires that the MB supports narrowing. > > I think this means the resctrl glue code will need to be able to write a configuration > to controls using the full partid_max range as it does today. But also be able to set > the narrowing mapping on classes that support it. > For the monitors, the resctrl glue code will need to allocate and configure a set of > monitors, and read and sum them. This will be regardless of whether narrowing is > supported. > > I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would > always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list. > I'm hoping we can make CDP a subset of this problem. > Some clever arithmetic may save allocating memory for a table - but if we change resctrl > to do this dynamically, the numbers become arbitrary forcing it to be a table. > It might also be possible to support moving monitor-groups between control groups with > the table driven approach. (see what you think on how complex it ends up ...) > In the current patch series, static allocation employs a straightforward intPARTID-to-reqPARTID translation, while dynamic management tracks the mappings via `reqpartid_map` table. > I'd like to keep that grouping static for now, the table needs creating at setup time, > (+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings > can be written once at setup time. > > I'd like to avoid exposing user ABI to control this until we get it working, then we can > talk about whether to try making the grouping dynamically managed by resctrl. (there were > some proposals in that area - but I can't find them on lore). > If there are platforms were its certainly not a trade-off, we can enable it > unconditionally - but I'm wary of this being "what we care about now", requiring user-abi > to enable features that were detectable. > e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't > expose it if we were using narrowing. Now its a trade-off. > > >> Capability Improvements >> ======================= >> >> -------------------------------------------------------------------------- >> The maximum | Sub-monitoring groups | System-wide >> number of | under a control group | monitoring groups >> -------------------------------------------------------------------------- >> Without reqPARTID | PMG | intPARTID * PMG >> -------------------------------------------------------------------------- >> reqPARTID | | >> static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG >> -------------------------------------------------------------------------- >> reqPARTID | | >> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG >> -------------------------------------------------------------------------- >> >> Note: The number of intPARTIDs can be capped via the boot parameter >> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than >> or equal to intPARTID count. >> >> Series Structure >> ================ >> >> Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions. >> Patches 2-6: Implement static reqPARTID allocation. >> Patches 7-10: Implement dynamic reqPARTID allocation. > > I've had a hard time following this series. You dive in with invasive changes, then > unbreak things in later patches. > > Please added the needed infrastructure in mpam_devices.c first. This should be free of > resctrl-isms, and 'only' needs reviewing against the architecture. > > Then add the resctrl glue code stuff. That needs to comply with what resctrl expects. > > I think the cleanest way to think about this is to break the mapping between CLOSID and > PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers > to explicitly do this early in those patches will make your changes clearer. > Please avoid spraying the narrowing terms for things everywhere. > > Sure, I'll reorder the series to introduce the core infrastructure in mpam_devices.c first. Should I drop the dynamic allocation part from this series for now? > > > [*] It's terminology from discussing this with Dave, just in case a summary is needed: > aliasing controls are like CPOR where two different PARTID with the same bitmap > compete for the same resource. If you give them each the same 50% of the portions, > they can't exceed that together. > non-aliasing controls are like CCAP where to different PARTID with the same fraction > compete for different resources. If you give them each 50% of the capacity, it adds > up to 100%. You can't represent 'the same' 50% using these controls. > > Narrowing papers over this problem with its remapping table, which gives you a 'same' > property. For MSC that have controls of that shape - and where more monitors are > desired - we'd have to drop the controls. > > I think "more monitors are desired" is going to need to be user-space policy. But > we can come back to how to do that later. > > I'm not sure if anyone else has formalized these into terminology before, but I fully agree with the terms "aliasing controls" and "non- aliasing controls" — they're instantly intuitive for software developers. Best regards, Zeng Heng
© 2016 - 2026 Red Hat, Inc.