[PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature

Zeng Heng posted 10 patches 1 month, 4 weeks ago
arch/arm64/include/asm/mpam.h   |  12 +-
arch/x86/include/asm/resctrl.h  |   7 +
drivers/resctrl/mpam_devices.c  | 103 ++++++++---
drivers/resctrl/mpam_internal.h |   6 +-
drivers/resctrl/mpam_resctrl.c  | 294 ++++++++++++++++++++++++++++----
fs/resctrl/monitor.c            |  50 +++++-
fs/resctrl/rdtgroup.c           |  24 ++-
include/linux/arm_mpam.h        |  17 ++
include/linux/resctrl.h         |  21 +++
9 files changed, 469 insertions(+), 65 deletions(-)
[PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
Posted by Zeng Heng 1 month, 4 weeks ago
Background
==========

On x86, the resctrl allows creating up to num_rmids monitoring groups
under parent control group. However, ARM64 MPAM is currently limited by
the PMG (Performance Monitoring Group) count, which is typically much
smaller than the theoretical RMID limit. This creates a significant
scalability gap: users expecting fine-grained per-process or per-thread
monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
remain available.

The Narrow-PARTID feature, defined in the ARM MPAM architecture,
addresses this by associating reqPARTIDs with intPARTIDs through a
programmable many-to-one mapping. This allows the kernel to present more
logical monitoring contexts.

Design Overview
===============

The implementation extends the RMID encoding to carry reqPARTID
information:

  RMID = reqPARTID * NUM_PMG + PMG

In this patchset, a monitoring group is uniquely identified by the
combination of reqPARTID and PMG. The closid is represented by intPARTID,
which is exactly the original PARTID.

For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
driver exposes the full reqPARTID range directly. For heterogeneous
systems where some MSCs lack Narrow-PARTID support, the driver utilizes
PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
capability. The sole exception is when any type of MSCs lack Narrow-PARTID
support, their percentage-based control mechanism prevents the use of
PARTIDs as reqPARTIDs.

Capability Improvements
=======================

--------------------------------------------------------------------------
The maximum        |  Sub-monitoring groups            | System-wide
number of          |  under a control group            | monitoring groups
--------------------------------------------------------------------------
Without reqPARTID  |  PMG                              | intPARTID * PMG
--------------------------------------------------------------------------
reqPARTID          |                                   |
static allocation  | (reqPARTID // intPARTID) * PMG    | reqPARTID * PMG
--------------------------------------------------------------------------
reqPARTID          |                                   |
dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
--------------------------------------------------------------------------

Note: The number of intPARTIDs can be capped via the boot parameter
mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
or equal to intPARTID count.

Series Structure
================

Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
Patches 2-6: Implement static reqPARTID allocation.
Patches 7-10: Implement dynamic reqPARTID allocation.

Changes
=======

Compared with v7:
  - Add boot parameter to limit mpam_intpartid_max.
  - Update the Narrow-PARTID enablement condition checks.
  - Add default group detection in mpam_thread_switch().
  - Correct patch series revision tag for consistency.

Compared with v6:
  - Add dynamic reqPARTID allocation implementation.
  - Add Patch 1 to fix pre-existing out-of-range PARTID issue.
  - Drop original patch 4 which has been merged into the baseline.

Compared with v5:
  - Redefine the RMID information.
  - Refactor the resctrl_arch_rmid_idx_decode() and
    resctrl_arch_rmid_idx_encode().
  - Simplify closid_rmid2reqpartid() to rmid2reqpartid() and replace it
    accordingly.

Compared with RFC-v4:
  - Rebase the patch set on the v6.14-rc1 branch.

Compared with RFC-v3:
  - Add limitation of the Narrow-PARTID feature (See Patch 2).
  - Remove redundant reqpartid2closid() and reqpartid_pmg2rmid().
  - Refactor closid_rmid2reqpartid() partially.
  - Merge the PARTID conversion-related patches into a single patch for
    bisectability.
  - Skip adaptation of resctrl_arch_set_rmid() which is going to be
    removed.

Compared with RFC-v2:
  - Refactor closid/rmid pair translation.
  - Simplify the logic of synchronize configuration.
  - Remove reqPARTID source bitmap.

Compared with RFC-v1:
  - Rebase this patch set on latest MPAM driver of the v6.12-rc1 branch.

Previous Versions
=================

    v7: https://lore.kernel.org/all/20260317132141.1272506-1-zengheng4@huawei.com/
    v6: https://lore.kernel.org/all/20250222112448.2438586-1-zengheng4@huawei.com/
    v5: https://lore.kernel.org/all/20250217031852.2014939-1-zengheng4@huawei.com/
RFC-v4: https://lore.kernel.org/all/20250104101224.873926-1-zengheng4@huawei.com/
RFC-v3: https://lore.kernel.org/all/20241207092136.2488426-1-zengheng4@huawei.com/
RFC-v2: https://lore.kernel.org/all/20241119135104.595630-1-zengheng4@huawei.com/
RFC-v1: https://lore.kernel.org/all/20241114135037.918470-1-zengheng4@huawei.com/

---

Zeng Heng (10):
  fs/resctrl: Fix MPAM Partid parsing errors by preserving CDP state
    during umount
  arm_mpam: Add intPARTID and reqPARTID support for Narrow-PARTID
    feature
  arm_mpam: Disable reqPARTID expansion when Narrow-PARTID is
    unavailable
  arm_mpam: Refactor rmid to reqPARTID/PMG mapping
  arm_mpam: Propagate control group config to sub-monitoring groups
  arm_mpam: Add boot parameter to limit mpam_intpartid_max
  fs/resctrl: Add rmid_entry state helpers
  arm_mpam: Implement dynamic reqPARTID allocation for monitoring groups
  fs/resctrl: Wire up rmid expansion and reclaim functions
  arm_mpam: Add mpam_sync_config() for dynamic rmid expansion

 arch/arm64/include/asm/mpam.h   |  12 +-
 arch/x86/include/asm/resctrl.h  |   7 +
 drivers/resctrl/mpam_devices.c  | 103 ++++++++---
 drivers/resctrl/mpam_internal.h |   6 +-
 drivers/resctrl/mpam_resctrl.c  | 294 ++++++++++++++++++++++++++++----
 fs/resctrl/monitor.c            |  50 +++++-
 fs/resctrl/rdtgroup.c           |  24 ++-
 include/linux/arm_mpam.h        |  17 ++
 include/linux/resctrl.h         |  21 +++
 9 files changed, 469 insertions(+), 65 deletions(-)

--
2.25.1
Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
Posted by Zeng Heng 1 month, 1 week ago

On 2026/4/13 16:53, Zeng Heng wrote:
> Background
> ==========
> 
> On x86, the resctrl allows creating up to num_rmids monitoring groups
> under parent control group. However, ARM64 MPAM is currently limited by
> the PMG (Performance Monitoring Group) count, which is typically much
> smaller than the theoretical RMID limit. This creates a significant
> scalability gap: users expecting fine-grained per-process or per-thread
> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
> remain available.
> 
> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
> addresses this by associating reqPARTIDs with intPARTIDs through a
> programmable many-to-one mapping. This allows the kernel to present more
> logical monitoring contexts.
> 
> Design Overview
> ===============
> 
> The implementation extends the RMID encoding to carry reqPARTID
> information:
> 
>    RMID = reqPARTID * NUM_PMG + PMG
> 
> In this patchset, a monitoring group is uniquely identified by the
> combination of reqPARTID and PMG. The closid is represented by intPARTID,
> which is exactly the original PARTID.
> 
> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
> driver exposes the full reqPARTID range directly. For heterogeneous
> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
> support, their percentage-based control mechanism prevents the use of
> PARTIDs as reqPARTIDs.
> 
> Capability Improvements
> =======================
> 
> --------------------------------------------------------------------------
> The maximum        |  Sub-monitoring groups            | System-wide
> number of          |  under a control group            | monitoring groups
> --------------------------------------------------------------------------
> Without reqPARTID  |  PMG                              | intPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID          |                                   |
> static allocation  | (reqPARTID // intPARTID) * PMG    | reqPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID          |                                   |
> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
> 
> Note: The number of intPARTIDs can be capped via the boot parameter
> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
> or equal to intPARTID count.
> 

Thank you for your review. Please find the test report below for this
patch set.

Test Environment:
- Kernel: linux/master (commit: 6596a02b207886e9e00bb0161c7fd59fea53c081)
- Architecture: arm64
- Platform: Kunpeng 950
- MPAM Features: 256 reqPARTIDs, 32 intPARTIDs, 4 PMGs

Functional Test Example:

~~~
   # dmesg | grep -i pmg
   [   11.528919] MPAM enabled with 256 reqPARTIDs, 16 intPARTIDs and 4 PMGs
   # cat /proc/cmdline
   <...> mpam.intpartid_max=15
   # cat /sys/module/mpam/parameters/intpartid_max
   15

   # mount -t resctrl resctrl /sys/fs/resctrl/ -o debug
   # cd /sys/fs/resctrl/
   # mkdir p1
   # cd p1/mon_groups/
   # mkdir m{1..963}
   # ls m963
   cpus  cpus_list  mon_data  mon_hw_id  tasks
~~~
Able to create 964 resource groups ((256 - 15) * 4), which matches the
expected result.


MPAM KUnit Test Log:

~~~
[   11.515338] KTAP version 1
[   11.515340] 1..2
[   11.515398]     KTAP version 1
[   11.515400]     # Subtest: mpam_devices_test_suite
[   11.515402]     # module: mpam
[   11.515405]     1..3
[   11.515497]     ok 1 test_mpam_reset_msc_bitmap
[   11.515568] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[   11.515579] mpam:__props_mismatch: cleared cpor_part
[   11.515581] mpam:__props_mismatch: cleared mbw_part
[   11.515592] mpam:__props_mismatch: took the min bwa_wd
[   11.515594] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[   11.515613] mpam:__props_mismatch: __props_mismatch took the min cmax_wd
[   11.515658]     ok 2 test_mpam_enable_merge_features
[   11.515722]     ok 3 test__props_mismatch
[   11.515725] # mpam_devices_test_suite: pass:3 fail:0 skip:0 total:3
[   11.515726] # Totals: pass:3 fail:0 skip:0 total:3
[   11.515728] ok 1 mpam_devices_test_suite
[   11.515730]     KTAP version 1
[   11.515732]     # Subtest: mpam_resctrl_test_suite
[   11.515734]     # module: mpam
[   11.515735]     1..6
[   11.515792]     ok 1 test_get_mba_granularity
[   11.515795]         KTAP version 1
[   11.515797]         # Subtest: test_mbw_max_to_percent
[   11.515850]         ok 1 pc=1, width=8, value=0x01
[   11.515905]         ok 2 pc=1, width=12, value=0x027
[   11.515959]         ok 3 pc=1, width=16, value=0x028e
[   11.516011]         ok 4 pc=25, width=8, value=0x3f
[   11.516069]         ok 5 pc=25, width=12, value=0x3ff
[   11.516127]         ok 6 pc=25, width=16, value=0x3fff
[   11.516182]         ok 7 pc=33, width=8, value=0x53
[   11.516236]         ok 8 pc=33, width=12, value=0x546
[   11.516290]         ok 9 pc=33, width=16, value=0x5479
[   11.516345]         ok 10 pc=35, width=8, value=0x58
[   11.516401]         ok 11 pc=35, width=12, value=0x598
[   11.516456]         ok 12 pc=35, width=16, value=0x5998
[   11.516511]         ok 13 pc=45, width=8, value=0x72
[   11.516565]         ok 14 pc=45, width=12, value=0x732
[   11.516617]         ok 15 pc=45, width=16, value=0x7332
[   11.516673]         ok 16 pc=50, width=8, value=0x7f
[   11.516726]         ok 17 pc=50, width=12, value=0x7ff
[   11.516781]         ok 18 pc=50, width=16, value=0x7fff
[   11.516835]         ok 19 pc=52, width=8, value=0x84
[   11.516889]         ok 20 pc=52, width=12, value=0x850
[   11.516943]         ok 21 pc=52, width=16, value=0x851d
[   11.516997]         ok 22 pc=55, width=8, value=0x8b
[   11.517073]         ok 23 pc=55, width=12, value=0x8cb
[   11.517139]         ok 24 pc=55, width=16, value=0x8ccb
[   11.517196]         ok 25 pc=58, width=8, value=0x93
[   11.517254]         ok 26 pc=58, width=12, value=0x946
[   11.517309]         ok 27 pc=58, width=16, value=0x9479
[   11.517364]         ok 28 pc=75, width=8, value=0xbf
[   11.517418]         ok 29 pc=75, width=12, value=0xbff
[   11.517474]         ok 30 pc=75, width=16, value=0xbfff
[   11.517529]         ok 31 pc=80, width=8, value=0xcb
[   11.517583]         ok 32 pc=80, width=12, value=0xccb
[   11.517638]         ok 33 pc=80, width=16, value=0xcccb
[   11.517693]         ok 34 pc=88, width=8, value=0xe0
[   11.517750]         ok 35 pc=88, width=12, value=0xe13
[   11.517806]         ok 36 pc=88, width=16, value=0xe146
[   11.517861]         ok 37 pc=95, width=8, value=0xf2
[   11.517918]         ok 38 pc=95, width=12, value=0xf32
[   11.517972]         ok 39 pc=95, width=16, value=0xf332
[   11.518027]         ok 40 pc=100, width=8, value=0xff
[   11.518084]         ok 41 pc=100, width=12, value=0xfff
[   11.518141]         ok 42 pc=100, width=16, value=0xffff
[   11.518144]     # test_mbw_max_to_percent: pass:42 fail:0 skip:0 total:42
[   11.518146]     ok 2 test_mbw_max_to_percent
[   11.518149]         KTAP version 1
[   11.518151]         # Subtest: test_percent_to_mbw_max
[   11.518204]         ok 1 pc=1, width=8, value=0x01
[   11.518259]         ok 2 pc=1, width=12, value=0x027
[   11.518321]         ok 3 pc=1, width=16, value=0x028e
[   11.518380]         ok 4 pc=25, width=8, value=0x3f
[   11.518434]         ok 5 pc=25, width=12, value=0x3ff
[   11.518492]         ok 6 pc=25, width=16, value=0x3fff
[   11.518546]         ok 7 pc=33, width=8, value=0x53
[   11.518603]         ok 8 pc=33, width=12, value=0x546
[   11.518662]         ok 9 pc=33, width=16, value=0x5479
[   11.518718]         ok 10 pc=35, width=8, value=0x58
[   11.518776]         ok 11 pc=35, width=12, value=0x598
[   11.518832]         ok 12 pc=35, width=16, value=0x5998
[   11.518887]         ok 13 pc=45, width=8, value=0x72
[   11.518943]         ok 14 pc=45, width=12, value=0x732
[   11.519000]         ok 15 pc=45, width=16, value=0x7332
[   11.519055]         ok 16 pc=50, width=8, value=0x7f
[   11.519110]         ok 17 pc=50, width=12, value=0x7ff
[   11.519166]         ok 18 pc=50, width=16, value=0x7fff
[   11.519219]         ok 19 pc=52, width=8, value=0x84
[   11.519276]         ok 20 pc=52, width=12, value=0x850
[   11.519331]         ok 21 pc=52, width=16, value=0x851d
[   11.519387]         ok 22 pc=55, width=8, value=0x8b
[   11.519444]         ok 23 pc=55, width=12, value=0x8cb
[   11.519504]         ok 24 pc=55, width=16, value=0x8ccb
[   11.519561]         ok 25 pc=58, width=8, value=0x93
[   11.519616]         ok 26 pc=58, width=12, value=0x946
[   11.519676]         ok 27 pc=58, width=16, value=0x9479
[   11.519730]         ok 28 pc=75, width=8, value=0xbf
[   11.519785]         ok 29 pc=75, width=12, value=0xbff
[   11.519840]         ok 30 pc=75, width=16, value=0xbfff
[   11.519893]         ok 31 pc=80, width=8, value=0xcb
[   11.519946]         ok 32 pc=80, width=12, value=0xccb
[   11.519999]         ok 33 pc=80, width=16, value=0xcccb
[   11.520054]         ok 34 pc=88, width=8, value=0xe0
[   11.520106]         ok 35 pc=88, width=12, value=0xe13
[   11.520161]         ok 36 pc=88, width=16, value=0xe146
[   11.520217]         ok 37 pc=95, width=8, value=0xf2
[   11.520270]         ok 38 pc=95, width=12, value=0xf32
[   11.520323]         ok 39 pc=95, width=16, value=0xf332
[   11.520376]         ok 40 pc=100, width=8, value=0xff
[   11.520428]         ok 41 pc=100, width=12, value=0xfff
[   11.520481]         ok 42 pc=100, width=16, value=0xffff
[   11.520483]     # test_percent_to_mbw_max: pass:42 fail:0 skip:0 total:42
[   11.520485]     ok 3 test_percent_to_mbw_max
[   11.520487]         KTAP version 1
[   11.520488]         # Subtest: test_mbw_max_to_percent_limits
[   11.520540]         ok 1 wd=1
[   11.520594]         ok 2 wd=2
[   11.520647]         ok 3 wd=3
[   11.520700]         ok 4 wd=4
[   11.520752]         ok 5 wd=5
[   11.520807]         ok 6 wd=6
[   11.520858]         ok 7 wd=7
[   11.520909]         ok 8 wd=8
[   11.520962]         ok 9 wd=9
[   11.521017]         ok 10 wd=10
[   11.521078]         ok 11 wd=11
[   11.521138]         ok 12 wd=12
[   11.521193]         ok 13 wd=13
[   11.521245]         ok 14 wd=14
[   11.521296]         ok 15 wd=15
[   11.521346]         ok 16 wd=16
[   11.521348]     # test_mbw_max_to_percent_limits: pass:16 fail:0 
skip:0 total:16
[   11.521350]     ok 4 test_mbw_max_to_percent_limits
[   11.521375]     # test_percent_to_max_rounding: Round-up rate: 43% 
(18/42)
[   11.521409]     ok 5 test_percent_to_max_rounding
[   11.521411]         KTAP version 1
[   11.521413]         # Subtest: test_percent_max_roundtrip_stability
[   11.521465]         ok 1 wd=1
[   11.521515]         ok 2 wd=2
[   11.521565]         ok 3 wd=3
[   11.521614]         ok 4 wd=4
[   11.521667]         ok 5 wd=5
[   11.521719]         ok 6 wd=6
[   11.521770]         ok 7 wd=7
[   11.521820]         ok 8 wd=8
[   11.521871]         ok 9 wd=9
[   11.521922]         ok 10 wd=10
[   11.521973]         ok 11 wd=11
[   11.522022]         ok 12 wd=12
[   11.522075]         ok 13 wd=13
[   11.522126]         ok 14 wd=14
[   11.522178]         ok 15 wd=15
[   11.522231]         ok 16 wd=16
[   11.522233]     # test_percent_max_roundtrip_stability: pass:16 
fail:0 skip:0 total:16
[   11.522235]     ok 6 test_percent_max_roundtrip_stability
[   11.522237] # mpam_resctrl_test_suite: pass:6 fail:0 skip:0 total:6
[   11.522238] # Totals: pass:118 fail:0 skip:0 total:118
[   11.522240] ok 2 mpam_resctrl_test_suite
~~~
All 118 MPAM KUnit tests passed.

Tested-by: Zeng Heng <zengheng4@huawei.com>


Best regards,
Zeng Heng
Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
Posted by Zeng Heng 1 month, 3 weeks ago
Hi Shaopeng,

> Hello Zeng Heng,
> 
> Could you tell me which branch this patch series based on?
> 
> Best regards,
> Shaopent TAN


As indicated in the patch series tags, this patch set applies to the
linux-next repository, specifically the master branch at:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next.git

Keep me in the mail list for follow-up responses if you want my feedback
in time. I was accidentally dropped from the mail list in a previous
thread (see 
https://lore.kernel.org/all/TY4PR01MB16930EB1ACB3A3356A92169BC8B232@TY4PR01MB16930.jpnprd01.prod.outlook.com/).


Kind regards,
Zeng Heng
Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
Posted by James Morse 3 weeks, 6 days ago
Hi Zeng,

(beware this is the first version I've seen - arm have been silently deleting your mail,
 it looks like a problem with DKIM signatures)

On 13/04/2026 09:53, Zeng Heng wrote:
> Background
> ==========
> 
> On x86, the resctrl allows creating up to num_rmids monitoring groups
> under parent control group. However, ARM64 MPAM is currently limited by
> the PMG (Performance Monitoring Group) count, which is typically much
> smaller than the theoretical RMID limit.

The MPAM PMG limit is 255. Is that not enough?

I think the real problem is the CHI interconnect protocol is forcing people
to only have 1 bit of PMG - regardless of what the architecture says. This
isn't an MPAM problem as such - its an implementation issue.

(but we can try and work around it)


> This creates a significant
> scalability gap: users expecting fine-grained per-process or per-thread
> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
> remain available.

This is more about MPAM's philosophical stance that PMG extents PARTID, whereas
on x86 RMID is an independent number.

Please don't muddle these - it results in muddled patches!
If we want to try and attack both with narrowing, we should do them separately.


> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
> addresses this by associating reqPARTIDs with intPARTIDs through a
> programmable many-to-one mapping. This allows the kernel to present more
> logical monitoring contexts.

I'd put this as "can be abused to avoid this problem"! We still have a problem with
controls that don't alias and need to be removed from MSC that don't support narrowing.
This isn't what the feature was designed for - but it is a really cool trick, it works
for some real platforms, and solves a problem seen in user-space.

However - throughout this series you seem to be discarding all the control-group support
for a monitoring-only setup that allocates intPARTID for everything. This might work for
your use-case on your platform, but it doesn't generalise to platforms without narrowing
or where multiple control-groups are needed.


> Design Overview
> ===============
> 
> The implementation extends the RMID encoding to carry reqPARTID
> information:
> 
>   RMID = reqPARTID * NUM_PMG + PMG
> 
> In this patchset, a monitoring group is uniquely identified by the
> combination of reqPARTID and PMG. The closid is represented by intPARTID,
> which is exactly the original PARTID.

The way I think of this is 'RMID' bits being spilled into PARTID. This
means each control group has a set of PARTID. For MSC using narrowing,
CLOSID would be the intPARTID value. But as you note, we need to support
mismatches:


> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
> driver exposes the full reqPARTID range directly. For heterogeneous
> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
> support, their percentage-based control mechanism prevents the use of
> PARTIDs as reqPARTIDs.

It'd be good to have some discussion about what the interface between the
mpam_devices code and any other user (like resctrl) should be.

As a hypothetical system to think about:
  64 PARTID at the L3, which support CPOR and CCAP
  64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR
  64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX

I think whether using intPARTID is a benefit needs to be user-space policy.
You've likely got a platform where that choice is obvious - but it is a
trade-off as you lose the non-aliasing controls. In the example above, using
narrowing on this system means losing the CCAP controls on L3 as they don't alias [*].
Where its a policy, its likely to be one policy for resctrl, and another for any other
user.
We can get the resctrl glue code to turn it on unconditionally if there is no trade off,
I think that means: no non-aliasing controls in any class that doesn't support narrowing
- including 'unknown'. (we couldn't add them to resctrl in the future if you already chose
to enable this).

As for the interface with mpam_devices:
I think this means the resctrl glue code needs to be able to discover which
classes support intPARTID, and how many controls they actually have. From there
it can apply to policy to determine whether its better to support fewer features
in resctrl to get more RMID. (the alternative is always to ignore the MSC with
narrowing - narrowing lets hardware lie about the features it supports).

Currently the resctrl glue code has to program a configuration for two PARTID
when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like
to explore generalising it as this narrowing stuff will also need to apply a
configuration to a set of PARTID when that MSC doesn't support narrowing.
In the example above, we'd need to discard the CCAP controls and write the same
CPOR bitmap to each PARTID that is mapped together by narrowing.


I think this means the resctrl glue code will need to be able to write a configuration
to controls using the full partid_max range as it does today. But also be able to set
the narrowing mapping on classes that support it.
For the monitors, the resctrl glue code will need to allocate and configure a set of
monitors, and read and sum them. This will be regardless of whether narrowing is
supported.

I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would
always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list.
I'm hoping we can make CDP a subset of this problem.
Some clever arithmetic may save allocating memory for a table - but if we change resctrl
to do this dynamically, the numbers become arbitrary forcing it to be a table.
It might also be possible to support moving monitor-groups between control groups with
the table driven approach. (see what you think on how complex it ends up ...)

I'd like to keep that grouping static for now, the table needs creating at setup time,
(+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings
can be written once at setup time.

I'd like to avoid exposing user ABI to control this until we get it working, then we can
talk about whether to try making the grouping dynamically managed by resctrl. (there were
some proposals in that area - but I can't find them on lore).
If there are platforms were its certainly not a trade-off, we can enable it
unconditionally - but I'm wary of this being "what we care about now", requiring user-abi
to enable features that were detectable.
e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't
expose it if we were using narrowing. Now its a trade-off.


> Capability Improvements
> =======================
> 
> --------------------------------------------------------------------------
> The maximum        |  Sub-monitoring groups            | System-wide
> number of          |  under a control group            | monitoring groups
> --------------------------------------------------------------------------
> Without reqPARTID  |  PMG                              | intPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID          |                                   |
> static allocation  | (reqPARTID // intPARTID) * PMG    | reqPARTID * PMG
> --------------------------------------------------------------------------
> reqPARTID          |                                   |
> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
> --------------------------------------------------------------------------
> 
> Note: The number of intPARTIDs can be capped via the boot parameter
> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
> or equal to intPARTID count.
> 
> Series Structure
> ================
> 
> Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
> Patches 2-6: Implement static reqPARTID allocation.
> Patches 7-10: Implement dynamic reqPARTID allocation.

I've had  a hard time following this series. You dive in with invasive changes, then
unbreak things in later patches.

Please added the needed infrastructure in mpam_devices.c first. This should be free of
resctrl-isms, and 'only' needs reviewing against the architecture.

Then add the resctrl glue code stuff. That needs to comply with what resctrl expects.

I think the cleanest way to think about this is to break the mapping between CLOSID and
PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers
to explicitly do this early in those patches will make your changes clearer.
Please avoid spraying the narrowing terms for things everywhere.


Thanks,

James


[*] It's terminology from discussing this with Dave, just in case a summary is needed:
  aliasing controls are like CPOR where two different PARTID with the same bitmap
  compete for the same resource. If you give them each the same 50% of the portions,
  they can't exceed that together.
  non-aliasing controls are like CCAP where to different PARTID with the same fraction
  compete for different resources. If you give them each 50% of the capacity, it adds
  up to 100%. You can't represent 'the same' 50% using these controls.

  Narrowing papers over this problem with its remapping table, which gives you a 'same'
  property. For MSC that have controls of that shape - and where more monitors are
  desired - we'd have to drop the controls.

  I think "more monitors are desired" is going to need to be user-space policy. But
  we can come back to how to do that later.
Re: [PATCH v8 next 00/10] arm_mpam: Introduce Narrow-PARTID feature
Posted by Zeng Heng 3 weeks, 1 day ago
Hi James,

On 2026/5/15 1:06, James Morse wrote:
> Hi Zeng,
> 
> (beware this is the first version I've seen - arm have been silently deleting your mail,
>   it looks like a problem with DKIM signatures)
> 

Thanks for your informing. I will try to send community mails using
huaweicloud email to avoid DKIM signature issues.

Hope it works.

> On 13/04/2026 09:53, Zeng Heng wrote:
>> Background
>> ==========
>>
>> On x86, the resctrl allows creating up to num_rmids monitoring groups
>> under parent control group. However, ARM64 MPAM is currently limited by
>> the PMG (Performance Monitoring Group) count, which is typically much
>> smaller than the theoretical RMID limit.
> 
> The MPAM PMG limit is 255. Is that not enough?
> 
> I think the real problem is the CHI interconnect protocol is forcing people
> to only have 1 bit of PMG - regardless of what the architecture says. This
> isn't an MPAM problem as such - its an implementation issue.
> 
> (but we can try and work around it)
> 

Yes, the architecture theoretically allows PMG to be up to 8 bits wide,
but many platforms I've worked with (not just Kunpeng) implement far
fewer bits in practice.

> 
>> This creates a significant
>> scalability gap: users expecting fine-grained per-process or per-thread
>> monitoring quickly exhaust the PMG space, even when plenty of reqPARTIDs
>> remain available.
> 
> This is more about MPAM's philosophical stance that PMG extents PARTID, whereas
> on x86 RMID is an independent number.
> 

No value judgment here. ARM seeks to expand the number of monitoring
groups by combining PARTID and PMG within limited bit-width constraints,
which inherently introduces coupling between the two.

> Please don't muddle these - it results in muddled patches!
> If we want to try and attack both with narrowing, we should do them separately.
> 
> 
>> The Narrow-PARTID feature, defined in the ARM MPAM architecture,
>> addresses this by associating reqPARTIDs with intPARTIDs through a
>> programmable many-to-one mapping. This allows the kernel to present more
>> logical monitoring contexts.
> 
> I'd put this as "can be abused to avoid this problem"! We still have a problem with
> controls that don't alias and need to be removed from MSC that don't support narrowing.
> This isn't what the feature was designed for - but it is a really cool trick, it works
> for some real platforms, and solves a problem seen in user-space.
> 
> However - throughout this series you seem to be discarding all the control-group support
> for a monitoring-only setup that allocates intPARTID for everything. This might work for
> your use-case on your platform, but it doesn't generalise to platforms without narrowing
> or where multiple control-groups are needed.
> 

Currently, for MSCs that have non-aliasing controls but do not support
the Narrow PARTID feature, this solution will directly disable itself,
rather than hiding the non-aliasing control capabilities (Patch 3:
https://lore.kernel.org/all/20260413085405.1166412-4-zengheng4@huawei.com/). 


This does indeed affect the enablement of this solution on MSC systems
without narrowing capability.

On the contrary, the solution attempts to preserve as many intPARTIDs
(i.e., control groups) as original. In principle, I hope that on
systems where narrow PARTID was not previously enabled, this patch set
can create as many monitoring groups as possible without changing any
other functionality.

And also allows users to limit the intpartid_max count via boot
parameters. (Patch 6: 
https://lore.kernel.org/all/20260413085405.1166412-7-zengheng4@huawei.com/)


> 
>> Design Overview
>> ===============
>>
>> The implementation extends the RMID encoding to carry reqPARTID
>> information:
>>
>>    RMID = reqPARTID * NUM_PMG + PMG
>>
>> In this patchset, a monitoring group is uniquely identified by the
>> combination of reqPARTID and PMG. The closid is represented by intPARTID,
>> which is exactly the original PARTID.
> 
> The way I think of this is 'RMID' bits being spilled into PARTID. This
> means each control group has a set of PARTID. For MSC using narrowing,
> CLOSID would be the intPARTID value. But as you note, we need to support
> mismatches:
> 
> 

Yes.

>> For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
>> driver exposes the full reqPARTID range directly. For heterogeneous
>> systems where some MSCs lack Narrow-PARTID support, the driver utilizes
>> PARTIDs beyond the intPARTID range as reqPARTIDs to expand monitoring
>> capability. The sole exception is when any type of MSCs lack Narrow-PARTID
>> support, their percentage-based control mechanism prevents the use of
>> PARTIDs as reqPARTIDs.
> 
> It'd be good to have some discussion about what the interface between the
> mpam_devices code and any other user (like resctrl) should be.
> 
> As a hypothetical system to think about:
>    64 PARTID at the L3, which support CPOR and CCAP
>    64 PARTID and narrowing to 16 at the SLC, which supoprts CPOR
>    64 PARTID and narrowing to 32 at the memory-controller, which support MBWU_MAX
> 

By the way, in this case, the L3 does not support NP and has CCAP, so
the PARTID mapping extension(PME) is not enabled by default.

If we exclude the L3 CCAP, the solution would support 16 control groups
and (64 * PMG) monitoring groups.

> I think whether using intPARTID is a benefit needs to be user-space policy.
> You've likely got a platform where that choice is obvious - but it is a
> trade-off as you lose the non-aliasing controls. In the example above, using
> narrowing on this system means losing the CCAP controls on L3 as they don't alias [*].
> Where its a policy, its likely to be one policy for resctrl, and another for any other
> user.
> We can get the resctrl glue code to turn it on unconditionally if there is no trade off,
> I think that means: no non-aliasing controls in any class that doesn't support narrowing
> - including 'unknown'. (we couldn't add them to resctrl in the future if you already chose
> to enable this).
> 

Currently, after MPAM initialization, the PARTID mapping extension(PME)
is enabled by default unless there exists an MSC that both lacks NP
support and has non-aliasing controls — this is purely beneficial with
no downsides. Going forward, we may consider adding a `force_reqpartid`
option to forcibly enable the feature and disable non-aliasing controls.

> As for the interface with mpam_devices:
> I think this means the resctrl glue code needs to be able to discover which
> classes support intPARTID, and how many controls they actually have. From there
> it can apply to policy to determine whether its better to support fewer features
> in resctrl to get more RMID. (the alternative is always to ignore the MSC with
> narrowing - narrowing lets hardware lie about the features it supports).
> 
> Currently the resctrl glue code has to program a configuration for two PARTID
> when CDP is being hidden on the MB resource. This is ugly and fragile. I'd like
> to explore generalising it as this narrowing stuff will also need to apply a
> configuration to a set of PARTID when that MSC doesn't support narrowing.
> In the example above, we'd need to discard the CCAP controls and write the same
> CPOR bitmap to each PARTID that is mapped together by narrowing.
> 

One option is to expand CDP compatibility by PME: L3DATA and L3CODE
would still be controlled separately, while MB control would be
consolidated via narrow mapping onto a single intPARTID.

Of course, this requires that the MB supports narrowing.

> 
> I think this means the resctrl glue code will need to be able to write a configuration
> to controls using the full partid_max range as it does today. But also be able to set
> the narrowing mapping on classes that support it.
> For the monitors, the resctrl glue code will need to allocate and configure a set of
> monitors, and read and sum them. This will be regardless of whether narrowing is
> supported. >
> I think this means allocating a table of CLOSID to PARTID(s). the intPARTID would
> always match the CLOSID. Monitors and non-narrowing MSC would need to walk the list.
> I'm hoping we can make CDP a subset of this problem.
> Some clever arithmetic may save allocating memory for a table - but if we change resctrl
> to do this dynamically, the numbers become arbitrary forcing it to be a table.
> It might also be possible to support moving monitor-groups between control groups with
> the table driven approach. (see what you think on how complex it ends up ...)
> 

In the current patch series, static allocation employs a
straightforward intPARTID-to-reqPARTID translation, while dynamic
management tracks the mappings via `reqpartid_map` table.

> I'd like to keep that grouping static for now, the table needs creating at setup time,
> (+/- CDP), to avoid problems like you've found with CDP. This means the intpartid mappings
> can be written once at setup time.
> 
> I'd like to avoid exposing user ABI to control this until we get it working, then we can
> talk about whether to try making the grouping dynamically managed by resctrl. (there were
> some proposals in that area - but I can't find them on lore).
> If there are platforms were its certainly not a trade-off, we can enable it
> unconditionally - but I'm wary of this being "what we care about now", requiring user-abi
> to enable features that were detectable.
> e.g. we ignore an unknown MSC, and add a resctrl schema for it later - only we can't
> expose it if we were using narrowing. Now its a trade-off.
> 
> 
>> Capability Improvements
>> =======================
>>
>> --------------------------------------------------------------------------
>> The maximum        |  Sub-monitoring groups            | System-wide
>> number of          |  under a control group            | monitoring groups
>> --------------------------------------------------------------------------
>> Without reqPARTID  |  PMG                              | intPARTID * PMG
>> --------------------------------------------------------------------------
>> reqPARTID          |                                   |
>> static allocation  | (reqPARTID // intPARTID) * PMG    | reqPARTID * PMG
>> --------------------------------------------------------------------------
>> reqPARTID          |                                   |
>> dynamic allocation | (reqPARTID - intPARTID + 1) * PMG | reqPARTID * PMG
>> --------------------------------------------------------------------------
>>
>> Note: The number of intPARTIDs can be capped via the boot parameter
>> mpam.intpartid_max. Under MPAM, reqPARTID count is always greater than
>> or equal to intPARTID count.
>>
>> Series Structure
>> ================
>>
>> Patch 1: Fix pre-existing out-of-range PARTID issue between mount sessions.
>> Patches 2-6: Implement static reqPARTID allocation.
>> Patches 7-10: Implement dynamic reqPARTID allocation.
> 
> I've had  a hard time following this series. You dive in with invasive changes, then
> unbreak things in later patches.
> 
> Please added the needed infrastructure in mpam_devices.c first. This should be free of
> resctrl-isms, and 'only' needs reviewing against the architecture.
> 
> Then add the resctrl glue code stuff. That needs to comply with what resctrl expects.
> 
> I think the cleanest way to think about this is to break the mapping between CLOSID and
> PARTID. We're effectively moving bits of RMID out of PMG into PARTID. Adding helpers
> to explicitly do this early in those patches will make your changes clearer.
> Please avoid spraying the narrowing terms for things everywhere.
> 
> 

Sure, I'll reorder the series to introduce the core infrastructure in
mpam_devices.c first. Should I drop the dynamic allocation part from
this series for now?

> 
> 
> [*] It's terminology from discussing this with Dave, just in case a summary is needed:
>    aliasing controls are like CPOR where two different PARTID with the same bitmap
>    compete for the same resource. If you give them each the same 50% of the portions,
>    they can't exceed that together.
>    non-aliasing controls are like CCAP where to different PARTID with the same fraction
>    compete for different resources. If you give them each 50% of the capacity, it adds
>    up to 100%. You can't represent 'the same' 50% using these controls.
> 
>    Narrowing papers over this problem with its remapping table, which gives you a 'same'
>    property. For MSC that have controls of that shape - and where more monitors are
>    desired - we'd have to drop the controls.
> 
>    I think "more monitors are desired" is going to need to be user-space policy. But
>    we can come back to how to do that later.
> 
> 

I'm not sure if anyone else has formalized these into terminology
before, but I fully agree with the terms "aliasing controls" and "non-
aliasing controls" — they're instantly intuitive for software
developers.


Best regards,
Zeng Heng