arch/x86/kernel/cpu/resctrl/core.c | 20 +++ drivers/platform/arm64/mpam/mpam_devices.c | 80 +++++++++-- drivers/platform/arm64/mpam/mpam_internal.h | 6 + drivers/platform/arm64/mpam/mpam_resctrl.c | 145 +++++++++++++++++++- fs/resctrl/internal.h | 4 - fs/resctrl/monitor.c | 16 ++- fs/resctrl/pseudo_lock.c | 7 +- fs/resctrl/rdtgroup.c | 84 ++++++++---- include/linux/resctrl.h | 30 ++++ 9 files changed, 342 insertions(+), 50 deletions(-)
The patch set is applied for mpam/snapshot/v6.12-rc1 branch of
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git
repository.
This patch set is fully compatible with x86 RDT functionality.
The narrow-partid feature in MPAM allows for a more efficient use of
PARTIDs by enabling a many-to-one mapping of reqpartids (requested PARTIDs)
to intpartids (internal PARTIDs). This mapping reduces the number of unique
PARTIDs needed, thus allowing more tasks or processes to be monitored and
managed with the available resources.
Intpartid(Internal PARTID) is an internal identifier used by the hardware
to represent a specific resource partition. It is a low-level identifier
that the hardware uses to track and manage resource allocation and
monitoring.
Reqpartid(Request PARTID) is an identifier provided by the software when
requesting resources from the memory system. It indicates the desired
partition for resource monitoring. By using reqpartids, software can
monitor specific resources or allow the system to subdivide smaller
granularity partitions within existing partitions to serve as monitoring
partitions.
For the new rmid allocation strategy, it will check whether there is an
available rmid of any reqPARTID which belongs to the input intPARTID.
The MPAM driver statically assigns all reqPARTIDs to respective intPARTIDs,
with a specific illustration as follows:
m - Indicates the number of reqPARTIDs per intPARTID
n - Indicates the total number of intPARTIDs
(m * n) - Represents the total number of reqPARTIDs
intPARTID_1 = 0
├── reqPARTID_1_1 = 0
├── reqPARTID_1_2 = 0 + n
├── ...
└── reqPARTID_1_m = 0 + n * (m - 1)
intPARTID_2 = 1
├── reqPARTID_2_1 = 1
├── reqPARTID_2_2 = 1 + n
├── ...
└── reqPARTID_2_m = 1 + n * (m - 1)
...
intPARTID_n = (n - 1)
Each intPARTID has m reqPARTIDs, which are used to expand the number of
monitoring groups under the control group. Therefore, the number of
monitoring groups is no longer limited by the range of MPAM PMG, which
enhances the extensibility of the system's monitoring capabilities.
---
compared with v1:
- Rebase this patch set on latest MPAM driver of the v6.12-rc1 branch.
---
Dave Martin (1):
arm_mpam: Set INTERNAL as needed when setting MSC controls
Zeng Heng (5):
arm_mpam: Introduce the definitions of intPARTID and reqPARTID
arm_mpam: Create reqPARTIDs resource bitmap
arm_mpam: Enhance the rmid allocation strategy
arm_mpam: Call resctrl_sync_config() when allocate new reqPARTID
fs/resctrl: Add the helper to check if the task exists in the target
group
arch/x86/kernel/cpu/resctrl/core.c | 20 +++
drivers/platform/arm64/mpam/mpam_devices.c | 80 +++++++++--
drivers/platform/arm64/mpam/mpam_internal.h | 6 +
drivers/platform/arm64/mpam/mpam_resctrl.c | 145 +++++++++++++++++++-
fs/resctrl/internal.h | 4 -
fs/resctrl/monitor.c | 16 ++-
fs/resctrl/pseudo_lock.c | 7 +-
fs/resctrl/rdtgroup.c | 84 ++++++++----
include/linux/resctrl.h | 30 ++++
9 files changed, 342 insertions(+), 50 deletions(-)
--
2.25.1
Hi, On Tue, Nov 19, 2024 at 09:50:58PM +0800, Zeng Heng wrote: > The patch set is applied for mpam/snapshot/v6.12-rc1 branch of > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git > repository. > > This patch set is fully compatible with x86 RDT functionality. > > The narrow-partid feature in MPAM allows for a more efficient use of > PARTIDs by enabling a many-to-one mapping of reqpartids (requested PARTIDs) > to intpartids (internal PARTIDs). This mapping reduces the number of unique > PARTIDs needed, thus allowing more tasks or processes to be monitored and > managed with the available resources. > > Intpartid(Internal PARTID) is an internal identifier used by the hardware > to represent a specific resource partition. It is a low-level identifier > that the hardware uses to track and manage resource allocation and > monitoring. > > Reqpartid(Request PARTID) is an identifier provided by the software when > requesting resources from the memory system. It indicates the desired > partition for resource monitoring. By using reqpartids, software can > monitor specific resources or allow the system to subdivide smaller > granularity partitions within existing partitions to serve as monitoring > partitions. > > For the new rmid allocation strategy, it will check whether there is an > available rmid of any reqPARTID which belongs to the input intPARTID. > > The MPAM driver statically assigns all reqPARTIDs to respective intPARTIDs, > with a specific illustration as follows: > > m - Indicates the number of reqPARTIDs per intPARTID > n - Indicates the total number of intPARTIDs > (m * n) - Represents the total number of reqPARTIDs > > intPARTID_1 = 0 > ├── reqPARTID_1_1 = 0 > ├── reqPARTID_1_2 = 0 + n > ├── ... > └── reqPARTID_1_m = 0 + n * (m - 1) > > intPARTID_2 = 1 > ├── reqPARTID_2_1 = 1 > ├── reqPARTID_2_2 = 1 + n > ├── ... > └── reqPARTID_2_m = 1 + n * (m - 1) > > ... > > intPARTID_n = (n - 1) > > Each intPARTID has m reqPARTIDs, which are used to expand the number of > monitoring groups under the control group. Therefore, the number of > monitoring groups is no longer limited by the range of MPAM PMG, which > enhances the extensibility of the system's monitoring capabilities. The idea of mapping multiple reqPARTIDs to each resctrl control group looks like it can work, but I think that there are some issues that need to be considered: 1) There may be a mixture of MSCs in the system, some of which support PARTID Narrowing and some of which do not. Affected MSCs will not be able to regulate resource consumption for a single resctrl control group as a single unit, if multiple reqPARTIDs are used. This matters when an MSC that does not support PARTID Narrowing also has resource controls that are not of the "partition bitmap" type. (Consider a resctrl control partition that throttles the partition to 30% of memory bandwidth. How can the same behaviour be achieved if the partition is split arbitrarily across multiple reqPARTIDs?) Because the MPAM driver needs to be as general as possible, it may be hard to make the "right" decision about whether to group reqPARTIDs to provide more monitoring groups. because different use cases may have different requirments (e.g., number of control groups versus number of monitoring groups, and which types of control are useful). 2) The resctrl core code uses CLOSIDs and RMIDs to identify control groups and monitoring groups. If a particular driver wants to translate these into other values (reqPARTID, intPARTID, PMG) then it can do so, but this mapping logic should be encapsulated in the driver. This should be better for maintainability, since the details of the remapping will be arch-specific -- and in general not all arches are going to require it. With this in mind, I think that changes in the resctrl core code would be minimal (perhaps no changes at all). 3) How should the amount of reqPARTID grouping (your "n" parameter) be determined, in general? As with (1), the right answer may depend on the use case as well as on the hardware. From my investigations into this, I feel that some configuration parameters will probably be needed, at least at boot time. 4) If the mapping between reqPARTIDs and (CLOSID,RMID) pairs is static, is it necessary to track which reqPARTIDs are in use? Would it be simpler to treat all n reqPARTIDs as permanently assigned to the corresponding CLOSID? If reqPARTID usage is not tracked, then every control change on MSCs that do not support PARTID Narrowing would need to be replicated across all reqPARTIDs corresponding to the affected resctrl control partition. But control changes are a relatively rare event, so this approach feels acceptable as a way of keeping the driver complexity down. It partly depends on how large the "n" parameter can become. (Since PARTID Narrowing allows any arbitrary set of reqPARTIDs to be mapped to a given intPARTID, it might be possible to allocate reqPARTIDs completely dynamically. But this probably would require a change to the resctrl core interface. It is also not clear to me whether the "num_closids" and "num_rmids" values advertised to userspace can be satisfied. For now, static allocation seems the most straightforward way to to get better monitoring, but perhaps it could be enhanced later on.) [...] Cheers ---Dave
Hi, Thanks for comments! On 2024/11/19 23:31, Dave Martin wrote: > > 1) There may be a mixture of MSCs in the system, some of which support > PARTID Narrowing and some of which do not. Affected MSCs will not be > able to regulate resource consumption for a single resctrl control > group as a single unit, if multiple reqPARTIDs are used. > > This matters when an MSC that does not support PARTID Narrowing also > has resource controls that are not of the "partition bitmap" type. > > (Consider a resctrl control partition that throttles the partition to > 30% of memory bandwidth. How can the same behaviour be achieved if the > partition is split arbitrarily across multiple reqPARTIDs?) > > Because the MPAM driver needs to be as general as possible, it may be > hard to make the "right" decision about whether to group reqPARTIDs to > provide more monitoring groups. because different use cases may have > different requirments (e.g., number of control groups versus number of > monitoring groups, and which types of control are useful). 1. The patch set solution is designed considering mixed MSC scenarios. Regarding the definition of the quantity 'n', here is a detailed explanation: n - Indicates the total number of intPARTIDs l - Represents the total number of reqPARTIDs m - Indicates the number of reqPARTIDs per intPARTID The values of n/l/m are derived from the following formula: n = min(intPARTID-np, PARTID-nnp) l = min(reqPARTID-np, PARTID-nnp) m = l // n reqPARTID-np -- The number of reqPARTIDs supported by MSCs that support narrow-partid. intPARTID-np -- The number of intPARTIDs supported by MSCs that support narrow partid. PARTID-nnp -- The number of PARTIDs supported by MSCs that do not support narrow partid. The software needs to ensure that 'm' is an integer, meaning the number of supported reqPARTIDs is an integer multiple of 'n'. To illustrate how to determine n, l, and m through examples, we can assume a specific platform: L3 - Supports the narrow PARTID feature, supports 32 intPARTIDs, and supports 256 reqPARTIDs. mata - Does not support the narrow PARTID feature, supports a range of 256 PARTIDs. Then, n = min(intPARTID-l3, PARTID-mata) = min(32, 256) = 32 l = min(reqPARTID-l3, PARTID-mata) = min(256,256) = 256 m = 256 / 32 = 8 The mapping relationships between each group's closid and the respective MSCs' intPARTID/reqPartid/PARTID are illustrated: P - partition group M - monitoring group Group: Closid MSCs with narrow-partid MSCs without narrow-partid P1 : 0 intPARTID_1 PARTID_1 M1_1 : 0 ├── reqPARTID_1_1 ├── PARTID_1_1 M1_2 : 0+n ├── reqPARTID_1_2 ├── PARTID_1_2 ... ├── ... ├── ... M1_m : 0+n*(m-1) └── reqPARTID_1_m └── PARTID_1_m P2 : 1 intPARTID_2 PARTID_2 M2_1 : 1 ├── reqPARTID_2_1 ├── PARTID_2_1 M2_2 : 1+n ├── reqPARTID_2_2 ├── PARTID_2_2 ... ├── ... ├── ... M2_m : 1+n*(m-1) └── reqPARTID_2_m └── PARTID_2_m Pn : (n-1) intPARTID_n PARTID_n Mn_1 : (n-1) ├── reqPARTID_n_1 ├── PARTID_n_1 Mn_2 : (n-1)+n ├── reqPARTID_n_2 ├── PARTID_n_2 ... ├── ... ├── ... Mn_m : (n-1)+n*(m-1) = n*m-1 └── reqPARTID_n_m └── PARTID_n_m The advantages of doing this are: 1. There is no need to modify or disrupt the existing resctrl layer interface, ensuring that each control group has same resource control functionality; 2. MSCs that support narrow-partid (including intPARTID and reqPARTID) and MSCs that do not support (only including PARTID) can share the same PARTID space; 3. On the premise of ensuring the (1) point, the number of control groups can be maximized, because users can always choose to make a control group act as a sub-monitoring group under another control group; > 2) The resctrl core code uses CLOSIDs and RMIDs to identify control > groups and monitoring groups. If a particular driver wants to > translate these into other values (reqPARTID, intPARTID, PMG) then it > can do so, but this mapping logic should be encapsulated in the driver. > This should be better for maintainability, since the details of the > remapping will be arch-specific -- and in general not all arches are > going to require it. With this in mind, I think that changes in the > resctrl core code would be minimal (perhaps no changes at all). Yes, maintaining the interface of the resctrl core code unchanged is, in essence, the (first) important constraint of the current MPAM code. We try the best to keep all resctrl interfaces and ensure the existing functionality of x86 RDT. The only thing that falls short of being ideal (forgive me), is that it introduces the sole new function resctrl_arch_alloc_rmid() into the resctrl code (resctrl_arch_free_rmid() will be optimized away in the next version, and there are no other new functions any more). The resctrl_arch_alloc_rmid() is the result of several restructuring iterations and it is one of the most critical points in the patch series. > 3) How should the amount of reqPARTID grouping (your "n" parameter) be > determined, in general? > > As with (1), the right answer may depend on the use case as well as on > the hardware. > > >From my investigations into this, I feel that some configuration > parameters will probably be needed, at least at boot time. As mentioned earlier, Total number of intPARTIDs: n = min(intPARTID-np, PARTID-nnp) Total number of reqPARTIDs: l = min(reqPARTID-np, PARTID-nnp) We maximize the number of control groups because users can always choose to make a control group act as a sub-monitoring group any time. > 4) If the mapping between reqPARTIDs and (CLOSID,RMID) pairs is static, > is it necessary to track which reqPARTIDs are in use? Would it be > simpler to treat all n reqPARTIDs as permanently assigned to the > corresponding CLOSID? > > If reqPARTID usage is not tracked, then every control change on MSCs > that do not support PARTID Narrowing would need to be replicated across > all reqPARTIDs corresponding to the affected resctrl control partition. > But control changes are a relatively rare event, so this approach feels > acceptable as a way of keeping the driver complexity down. It partly > depends on how large the "n" parameter can become. Yes, totally agree. I will try to remove the reqPARTID bitmap and the resctrl_arch_free_rmid(). As mentioned, this will simplify the code logic and reduce changes to the resctrl layer code. Initially, to reduce the number of IPI interrupt, keep this resource tracking until now, and I will prioritize optimization for the next version. (In fact, the initial version of the patch set was dynamically allocated, and during the code restructuring process, it was inevitable to retain some of the original ideas.) Best regards, Zeng Heng
Hi, On Sat, Nov 23, 2024 at 05:34:08PM +0800, Zeng Heng wrote: > Hi, > > Thanks for comments! > > On 2024/11/19 23:31, Dave Martin wrote: > > > > 1) There may be a mixture of MSCs in the system, some of which support > > PARTID Narrowing and some of which do not. Affected MSCs will not be > > able to regulate resource consumption for a single resctrl control > > group as a single unit, if multiple reqPARTIDs are used. > > > > This matters when an MSC that does not support PARTID Narrowing also > > has resource controls that are not of the "partition bitmap" type. > > > > (Consider a resctrl control partition that throttles the partition to > > 30% of memory bandwidth. How can the same behaviour be achieved if the > > partition is split arbitrarily across multiple reqPARTIDs?) > > > > Because the MPAM driver needs to be as general as possible, it may be > > hard to make the "right" decision about whether to group reqPARTIDs to > > provide more monitoring groups. because different use cases may have > > different requirments (e.g., number of control groups versus number of > > monitoring groups, and which types of control are useful). > > 1. The patch set solution is designed considering mixed MSC scenarios. > > Regarding the definition of the quantity 'n', here is a detailed > > explanation: > > n - Indicates the total number of intPARTIDs > > l - Represents the total number of reqPARTIDs > > m - Indicates the number of reqPARTIDs per intPARTID > > The values of n/l/m are derived from the following formula: > > n = min(intPARTID-np, PARTID-nnp) > > l = min(reqPARTID-np, PARTID-nnp) > > m = l // n > > reqPARTID-np -- The number of reqPARTIDs supported by MSCs that support > narrow-partid. > > intPARTID-np -- The number of intPARTIDs supported by MSCs that support > narrow partid. > PARTID-nnp -- The number of PARTIDs supported by MSCs that do not support > narrow partid. > > The software needs to ensure that 'm' is an integer, meaning the number of > > supported reqPARTIDs is an integer multiple of 'n'. > > To illustrate how to determine n, l, and m through examples, we can assume > > a specific platform: > > L3 - Supports the narrow PARTID feature, supports 32 intPARTIDs, and > > supports 256 reqPARTIDs. > > mata - Does not support the narrow PARTID feature, supports a range of > > 256 PARTIDs. > > Then, > > n = min(intPARTID-l3, PARTID-mata) = min(32, 256) = 32 > > l = min(reqPARTID-l3, PARTID-mata) = min(256,256) = 256 > > m = 256 / 32 = 8 > > The mapping relationships between each group's closid and the respective > > MSCs' intPARTID/reqPartid/PARTID are illustrated: > > P - partition group > > M - monitoring group > > Group: Closid MSCs with narrow-partid MSCs without narrow-partid > P1 : 0 intPARTID_1 PARTID_1 > M1_1 : 0 ├── reqPARTID_1_1 ├── PARTID_1_1 > M1_2 : 0+n ├── reqPARTID_1_2 ├── PARTID_1_2 > ... ├── ... ├── ... > M1_m : 0+n*(m-1) └── reqPARTID_1_m └── PARTID_1_m > > P2 : 1 intPARTID_2 PARTID_2 > M2_1 : 1 ├── reqPARTID_2_1 ├── PARTID_2_1 > M2_2 : 1+n ├── reqPARTID_2_2 ├── PARTID_2_2 > ... ├── ... ├── ... > M2_m : 1+n*(m-1) └── reqPARTID_2_m └── PARTID_2_m > > Pn : (n-1) intPARTID_n PARTID_n > Mn_1 : (n-1) ├── reqPARTID_n_1 ├── PARTID_n_1 > Mn_2 : (n-1)+n ├── reqPARTID_n_2 ├── PARTID_n_2 > ... ├── ... ├── ... > Mn_m : (n-1)+n*(m-1) = n*m-1 └── reqPARTID_n_m └── PARTID_n_m Thanks for this illustration. > The advantages of doing this are: > > 1. There is no need to modify or disrupt the existing resctrl layer > > interface, ensuring that each control group has same resource > > control functionality; I don't think this is guaranteed. If there is some MSC that does not have PARTID Narrowing support, and this MSC has a memory bandwidth control that the MPAM driver exposes through resctrl, then there is no way to configure that MSC that exhibits the behaviour that the resctrl user expects. For a concrete example: Suppose that n=8, and the user asks for P1 to be given 30% of system memory bandwidth. On the affected MSC, P1 maps to eight PARTIDs, each with its own memory bandwidth regulation. If the work that happens to be in M1_1 dominates P1's bandwith requirment, then PARTID_1_1 needs to be given 30% of total memory bandwidth. If the work in P1 is evenly spread across M1_1, M1_2 ... M1_m, then they would each need to be given 30% / 8 = 3.75% of total memory bandwidth so that the total allocated bandwidth is 30%. But we don't know how memory bandwidth consumption is distributed among M1_1, M2_2 etc., so there is no way to program the memory bandiwdth regulation on that MSC that guarantees the expected result of P1 receiving 30% of the total available bandwidth. This means that on some hardware, a choice needs to be made: should the MPAM driver hide from resctrl any controls that have this problem, or should it disable the use of PARTID Narrowing for providing additional monitoring groups. My concern is that the correct choice is likely to be use-case- dependent. Do you have a view on this? > 2. MSCs that support narrow-partid (including intPARTID and reqPARTID) > > and MSCs that do not support (only including PARTID) can share the > > same PARTID space; This seems like it may be problematic on some hardware, as I tried to explain above for point 1. Note though, if the non-Narrowing MSCs only have bitmap-type controls, then sharing the PARTID space is harmless. This comes about because because these controls explicitly allow contention: cache way 0 for example is contended between all the work that is allowed by MPAM to use this cache way. Breaking up the work arbitrarily under different PARTIDs makes no difference in this case: the amount of work allocated to that cache way, and the amount of contention is still the same. > > 3. On the premise of ensuring the (1) point, the number of control > > groups can be maximized, because users can always choose to make a > > control group act as a sub-monitoring group under another control > > group; What do you mean by "control group" here? resctrl's group hierarchy is strict: work is distributed across one or more control groups at the top level, and the work in each control group is further distributed across one or more monitoring groups within that control group. There is no way to repurpose a resctrl control group is a monitoring group under some other control group. Or were you referring to something else here? > > 2) The resctrl core code uses CLOSIDs and RMIDs to identify control > > groups and monitoring groups. If a particular driver wants to > > translate these into other values (reqPARTID, intPARTID, PMG) then it > > can do so, but this mapping logic should be encapsulated in the driver. > > This should be better for maintainability, since the details of the > > remapping will be arch-specific -- and in general not all arches are > > going to require it. With this in mind, I think that changes in the > > resctrl core code would be minimal (perhaps no changes at all). > Yes, maintaining the interface of the resctrl core code unchanged is, > in essence, the (first) important constraint of the current MPAM code. > We try the best to keep all resctrl interfaces and ensure the existing > functionality of x86 RDT. > > The only thing that falls short of being ideal (forgive me), is that > it introduces the sole new function resctrl_arch_alloc_rmid() into the > resctrl code (resctrl_arch_free_rmid() will be optimized away in the next > version, and there are no other new functions any more). > > The resctrl_arch_alloc_rmid() is the result of several restructuring > iterations and it is one of the most critical points in the patch series. I was concerned about the changes in patch 6 for example, where the new function task_belongs_to_ctrl_group() now has to look at more information that just rdtgroup->closid, in order to determine which control group a task belongs to. This is precisely what resctrl_arch_match_closid() is supposed to do, using just the closid. This suggests that the meaning of "closid" in the core code has been changed: if closid is the control group identifier, then each control group should have exactly one closid value. For comparison, you may want to take a look at the top 3 commits of this experimental branch: https://git.gitlab.arm.com/linux-arm/linux-dm/-/commits/mpam/partid-pmg-remap/v0.2/head/?ref_type=heads which attempts to do all the mapping within the MPAM driver instead. Note, the approach is a bit over-complicated and I decided that a simpler approach is needed. But it may help to illustrate what I mean about keeping all the remapping out of the resctrl core code. > > > 3) How should the amount of reqPARTID grouping (your "n" parameter) be > > determined, in general? > > > > As with (1), the right answer may depend on the use case as well as on > > the hardware. > > > > >From my investigations into this, I feel that some configuration > > parameters will probably be needed, at least at boot time. > As mentioned earlier, > Total number of intPARTIDs: n = min(intPARTID-np, PARTID-nnp) > Total number of reqPARTIDs: l = min(reqPARTID-np, PARTID-nnp) > > We maximize the number of control groups because users can always > choose to make a control group act as a sub-monitoring group any time. I'm still not sure what you mean here; see my response on point 3. (I might be misunderstanding something here, but if you can give an illustration then that may help.) > > > 4) If the mapping between reqPARTIDs and (CLOSID,RMID) pairs is static, > > is it necessary to track which reqPARTIDs are in use? Would it be > > simpler to treat all n reqPARTIDs as permanently assigned to the > > corresponding CLOSID? > > > > If reqPARTID usage is not tracked, then every control change on MSCs > > that do not support PARTID Narrowing would need to be replicated across > > all reqPARTIDs corresponding to the affected resctrl control partition. > > But control changes are a relatively rare event, so this approach feels > > acceptable as a way of keeping the driver complexity down. It partly > > depends on how large the "n" parameter can become. > Yes, totally agree. I will try to remove the reqPARTID bitmap and > the resctrl_arch_free_rmid(). As mentioned, this will simplify the code > logic and reduce changes to the resctrl layer code. > > Initially, to reduce the number of IPI interrupt, keep this resource > tracking until now, and I will prioritize optimization for the next > version. > (In fact, the initial version of the patch set was dynamically allocated, > and during the code restructuring process, it was inevitable to retain > some of the original ideas.) > > Best regards, > Zeng Heng > OK; fair enough. This kind of feature could always be re-added later on if it proves to be important for performance in real use-cases, but it is probably best to keep things as simple as possible initially. Cheers ---Dave
On 2024/11/25 23:39, Dave Martin wrote: >> The advantages of doing this are: >> >> 1. There is no need to modify or disrupt the existing resctrl layer >> >> interface, ensuring that each control group has same resource >> >> control functionality; > > I don't think this is guaranteed. > > If there is some MSC that does not have PARTID Narrowing support, and > this MSC has a memory bandwidth control that the MPAM driver exposes > through resctrl, then there is no way to configure that MSC that > exhibits the behaviour that the resctrl user expects. > > For a concrete example: > > Suppose that n=8, and the user asks for P1 to be given 30% of system > memory bandwidth. > > On the affected MSC, P1 maps to eight PARTIDs, each with its own memory > bandwidth regulation. > > If the work that happens to be in M1_1 dominates P1's bandwith > requirment, then PARTID_1_1 needs to be given 30% of total memory bandwidth. > > If the work in P1 is evenly spread across M1_1, M1_2 ... M1_m, then > they would each need to be given 30% / 8 = 3.75% of total memory > bandwidth so that the total allocated bandwidth is 30%. > > But we don't know how memory bandwidth consumption is distributed among > M1_1, M2_2 etc., so there is no way to program the memory bandiwdth > regulation on that MSC that guarantees the expected result of P1 > receiving 30% of the total available bandwidth. > > > This means that on some hardware, a choice needs to be made: should the > MPAM driver hide from resctrl any controls that have this problem, or > should it disable the use of PARTID Narrowing for providing additional > monitoring groups. > > My concern is that the correct choice is likely to be use-case- > dependent. > > Do you have a view on this? I understand your meaning and concerns, and this is indeed a problem. From a software perspective, I think the use cases should be limited. For scenarios where mata does not support narrow-partid, I tend to favor disabling the narrow-partid feature in the driver for such scenarios. From a hardware perspective, MSCs, such as L2/L3, are designed with area considerations in mind and choose to implement the narrow-partid feature. MATA, on the other hand, is located on a different die and does not have similar concerns, often not considering the implementation of the narrow-partid feature, which makes this a rather thorny issue. >> 2. MSCs that support narrow-partid (including intPARTID and reqPARTID) >> >> and MSCs that do not support (only including PARTID) can share the >> >> same PARTID space; > > This seems like it may be problematic on some hardware, as I tried to > explain above for point 1. > > Note though, if the non-Narrowing MSCs only have bitmap-type controls, > then sharing the PARTID space is harmless. This comes about because > because these controls explicitly allow contention: cache way 0 for > example is contended between all the work that is allowed by MPAM to > use this cache way. Breaking up the work arbitrarily under different > PARTIDs makes no difference in this case: the amount of work allocated > to that cache way, and the amount of contention is still the same. > Completely agree. MSCs without the narrow-partid feature, if they only have bitmap-type controls, can be compatible with the shared PARTID space scheme. >> >> 3. On the premise of ensuring the (1) point, the number of control >> >> groups can be maximized, because users can always choose to make a >> >> control group act as a sub-monitoring group under another control >> >> group; > > What do you mean by "control group" here? > > resctrl's group hierarchy is strict: work is distributed across one or > more control groups at the top level, and the work in each control > group is further distributed across one or more monitoring groups > within that control group. > > There is no way to repurpose a resctrl control group is a monitoring > group under some other control group. > > Or were you referring to something else here? > Apologies for my miscommunication. What I meant to say is to use the extra PARTIDs of MSC (which do not support the narrow-partid feature) as an expansion for number of sub-monitoring groups. >>> 2) The resctrl core code uses CLOSIDs and RMIDs to identify control >>> groups and monitoring groups. If a particular driver wants to >>> translate these into other values (reqPARTID, intPARTID, PMG) then it >>> can do so, but this mapping logic should be encapsulated in the driver. >>> This should be better for maintainability, since the details of the >>> remapping will be arch-specific -- and in general not all arches are >>> going to require it. With this in mind, I think that changes in the >>> resctrl core code would be minimal (perhaps no changes at all). > >> Yes, maintaining the interface of the resctrl core code unchanged is, >> in essence, the (first) important constraint of the current MPAM code. >> We try the best to keep all resctrl interfaces and ensure the existing >> functionality of x86 RDT. >> >> The only thing that falls short of being ideal (forgive me), is that >> it introduces the sole new function resctrl_arch_alloc_rmid() into the >> resctrl code (resctrl_arch_free_rmid() will be optimized away in the next >> version, and there are no other new functions any more). >> >> The resctrl_arch_alloc_rmid() is the result of several restructuring >> iterations and it is one of the most critical points in the patch series. > > I was concerned about the changes in patch 6 for example, where the new > function task_belongs_to_ctrl_group() now has to look at more > information that just rdtgroup->closid, in order to determine which > control group a task belongs to. This is precisely what > resctrl_arch_match_closid() is supposed to do, using just the closid. > > This suggests that the meaning of "closid" in the core code has been > changed: if closid is the control group identifier, then each control > group should have exactly one closid value. > > > For comparison, you may want to take a look at the top 3 commits of > this experimental branch: > > https://git.gitlab.arm.com/linux-arm/linux-dm/-/commits/mpam/partid-pmg-remap/v0.2/head/?ref_type=heads > > which attempts to do all the mapping within the MPAM driver instead. > Note, the approach is a bit over-complicated and I decided that a > simpler approach is needed. But it may help to illustrate what I mean > about keeping all the remapping out of the resctrl core code. > > I understand your suggestion. I will consider refactoring the mapping relationships between closid/rmid and partid/reqpartid/intpartid/pmg. In fact, I prepared a simplified version of v2 as v3. But in light of your suggestions, I decide to reconstruct the solution. At present, I'm not sure if I can completely isolated the mapping within the MPAM driver layer only. If my reconstructed version goes smoothly, I will reply ASAP. >>> 4) If the mapping between reqPARTIDs and (CLOSID,RMID) pairs is static, >>> is it necessary to track which reqPARTIDs are in use? Would it be >>> simpler to treat all n reqPARTIDs as permanently assigned to the >>> corresponding CLOSID? >>> >>> If reqPARTID usage is not tracked, then every control change on MSCs >>> that do not support PARTID Narrowing would need to be replicated across >>> all reqPARTIDs corresponding to the affected resctrl control partition. >>> But control changes are a relatively rare event, so this approach feels >>> acceptable as a way of keeping the driver complexity down. It partly >>> depends on how large the "n" parameter can become. >> Yes, totally agree. I will try to remove the reqPARTID bitmap and >> the resctrl_arch_free_rmid(). As mentioned, this will simplify the code >> logic and reduce changes to the resctrl layer code. >> >> Initially, to reduce the number of IPI interrupt, keep this resource >> tracking until now, and I will prioritize optimization for the next >> version. >> (In fact, the initial version of the patch set was dynamically allocated, >> and during the code restructuring process, it was inevitable to retain >> some of the original ideas.) >> >> Best regards, >> Zeng Heng >> > > OK; fair enough. > > This kind of feature could always be re-added later on if it proves to > be important for performance in real use-cases, but it is probably best > to keep things as simple as possible initially. > Many thanks as always for your prompt reply and insightful suggestions. Best Regards, Zeng Heng
© 2016 - 2026 Red Hat, Inc.