[v1] arm64/mpam: Support partial-core boot for MPAM

[PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Zeng Heng 1 month ago

Some MPAM MSCs (like L2 MSC) shares the same power domain with its
associated CPUs. Therefore, in scenarios where only partial cores power
up, the MSCs belonging to the un-powered cores don't need and should not
be accessed, otherwise bus-access fault would occur.

In such non-full core boot scenarios, the MSCs corresponding to offline
CPUs should skip. If the MSC's accessibility mask doesn't contain any
online CPU, this MSC remains uninitialized.

During initialization of class->props, skip any MSC that is not powered
up, so that ensure the class->props member unaffected from uninitialized
vmsc->props in mpam_enable_init_class_features() and
mpam_enable_merge_vmsc_features().

Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
 drivers/resctrl/mpam_devices.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0b5b158e1aaf..488ad2e40f66 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2134,10 +2134,12 @@ static void mpam_enable_init_class_features(struct mpam_class *class)
 	struct mpam_vmsc *vmsc;
 	struct mpam_component *comp;
 
-	comp = list_first_entry(&class->components,
-				struct mpam_component, class_list);
-	vmsc = list_first_entry(&comp->vmsc,
-				struct mpam_vmsc, comp_list);
+	list_for_each_entry(comp, &class->components, class_list) {
+		list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+			if (vmsc->msc->probed)
+				break;
+		}
+	}
 
 	class->props = vmsc->props;
 }
@@ -2149,6 +2151,8 @@ static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
 	struct mpam_class *class = comp->class;
 
 	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (!vmsc->msc->probed)
+			continue;
 		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
 			__vmsc_props_mismatch(vmsc, ris);
 			class->nrdy_usec = max(class->nrdy_usec,
@@ -2620,6 +2624,7 @@ void mpam_disable(struct work_struct *ignored)
  */
 void mpam_enable(struct work_struct *work)
 {
+	cpumask_t mask;
 	static atomic_t once;
 	struct mpam_msc *msc;
 	bool all_devices_probed = true;
@@ -2629,8 +2634,11 @@ void mpam_enable(struct work_struct *work)
 	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
 				 srcu_read_lock_held(&mpam_srcu)) {
 		mutex_lock(&msc->probe_lock);
-		if (!msc->probed)
-			all_devices_probed = false;
+		if (!msc->probed) {
+			cpumask_and(&mask, &msc->accessibility, cpu_online_mask);
+			if (!cpumask_empty(&mask))
+				all_devices_probed = false;
+		}
 		mutex_unlock(&msc->probe_lock);
 
 		if (!all_devices_probed)
-- 
2.25.1

Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Ben Horgan 1 week, 2 days ago

Hi Zeng,

I think I've just managed to whitelist your email address. So, all being
well I'll get your emails in my inbox.

On 1/7/26 03:13, Zeng Heng wrote:
> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
> associated CPUs. Therefore, in scenarios where only partial cores power
> up, the MSCs belonging to the un-powered cores don't need and should not
> be accessed, otherwise bus-access fault would occur.

The MPAM driver intentionally to waits until all MSCs have been
discovered before allowing MPAM to be used so that it can check the
properties of all the MSC and determine the configuration based on full
knowledge. Once a CPU affine with each MSC has been enabled then MPAM
will be enabled and usable.

Suppose we weren't to access all MSCs in an asymmetric configuration.
E.g. if different L2 had different lengths of cache portion bit maps and
MPAM was enabled with only the CPUs with the same L2 then the driver
wouldn't know and we'd end up with a bad configuration which would
become a problem when the other CPUs are eventually turned on.

Hence, I think we should retain the restriction that MPAM is only
enabled once all MSC are probed. Is this a particularly onerous
resctriction for you?

> 
> In such non-full core boot scenarios, the MSCs corresponding to offline
> CPUs should skip. If the MSC's accessibility mask doesn't contain any
> online CPU, this MSC remains uninitialized.
> 
> During initialization of class->props, skip any MSC that is not powered
> up, so that ensure the class->props member unaffected from uninitialized
> vmsc->props in mpam_enable_init_class_features() and
> mpam_enable_merge_vmsc_features().
> 
> Signed-off-by: Zeng Heng <zengheng4@huawei.com>
> ---
>  drivers/resctrl/mpam_devices.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 0b5b158e1aaf..488ad2e40f66 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -2134,10 +2134,12 @@ static void mpam_enable_init_class_features(struct mpam_class *class)
>  	struct mpam_vmsc *vmsc;
>  	struct mpam_component *comp;
>  
> -	comp = list_first_entry(&class->components,
> -				struct mpam_component, class_list);
> -	vmsc = list_first_entry(&comp->vmsc,
> -				struct mpam_vmsc, comp_list);
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +			if (vmsc->msc->probed)
> +				break;
> +		}
> +	}
>  
>  	class->props = vmsc->props;
>  }
> @@ -2149,6 +2151,8 @@ static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
>  	struct mpam_class *class = comp->class;
>  
>  	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (!vmsc->msc->probed)
> +			continue;
>  		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
>  			__vmsc_props_mismatch(vmsc, ris);
>  			class->nrdy_usec = max(class->nrdy_usec,
> @@ -2620,6 +2624,7 @@ void mpam_disable(struct work_struct *ignored)
>   */
>  void mpam_enable(struct work_struct *work)
>  {
> +	cpumask_t mask;
>  	static atomic_t once;
>  	struct mpam_msc *msc;
>  	bool all_devices_probed = true;
> @@ -2629,8 +2634,11 @@ void mpam_enable(struct work_struct *work)
>  	list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
>  				 srcu_read_lock_held(&mpam_srcu)) {
>  		mutex_lock(&msc->probe_lock);
> -		if (!msc->probed)
> -			all_devices_probed = false;
> +		if (!msc->probed) {
> +			cpumask_and(&mask, &msc->accessibility, cpu_online_mask);
> +			if (!cpumask_empty(&mask))
> +				all_devices_probed = false;
> +		}
>  		mutex_unlock(&msc->probe_lock);
>  
>  		if (!all_devices_probed)

Thanks,

Ben

Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Zeng Heng 5 days, 20 hours ago

On 2026/1/29 18:11, Ben Horgan wrote:
> Hi Zeng,
> 
> I think I've just managed to whitelist your email address. So, all being
> well I'll get your emails in my inbox.
> 
> On 1/7/26 03:13, Zeng Heng wrote:
>> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
>> associated CPUs. Therefore, in scenarios where only partial cores power
>> up, the MSCs belonging to the un-powered cores don't need and should not
>> be accessed, otherwise bus-access fault would occur.
> 
> The MPAM driver intentionally to waits until all MSCs have been
> discovered before allowing MPAM to be used so that it can check the
> properties of all the MSC and determine the configuration based on full
> knowledge. Once a CPU affine with each MSC has been enabled then MPAM
> will be enabled and usable.
> 
> Suppose we weren't to access all MSCs in an asymmetric configuration.
> E.g. if different L2 had different lengths of cache portion bit maps and
> MPAM was enabled with only the CPUs with the same L2 then the driver
> wouldn't know and we'd end up with a bad configuration which would
> become a problem when the other CPUs are eventually turned on.
> 
> Hence, I think we should retain the restriction that MPAM is only
> enabled once all MSC are probed. Is this a particularly onerous
> resctriction for you?
> 

I have no objection to the restriction that "MPAM is only enabled once
all MSC are probed." This constraint ensures the driver has complete
knowledge of all Memory System Components before establishing the
configuration.

However, this patch is specifically designed to address CPU core
isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
line parameter).

The patch allows the MPAM driver to successfully complete the
initialization of online MSCs even when the system is booted with
certain cores isolated or disabled. The patch ensures that MPAM
initialization is decoupled from the requirement that all CPUs must be
online during the probing phase.

CPU core isolation is indeed a common production scenario. This
functionality requires the kernel to enable functionalities in the
presence of faulty cores (which cannot be recovered through cold boot).
This ensures system reliability and availability on multi-core
processors where single-core faults.

Without this patch would prevent MPAM from initialization under CPU core
isolation scenarios. Apologies for not mentioning in the patch: we can
verify the functionality by adding 'maxcpus=1' to the boot parameters.

Please let me know if you have any further questions or concerns.

Best Regards,
Zeng Heng

Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Zeng Heng 5 days, 19 hours ago


On 2026/2/2 16:41, Zeng Heng wrote:
> 
> 
> On 2026/1/29 18:11, Ben Horgan wrote:
>> Hi Zeng,
>>
>> I think I've just managed to whitelist your email address. So, all being
>> well I'll get your emails in my inbox.
>>
>> On 1/7/26 03:13, Zeng Heng wrote:
>>> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
>>> associated CPUs. Therefore, in scenarios where only partial cores power
>>> up, the MSCs belonging to the un-powered cores don't need and should not
>>> be accessed, otherwise bus-access fault would occur.
>>
>> The MPAM driver intentionally to waits until all MSCs have been
>> discovered before allowing MPAM to be used so that it can check the
>> properties of all the MSC and determine the configuration based on full
>> knowledge. Once a CPU affine with each MSC has been enabled then MPAM
>> will be enabled and usable.
>>
>> Suppose we weren't to access all MSCs in an asymmetric configuration.
>> E.g. if different L2 had different lengths of cache portion bit maps and
>> MPAM was enabled with only the CPUs with the same L2 then the driver
>> wouldn't know and we'd end up with a bad configuration which would
>> become a problem when the other CPUs are eventually turned on.
>>
>> Hence, I think we should retain the restriction that MPAM is only
>> enabled once all MSC are probed. Is this a particularly onerous
>> resctriction for you?
>>
> 
> I have no objection to the restriction that "MPAM is only enabled once
> all MSC are probed." This constraint ensures the driver has complete
> knowledge of all Memory System Components before establishing the
> configuration.
> 
> 
> However, this patch is specifically designed to address CPU core
> isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
> line parameter).
> 
> The patch allows the MPAM driver to successfully complete the
> initialization of online MSCs even when the system is booted with
> certain cores isolated or disabled. The patch ensures that MPAM
> initialization is decoupled from the requirement that all CPUs must be
> online during the probing phase.
> 
> CPU core isolation is indeed a common production scenario. This
> functionality requires the kernel to enable functionalities in the
> presence of faulty cores (which cannot be recovered through cold boot).
> This ensures system reliability and availability on multi-core
> processors where single-core faults.
> 
> Without this patch would prevent MPAM from initialization under CPU core
> isolation scenarios. Apologies for not mentioning in the patch: we can
> verify the functionality by adding 'maxcpus=1' to the boot parameters.
> 
> Please let me know if you have any further questions or concerns.
> 
> 
> Best Regards,
> Zeng Heng
> 
> 

My platform consists of 12 clusters, each containing 16 CPU cores. Under 
normal boot conditions, the schemata is as follows:

  # mount -t resctrl l resctrl /sys/fs/resctrl/
  # cat /sys/fs/resctrl/schemata
  L3:1=1ffff;26=1ffff;51=1ffff;76=1ffff;101=1ffff;126=1ffff;151=1ffff;
     176=1ffff;201=1ffff;226=1ffff;251=1ffff;276=1ffff

Adding 'maxcpus=1' to the boot parameters:
Without this patch, MPAM initialization fails.
With the patch, MPAM initialization succeeds, the number of clusters
matches expectations, and the schemata is as follows:

   # mount -t resctrl l resctrl /sys/fs/resctrl/
   # cat schemata
   L3:1=1ffff


Thanks,
Zeng Heng

Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Ben Horgan 5 days, 17 hours ago

Hi Zeng,

On 2/2/26 09:16, Zeng Heng wrote:
> 
> 
> On 2026/2/2 16:41, Zeng Heng wrote:
>>
>>
>> On 2026/1/29 18:11, Ben Horgan wrote:
>>> Hi Zeng,
>>>
>>> I think I've just managed to whitelist your email address. So, all being
>>> well I'll get your emails in my inbox.
>>>
>>> On 1/7/26 03:13, Zeng Heng wrote:
>>>> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
>>>> associated CPUs. Therefore, in scenarios where only partial cores power
>>>> up, the MSCs belonging to the un-powered cores don't need and should
>>>> not
>>>> be accessed, otherwise bus-access fault would occur.
>>>
>>> The MPAM driver intentionally to waits until all MSCs have been
>>> discovered before allowing MPAM to be used so that it can check the
>>> properties of all the MSC and determine the configuration based on full
>>> knowledge. Once a CPU affine with each MSC has been enabled then MPAM
>>> will be enabled and usable.
>>>
>>> Suppose we weren't to access all MSCs in an asymmetric configuration.
>>> E.g. if different L2 had different lengths of cache portion bit maps and
>>> MPAM was enabled with only the CPUs with the same L2 then the driver
>>> wouldn't know and we'd end up with a bad configuration which would
>>> become a problem when the other CPUs are eventually turned on.
>>>
>>> Hence, I think we should retain the restriction that MPAM is only
>>> enabled once all MSC are probed. Is this a particularly onerous
>>> resctriction for you?
>>>
>>
>> I have no objection to the restriction that "MPAM is only enabled once
>> all MSC are probed." This constraint ensures the driver has complete
>> knowledge of all Memory System Components before establishing the
>> configuration.
>>
>>
>> However, this patch is specifically designed to address CPU core
>> isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
>> line parameter).

In the isolation scenario are you for some cpus, enabling MPAM, using
those cpus but not taking into account the parameters of the associated MSC?

>>
>> The patch allows the MPAM driver to successfully complete the
>> initialization of online MSCs even when the system is booted with
>> certain cores isolated or disabled. The patch ensures that MPAM
>> initialization is decoupled from the requirement that all CPUs must be
>> online during the probing phase.
>>
>> CPU core isolation is indeed a common production scenario. This
>> functionality requires the kernel to enable functionalities in the
>> presence of faulty cores (which cannot be recovered through cold boot).
>> This ensures system reliability and availability on multi-core
>> processors where single-core faults.
>>
>> Without this patch would prevent MPAM from initialization under CPU core
>> isolation scenarios. Apologies for not mentioning in the patch: we can
>> verify the functionality by adding 'maxcpus=1' to the boot parameters.

For 'maxcpus=1' I think the correct behaviour is to not enable MPAM as
the other CPUs can then be turned on afterwards. E.g by
echo 1 > /sys/devices/system/cpu/cpuX/online

For faulty cores how would you ensure they are never turned on?

>>
>> Please let me know if you have any further questions or concerns.
>>
>>
>> Best Regards,
>> Zeng Heng
>>
>>
> 
> My platform consists of 12 clusters, each containing 16 CPU cores. Under
> normal boot conditions, the schemata is as follows:

Thank you for sharing this information.

> 
>  # mount -t resctrl l resctrl /sys/fs/resctrl/
>  # cat /sys/fs/resctrl/schemata
>  L3:1=1ffff;26=1ffff;51=1ffff;76=1ffff;101=1ffff;126=1ffff;151=1ffff;
>     176=1ffff;201=1ffff;226=1ffff;251=1ffff;276=1ffff
> 
> Adding 'maxcpus=1' to the boot parameters:
> Without this patch, MPAM initialization fails.
> With the patch, MPAM initialization succeeds, the number of clusters
> matches expectations, and the schemata is as follows:
> 
>   # mount -t resctrl l resctrl /sys/fs/resctrl/
>   # cat schemata
>   L3:1=1ffff
> 
> 
> Thanks,
> Zeng Heng
> 


Thanks,

Ben

Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Zeng Heng 5 days, 16 hours ago

Hi Ben,

On 2026/2/2 19:34, Ben Horgan wrote:
> Hi Zeng,
> 
> On 2/2/26 09:16, Zeng Heng wrote:
>>
>>
>> On 2026/2/2 16:41, Zeng Heng wrote:
>>>
>>>
>>> On 2026/1/29 18:11, Ben Horgan wrote:
>>>> Hi Zeng,
>>>>
>>>> I think I've just managed to whitelist your email address. So, all being
>>>> well I'll get your emails in my inbox.
>>>>
>>>> On 1/7/26 03:13, Zeng Heng wrote:
>>>>> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
>>>>> associated CPUs. Therefore, in scenarios where only partial cores power
>>>>> up, the MSCs belonging to the un-powered cores don't need and should
>>>>> not
>>>>> be accessed, otherwise bus-access fault would occur.
>>>>
>>>> The MPAM driver intentionally to waits until all MSCs have been
>>>> discovered before allowing MPAM to be used so that it can check the
>>>> properties of all the MSC and determine the configuration based on full
>>>> knowledge. Once a CPU affine with each MSC has been enabled then MPAM
>>>> will be enabled and usable.
>>>>
>>>> Suppose we weren't to access all MSCs in an asymmetric configuration.
>>>> E.g. if different L2 had different lengths of cache portion bit maps and
>>>> MPAM was enabled with only the CPUs with the same L2 then the driver
>>>> wouldn't know and we'd end up with a bad configuration which would
>>>> become a problem when the other CPUs are eventually turned on.
>>>>
>>>> Hence, I think we should retain the restriction that MPAM is only
>>>> enabled once all MSC are probed. Is this a particularly onerous
>>>> resctriction for you?
>>>>
>>>
>>> I have no objection to the restriction that "MPAM is only enabled once
>>> all MSC are probed." This constraint ensures the driver has complete
>>> knowledge of all Memory System Components before establishing the
>>> configuration.
>>>
>>>
>>> However, this patch is specifically designed to address CPU core
>>> isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
>>> line parameter).
> 
> In the isolation scenario are you for some cpus, enabling MPAM, using
> those cpus but not taking into account the parameters of the associated MSC?

In the CPU core isolation scenario, the CPU affinity information of MSC
must be reported. In fact, ACPI MPAM table has already designed a
mechanism for reporting the affinity information of each MSC instance.

Through the "Hardware ID of linked device" and "Instance ID of linked
device" fields, the container and container ID to which the MSC belongs
are specified respectively, thereby obtaining the MSC affinity
information.

The kernel is responsible for parsing this information and determines
which MSCs should be initialized based on the currently online CPUs.

> 
>>>
>>> The patch allows the MPAM driver to successfully complete the
>>> initialization of online MSCs even when the system is booted with
>>> certain cores isolated or disabled. The patch ensures that MPAM
>>> initialization is decoupled from the requirement that all CPUs must be
>>> online during the probing phase.
>>>
>>> CPU core isolation is indeed a common production scenario. This
>>> functionality requires the kernel to enable functionalities in the
>>> presence of faulty cores (which cannot be recovered through cold boot).
>>> This ensures system reliability and availability on multi-core
>>> processors where single-core faults.
>>>
>>> Without this patch would prevent MPAM from initialization under CPU core
>>> isolation scenarios. Apologies for not mentioning in the patch: we can
>>> verify the functionality by adding 'maxcpus=1' to the boot parameters.
> 
> For 'maxcpus=1' I think the correct behaviour is to not enable MPAM as
> the other CPUs can then be turned on afterwards. E.g by
> echo 1 > /sys/devices/system/cpu/cpuX/online
> 
> For faulty cores how would you ensure they are never turned on?
> 

The maxcpus=1 is merely an extreme simulation scenario. In production
environments, detected faulty cores have already been disabled by the
BIOS firmware and cannot be brought online again.


Thanks,
Zeng Heng

Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

Posted by Zeng Heng 4 days, 19 hours ago


On 2026/2/2 20:46, Zeng Heng wrote:
> Hi Ben,
> 
> On 2026/2/2 19:34, Ben Horgan wrote:
>> Hi Zeng,
>>
>> On 2/2/26 09:16, Zeng Heng wrote:
>>>
>>>
>>> On 2026/2/2 16:41, Zeng Heng wrote:
>>>>
>>>>
>>>> On 2026/1/29 18:11, Ben Horgan wrote:
>>>>> Hi Zeng,
>>>>>
>>>>> I think I've just managed to whitelist your email address. So, all 
>>>>> being
>>>>> well I'll get your emails in my inbox.
>>>>>
>>>>> On 1/7/26 03:13, Zeng Heng wrote:
>>>>>> Some MPAM MSCs (like L2 MSC) shares the same power domain with its
>>>>>> associated CPUs. Therefore, in scenarios where only partial cores 
>>>>>> power
>>>>>> up, the MSCs belonging to the un-powered cores don't need and should
>>>>>> not
>>>>>> be accessed, otherwise bus-access fault would occur.
>>>>>
>>>>> The MPAM driver intentionally to waits until all MSCs have been
>>>>> discovered before allowing MPAM to be used so that it can check the
>>>>> properties of all the MSC and determine the configuration based on 
>>>>> full
>>>>> knowledge. Once a CPU affine with each MSC has been enabled then MPAM
>>>>> will be enabled and usable.
>>>>>
>>>>> Suppose we weren't to access all MSCs in an asymmetric configuration.
>>>>> E.g. if different L2 had different lengths of cache portion bit 
>>>>> maps and
>>>>> MPAM was enabled with only the CPUs with the same L2 then the driver
>>>>> wouldn't know and we'd end up with a bad configuration which would
>>>>> become a problem when the other CPUs are eventually turned on.
>>>>>
>>>>> Hence, I think we should retain the restriction that MPAM is only
>>>>> enabled once all MSC are probed. Is this a particularly onerous
>>>>> resctriction for you?
>>>>>
>>>>
>>>> I have no objection to the restriction that "MPAM is only enabled once
>>>> all MSC are probed." This constraint ensures the driver has complete
>>>> knowledge of all Memory System Components before establishing the
>>>> configuration.
>>>>
>>>>
>>>> However, this patch is specifically designed to address CPU core
>>>> isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
>>>> line parameter).
>>
>> In the isolation scenario are you for some cpus, enabling MPAM, using
>> those cpus but not taking into account the parameters of the 
>> associated MSC?
> 
> In the CPU core isolation scenario, the CPU affinity information of MSC
> must be reported. In fact, ACPI MPAM table has already designed a
> mechanism for reporting the affinity information of each MSC instance.
> 
> Through the "Hardware ID of linked device" and "Instance ID of linked
> device" fields, the container and container ID to which the MSC belongs
> are specified respectively, thereby obtaining the MSC affinity
> information.
> 
> The kernel is responsible for parsing this information and determines
> which MSCs should be initialized based on the currently online CPUs.
> 
>>
>>>>
>>>> The patch allows the MPAM driver to successfully complete the
>>>> initialization of online MSCs even when the system is booted with
>>>> certain cores isolated or disabled. The patch ensures that MPAM
>>>> initialization is decoupled from the requirement that all CPUs must be
>>>> online during the probing phase.
>>>>
>>>> CPU core isolation is indeed a common production scenario. This
>>>> functionality requires the kernel to enable functionalities in the
>>>> presence of faulty cores (which cannot be recovered through cold boot).
>>>> This ensures system reliability and availability on multi-core
>>>> processors where single-core faults.
>>>>
>>>> Without this patch would prevent MPAM from initialization under CPU 
>>>> core
>>>> isolation scenarios. Apologies for not mentioning in the patch: we can
>>>> verify the functionality by adding 'maxcpus=1' to the boot parameters.
>>
>> For 'maxcpus=1' I think the correct behaviour is to not enable MPAM as
>> the other CPUs can then be turned on afterwards. E.g by
>> echo 1 > /sys/devices/system/cpu/cpuX/online
>>
>> For faulty cores how would you ensure they are never turned on?
>>
> 
> The maxcpus=1 is merely an extreme simulation scenario. In production
> environments, detected faulty cores have already been disabled by the
> BIOS firmware and cannot be brought online again.
> 
> 

Even the faulty cores or offline CPUs are turned on, the patch does not
affect the automatic recovery and bring-up of MPAM MSCs.

Adding 'maxcpus=1' to the boot parameters, and testing with the patch
applied is as follows:

  # mount -t resctrl l resctrl /sys/fs/resctrl/
  # cat /sys/fs/resctrl/schemata
  L2:4=ff
  L3:1=1ffff

  # echo 1 > /sys/devices/system/cpu/cpu2/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff;7=ff
  L3:1=1ffff

  # echo 1 > /sys/devices/system/cpu/cpu16/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff;7=ff;29=ff 

  L3:1=1ffff;26=1ffff

  # echo 0 > /sys/devices/system/cpu/cpu16/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff;7=ff
  L3:1=1ffff

  # echo 0 > /sys/devices/system/cpu/cpu2/online
  # cat /sys/fs/resctrl/schemata
  L2:4=ff
  L3:1=1ffff



Best Regards,
Zeng Heng