[PATCH v2 2/3] selftests/resctrl: Fix a division by zero error on Hygon

Xiaochen Shen posted 3 patches 2 weeks ago
There is a newer version of this series
[PATCH v2 2/3] selftests/resctrl: Fix a division by zero error on Hygon
Posted by Xiaochen Shen 2 weeks ago
Commit

  a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")

introduced the snc_nodes_per_l3_cache() function to detect the Intel
Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs
sharing LLC with CPU0. The function was designed to return:
  (1) >1: SNC mode is enabled.
  (2)  1: SNC mode is not enabled or not supported.

However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually
less than #CPUs in node0. This results in snc_nodes_per_l3_cache()
returning 0 (calculated as cache_cpus / node_cpus).

This leads to a division by zero error in get_cache_size():
  *cache_size /= snc_nodes_per_l3_cache();

Causing the resctrl selftest to fail with:
  "Floating point exception (core dumped)"

Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC
mode is not supported on the platform.

Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
index 195f04c4d158..2b075e7334bf 100644
--- a/tools/testing/selftests/resctrl/resctrlfs.c
+++ b/tools/testing/selftests/resctrl/resctrlfs.c
@@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void)
 		}
 		snc_mode = cache_cpus / node_cpus;
 
+		/*
+		 * On certain Hygon platforms:
+		 * cache_cpus < node_cpus, the calculated snc_mode is 0.
+		 *
+		 * Set snc_mode = 1 to indicate that SNC mode is not
+		 * supported on the platform.
+		 */
+		if (!snc_mode)
+			snc_mode = 1;
+
 		if (snc_mode > 1)
 			ksft_print_msg("SNC-%d mode discovered.\n", snc_mode);
 	}
-- 
2.47.3
Re: [PATCH v2 2/3] selftests/resctrl: Fix a division by zero error on Hygon
Posted by Fenghua Yu 1 week, 6 days ago
Hi, Xiaochen,

On 12/5/25 01:25, Xiaochen Shen wrote:
> Commit
> 
>    a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
> 
> introduced the snc_nodes_per_l3_cache() function to detect the Intel
> Sub-NUMA Clustering (SNC) feature by comparing #CPUs in node0 with #CPUs
> sharing LLC with CPU0. The function was designed to return:
>    (1) >1: SNC mode is enabled.
>    (2)  1: SNC mode is not enabled or not supported.
> 
> However, on certain Hygon CPUs, #CPUs sharing LLC with CPU0 is actually
> less than #CPUs in node0. This results in snc_nodes_per_l3_cache()
> returning 0 (calculated as cache_cpus / node_cpus).
> 
> This leads to a division by zero error in get_cache_size():
>    *cache_size /= snc_nodes_per_l3_cache();
> 
> Causing the resctrl selftest to fail with:
>    "Floating point exception (core dumped)"
> 
> Fix the issue by ensuring snc_nodes_per_l3_cache() returns 1 when SNC
> mode is not supported on the platform.
> 
> Fixes: a1cd99e700ec ("selftests/resctrl: Adjust effective L3 cache size with SNC enabled")
> Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net>
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>   tools/testing/selftests/resctrl/resctrlfs.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/tools/testing/selftests/resctrl/resctrlfs.c b/tools/testing/selftests/resctrl/resctrlfs.c
> index 195f04c4d158..2b075e7334bf 100644
> --- a/tools/testing/selftests/resctrl/resctrlfs.c
> +++ b/tools/testing/selftests/resctrl/resctrlfs.c
> @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void)
>   		}
>   		snc_mode = cache_cpus / node_cpus;
>   
> +		/*
> +		 * On certain Hygon platforms:

nit. This situation could happen on other platforms than Hygon. Maybe 
it's better to have a more generic comment here?
		 * On some platforms (e.g. Hygon),

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

> +		 * cache_cpus < node_cpus, the calculated snc_mode is 0.
> +		 *
> +		 * Set snc_mode = 1 to indicate that SNC mode is not
> +		 * supported on the platform.
> +		 */
> +		if (!snc_mode)
> +			snc_mode = 1;
> +
>   		if (snc_mode > 1)
>   			ksft_print_msg("SNC-%d mode discovered.\n", snc_mode);
>   	}
Thanks.
-Fenghua
Re: [PATCH v2 2/3] selftests/resctrl: Fix a division by zero error on Hygon
Posted by Xiaochen Shen 1 week, 4 days ago
Hi Fenghua,

On 12/6/2025 2:53 AM, Fenghua Yu wrote:
>> @@ -243,6 +243,16 @@ int snc_nodes_per_l3_cache(void)
>>           }
>>           snc_mode = cache_cpus / node_cpus;
>>   +        /*
>> +         * On certain Hygon platforms:
> 
> nit. This situation could happen on other platforms than Hygon. Maybe it's better to have a more generic comment here?
>          * On some platforms (e.g. Hygon),
> 

I will update the comment as you suggested. Thank you!


> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thank you!

> 
>> +         * cache_cpus < node_cpus, the calculated snc_mode is 0.
>> +         *
>> +         * Set snc_mode = 1 to indicate that SNC mode is not
>> +         * supported on the platform.
>> +         */
>> +        if (!snc_mode)
>> +            snc_mode = 1;
>> +
>>           if (snc_mode > 1)
>>               ksft_print_msg("SNC-%d mode discovered.\n", snc_mode);
>>       }
> Thanks.
> -Fenghua


Best regards,
Xiaochen Shen