[v3] DT: Enable sharing resources for SMT threads

[PATCH v3 7/7] of: of_cpu_phandle_to_id to support SMT threads

Posted by Alireza Sanaee 9 months ago

Enhance the API to support SMT threads, this will allow sharing
resources among multiple SMT threads.

Enabled the sharing of resources, such as L1 Cache and clocks, between
SMT threads. It introduces a fix that uses thread IDs to match each CPU
thread in the register array within the cpu-node. This ensures that the
cpu-map or any driver relying on this API is fine even when SMT threads
share resources.

Additionally, I have tested this for CPU based on the discussions in
[1], I adopted the new cpu-map layout, where the first parameter is a
phandle and the second is the local thread index, as shown below:

    core0 {
      thread0 {
        cpu = <&cpu0 0>;
      };
      thread1 {
        cpu = <&cpu0 1>;
      };
    };

Also, there are devices such as below that are a bit different.

    arm_dsu@0 {
      compatible = "arm,dsu";
      cpus = <&cpu0 &cpu1 &cpu2 &cpu3>;
    }

In these cases, we can also point to a CPU thread as well like the
following:

    arm_dsu@0 {
      compatible = "arm,dsu";
        cpus = <&cpu0 5 &cpu1 9 &cpu2 1 &cpu3 0>;
    }

It should be possible to know how many arguments a phandle might
require, and this information is encoded in another variable in the dt
called #cpu-cells in cpu node.

Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>

[1] https://lore.kernel.org/devicetree-spec/CAL_JsqK1yqRLD9B+G7UUp=D8K++mXHq0Rmv=1i6DL_jXyZwXAw@mail.gmail.com/
---
 drivers/of/cpu.c | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/of/cpu.c b/drivers/of/cpu.c
index fba17994fc20..cf54ef47f029 100644
--- a/drivers/of/cpu.c
+++ b/drivers/of/cpu.c
@@ -189,16 +189,41 @@ int of_cpu_phandle_to_id(const struct device_node *node,
 			 struct device_node **cpu_np,
 			 uint8_t cpu_idx)
 {
+	bool found = false;
+	int cpu, ret = -1, i, j;
+	uint32_t local_thread, thread_index;
+	struct device_node *np;
+	struct of_phandle_args args;
+	static const char * const phandle_names[] = { "cpus", "cpu" };
+	static const char * const cpu_cells[] = { "#cpu-cells", NULL };
+
 	if (!node)
-		return -1;
+		return ret;
 
-	*cpu_np = of_parse_phandle(node, "cpu", 0);
-	if (!*cpu_np)
-		*cpu_np = of_parse_phandle(node, "cpus", cpu_idx);
-			if (!*cpu_np)
-				return -ENODEV;
+	for (i = 0; i < ARRAY_SIZE(phandle_names); i++) {
+		for (j = 0; j < ARRAY_SIZE(cpu_cells); j++) {
+			ret = of_parse_phandle_with_args(node, phandle_names[i],
+							 cpu_cells[j], cpu_idx,
+							 &args);
+				if (ret >= 0)
+					goto success;
+		}
+	}
 
-	return of_cpu_node_to_id(*cpu_np);
+	if (ret < 0)
+		return ret;
+success:
+	*cpu_np = args.np;
+	thread_index = args.args_count == 1 ? args.args[0] : 0;
+	for_each_possible_cpu(cpu) {
+		np = of_get_cpu_node(cpu, &local_thread);
+		found = (*cpu_np == np) && (local_thread == thread_index);
+		of_node_put(np);
+		if (found)
+			return cpu;
+	}
+
+	return -ENODEV;
 }
 EXPORT_SYMBOL(of_cpu_phandle_to_id);
 
@@ -206,7 +231,7 @@ EXPORT_SYMBOL(of_cpu_phandle_to_id);
  * of_get_cpu_state_node - Get CPU's idle state node at the given index
  *
  * @cpu_node: The device node for the CPU
- * @index: The index in the list of the idle states
+g* @index: The index in the list of the idle states
  *
  * Two generic methods can be used to describe a CPU's idle states, either via
  * a flattened description through the "cpu-idle-states" binding or via the
-- 
2.34.1

Re: [PATCH v3 7/7] of: of_cpu_phandle_to_id to support SMT threads

Posted by Rob Herring 8 months, 1 week ago

On Mon, May 12, 2025 at 09:07:15AM +0100, Alireza Sanaee wrote:
> Enhance the API to support SMT threads, this will allow sharing
> resources among multiple SMT threads.
> 
> Enabled the sharing of resources, such as L1 Cache and clocks, between
> SMT threads. It introduces a fix that uses thread IDs to match each CPU
> thread in the register array within the cpu-node. This ensures that the
> cpu-map or any driver relying on this API is fine even when SMT threads
> share resources.
> 
> Additionally, I have tested this for CPU based on the discussions in
> [1], I adopted the new cpu-map layout, where the first parameter is a
> phandle and the second is the local thread index, as shown below:
> 
>     core0 {
>       thread0 {
>         cpu = <&cpu0 0>;
>       };
>       thread1 {
>         cpu = <&cpu0 1>;
>       };

I think the thread nodes should be omitted in this case.

>     };
> 
> Also, there are devices such as below that are a bit different.
> 
>     arm_dsu@0 {
>       compatible = "arm,dsu";
>       cpus = <&cpu0 &cpu1 &cpu2 &cpu3>;
>     }
> 
> In these cases, we can also point to a CPU thread as well like the
> following:
> 
>     arm_dsu@0 {
>       compatible = "arm,dsu";
>         cpus = <&cpu0 5 &cpu1 9 &cpu2 1 &cpu3 0>;

The purpose of 'cpus' properties is to define CPU affinity. I don't 
think the affinity could ever be different for threads in a core.

And cpu1 having 10 threads is nonsense.

Most cases of 'cpus' (and 'affinity') lookups and then callers of 
of_cpu_node_to_id() ultimately just want to set a cpumask. So we should 
provide that rather than opencoding the same loop everywhere.

>     }
> 
> It should be possible to know how many arguments a phandle might
> require, and this information is encoded in another variable in the dt
> called #cpu-cells in cpu node.
> 
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
> 
> [1] https://lore.kernel.org/devicetree-spec/CAL_JsqK1yqRLD9B+G7UUp=D8K++mXHq0Rmv=1i6DL_jXyZwXAw@mail.gmail.com/
> ---
>  drivers/of/cpu.c | 41 +++++++++++++++++++++++++++++++++--------
>  1 file changed, 33 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/of/cpu.c b/drivers/of/cpu.c
> index fba17994fc20..cf54ef47f029 100644
> --- a/drivers/of/cpu.c
> +++ b/drivers/of/cpu.c
> @@ -189,16 +189,41 @@ int of_cpu_phandle_to_id(const struct device_node *node,
>  			 struct device_node **cpu_np,
>  			 uint8_t cpu_idx)
>  {
> +	bool found = false;
> +	int cpu, ret = -1, i, j;
> +	uint32_t local_thread, thread_index;
> +	struct device_node *np;
> +	struct of_phandle_args args;
> +	static const char * const phandle_names[] = { "cpus", "cpu" };
> +	static const char * const cpu_cells[] = { "#cpu-cells", NULL };
> +
>  	if (!node)
> -		return -1;
> +		return ret;
>  
> -	*cpu_np = of_parse_phandle(node, "cpu", 0);
> -	if (!*cpu_np)
> -		*cpu_np = of_parse_phandle(node, "cpus", cpu_idx);
> -			if (!*cpu_np)
> -				return -ENODEV;
> +	for (i = 0; i < ARRAY_SIZE(phandle_names); i++) {
> +		for (j = 0; j < ARRAY_SIZE(cpu_cells); j++) {
> +			ret = of_parse_phandle_with_args(node, phandle_names[i],
> +							 cpu_cells[j], cpu_idx,
> +							 &args);
> +				if (ret >= 0)
> +					goto success;
> +		}
> +	}
>  
> -	return of_cpu_node_to_id(*cpu_np);
> +	if (ret < 0)
> +		return ret;
> +success:
> +	*cpu_np = args.np;
> +	thread_index = args.args_count == 1 ? args.args[0] : 0;
> +	for_each_possible_cpu(cpu) {
> +		np = of_get_cpu_node(cpu, &local_thread);
> +		found = (*cpu_np == np) && (local_thread == thread_index);
> +		of_node_put(np);
> +		if (found)
> +			return cpu;
> +	}
> +
> +	return -ENODEV;
>  }
>  EXPORT_SYMBOL(of_cpu_phandle_to_id);
>  
> @@ -206,7 +231,7 @@ EXPORT_SYMBOL(of_cpu_phandle_to_id);
>   * of_get_cpu_state_node - Get CPU's idle state node at the given index
>   *
>   * @cpu_node: The device node for the CPU
> - * @index: The index in the list of the idle states
> +g* @index: The index in the list of the idle states

Oops!

>   *
>   * Two generic methods can be used to describe a CPU's idle states, either via
>   * a flattened description through the "cpu-idle-states" binding or via the
> -- 
> 2.34.1
>