Improve the efficiency of calculating the total number of scheduling
domains by using the updated uf_union function, which now returns a
boolean to indicate if a merge occurred. Previously, an additional loop
was needed to count root nodes for distinct groups. With this change,
each successful merge reduces the domain count (ndoms) directly,
eliminating the need for the final loop and enhancing performance.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
Note: Tested with test_cpuset_prs.sh
Side note: I know this optimization provides limited efficiency
improvements in this case, but since the union-find code is in the
library and other users may need group counting in the future, and
the required code change is minimal, I think it's still worthwhile.
kernel/cgroup/cpuset.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a4dd285cdf39..5e9301550d43 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -817,6 +817,8 @@ static int generate_sched_domains(cpumask_var_t **domains,
if (root_load_balance && (csn == 1))
goto single_root_domain;
+ ndoms = csn;
+
for (i = 0; i < csn; i++)
uf_node_init(&csa[i]->node);
@@ -829,17 +831,11 @@ static int generate_sched_domains(cpumask_var_t **domains,
* partition root cpusets.
*/
WARN_ON_ONCE(cgrpv2);
- uf_union(&csa[i]->node, &csa[j]->node);
+ ndoms -= uf_union(&csa[i]->node, &csa[j]->node);
}
}
}
- /* Count the total number of domains */
- for (i = 0; i < csn; i++) {
- if (uf_find(&csa[i]->node) == &csa[i]->node)
- ndoms++;
- }
-
/*
* Now we know how many domains to create.
* Convert <csn, csa> to <ndoms, doms> and populate cpu masks.
--
2.34.1
On 10/7/24 11:28 AM, Kuan-Wei Chiu wrote: > Improve the efficiency of calculating the total number of scheduling > domains by using the updated uf_union function, which now returns a > boolean to indicate if a merge occurred. Previously, an additional loop > was needed to count root nodes for distinct groups. With this change, > each successful merge reduces the domain count (ndoms) directly, > eliminating the need for the final loop and enhancing performance. > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > --- > Note: Tested with test_cpuset_prs.sh > > Side note: I know this optimization provides limited efficiency > improvements in this case, but since the union-find code is in the > library and other users may need group counting in the future, and > the required code change is minimal, I think it's still worthwhile. > > kernel/cgroup/cpuset.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index a4dd285cdf39..5e9301550d43 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -817,6 +817,8 @@ static int generate_sched_domains(cpumask_var_t **domains, > if (root_load_balance && (csn == 1)) > goto single_root_domain; > > + ndoms = csn; > + > for (i = 0; i < csn; i++) > uf_node_init(&csa[i]->node); > > @@ -829,17 +831,11 @@ static int generate_sched_domains(cpumask_var_t **domains, > * partition root cpusets. > */ > WARN_ON_ONCE(cgrpv2); > - uf_union(&csa[i]->node, &csa[j]->node); > + ndoms -= uf_union(&csa[i]->node, &csa[j]->node); You are taking the implicit assumption that a boolean true is casted to int 1. That is the usual practice, but it is not part of the C standard itself though it is for C++. I will be more comfortable with the "if (cond) ndoms++" form. It will also be more clear. Cheers, Longman
On Tue, Oct 08, 2024 at 10:02:23AM -0400, Waiman Long wrote: > On 10/7/24 11:28 AM, Kuan-Wei Chiu wrote: > > Improve the efficiency of calculating the total number of scheduling > > domains by using the updated uf_union function, which now returns a > > boolean to indicate if a merge occurred. Previously, an additional loop > > was needed to count root nodes for distinct groups. With this change, > > each successful merge reduces the domain count (ndoms) directly, > > eliminating the need for the final loop and enhancing performance. > > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> > > --- > > Note: Tested with test_cpuset_prs.sh > > > > Side note: I know this optimization provides limited efficiency > > improvements in this case, but since the union-find code is in the > > library and other users may need group counting in the future, and > > the required code change is minimal, I think it's still worthwhile. > > > > kernel/cgroup/cpuset.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > > index a4dd285cdf39..5e9301550d43 100644 > > --- a/kernel/cgroup/cpuset.c > > +++ b/kernel/cgroup/cpuset.c > > @@ -817,6 +817,8 @@ static int generate_sched_domains(cpumask_var_t **domains, > > if (root_load_balance && (csn == 1)) > > goto single_root_domain; > > + ndoms = csn; > > + > > for (i = 0; i < csn; i++) > > uf_node_init(&csa[i]->node); > > @@ -829,17 +831,11 @@ static int generate_sched_domains(cpumask_var_t **domains, > > * partition root cpusets. > > */ > > WARN_ON_ONCE(cgrpv2); > > - uf_union(&csa[i]->node, &csa[j]->node); > > + ndoms -= uf_union(&csa[i]->node, &csa[j]->node); > > You are taking the implicit assumption that a boolean true is casted to int > 1. That is the usual practice, but it is not part of the C standard itself > though it is for C++. I will be more comfortable with the "if (cond) > ndoms++" form. It will also be more clear. > Thanks for your feedback. I appreciate your point regarding the implicit casting of boolean values. And changing the code to: if (uf_union(&csa[i]->node, &csa[j]->node)) --ndoms; would also enhance clarity and readability. Would you like me to resend v3? I'm asking because I suspect Tejun may want to see more user cases before considering such optimizations. Regards, Kuan-Wei
On 10/8/24 12:45 PM, Kuan-Wei Chiu wrote: > On Tue, Oct 08, 2024 at 10:02:23AM -0400, Waiman Long wrote: >> On 10/7/24 11:28 AM, Kuan-Wei Chiu wrote: >>> Improve the efficiency of calculating the total number of scheduling >>> domains by using the updated uf_union function, which now returns a >>> boolean to indicate if a merge occurred. Previously, an additional loop >>> was needed to count root nodes for distinct groups. With this change, >>> each successful merge reduces the domain count (ndoms) directly, >>> eliminating the need for the final loop and enhancing performance. >>> >>> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> >>> --- >>> Note: Tested with test_cpuset_prs.sh >>> >>> Side note: I know this optimization provides limited efficiency >>> improvements in this case, but since the union-find code is in the >>> library and other users may need group counting in the future, and >>> the required code change is minimal, I think it's still worthwhile. >>> >>> kernel/cgroup/cpuset.c | 10 +++------- >>> 1 file changed, 3 insertions(+), 7 deletions(-) >>> >>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >>> index a4dd285cdf39..5e9301550d43 100644 >>> --- a/kernel/cgroup/cpuset.c >>> +++ b/kernel/cgroup/cpuset.c >>> @@ -817,6 +817,8 @@ static int generate_sched_domains(cpumask_var_t **domains, >>> if (root_load_balance && (csn == 1)) >>> goto single_root_domain; >>> + ndoms = csn; >>> + >>> for (i = 0; i < csn; i++) >>> uf_node_init(&csa[i]->node); >>> @@ -829,17 +831,11 @@ static int generate_sched_domains(cpumask_var_t **domains, >>> * partition root cpusets. >>> */ >>> WARN_ON_ONCE(cgrpv2); >>> - uf_union(&csa[i]->node, &csa[j]->node); >>> + ndoms -= uf_union(&csa[i]->node, &csa[j]->node); >> You are taking the implicit assumption that a boolean true is casted to int >> 1. That is the usual practice, but it is not part of the C standard itself >> though it is for C++. I will be more comfortable with the "if (cond) >> ndoms++" form. It will also be more clear. >> > Thanks for your feedback. I appreciate your point regarding the implicit > casting of boolean values. And changing the code to: > > if (uf_union(&csa[i]->node, &csa[j]->node)) > --ndoms; > > would also enhance clarity and readability. > > Would you like me to resend v3? I'm asking because I suspect Tejun may > want to see more user cases before considering such optimizations. I agreed with Tejun that union-find performance is not that important for the cpuset use case which is also the only use case at the moment. I will support a v3 if you can find another use case where performance is more important. Cheers, Longman
© 2016 - 2024 Red Hat, Inc.