kernel/sched/topology.c | 73 +++++++++++++++++++++++++++-------------- 1 file changed, 48 insertions(+), 25 deletions(-)
toplogy_span_sane() has an O(N^2) algorithm that takes an inordinate
amount of time on systems with a large number of cpus.
The first patch in this series replaces the algorithm used with a O(N)
method that should exactly duplicate the previous code's results.
The second patch simplifies the first, taking a similar amount of time
to run, but potentially has different results than previous code under
situations believed to not truly exist, like a CPU not being included
in its own span.
Version 1:
* Original patch
Version 2:
* Adopted simplifications from K Prateek Nayak,and fixed use of
num_possible_cpus().
Version 3:
* Undid the simplifications from version 2 when noticed that results
could differ from original code; kept num_possible_cpus() fix.
Version 4:
* Turned the patch into a series of 2, the second re-introduces the
simplifications, and includes further simplification suggested by
Valentin Schneider in the discussion for Version 2.
Steve Wahl (2):
sched/topology: improve topology_span_sane speed
sched/topology: Refinement to topology_span_sane speedup
kernel/sched/topology.c | 73 +++++++++++++++++++++++++++--------------
1 file changed, 48 insertions(+), 25 deletions(-)
--
2.26.2
Hi Steve, On 04/03/25 21:38, Steve Wahl wrote: > toplogy_span_sane() has an O(N^2) algorithm that takes an inordinate > amount of time on systems with a large number of cpus. > > The first patch in this series replaces the algorithm used with a O(N) > method that should exactly duplicate the previous code's results. > > The second patch simplifies the first, taking a similar amount of time > to run, but potentially has different results than previous code under > situations believed to not truly exist, like a CPU not being included > in its own span. I have reviewed the proposed approach for the topology sanity check and it looks good to me. I have also tested the patch on a Power10 system with 12 cores (96 CPUs). The average CPU hotplug latency decreased by around 10%. Therefore, Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Thanks, Madadi Vineeth Reddy > > Version 1: > * Original patch > > Version 2: > > * Adopted simplifications from K Prateek Nayak,and fixed use of > num_possible_cpus(). > > Version 3: > > * Undid the simplifications from version 2 when noticed that results > could differ from original code; kept num_possible_cpus() fix. > > Version 4: > > * Turned the patch into a series of 2, the second re-introduces the > simplifications, and includes further simplification suggested by > Valentin Schneider in the discussion for Version 2. > > Steve Wahl (2): > sched/topology: improve topology_span_sane speed > sched/topology: Refinement to topology_span_sane speedup > > kernel/sched/topology.c | 73 +++++++++++++++++++++++++++-------------- > 1 file changed, 48 insertions(+), 25 deletions(-) >
On 04/03/25 10:08, Steve Wahl wrote:
> toplogy_span_sane() has an O(N^2) algorithm that takes an inordinate
> amount of time on systems with a large number of cpus.
>
> The first patch in this series replaces the algorithm used with a O(N)
> method that should exactly duplicate the previous code's results.
>
> The second patch simplifies the first, taking a similar amount of time
> to run, but potentially has different results than previous code under
> situations believed to not truly exist, like a CPU not being included
> in its own span.
>
Had to hack up arch_topology.c some more to replicate the setup described
in
ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
but eventually go there, and it was correctly caught by topology_span_sane().
Thanks!
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Tested-by: Valentin Schneider <vschneid@redhat.com>
Hello Steve, On 3/4/2025 9:38 PM, Steve Wahl wrote: > toplogy_span_sane() has an O(N^2) algorithm that takes an inordinate > amount of time on systems with a large number of cpus. > > The first patch in this series replaces the algorithm used with a O(N) > method that should exactly duplicate the previous code's results. > > The second patch simplifies the first, taking a similar amount of time > to run, but potentially has different results than previous code under > situations believed to not truly exist, like a CPU not being included > in its own span. I've tested Patch 1 individually and the whole series as is on top of tip:sched/core and I haven't run into any issues with the optimization on my 3rd Generation EPYC system. Please feel free to include: Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> -- Thanks and Regards, Prateek > > Version 1: > * Original patch > > Version 2: > > * Adopted simplifications from K Prateek Nayak,and fixed use of > num_possible_cpus(). > > Version 3: > > * Undid the simplifications from version 2 when noticed that results > could differ from original code; kept num_possible_cpus() fix. > > Version 4: > > * Turned the patch into a series of 2, the second re-introduces the > simplifications, and includes further simplification suggested by > Valentin Schneider in the discussion for Version 2. > > Steve Wahl (2): > sched/topology: improve topology_span_sane speed > sched/topology: Refinement to topology_span_sane speedup > > kernel/sched/topology.c | 73 +++++++++++++++++++++++++++-------------- > 1 file changed, 48 insertions(+), 25 deletions(-) >
© 2016 - 2026 Red Hat, Inc.