kernel/sched/topology.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)
The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), even when cpu != i. In such case, cpumask_equal() would always return true, and we can proceed to the next iteration immediately. Valentin Schneider shares on it: PKG can potentially hit that condition, and so can any sched_domain_mask_f that relies on the node masks... I'm thinking ideally we should have checks in place to ensure all node_to_cpumask_map[] masks are disjoint, then we could entirely skip the levels that use these masks in topology_span_sane(), but there's unfortunately no nice way to flag them... Also there would be cases where there's no real difference between PKG and NODE other than NODE is still based on a per-cpu cpumask and PKG isn't, so I don't see a nicer way to go about this. v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ v2: https://lkml.org/lkml/2024/8/7/1299 v3: - add topology_cpumask_equal() helper in #3; - re-use 'cpu' as an iterator int the for_each_cpu() loop; - add proper versioning for all patches. Yury Norov (3): sched/topology: pre-compute topology_span_sane() loop params sched/topology: optimize topology_span_sane() sched/topology: reorganize topology_span_sane() checking order kernel/sched/topology.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) -- 2.43.0
On Mon, Sep 02, 2024 at 11:36:04AM -0700, Yury Norov wrote: > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), > even when cpu != i. In such case, cpumask_equal() would always return > true, and we can proceed to the next iteration immediately. > > Valentin Schneider shares on it: > > PKG can potentially hit that condition, and so can any > sched_domain_mask_f that relies on the node masks... > > I'm thinking ideally we should have checks in place to > ensure all node_to_cpumask_map[] masks are disjoint, > then we could entirely skip the levels that use these > masks in topology_span_sane(), but there's unfortunately > no nice way to flag them... Also there would be cases > where there's no real difference between PKG and NODE > other than NODE is still based on a per-cpu cpumask and > PKG isn't, so I don't see a nicer way to go about this. > > v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ > v2: https://lkml.org/lkml/2024/8/7/1299 > v3: > - add topology_cpumask_equal() helper in #3; > - re-use 'cpu' as an iterator int the for_each_cpu() loop; > - add proper versioning for all patches. > > Yury Norov (3): > sched/topology: pre-compute topology_span_sane() loop params > sched/topology: optimize topology_span_sane() > sched/topology: reorganize topology_span_sane() checking order Why are we doing this? Subject says optimize, but I see no performance numbers?
On Wed, Nov 06, 2024 at 07:06:13PM +0100, Peter Zijlstra wrote: > On Mon, Sep 02, 2024 at 11:36:04AM -0700, Yury Norov wrote: > > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), > > even when cpu != i. In such case, cpumask_equal() would always return > > true, and we can proceed to the next iteration immediately. > > > > Valentin Schneider shares on it: > > > > PKG can potentially hit that condition, and so can any > > sched_domain_mask_f that relies on the node masks... > > > > I'm thinking ideally we should have checks in place to > > ensure all node_to_cpumask_map[] masks are disjoint, > > then we could entirely skip the levels that use these > > masks in topology_span_sane(), but there's unfortunately > > no nice way to flag them... Also there would be cases > > where there's no real difference between PKG and NODE > > other than NODE is still based on a per-cpu cpumask and > > PKG isn't, so I don't see a nicer way to go about this. > > > > v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ > > v2: https://lkml.org/lkml/2024/8/7/1299 > > v3: > > - add topology_cpumask_equal() helper in #3; > > - re-use 'cpu' as an iterator int the for_each_cpu() loop; > > - add proper versioning for all patches. > > > > Yury Norov (3): > > sched/topology: pre-compute topology_span_sane() loop params > > sched/topology: optimize topology_span_sane() > > sched/topology: reorganize topology_span_sane() checking order > > Why are we doing this? Subject says optimize, but I see no performance > numbers? Well, if we have a chance that cpumask_equal() is passed with mask1 == mask2, the new vs old approach is O(1) vs O(N), and this is the case of PKG. This should have performance impact for many-CPUs setups (much more than BITS_PER_LONG). I have no such machine, and can't give you perf numbers. I can drop 'optimization' wording, if you find it confusing without numbers. Thanks, Yury
Ping? On Mon, Sep 02, 2024 at 11:36:04AM -0700, Yury Norov wrote: > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), > even when cpu != i. In such case, cpumask_equal() would always return > true, and we can proceed to the next iteration immediately. > > Valentin Schneider shares on it: > > PKG can potentially hit that condition, and so can any > sched_domain_mask_f that relies on the node masks... > > I'm thinking ideally we should have checks in place to > ensure all node_to_cpumask_map[] masks are disjoint, > then we could entirely skip the levels that use these > masks in topology_span_sane(), but there's unfortunately > no nice way to flag them... Also there would be cases > where there's no real difference between PKG and NODE > other than NODE is still based on a per-cpu cpumask and > PKG isn't, so I don't see a nicer way to go about this. > > v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ > v2: https://lkml.org/lkml/2024/8/7/1299 > v3: > - add topology_cpumask_equal() helper in #3; > - re-use 'cpu' as an iterator int the for_each_cpu() loop; > - add proper versioning for all patches. > > Yury Norov (3): > sched/topology: pre-compute topology_span_sane() loop params > sched/topology: optimize topology_span_sane() > sched/topology: reorganize topology_span_sane() checking order > > kernel/sched/topology.c | 29 +++++++++++++++++++++++++---- > 1 file changed, 25 insertions(+), 4 deletions(-) > > -- > 2.43.0
Ping again? On Sat, Sep 14, 2024 at 09:54:43AM -0700, Yury Norov wrote: > Ping? > > On Mon, Sep 02, 2024 at 11:36:04AM -0700, Yury Norov wrote: > > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), > > even when cpu != i. In such case, cpumask_equal() would always return > > true, and we can proceed to the next iteration immediately. > > > > Valentin Schneider shares on it: > > > > PKG can potentially hit that condition, and so can any > > sched_domain_mask_f that relies on the node masks... > > > > I'm thinking ideally we should have checks in place to > > ensure all node_to_cpumask_map[] masks are disjoint, > > then we could entirely skip the levels that use these > > masks in topology_span_sane(), but there's unfortunately > > no nice way to flag them... Also there would be cases > > where there's no real difference between PKG and NODE > > other than NODE is still based on a per-cpu cpumask and > > PKG isn't, so I don't see a nicer way to go about this. > > > > v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ > > v2: https://lkml.org/lkml/2024/8/7/1299 > > v3: > > - add topology_cpumask_equal() helper in #3; > > - re-use 'cpu' as an iterator int the for_each_cpu() loop; > > - add proper versioning for all patches. > > > > Yury Norov (3): > > sched/topology: pre-compute topology_span_sane() loop params > > sched/topology: optimize topology_span_sane() > > sched/topology: reorganize topology_span_sane() checking order > > > > kernel/sched/topology.c | 29 +++++++++++++++++++++++++---- > > 1 file changed, 25 insertions(+), 4 deletions(-) > > > > -- > > 2.43.0
Last reminder. If you guys don't care, I don't care either. On Mon, Sep 30, 2024 at 12:15:02PM -0700, Yury Norov wrote: > Ping again? > > On Sat, Sep 14, 2024 at 09:54:43AM -0700, Yury Norov wrote: > > Ping? > > > > On Mon, Sep 02, 2024 at 11:36:04AM -0700, Yury Norov wrote: > > > The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), > > > even when cpu != i. In such case, cpumask_equal() would always return > > > true, and we can proceed to the next iteration immediately. > > > > > > Valentin Schneider shares on it: > > > > > > PKG can potentially hit that condition, and so can any > > > sched_domain_mask_f that relies on the node masks... > > > > > > I'm thinking ideally we should have checks in place to > > > ensure all node_to_cpumask_map[] masks are disjoint, > > > then we could entirely skip the levels that use these > > > masks in topology_span_sane(), but there's unfortunately > > > no nice way to flag them... Also there would be cases > > > where there's no real difference between PKG and NODE > > > other than NODE is still based on a per-cpu cpumask and > > > PKG isn't, so I don't see a nicer way to go about this. > > > > > > v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ > > > v2: https://lkml.org/lkml/2024/8/7/1299 > > > v3: > > > - add topology_cpumask_equal() helper in #3; > > > - re-use 'cpu' as an iterator int the for_each_cpu() loop; > > > - add proper versioning for all patches. > > > > > > Yury Norov (3): > > > sched/topology: pre-compute topology_span_sane() loop params > > > sched/topology: optimize topology_span_sane() > > > sched/topology: reorganize topology_span_sane() checking order > > > > > > kernel/sched/topology.c | 29 +++++++++++++++++++++++++---- > > > 1 file changed, 25 insertions(+), 4 deletions(-) > > > > > > -- > > > 2.43.0
Le 06/11/2024 à 17:28, Yury Norov a écrit : > Last reminder. If you guys don't care, I don't care either. Hi Yury, I'm the only one in the To: field, but I'm not a maintainer. Maybe, try to move people in the Cc: field into the To: field, to increase visibility? CJ > > On Mon, Sep 30, 2024 at 12:15:02PM -0700, Yury Norov wrote: >> Ping again? >> >> On Sat, Sep 14, 2024 at 09:54:43AM -0700, Yury Norov wrote: >>> Ping? >>> >>> On Mon, Sep 02, 2024 at 11:36:04AM -0700, Yury Norov wrote: >>>> The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i), >>>> even when cpu != i. In such case, cpumask_equal() would always return >>>> true, and we can proceed to the next iteration immediately. >>>> >>>> Valentin Schneider shares on it: >>>> >>>> PKG can potentially hit that condition, and so can any >>>> sched_domain_mask_f that relies on the node masks... >>>> >>>> I'm thinking ideally we should have checks in place to >>>> ensure all node_to_cpumask_map[] masks are disjoint, >>>> then we could entirely skip the levels that use these >>>> masks in topology_span_sane(), but there's unfortunately >>>> no nice way to flag them... Also there would be cases >>>> where there's no real difference between PKG and NODE >>>> other than NODE is still based on a per-cpu cpumask and >>>> PKG isn't, so I don't see a nicer way to go about this. >>>> >>>> v1: https://lore.kernel.org/lkml/ZrJk00cmVaUIAr4G@yury-ThinkPad/T/ >>>> v2: https://lkml.org/lkml/2024/8/7/1299 >>>> v3: >>>> - add topology_cpumask_equal() helper in #3; >>>> - re-use 'cpu' as an iterator int the for_each_cpu() loop; >>>> - add proper versioning for all patches. >>>> >>>> Yury Norov (3): >>>> sched/topology: pre-compute topology_span_sane() loop params >>>> sched/topology: optimize topology_span_sane() >>>> sched/topology: reorganize topology_span_sane() checking order >>>> >>>> kernel/sched/topology.c | 29 +++++++++++++++++++++++++---- >>>> 1 file changed, 25 insertions(+), 4 deletions(-) >>>> >>>> -- >>>> 2.43.0 > >
On Wed, Nov 06, 2024 at 06:58:23PM +0100, Christophe JAILLET wrote: > Le 06/11/2024 à 17:28, Yury Norov a écrit : > > Last reminder. If you guys don't care, I don't care either. > > Hi Yury, > > I'm the only one in the To: field, but I'm not a maintainer. > > Maybe, try to move people in the Cc: field into the To: field, to increase > visibility? Sure, moving everyone into the To.
© 2016 - 2025 Red Hat, Inc.