From nobody Wed Jun 17 07:14:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88E1B387371 for ; Thu, 23 Apr 2026 16:54:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963243; cv=none; b=fyR6OiwW/Mba7qX7ctM2x5BibXZmgFM8qpltynl6YaVZanXBI9DnoSIeCfDajvFBD63eVsfrEQcxduHShf/Mv2EKddJQEL2C4tR0KX2O6tdwx6SzAIZL9kCpjqeXp0MzsY9OtXIQwA/GaLwoWVKaBk34/1pezH2/PxR1+f+ZiSQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963243; c=relaxed/simple; bh=V/XDEJJ0CbL0wAHwWiZrznqbMPfKzegrL/De6HZ9vM0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t85fu9g2YYhZx+rgVrtt/9bw/b9Y6Njlt+zfSUmXalywC5hDGmbOZMnpwb6I3YJKLv+0SfgwKTFR075PkfQTHud85MpraSbbBh9K/dRshM4M0BWnmSAiJrVeqahZJoE2xX4GspSt4BsxYoNC0Zaqk841m7qqWKEOeATTLUNkazk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UGXZclHK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UGXZclHK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EFC67C2BCB4; Thu, 23 Apr 2026 16:54:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776963243; bh=V/XDEJJ0CbL0wAHwWiZrznqbMPfKzegrL/De6HZ9vM0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UGXZclHKQoej+/PC3lGbDqhu8T394cztMc8NA0UUCjN4CxDi6iuyhZv2Qie85zw3J roiYMWO4tASyn+ShGsgJiSSUoOxsiT014yG66WMGWMhA2Zm/CVkIspP2pJc6g4RH3L j0G61v2CmuDVJX5HIXJzoB789k6zwovd6xtYK16hekyvDAqRVvjk4zvhkZX49r/JWK VWqP8iXQWErxlih0IaxzucPv7aZ+MUaX5FEkGNrv/1EWusCQDKWri/5TwObKVLT8B/ GMvjQ6bslDW25B/hJEQmmFq1UU1dmAsay4JGqwEOjLmDJph4BSI/tpJoVSijqORqEl DjMonKp42zQQg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Anna-Maria Behnsen , Sehee Jeong , Thomas Gleixner Subject: [PATCH 1/6] timers/migration: Fix another hotplug activation race Date: Thu, 23 Apr 2026 18:53:49 +0200 Message-ID: <20260423165354.95152-2-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260423165354.95152-1-frederic@kernel.org> References: <20260423165354.95152-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The hotplug control CPU is assumed to be active in the hierarchy but that doesn't imply that the root is active. If the current CPU is not the one that activated the current hierarchy, and the CPU performing this duty is still halfway through the tree, the root may still be observed inactive. And this can break the activation of a new root as in the following scenario: 1) Initially, the whole system has 64 CPUs and only CPU 63 is awake. [GRP1:0] active / | \ / | \ [GRP0:0] [...] [GRP0:7] idle idle active / | \ | CPU 0 CPU 1 ... CPU 63 idle idle active 2) CPU 63 goes idle _but_ due to a #VMEXIT it hasn't yet reached the [GRP1:0]->parent dereference (that would be NULL and stop the walk) in __walk_groups_from(). [GRP1:0] idle / | \ / | \ [GRP0:0] [...] [GRP0:7] idle idle idle / | \ | CPU 0 CPU 1 ... CPU 63 idle idle idle 3) CPU 1 wakes up, activates GRP0:0 but didn't yet manage to propagate up to GRP1:0 due to yet another #VMEXIT. [GRP1:0] idle / | \ / | \ [GRP0:0] [...] [GRP0:7] active idle idle / | \ | CPU 0 CPU 1 ... CPU 63 idle active idle 3) CPU 0 wakes up and doesn't need to walk above GRP0:0 as it's CPU 1 role. [GRP1:0] idle / | \ / | \ [GRP0:0] [...] [GRP0:7] active idle idle / | \ | CPU 0 CPU 1 ... CPU 63 active active idle 4) CPU 0 boots CPU 64. It creates a new root for it. [GRP2:0] idle / \ / \ [GRP1:0] [GRP1:1] idle idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline 5) CPU 0 activates the new root, but note that GRP1:0 is still idle, waiting for CPU 1 to resume from #VMEXIT and activate it. [GRP2:0] active / \ / \ [GRP1:0] [GRP1:1] idle idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline 6) CPU 63 resumes after #VMEXIT and sees the new GRP1:0 parent. Therefore it propagates the stale inactive state of GRP1:0 up to GRP2:0. [GRP2:0] idle / \ / \ [GRP1:0] [GRP1:1] idle idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline 7) CPU 1 resumes after #VMEXIT and finally activates GRP1:0. But it doesn't observe its parent link because no ordering enforced that. Therefore GRP2:0 is spuriously left idle. [GRP2:0] idle / \ / \ [GRP1:0] [GRP1:1] active idle / | \ \ / | \ \ [GRP0:0] [...] [GRP0:7] [GRP0:8] active idle idle idle / | \ | | CPU 0 CPU 1 ... CPU 63 CPU 64 active active idle offline Such races are highly theoretical and the problem would solve itself once the old root ever becomes idle again. But it still leaves a taste of discomfort. Fix it with enforcing a fully ordered atomic read of the old root state before propagating the activate state up to the new root. It has a two directions ordering effect: * Acquire + release of the latest old root state: If the hotplug control CPU is not the one that woke up the old root, make sure to acquire its active state and propagate it upwards through the ordered chain of activation (the acquire pairs with the cmpxchg() in tmigr_active_up() and subsequent releases will pair with atomic_read_acquire() and smp_mb__after_atomic() in tmigr_inactive_up()). * Release: If the hotplug control CPU is not the one that must wake up the old root, but the CPU covering that is lagging behind its duty, publish the links from the old root to the new parents. This way the lagging CPU will propagate the active state itself. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Signed-off-by: Frederic Weisbecker --- kernel/time/timer_migration.c | 40 +++++++++++++++++++++++++---------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 155eeaea4113..1d0d3a4058d5 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -1860,19 +1860,37 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node, * child to the new parents. So tmigr_active_up() activates the * new parents while walking up from the old root to the new. * - * * It is ensured that @start is active, as this setup path is - * executed in hotplug prepare callback. This is executed by an - * already connected and !idle CPU. Even if all other CPUs go idle, - * the CPU executing the setup will be responsible up to current top - * level group. And the next time it goes inactive, it will release - * the new childmask and parent to subsequent walkers through this - * @child. Therefore propagate active state unconditionally. + * * It is ensured that @start is active, (or on the way to be activated + * by another CPU that woke up before the current one) as this setup p= ath + * is executed in hotplug prepare callback. This is executed by an alr= eady + * connected and !idle CPU in the hierarchy. + * + * * The below RmW atomic operation ensures that: + * + * 1) If the old root has been completely activated, the latest state = is + * acquired (the below implicit acquire pairs with the implicit rel= ease + * from cmpxchg() in tmigr_active_up()). + * + * 2) If the old root is still on the way to be activated, the lagging= behind + * CPU performing the activation will acquire the links up to the n= ew root. + * (The below implicit release pairs with the implicit acquire from= cmpxchg() + * in tmigr_active_up()). + * + * 3) Every subsequent CPU below the old root will acquire the new lin= ks while + * walking through the old root (The below implicit release pairs w= ith the + * implicit acquire from cmpxchg() in either tmigr_active_up()) or + * tmigr_inactive_up(). */ - state.state =3D atomic_read(&start->migr_state); - WARN_ON_ONCE(!state.active); + state.state =3D atomic_fetch_or(0, &start->migr_state); WARN_ON_ONCE(!start->parent); - data.childmask =3D start->groupmask; - __walk_groups_from(tmigr_active_up, &data, start, start->parent); + /* + * If the state of the old root is inactive, another CPU is on its way t= o activate + * it and propagate to the new root. + */ + if (state.active) { + data.childmask =3D start->groupmask; + __walk_groups_from(tmigr_active_up, &data, start, start->parent); + } } =20 /* Root update */ --=20 2.53.0 From nobody Wed Jun 17 07:14:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66448382F15 for ; Thu, 23 Apr 2026 16:54:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963245; cv=none; b=D3yYa3vYAz/veiLdSokCEBVjlw+mEjeRh4rSteiVvB7ylVJ/U7v2Qi32gSLUsgaxemylUkySo2cQh0odAbSFDUg9H8nPE7108MK+9unwz5GR/dDL8fr4qzR5d6oc77yguVkfwk/cbIbimdOxYZFwewkrHNjpeuVP81hA13KNOM0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963245; c=relaxed/simple; bh=ouBlkGGDV1QR0xzPMG5Z/zxgIyeHgEBZLClqGv9Ybgo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OVi2SZwoDcAjfThyZ/FIuVWgclRYfkaZEuwr4T4F50dadPliqnbGknMMX4uH0ZhbbIx4nsT0sT5Vwwo3p8tVGYBEuXOM0kqRLTeydx7G1jkyvpK1RXbHvIxd9EJwSRFFzz2fSYyq41Pqrxy4Ves5Tw81wEYh9Y9AoQo1Wx7HPpE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lvWqBDbt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lvWqBDbt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5D33C2BCAF; Thu, 23 Apr 2026 16:54:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776963245; bh=ouBlkGGDV1QR0xzPMG5Z/zxgIyeHgEBZLClqGv9Ybgo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lvWqBDbtiBUg1aTiHayjg5Qpbpt+WLDUg0d8ZgaQrIixRyc/3Uq+hBs30NqOgPFs6 a1P+DyYuAjBVffgUFFyeZblfkLG/tazRYlP1QPNxqgThQg2BaW0SsnPbXGjTo77T9x bspR3dXPpzTquIctlywKsrCSa8cHrCCKayoJFLCh4uuzXR12lNOZiF3SolAv0ehm78 +fk+vH8i5iy+Xa3tRfp0tHyHOCYrW+dHbbBOzL3UaBmtIUVA+L7j/3Z28Y9EwnLHh9 8/0dvqlBBtrHA/BmznKP0+/uo3f5X0lAm2Rue0ewqVqe2ROaEI17ZFyc6CRd3gaNni s6QMQvG9J/CPw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Anna-Maria Behnsen , Sehee Jeong , Thomas Gleixner Subject: [PATCH 2/6] timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness Date: Thu, 23 Apr 2026 18:53:50 +0200 Message-ID: <20260423165354.95152-3-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260423165354.95152-1-frederic@kernel.org> References: <20260423165354.95152-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to prepare for separating out CPUs from different capacities in distinct hierarchies, create a hierarchy structure that group setup must rely upon. Signed-off-by: Frederic Weisbecker --- kernel/time/timer_migration.c | 100 +++++++++++++++++++++------------- kernel/time/timer_migration.h | 10 ++++ 2 files changed, 72 insertions(+), 38 deletions(-) diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 1d0d3a4058d5..52e97b880b1c 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -102,7 +102,7 @@ * active CPU/group information atomic_try_cmpxchg() is used instead and o= nly * the per CPU tmigr_cpu->lock is held. * - * During the setup of groups tmigr_level_list is required. It is protecte= d by + * During the setup of groups, hier->level_list is required. It is protect= ed by * @tmigr_mutex. * * When @timer_base->lock as well as tmigr related locks are required, the= lock @@ -416,13 +416,12 @@ */ =20 static DEFINE_MUTEX(tmigr_mutex); -static struct list_head *tmigr_level_list __read_mostly; + +static struct tmigr_hierarchy *hierarchy; =20 static unsigned int tmigr_hierarchy_levels __read_mostly; static unsigned int tmigr_crossnode_level __read_mostly; =20 -static struct tmigr_group *tmigr_root; - static DEFINE_PER_CPU(struct tmigr_cpu, tmigr_cpu); =20 /* @@ -1653,14 +1652,15 @@ static void tmigr_init_group(struct tmigr_group *gr= oup, unsigned int lvl, group->groupevt.ignore =3D true; } =20 -static struct tmigr_group *tmigr_get_group(int node, unsigned int lvl) +static struct tmigr_group *tmigr_get_group(struct tmigr_hierarchy *hier, + int node, unsigned int lvl) { struct tmigr_group *tmp, *group =3D NULL; =20 lockdep_assert_held(&tmigr_mutex); =20 /* Try to attach to an existing group first */ - list_for_each_entry(tmp, &tmigr_level_list[lvl], list) { + list_for_each_entry(tmp, &hier->level_list[lvl], list) { /* * If @lvl is below the cross NUMA node level, check whether * this group belongs to the same NUMA node. @@ -1694,14 +1694,15 @@ static struct tmigr_group *tmigr_get_group(int node= , unsigned int lvl) tmigr_init_group(group, lvl, node); =20 /* Setup successful. Add it to the hierarchy */ - list_add(&group->list, &tmigr_level_list[lvl]); + list_add(&group->list, &hier->level_list[lvl]); trace_tmigr_group_set(group); return group; } =20 -static bool tmigr_init_root(struct tmigr_group *group, bool activate) +static bool tmigr_init_root(struct tmigr_hierarchy *hier, + struct tmigr_group *group, bool activate) { - if (!group->parent && group !=3D tmigr_root) { + if (!group->parent && group !=3D hier->root) { /* * This is the new top-level, prepare its groupmask in advance * to avoid accidents where yet another new top-level is @@ -1717,11 +1718,12 @@ static bool tmigr_init_root(struct tmigr_group *gro= up, bool activate) =20 } =20 -static void tmigr_connect_child_parent(struct tmigr_group *child, +static void tmigr_connect_child_parent(struct tmigr_hierarchy *hier, + struct tmigr_group *child, struct tmigr_group *parent, bool activate) { - if (tmigr_init_root(parent, activate)) { + if (tmigr_init_root(hier, parent, activate)) { /* * The previous top level had prepared its groupmask already, * simply account it in advance as the first child. If some groups @@ -1757,10 +1759,10 @@ static void tmigr_connect_child_parent(struct tmigr= _group *child, trace_tmigr_connect_child_parent(child); } =20 -static int tmigr_setup_groups(unsigned int cpu, unsigned int node, - struct tmigr_group *start, bool activate) +static int tmigr_setup_groups(struct tmigr_hierarchy *hier, unsigned int c= pu, + unsigned int node, struct tmigr_group *start, bool activate) { - struct tmigr_group *group, *child, **stack; + struct tmigr_group *root =3D hier->root, *group, *child, **stack; int i, top =3D 0, err =3D 0, start_lvl =3D 0; bool root_mismatch =3D false; =20 @@ -1773,11 +1775,11 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node, start_lvl =3D start->level + 1; } =20 - if (tmigr_root) - root_mismatch =3D tmigr_root->numa_node !=3D node; + if (root) + root_mismatch =3D root->numa_node !=3D node; =20 for (i =3D start_lvl; i < tmigr_hierarchy_levels; i++) { - group =3D tmigr_get_group(node, i); + group =3D tmigr_get_group(hier, node, i); if (IS_ERR(group)) { err =3D PTR_ERR(group); i--; @@ -1799,7 +1801,7 @@ static int tmigr_setup_groups(unsigned int cpu, unsig= ned int node, if (group->parent) break; if ((!root_mismatch || i >=3D tmigr_crossnode_level) && - list_is_singular(&tmigr_level_list[i])) + list_is_singular(&hier->level_list[i])) break; } =20 @@ -1827,7 +1829,7 @@ static int tmigr_setup_groups(unsigned int cpu, unsig= ned int node, tmc->tmgroup =3D group; tmc->groupmask =3D BIT(group->num_children++); =20 - tmigr_init_root(group, activate); + tmigr_init_root(hier, group, activate); =20 trace_tmigr_connect_cpu_parent(tmc); =20 @@ -1835,7 +1837,7 @@ static int tmigr_setup_groups(unsigned int cpu, unsig= ned int node, continue; } else { child =3D stack[i - 1]; - tmigr_connect_child_parent(child, group, activate); + tmigr_connect_child_parent(hier, child, group, activate); } } =20 @@ -1894,15 +1896,15 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node, } =20 /* Root update */ - if (list_is_singular(&tmigr_level_list[top])) { - group =3D list_first_entry(&tmigr_level_list[top], + if (list_is_singular(&hier->level_list[top])) { + group =3D list_first_entry(&hier->level_list[top], typeof(*group), list); WARN_ON_ONCE(group->parent); - if (tmigr_root) { + if (root) { /* Old root should be the same or below */ - WARN_ON_ONCE(tmigr_root->level > top); + WARN_ON_ONCE(root->level > top); } - tmigr_root =3D group; + hier->root =3D group; } out: kfree(stack); @@ -1910,18 +1912,48 @@ static int tmigr_setup_groups(unsigned int cpu, uns= igned int node, return err; } =20 +static struct tmigr_hierarchy *tmigr_get_hierarchy(void) +{ + if (hierarchy) + return hierarchy; + + hierarchy =3D kzalloc(sizeof(*hierarchy), GFP_KERNEL); + if (!hierarchy) + return ERR_PTR(-ENOMEM); + + hierarchy->level_list =3D kzalloc_objs(struct list_head, + tmigr_hierarchy_levels); + if (!hierarchy->level_list) { + kfree(hierarchy); + hierarchy =3D NULL; + return ERR_PTR(-ENOMEM); + } + + for (int i =3D 0; i < tmigr_hierarchy_levels; i++) + INIT_LIST_HEAD(&hierarchy->level_list[i]); + + return hierarchy; +} + static int tmigr_add_cpu(unsigned int cpu) { - struct tmigr_group *old_root =3D tmigr_root; + struct tmigr_hierarchy *hier; + struct tmigr_group *old_root; int node =3D cpu_to_node(cpu); int ret; =20 guard(mutex)(&tmigr_mutex); =20 - ret =3D tmigr_setup_groups(cpu, node, NULL, false); + hier =3D tmigr_get_hierarchy(); + if (IS_ERR(hier)) + return PTR_ERR(hier); + + old_root =3D hier->root; + + ret =3D tmigr_setup_groups(hier, cpu, node, NULL, false); =20 /* Root has changed? Connect the old one to the new */ - if (ret >=3D 0 && old_root && old_root !=3D tmigr_root) { + if (ret >=3D 0 && old_root && old_root !=3D hier->root) { /* * The target CPU must never do the prepare work, except * on early boot when the boot CPU is the target. Otherwise @@ -1935,7 +1967,7 @@ static int tmigr_add_cpu(unsigned int cpu) * otherwise the old root may not be active as expected. */ WARN_ON_ONCE(!per_cpu_ptr(&tmigr_cpu, raw_smp_processor_id())->available= ); - ret =3D tmigr_setup_groups(-1, old_root->numa_node, old_root, true); + ret =3D tmigr_setup_groups(hier, -1, old_root->numa_node, old_root, true= ); } =20 return ret; @@ -1970,7 +2002,7 @@ static int tmigr_cpu_prepare(unsigned int cpu) =20 static int __init tmigr_init(void) { - unsigned int cpulvl, nodelvl, cpus_per_node, i; + unsigned int cpulvl, nodelvl, cpus_per_node; unsigned int nnodes =3D num_possible_nodes(); unsigned int ncpus =3D num_possible_cpus(); int ret =3D -ENOMEM; @@ -2017,14 +2049,6 @@ static int __init tmigr_init(void) */ tmigr_crossnode_level =3D cpulvl; =20 - tmigr_level_list =3D kzalloc_objs(struct list_head, - tmigr_hierarchy_levels); - if (!tmigr_level_list) - goto err; - - for (i =3D 0; i < tmigr_hierarchy_levels; i++) - INIT_LIST_HEAD(&tmigr_level_list[i]); - pr_info("Timer migration: %d hierarchy levels; %d children per group;" " %d crossnode level\n", tmigr_hierarchy_levels, TMIGR_CHILDREN_PER_GROUP, diff --git a/kernel/time/timer_migration.h b/kernel/time/timer_migration.h index 70879cde6fdd..77df422e5f9a 100644 --- a/kernel/time/timer_migration.h +++ b/kernel/time/timer_migration.h @@ -5,6 +5,16 @@ /* Per group capacity. Must be a power of 2! */ #define TMIGR_CHILDREN_PER_GROUP 8 =20 +/** + * struct tmigr_hierarchy - a hierarchy associated to a given CPU capacity. + * @level_list: Per level lists of tmigr groups + * @root: The current root of the hierarchy + */ +struct tmigr_hierarchy { + struct list_head *level_list; + struct tmigr_group *root; +}; + /** * struct tmigr_event - a timer event associated to a CPU * @nextevt: The node to enqueue an event in the parent group queue --=20 2.53.0 From nobody Wed Jun 17 07:14:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BDE338B14C for ; Thu, 23 Apr 2026 16:54:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963247; cv=none; b=roYUrMpl1O1YdXzcR6kEe6mFVwSNVS54Fn7lVYYvInRM6K+0gFAt3bspsVtdhl3MmhNbxCRTlyUMdF0OUZuNzwJrEmPKqJTIZHBQjMm7tABRBwvLRWbJTDS9XFVS3g8vkYHo5Pd5XhExTjA54klwCCE4ujCka3hTTBpG5XDH9LE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963247; c=relaxed/simple; bh=eF7Gawkk5bH//TSVdBMnX+O+QtyOsNz3h0dhC817zwc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fUzvw323KPsuFFXE5sYc/KKWFvizQ7ZG4qsv0bWx/429dU3U1nbgfGa78/iven/Q6it4hy2iMgGaP5Mx13rOx2TcF2Vb/5OASX619lJr1DzZu9FK3EXkd6J0l7l1eQFmOzxwdTFmDH+69O61+oXqSRiITcwADuImylr9quB2AdI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y/iughz0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y/iughz0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 68BECC2BCB3; Thu, 23 Apr 2026 16:54:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776963246; bh=eF7Gawkk5bH//TSVdBMnX+O+QtyOsNz3h0dhC817zwc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y/iughz0K0tPNCEJlvh9+3rXuM1QNxTm2xIM57gsT00l5JM/Ll/7MMUIs6YkwIN5E uJlcYKKWCpBvAUzGDgdXfRbr6nO1eAzkeByPSW+xKrOv20ZuSMSxafoUwqh1KJ78T1 N/IljOhmnFsoiiqehAA+32R1H6+Qn5DcebgvzIjgXaw4z9yX8UCMPdRsfrIrPdCWx8 lh0ln2ZLIV2+uur9gi7aula+nvHtHIrxZnarc8DWY3zU4R58e9U14N031EXtYdDcfJ CQaVt7VzmOIhOrGwYqUTBQWOpft0FN1mxFzuiGoqgZ+ApK/Dyv5VJu7rxQ4ry1+flZ hKGdRNabyE3sw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Anna-Maria Behnsen , Sehee Jeong , Thomas Gleixner Subject: [PATCH 3/6] timers/migration: Track CPUs in a hierarchy Date: Thu, 23 Apr 2026 18:53:51 +0200 Message-ID: <20260423165354.95152-4-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260423165354.95152-1-frederic@kernel.org> References: <20260423165354.95152-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a new root is created, the old root is connected to it and propagates up its own assumed to be active state, since the hotplug control CPU is itself active and part of the old root. However with per-capacity hierarchies, this assumption won't be true anymore because the hotplug control CPU calling the timer migration prepare callback may not belong to the same hierarchy as the booting CPU. To solve this, track the available CPUs per hierarchies so that the root connection can be offlined to safe CPUs. Signed-off-by: Frederic Weisbecker --- kernel/time/timer_migration.c | 25 +++++++++++++++++++------ kernel/time/timer_migration.h | 2 ++ 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 52e97b880b1c..d0de9f64528e 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -1921,18 +1921,25 @@ static struct tmigr_hierarchy *tmigr_get_hierarchy(= void) if (!hierarchy) return ERR_PTR(-ENOMEM); =20 + hierarchy->cpumask =3D kzalloc(cpumask_size(), GFP_KERNEL); + if (!hierarchy->cpumask) + goto err; + hierarchy->level_list =3D kzalloc_objs(struct list_head, tmigr_hierarchy_levels); - if (!hierarchy->level_list) { - kfree(hierarchy); - hierarchy =3D NULL; - return ERR_PTR(-ENOMEM); - } + if (!hierarchy->level_list) + goto err; =20 for (int i =3D 0; i < tmigr_hierarchy_levels; i++) INIT_LIST_HEAD(&hierarchy->level_list[i]); =20 return hierarchy; +err: + kfree(hierarchy->cpumask); + kfree(hierarchy); + hierarchy =3D NULL; + + return ERR_PTR(-ENOMEM); } =20 static int tmigr_add_cpu(unsigned int cpu) @@ -1952,8 +1959,11 @@ static int tmigr_add_cpu(unsigned int cpu) =20 ret =3D tmigr_setup_groups(hier, cpu, node, NULL, false); =20 + if (ret < 0) + return ret; + /* Root has changed? Connect the old one to the new */ - if (ret >=3D 0 && old_root && old_root !=3D hier->root) { + if (old_root && old_root !=3D hier->root) { /* * The target CPU must never do the prepare work, except * on early boot when the boot CPU is the target. Otherwise @@ -1970,6 +1980,9 @@ static int tmigr_add_cpu(unsigned int cpu) ret =3D tmigr_setup_groups(hier, -1, old_root->numa_node, old_root, true= ); } =20 + if (ret >=3D 0) + cpumask_set_cpu(cpu, hier->cpumask); + return ret; } =20 diff --git a/kernel/time/timer_migration.h b/kernel/time/timer_migration.h index 77df422e5f9a..0cfbb8d799a6 100644 --- a/kernel/time/timer_migration.h +++ b/kernel/time/timer_migration.h @@ -8,10 +8,12 @@ /** * struct tmigr_hierarchy - a hierarchy associated to a given CPU capacity. * @level_list: Per level lists of tmigr groups + * @cpumask: CPUs belonging to this hierarchy * @root: The current root of the hierarchy */ struct tmigr_hierarchy { struct list_head *level_list; + struct cpumask *cpumask; struct tmigr_group *root; }; =20 --=20 2.53.0 From nobody Wed Jun 17 07:14:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0905390985 for ; Thu, 23 Apr 2026 16:54:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963248; cv=none; b=CAg5XYN9dpeguI3G/ymlaCr3oEasqT3L85jUiyYrXOSkhkqYn+NA3pK0eWRPDHrp8vHkLVfo+UrhKeC9eCVQ1TR3O04U3v2KcxIPc0x/2FPS02xhNbKTHB5db2r9X6atwsMAXnVSLmW1IGx5gOC2dL//A6/JeeP4866Xdq1qzfY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963248; c=relaxed/simple; bh=HDqzr5+aT86ajmeF/GaLDiQ4N0guc2SdVYlpE1TCJRk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aVc3UbRCRGBOsVdkNnHfus/3fKFpmkh+lSw83H/tVo7CzAa91ejhtvpHBZtDBLixTZyQldqJV3tQFugCcCMHN2BZN2XPRQt8MaoBmNbEIyZ2T5nWfXBPmTreWIRY9jQRgUjX4Wdt1V3xZ0WS2txwfPBRoIps1GXvcGECVDKORP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BzA5nS5v; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BzA5nS5v" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 207ADC2BCAF; Thu, 23 Apr 2026 16:54:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776963248; bh=HDqzr5+aT86ajmeF/GaLDiQ4N0guc2SdVYlpE1TCJRk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BzA5nS5vHefIXuyBtYC+ABccUzHclUziWnSj6Fl/6MUfg5uj12dmGQkSAlalnrjvO HFgbVmMXkzoK3N9XdTntfxAl4SeoZSgRzD9H4ud2dQvdXm0nV2sBTEp/i798ZhFjhU EBWlXiMhIh1rzNuvG3RxNWzh8cNJReTMPoKJftNpadlmhnedR0RUpj4mIbqY5YVYNY nLI66VKOGm+hKWBZ7foRIu96mqM8atb3Ui/acFnG35Bbzs94E/8K5WdyONjgyZ+nm/ /NTsgno0Hrw1Z0bT8y3wlqzsJLQNkxzZDrcjkr7VL3n8ILhrBuxq6McVx8xPyLBYod NWVjIEkVxVLMw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Anna-Maria Behnsen , Sehee Jeong , Thomas Gleixner Subject: [PATCH 4/6] timers/migration: Split per-capacity hierarchies Date: Thu, 23 Apr 2026 18:53:52 +0200 Message-ID: <20260423165354.95152-5-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260423165354.95152-1-frederic@kernel.org> References: <20260423165354.95152-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Systems with heterogeneous CPU capacities, such as big.LITTLE, have reported power issues since the introduction of the new timer migration code. Timers migrate from small capacity CPUs to big ones, degrading their target residency and thus overall power consumption. Solve this with splitting hierarchies per CPU capacity. For example in a big.LITTLE machine, split a single hierarchy in two: one for big capacity CPUs and another one for small capacity CPUs. This way global timers only migrate across CPUs of the same capacity. For simplicity purpose, split hierarchies keep the same number of possible levels as if there were a single hierarchy, even though the CPUs are distributed between multiple hierarchies. This could be a problem on NUMA systems with heterogeneous CPU capacities (provided that ever exists yet) where useless intermediate nodes may be created. Solving this properly will imply on boot to know in advance how many capacities are available and the number of CPUs for each of them. Reported-by: Sehee Jeong Suggested-by: Thomas Gleixner Signed-off-by: Frederic Weisbecker --- kernel/time/timer_migration.c | 125 +++++++++++++++++++++++++--------- kernel/time/timer_migration.h | 7 ++ 2 files changed, 101 insertions(+), 31 deletions(-) diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index d0de9f64528e..0a8c893353a2 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -417,7 +417,7 @@ =20 static DEFINE_MUTEX(tmigr_mutex); =20 -static struct tmigr_hierarchy *hierarchy; +static LIST_HEAD(tmigr_hierarchy_list); =20 static unsigned int tmigr_hierarchy_levels __read_mostly; static unsigned int tmigr_crossnode_level __read_mostly; @@ -1893,6 +1893,12 @@ static int tmigr_setup_groups(struct tmigr_hierarchy= *hier, unsigned int cpu, data.childmask =3D start->groupmask; __walk_groups_from(tmigr_active_up, &data, start, start->parent); } + } else if (start) { + union tmigr_state state; + + /* Remote activation assumes the whole target's hierarchy is inactive */ + state.state =3D atomic_read(&start->migr_state); + WARN_ON_ONCE(state.active); } =20 /* Root update */ @@ -1912,36 +1918,80 @@ static int tmigr_setup_groups(struct tmigr_hierarch= y *hier, unsigned int cpu, return err; } =20 -static struct tmigr_hierarchy *tmigr_get_hierarchy(void) +static struct tmigr_hierarchy *tmigr_get_hierarchy(unsigned int capacity) { - if (hierarchy) - return hierarchy; + struct tmigr_hierarchy *hier =3D NULL, *iter; =20 - hierarchy =3D kzalloc(sizeof(*hierarchy), GFP_KERNEL); - if (!hierarchy) + list_for_each_entry(iter, &tmigr_hierarchy_list, node) { + if (iter->capacity =3D=3D capacity) + hier =3D iter; + } + + if (hier) + return hier; + + hier =3D kzalloc(sizeof(*hier), GFP_KERNEL); + if (!hier) return ERR_PTR(-ENOMEM); =20 - hierarchy->cpumask =3D kzalloc(cpumask_size(), GFP_KERNEL); - if (!hierarchy->cpumask) + hier->cpumask =3D kzalloc(cpumask_size(), GFP_KERNEL); + if (!hier->cpumask) goto err; =20 - hierarchy->level_list =3D kzalloc_objs(struct list_head, - tmigr_hierarchy_levels); - if (!hierarchy->level_list) + hier->level_list =3D kzalloc_objs(struct list_head, + tmigr_hierarchy_levels); + if (!hier->level_list) goto err; =20 for (int i =3D 0; i < tmigr_hierarchy_levels; i++) - INIT_LIST_HEAD(&hierarchy->level_list[i]); + INIT_LIST_HEAD(&hier->level_list[i]); =20 - return hierarchy; + hier->capacity =3D capacity; + list_add_tail(&hier->node, &tmigr_hierarchy_list); + + return hier; err: - kfree(hierarchy->cpumask); - kfree(hierarchy); - hierarchy =3D NULL; + kfree(hier->cpumask); + kfree(hier); =20 return ERR_PTR(-ENOMEM); } =20 +static int tmigr_connect_old_root(struct tmigr_hierarchy *hier, int cpu, + struct tmigr_group *old_root, bool activate) +{ + /* + * The target CPU must never do the prepare work, except + * on early boot when the boot CPU is the target. Otherwise + * it may spuriously activate the old top level group inside + * the new one (nevertheless whether old top level group is + * active or not) and/or release an uninitialized childmask. + */ + WARN_ON_ONCE(cpu =3D=3D smp_processor_id()); + if (activate) { + /* + * The current CPU is expected to be online in the hierarchy, + * otherwise the old root may not be active as expected. + */ + WARN_ON_ONCE(!__this_cpu_read(tmigr_cpu.available)); + } + + return tmigr_setup_groups(hier, -1, old_root->numa_node, old_root, activa= te); +} + +static long connect_old_root_work(void *arg) +{ + struct tmigr_group *old_root =3D arg; + struct tmigr_hierarchy *hier; + int cpu =3D smp_processor_id(); + + hier =3D tmigr_get_hierarchy(arch_scale_cpu_capacity(cpu)); + if (IS_ERR(hier)) + return PTR_ERR(hier); + + return tmigr_connect_old_root(hier, cpu, old_root, true); +} + static int tmigr_add_cpu(unsigned int cpu) { struct tmigr_hierarchy *hier; @@ -1951,7 +2001,7 @@ static int tmigr_add_cpu(unsigned int cpu) =20 guard(mutex)(&tmigr_mutex); =20 - hier =3D tmigr_get_hierarchy(); + hier =3D tmigr_get_hierarchy(arch_scale_cpu_capacity(cpu)); if (IS_ERR(hier)) return PTR_ERR(hier); =20 @@ -1964,20 +2014,33 @@ static int tmigr_add_cpu(unsigned int cpu) =20 /* Root has changed? Connect the old one to the new */ if (old_root && old_root !=3D hier->root) { - /* - * The target CPU must never do the prepare work, except - * on early boot when the boot CPU is the target. Otherwise - * it may spuriously activate the old top level group inside - * the new one (nevertheless whether old top level group is - * active or not) and/or release an uninitialized childmask. - */ - WARN_ON_ONCE(cpu =3D=3D raw_smp_processor_id()); - /* - * The (likely) current CPU is expected to be online in the hierarchy, - * otherwise the old root may not be active as expected. - */ - WARN_ON_ONCE(!per_cpu_ptr(&tmigr_cpu, raw_smp_processor_id())->available= ); - ret =3D tmigr_setup_groups(hier, -1, old_root->numa_node, old_root, true= ); + guard(migrate)(); + + if (cpumask_test_cpu(smp_processor_id(), hier->cpumask)) { + /* + * If the target belong to the same hierarchy, the old root is expected + * to be active. Link and propagate to the new root. + */ + ret =3D tmigr_connect_old_root(hier, cpu, old_root, true); + } else { + int target =3D cpumask_first_and(hier->cpumask, tmigr_available_cpumask= ); + + if (target < nr_cpu_ids) { + /* + * If the target doesn't belong to the same hierarchy as the current + * CPU, activate from a relevant one to make sure the old root is + * active. + */ + ret =3D work_on_cpu(target, connect_old_root_work, old_root); + } else { + /* + * No other available CPUs in the remote hierarchy. Link the + * old root remotely but don't propagate activation since the + * old root is not expected to be active. + */ + ret =3D tmigr_connect_old_root(hier, cpu, old_root, false); + } + } } =20 if (ret >=3D 0) diff --git a/kernel/time/timer_migration.h b/kernel/time/timer_migration.h index 0cfbb8d799a6..291bfb6adfc3 100644 --- a/kernel/time/timer_migration.h +++ b/kernel/time/timer_migration.h @@ -7,14 +7,21 @@ =20 /** * struct tmigr_hierarchy - a hierarchy associated to a given CPU capacity. + * Homogeneous systems have only one hierarchy. + * Heterogenous have one hierarchy per CPU capaci= ty. * @level_list: Per level lists of tmigr groups * @cpumask: CPUs belonging to this hierarchy * @root: The current root of the hierarchy + * @capacity: CPU capacity associated to this hierarchy + * @node: Node in the global hierarchy list */ struct tmigr_hierarchy { struct list_head *level_list; struct cpumask *cpumask; struct tmigr_group *root; + unsigned long capacity; + struct list_head node; + }; =20 /** --=20 2.53.0 From nobody Wed Jun 17 07:14:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C2A637DE8B for ; Thu, 23 Apr 2026 16:54:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963250; cv=none; b=Ce/5UUg2jVzVMnGT6CxGZZRUEh/KNYBUAlpab91FVfIyu5TV6aa28Rk2MQUlhF56kf31fG3BUmSmlxCIlMgiuUCkf+rtjuLcjngdySTMVzxrl1nCOLnhahJgQE/YzIEEBneqrPQKhzLdzFDNdtUu/zt3SvOAmHllbuHWQMrmdGk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963250; c=relaxed/simple; bh=4xs4/cwtYOq+1JkDsMP8WEXgpqf973tgGClM/Kqhnrs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Vk/g52IvYJk4TpTjPYYNRISQZGwdU7v81uslOX1ncJMya6UeEdQ3ON2D1VxgjfEZgJA5nzbTURFi5kNNSEVsAZSi/98SHsRnMxHOUvYLee4u5OwIVTAGf63cevQmtGuBB03G2tjdvT89laHrwhgN4o8Nrwb42odaMnPlIOa99Pk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=J2aie6eZ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="J2aie6eZ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC645C2BCB4; Thu, 23 Apr 2026 16:54:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776963250; bh=4xs4/cwtYOq+1JkDsMP8WEXgpqf973tgGClM/Kqhnrs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=J2aie6eZEdcb7SKqwxgdDA3HQFKbg/Nm4UuwA9qApEWVnqkbWWW62zmkK4AemBi4h +r5bs2YWdMDR7UyVwi4sYGa8tESW11ZVviGepe4TUhapD7syrOsgtQ9fSJJfKbIEJW LZHcY2cggf2YQ4PbQ90Y+LPdP3UaowoPXZMtRcyaODp8lBvr790rKKk7xgal642Fji P/lg/3Ajlx6eUL9+/ijqE1cDaH9mNEpR5ko8nUQBQ6YiaojHIbb/3oFe+yirkgEfox tyfAuK2GLf8EIg+FfdmM/kxxpyhTF9C8/shCWvnmsvWst7d7n6QJe9HrbQ/7aozRq+ LsTUnqInJUxYw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Anna-Maria Behnsen , Sehee Jeong , Thomas Gleixner Subject: [PATCH 5/6] timers/migration: Handle capacity in connect tracepoints Date: Thu, 23 Apr 2026 18:53:53 +0200 Message-ID: <20260423165354.95152-6-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260423165354.95152-1-frederic@kernel.org> References: <20260423165354.95152-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This let tracers know to which hierarchy a CPU belongs to. Signed-off-by: Frederic Weisbecker --- include/trace/events/timer_migration.h | 24 ++++++++++++++---------- kernel/time/timer_migration.c | 4 ++-- 2 files changed, 16 insertions(+), 12 deletions(-) diff --git a/include/trace/events/timer_migration.h b/include/trace/events/= timer_migration.h index 61171b13c687..0b135e9301b1 100644 --- a/include/trace/events/timer_migration.h +++ b/include/trace/events/timer_migration.h @@ -33,15 +33,16 @@ TRACE_EVENT(tmigr_group_set, =20 TRACE_EVENT(tmigr_connect_child_parent, =20 - TP_PROTO(struct tmigr_group *child), + TP_PROTO(struct tmigr_hierarchy *hier, struct tmigr_group *child), =20 - TP_ARGS(child), + TP_ARGS(hier, child), =20 TP_STRUCT__entry( __field( void *, child ) __field( void *, parent ) __field( unsigned int, lvl ) __field( unsigned int, numa_node ) + __field( unsigned int, capacity ) __field( unsigned int, num_children ) __field( u32, groupmask ) ), @@ -51,26 +52,28 @@ TRACE_EVENT(tmigr_connect_child_parent, __entry->parent =3D child->parent; __entry->lvl =3D child->parent->level; __entry->numa_node =3D child->parent->numa_node; + __entry->capacity =3D hier->capacity; __entry->num_children =3D child->parent->num_children; __entry->groupmask =3D child->groupmask; ), =20 - TP_printk("group=3D%p groupmask=3D%0x parent=3D%p lvl=3D%d numa=3D%d num_= children=3D%d", - __entry->child, __entry->groupmask, __entry->parent, - __entry->lvl, __entry->numa_node, __entry->num_children) + TP_printk("group=3D%p groupmask=3D%0x parent=3D%p lvl=3D%d numa=3D%d capa= city=3D%d num_children=3D%d", + __entry->child, __entry->groupmask, __entry->parent, __entry->lvl, + __entry->numa_node, __entry->capacity, __entry->num_children) ); =20 TRACE_EVENT(tmigr_connect_cpu_parent, =20 - TP_PROTO(struct tmigr_cpu *tmc), + TP_PROTO(struct tmigr_hierarchy *hier, struct tmigr_cpu *tmc), =20 - TP_ARGS(tmc), + TP_ARGS(hier, tmc), =20 TP_STRUCT__entry( __field( void *, parent ) __field( unsigned int, cpu ) __field( unsigned int, lvl ) __field( unsigned int, numa_node ) + __field( unsigned int, capacity ) __field( unsigned int, num_children ) __field( u32, groupmask ) ), @@ -80,13 +83,14 @@ TRACE_EVENT(tmigr_connect_cpu_parent, __entry->cpu =3D tmc->cpuevt.cpu; __entry->lvl =3D tmc->tmgroup->level; __entry->numa_node =3D tmc->tmgroup->numa_node; + __entry->capacity =3D hier->capacity; __entry->num_children =3D tmc->tmgroup->num_children; __entry->groupmask =3D tmc->groupmask; ), =20 - TP_printk("cpu=3D%d groupmask=3D%0x parent=3D%p lvl=3D%d numa=3D%d num_ch= ildren=3D%d", - __entry->cpu, __entry->groupmask, __entry->parent, - __entry->lvl, __entry->numa_node, __entry->num_children) + TP_printk("cpu=3D%d groupmask=3D%0x parent=3D%p lvl=3D%d numa=3D%d capaci= ty=3D%d num_children=3D%d", + __entry->cpu, __entry->groupmask, __entry->parent, __entry->lvl, + __entry->numa_node, __entry->capacity, __entry->num_children) ); =20 DECLARE_EVENT_CLASS(tmigr_group_and_cpu, diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 0a8c893353a2..ec3ff80f795c 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -1756,7 +1756,7 @@ static void tmigr_connect_child_parent(struct tmigr_h= ierarchy *hier, */ smp_store_release(&child->parent, parent); =20 - trace_tmigr_connect_child_parent(child); + trace_tmigr_connect_child_parent(hier, child); } =20 static int tmigr_setup_groups(struct tmigr_hierarchy *hier, unsigned int c= pu, @@ -1831,7 +1831,7 @@ static int tmigr_setup_groups(struct tmigr_hierarchy = *hier, unsigned int cpu, =20 tmigr_init_root(hier, group, activate); =20 - trace_tmigr_connect_cpu_parent(tmc); + trace_tmigr_connect_cpu_parent(hier, tmc); =20 /* There are no children that need to be connected */ continue; --=20 2.53.0 From nobody Wed Jun 17 07:14:19 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C96F39524E for ; Thu, 23 Apr 2026 16:54:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963252; cv=none; b=YYzSAKQS/85eJSz2AKxHrCwemVAOw3ivbaZV8fpNSQSMvaDCl/excMdlJkGrxci8tMfy3GzNN6BCbtfuF2YZr7dctJK0VboJb1PZJmc4NmyKKPhxpJ7MHktw4tPuU80JpeuC9SYAeLGsaGA4oA82uI5yZcE8A37xy2YhoPhpmJk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776963252; c=relaxed/simple; bh=y+JuNSvsbFXXmy0fEJKqTTVY4TeC2cFt/8NXpYvAAw0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hlrEnfMdvinKvH0ap8AAkmhBv1kfzYHrIM56SGLljRhbyAfmjsCbq9WNJ4GmeTBu4NC5yMxKO2rzhoHUh85mHobPTD/PpJePW0OjqsX7n8M3BhuYAIsig4FbmvHGMakGxXD3QGft5k3SnNJLGn+DHg7U5Pio8Px9R9OxNMLAj2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LnQFmduA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LnQFmduA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C33BC2BCB4; Thu, 23 Apr 2026 16:54:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776963251; bh=y+JuNSvsbFXXmy0fEJKqTTVY4TeC2cFt/8NXpYvAAw0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LnQFmduAfCmYFxeyDwAp08MRb/ztzfmeER5FB8ogNO5TLnPUUA/mjdFTP8ewUajTT RaoNJLQm3tavvqzMu02jXZvonHD37abO0JdnUkKxiJxZk66VHDrhElHwnGomcgneRk v0n26VJlrWPdXv7xqjWfuTyVvq1UY39CnqepxcH1+c8f4KXlL77ecT+YFE+Ze8I2pp Rd5NNBrm7RMvEPFQ6SI6DFNUufn90FpclFe6XJpOdRBWFSnnYbR1IxBOv83TzIik8D XDojEbmuwfgcUKxKYC14EipAZUaTSo/dD3MaPy/nsRIMWg94ImQDIkohqqaUqfa7KF 3APBwqPSCm+sQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Anna-Maria Behnsen , Sehee Jeong , Thomas Gleixner Subject: [PATCH 6/6] scripts/timers: Add timer_migration_tree.py Date: Thu, 23 Apr 2026 18:53:54 +0200 Message-ID: <20260423165354.95152-7-frederic@kernel.org> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260423165354.95152-1-frederic@kernel.org> References: <20260423165354.95152-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a script that provides a simple ascii representation of the timer migration tree on top of boot trace events. First boot with: trace_event=3D=3Dtmigr_connect_cpu_parent,tmigr_connect_child_parent Then parse the result with: scripts/timer_migration_tree.py < /sys/kernel/tracing/trace On a system with 8 CPUs, this produces the following output: Tree for capacity 1024 /-0, node 0, lvl:-1 | |--1, node 0, lvl:-1 | |--2, node 0, lvl:-1 | |--3, node 0, lvl:-1 -- /00000000dcebac8b, node 0, lvl:0 |--4, node 0, lvl:-1 | |--5, node 0, lvl:-1 | |--6, node 0, lvl:-1 | \-7, node 0, lvl:-1 Signed-off-by: Frederic Weisbecker --- scripts/timer_migration_tree.py | 122 ++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100755 scripts/timer_migration_tree.py diff --git a/scripts/timer_migration_tree.py b/scripts/timer_migration_tree= .py new file mode 100755 index 000000000000..faac9de854bd --- /dev/null +++ b/scripts/timer_migration_tree.py @@ -0,0 +1,122 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +""" +Draw the timer migration tree. + +1) Boot with trace_event=3D=3Dtmigr_connect_cpu_parent,tmigr_connect_child= _parent +2) ./timer_migration_tree.py < /sys/kernel/tracing/trace +""" + +import re, sys +from ete3 import Tree + +class Node: + def __init__(self, group): + self.group =3D group + self.children =3D [] + self.parent =3D None + self.num_children =3D 0 + self.groupmask =3D 0 + self.lvl =3D -1 + + def set_groupmask(self, groupmask): + self.groupmask =3D groupmask + + def set_parent(self, parent): + self.parent =3D parent + + def add_child(self, child): + self.children.append(child) + + def set_lvl(self, lvl): + self.lvl =3D lvl + + def set_numa(self, numa): + self.numa =3D numa + + def set_num_children(self, num_children): + self.num_children =3D num_children + + def __repr__(self): + if self.parent: + parent_grp =3D self.parent.group + else: + parent_grp =3D "-" + return "Group: %s mask: %s parent: %s lvl: %d numa: %d num_children: %d"= % (self.group, self.groupmask, parent_grp, self.lvl, self.numa, self.num_c= hildren) + +hierarchies =3D { } + +def get_hierarchy(capacity): + if capacity not in hierarchies: + hierarchies[capacity] =3D {} + return hierarchies[capacity] + +def get_node(capacity, group): + hier =3D get_hierarchy(capacity) + if group in hier: + return hier[group] + else: + n =3D Node(group) + hier[group] =3D n + return n + +def tmigr_connect_cpu_parent(ts, line): + s =3D re.search("tmigr_connect_cpu_parent: cpu=3D([0-9]+) groupmask=3D([0= -9a-zA-Z]+) parent=3D([0-9a-zA-Z]+) lvl=3D([0-9]+) numa=3D([-]?[0-9]+) capa= city=3D([-]?[0-9]+) num_children=3D([0-9]+)", line) + if s is None: + return False + (cpu, groupmask, parent, lvl, numa, capacity, num_children) =3D (int(s.gr= oup(1)), s.group(2), s.group(3), int(s.group(4)), int(s.group(5)), int(s.gr= oup(6)), int(s.group(7))) + n =3D get_node(capacity, cpu) + p =3D get_node(capacity, parent) + n.set_parent(p) + n.set_groupmask(groupmask) + n.set_lvl(-1) + p.set_lvl(lvl) + p.set_numa(numa) + n.set_numa(numa) + p.set_num_children(num_children) + p.add_child(n) + +def tmigr_connect_child_parent(ts, line): + s =3D re.search("tmigr_connect_child_parent: group=3D([0-9a-zA-Z]+) group= mask=3D([0-9a-zA-Z]+) parent=3D([0-9a-zA-Z]+) lvl=3D([0-9]+) numa=3D([-]?[0= -9]+) capacity=3D([-]?[0-9]+) num_children=3D([0-9]+)", line) + if s is None: + return False + (group, groupmask, parent, lvl, numa, capacity, num_children) =3D (s.grou= p(1), s.group(2), s.group(3), int(s.group(4)), int(s.group(5)), int(s.group= (6)), int(s.group(7))) + n =3D get_node(capacity, group) + p =3D get_node(capacity, parent) + n.set_parent(p) + n.set_groupmask(groupmask) + p.set_lvl(lvl) + p.set_numa(numa) + p.set_num_children(num_children) + p.add_child(n) + +def populate(enode, node): + enode =3D enode.add_child(name =3D node.group) + enode.add_feature("groupmask", "m:%s" % node.groupmask) + enode.add_feature("lvl", "lvl:%d" % node.lvl) + enode.add_feature("numa", "node %d" % node.numa) + enode.add_feature("num_children", "c=3D%d" % node.num_children) + for child in node.children: + populate(enode, child) + +if __name__ =3D=3D "__main__": + for line in sys.stdin: + s =3D re.search("([0-9]+[.][0-9]{6}): (.+?)$", line, re.S) + if s is not None: + if tmigr_connect_cpu_parent(float(s.group(1)), s.group(2)): + continue + if tmigr_connect_child_parent(float(s.group(1)), s.group(2)): + continue + + for cap in hierarchies: + h =3D hierarchies[cap] + print("Tree for capacity %d" % cap) + for k in h: + n =3D h[k] + while n.parent !=3D None: + n =3D n.parent + root =3D Tree() + populate(root, n) + print(root.get_ascii(show_internal=3DTrue, attributes=3D["name", "numa"= , "lvl"])) + break --=20 2.53.0