From nobody Tue Dec 16 05:40:20 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 552512206A2 for ; Tue, 14 Jan 2025 23:15:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736896514; cv=none; b=LNaWn36LIZj4ez8h8UTnZtjhAVO0cS/OtrOfXYrMTwOnShfulSd4/teoNu5mYFHG5q6906Qjg6uQsbt7RkrYf0KZtNzF8CgekejJVrlSISVp1amTWaSbFLEFnxBVsON3F1xuRBzxWwQUMikSvGB+GhJK5SrqtVbNoIGv793eZLM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736896514; c=relaxed/simple; bh=Fn4RfTrADqe2s6TIycpEv4JkOCG1UMzd1VZQ0iUnj0I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eGpXwEcrA28ect2gcyUklcgD2JtsMdlOYVQWIaSMa/iAwXdL7MfErZIjlkP+4Hvw7HGzenFONzOgO9zcxmzaLdeGL1iIVNMJgSuD5S4EZ6XVVZuY+ZF7AeNCl5pRIpPjQzVyRPE4fhCIY7OAOs2jJXbtDTFoIgLPXxsodVATrLI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Y6uERvPw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Y6uERvPw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3436AC4CEE4; Tue, 14 Jan 2025 23:15:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736896514; bh=Fn4RfTrADqe2s6TIycpEv4JkOCG1UMzd1VZQ0iUnj0I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y6uERvPwEzrEqptKANDSIYg5ELh1JZwI1JiQtudiv8yWNbTI5TWLw0SN9P8qfo+AD sx00c7MI+kiHB2gMlZAt/ZKZBnMt9nipA2K32hBqTeG7uNoeuVBgtjNWB6XbIa+c3r EF9qK8thDirCI+z3mTechhlt0dHejRtwxapQ5ybu+yxyEug2r7W1fgb6wJvhEypgY1 3I2w4+HL1Jsfx96T9YL1HwgdsfLx/MJ6mHr2zFaABbgT7UOvK6MfENn8ZEPMrzNsYW krmZZJRDoW8jTQKKvSA9QSS8c4rg9iRx3TeGa6eDPUzfPIPW2CtTahZ8lJznH/z64C ZcV35VRLQDGSw== From: Frederic Weisbecker To: Thomas Gleixner Cc: LKML , Frederic Weisbecker , anna-maria@linutronix.de Subject: [PATCH 1/4] timers/migration: Fix another race between hotplug and idle entry/exit Date: Wed, 15 Jan 2025 00:15:04 +0100 Message-ID: <20250114231507.21672-2-frederic@kernel.org> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20250114231507.21672-1-frederic@kernel.org> References: <20250114231507.21672-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The following commit: 10a0e6f3d3db ("timers/migration: Move hierarchy setup into cpuhotplug prepare callback") has fixed a race between idle exit and cpu hotplug up leading to a wrong "0" value migrator assigned to the top level. However there is still a situation that remains unhandled: [GRP0:0] migrator =3D TMIGR_NONE active =3D NONE groupmask =3D 0 / \ \ 0 1 2..7 idle idle idle 0) The system is fully idle. [GRP0:0] migrator =3D CPU 0 active =3D CPU 0 groupmask =3D 0 / \ \ 0 1 2..7 active idle idle 1) CPU 0 is activating. It has done the cmpxchg on the top's ->migr_state but it hasn't yet returned to __walk_groups(). [GRP0:0] migrator =3D CPU 0 active =3D CPU 0, CPU 1 groupmask =3D 0 / \ \ 0 1 2..7 active active idle 2) CPU 1 is activating. CPU 0 stays the migrator (still stuck in __walk_groups(), delayed by #VMEXIT for example). [GRP1:0] migrator =3D TMIGR_NONE active =3D NONE groupmask =3D 0 / \ [GRP0:0] [GRP0:1] migrator =3D CPU 0 migrator =3D TMIGR_NONE active =3D CPU 0, CPU1 active =3D NONE groupmask =3D 2 groupmask =3D 1 / \ \ 0 1 2..7 8 active active idle !online 3) CPU 8 is preparing to boot. CPUHP_TMIGR_PREPARE is being ran by CPU 1 which has created the GRP0:1 and the new top GRP1:0 connected to GRP0:1 and GRP0:0. The groupmask of GRP0:0 is now 2. CPU 1 hasn't yet propagated its activation up to GRP1:0. [GRP1:0] migrator =3D 0 (!!!) active =3D NONE groupmask =3D 0 / \ [GRP0:0] [GRP0:1] migrator =3D CPU 0 migrator =3D TMIGR_NONE active =3D CPU 0, CPU1 active =3D NONE groupmask =3D 2 groupmask =3D 1 / \ \ 0 1 2..7 8 active active idle !online 4) CPU 0 finally resumed after its #VMEXIT. It's in __walk_groups() returning from tmigr_cpu_active(). The new top GRP1:0 is visible and fetched but the freshly updated groupmask of GRP0:0 may not be visible due to lack of ordering! As a result tmigr_active_up() is called to GRP0:0 with a child's groupmask of "0". This buggy "0" groupmask then becomes the migrator for GRP1:0 forever. As a result, timers on a fully idle system get ignored. One possible fix would be to define TMIGR_NONE as "0" so that such a race would have no effect. And after all TMIGR_NONE doesn't need to be anything else. However this would leave an uncomfortable state machine where gears happen not to break by chance but are vulnerable to future modifications. Keep TMIGR_NONE as is instead and pre-initialize to "1" the groupmask of any newly created top level. This groupmask is guaranteed to be visible upon fetching the corresponding group for the 1st time: _ By the upcoming CPU thanks to CPU hotplug synchronization between the control CPU (BP) and the booting one (AP). _ By the control CPU since the groupmask and parent pointers are initialized locally. _ By all CPUs belonging to the same group than the control CPU because they must wait for it to ever become idle before needing to walk to the new top. The cmpcxhg() on ->migr_state then makes sure its groupmask is visible. With this pre-initialization, it is guaranteed that if a future top level is linked to an old one, it is walked through with a valid groupmask. Fixes: 10a0e6f3d3db ("timers/migration: Move hierarchy setup into cpuhotplu= g prepare callback") Signed-off-by: Frederic Weisbecker --- kernel/time/timer_migration.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c index 8d57f7686bb0..c8a8ea2e5b98 100644 --- a/kernel/time/timer_migration.c +++ b/kernel/time/timer_migration.c @@ -1487,6 +1487,21 @@ static void tmigr_init_group(struct tmigr_group *gro= up, unsigned int lvl, s.seq =3D 0; atomic_set(&group->migr_state, s.state); =20 + /* + * If this is a new top-level, prepare its groupmask in advance. + * This avoids accidents where yet another new top-level is + * created in the future and made visible before the current groupmask. + */ + if (list_empty(&tmigr_level_list[lvl])) { + group->groupmask =3D BIT(0); + /* + * The previous top level has prepared its groupmask already, + * simply account it as the first child. + */ + if (lvl > 0) + group->num_children =3D 1; + } + timerqueue_init_head(&group->events); timerqueue_init(&group->groupevt.nextevt); group->groupevt.nextevt.expires =3D KTIME_MAX; @@ -1550,8 +1565,20 @@ static void tmigr_connect_child_parent(struct tmigr_= group *child, raw_spin_lock_irq(&child->lock); raw_spin_lock_nested(&parent->lock, SINGLE_DEPTH_NESTING); =20 + if (activate) { + /* + * @child is the old top and @parent the new one. In this + * case groupmask is pre-initialized and @child already + * accounted, along with its new sibling corresponding to the + * CPU going up. + */ + WARN_ON_ONCE(child->groupmask !=3D BIT(0) || parent->num_children !=3D 2= ); + } else { + /* Adding @child for the CPU going up to @parent. */ + child->groupmask =3D BIT(parent->num_children++); + } + child->parent =3D parent; - child->groupmask =3D BIT(parent->num_children++); =20 raw_spin_unlock(&parent->lock); raw_spin_unlock_irq(&child->lock); --=20 2.46.0