From nobody Tue Oct 7 14:04:47 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FB1C2749F1; Wed, 9 Jul 2025 10:39:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057588; cv=none; b=B9SAxb2tQQObDogUUVN9ihhiS67oab1pmUU5B/xquiEiCgT4YK5RGUPCLZ6Rk6hL2pRBSGlobxI8OUSTJIfC92JzlIXZ2u5zXSNVUQ2C2f60X8Y9n5WbGvhwTAaljeDLne0XI+6kHDtNye6JB3HIuLYCk+08ja5B3oDUUDvoRfc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057588; c=relaxed/simple; bh=a+ygBf+amrdcbD9NF+H30vF3/3M/W25SaCHncKnVNdA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=T4g7z7+fnoH5eueRDBXchiwS8hvPPLLqfPOzJfnJNIoyAXoZsUGjuu7DQ13Xu+M5+RPnhtI678FlnQgJbWvez0jd6sQ6xV27vKBappl+du1CMulwGIzbFm4fohP9bZ5f5iRcf3q9j1O8GV5xzQmGzUfoo82/hNpiViQZiu9/6M4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=mcS7eFmU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="mcS7eFmU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 501DAC4CEEF; Wed, 9 Jul 2025 10:39:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057586; bh=a+ygBf+amrdcbD9NF+H30vF3/3M/W25SaCHncKnVNdA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mcS7eFmUkHnt7K8ZnLLXPttOZqaqny04nQYBrKEvHa5jc+Cix2fHEXUlnvx7ePq0N nnSlTlxBuDeS5ch8jQ+MZtuxnb/Q7pbzKNw84+IJhD+h3T0HrxoJgxIbZWgKj1ubLD Xx6cJTY6fjJwTUmmSHV++2fhGooi4yB4pzXFzj6rIMOccLw2hG/q4FpzH2ZmsFNhDo d6YuD7VBtNyRBmEPfvuDAEdtf9ZOLTRx7zhaOZ9hwI10s+iK4eQIS3VX5NswKpFLHD E9KKvvcWON+AQ7rUYhzaEdftMSS+Q+ACocX4LD8geLwX4WFRWQf4TrhwBoy8QC37Pf NyHIOMfTdsipw== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 1/5] rcu/exp: Protect against early QS report Date: Wed, 9 Jul 2025 16:09:05 +0530 Message-Id: <20250709103909.15498-2-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> References: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Frederic Weisbecker When a grace period is started, the ->expmask of each node is set up from sync_exp_reset_tree(). Then later on each leaf node also initialize its ->exp_tasks pointer. This means that the initialization of the quiescent state of a node and the initialization of its blocking tasks happen with an unlocked node gap in-between. It happens to be fine because nothing is expected to report an exp quiescent state within this gap, since no IPI have been issued yet and every rdp's ->cpu_no_qs.b.exp should be false. However if it were to happen by accident, the quiescent state could be reported and propagated while ignoring tasks that blocked _before_ the start of the grace period. Prevent such trouble to happen in the future and initialize both the quiescent states mask to report and the blocked tasks head from the same node locked block. If a task blocks within an RCU read side critical section before sync_exp_reset_tree() is called and is then unblocked between sync_exp_reset_tree() and __sync_rcu_exp_select_node_cpus(), the QS won't be reported because no RCU exp IPI had been issued to request it through the setting of srdp->cpu_no_qs.b.exp. Reviewed-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Joel Fernandes Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree_exp.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index c36c7d5575ca..2fa7aa9155bd 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -141,6 +141,13 @@ static void __maybe_unused sync_exp_reset_tree(void) raw_spin_lock_irqsave_rcu_node(rnp, flags); WARN_ON_ONCE(rnp->expmask); WRITE_ONCE(rnp->expmask, rnp->expmaskinit); + /* + * Need to wait for any blocked tasks as well. Note that + * additional blocking tasks will also block the expedited GP + * until such time as the ->expmask bits are cleared. + */ + if (rcu_is_leaf_node(rnp) && rcu_preempt_has_tasks(rnp)) + WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next); raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } } @@ -393,13 +400,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu= _exp_work *rewp) } mask_ofl_ipi =3D rnp->expmask & ~mask_ofl_test; =20 - /* - * Need to wait for any blocked tasks as well. Note that - * additional blocking tasks will also block the expedited GP - * until such time as the ->expmask bits are cleared. - */ - if (rcu_preempt_has_tasks(rnp)) - WRITE_ONCE(rnp->exp_tasks, rnp->blkd_tasks.next); raw_spin_unlock_irqrestore_rcu_node(rnp, flags); =20 /* IPI the remaining CPUs for expedited quiescent state. */ --=20 2.40.1 From nobody Tue Oct 7 14:04:47 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2676228DEE8; Wed, 9 Jul 2025 10:39:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057595; cv=none; b=Cpqqkj4OyRxdrvr7kqxJAjGUabxzqRgjU5Cq6KNV9yKftMfucYMYn5JSWuu5YMVP+ClvEqxZ0wS1wRuOZk3WDaIge+9Qc4aa9Aacky18aN6P4IDekQ99qp5qqSYwkPIVhgptOAhDheedfncR21nUoy/dm1LQHZU5dJd3nA/Yv9U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057595; c=relaxed/simple; bh=PxHHLkx1VHK181EbcOOUVDwD6SaEG/JAFcZiYpxdAyg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=C4jNEU3Ot7n8r4G2iw/oOSzrjjJLZwxGYt3lET4o3g72V2U807DBoqMV72WLFMNGZXYivAHetuQ0I3UNGAsGj0FnF38bC6SHLSCg9sXUs7cxQTB9mCC4Lq1tjfvJunQqjF6fCp8SECd48z4FSLL60qeGqJWBa4VW4rnT9qA+dzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=t70636v+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="t70636v+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B43BDC4CEF4; Wed, 9 Jul 2025 10:39:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057594; bh=PxHHLkx1VHK181EbcOOUVDwD6SaEG/JAFcZiYpxdAyg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=t70636v+9hIiUkqS6/SLOSQXsvAYAHtjiwUsqbp7bpABuNOJ+4nLW4TN4NRMa9Jmd Tj+eDcdSMSOEOTo1y5IDmWiPGMQhO15u1NTmXayVSLdKK4Xfu/zAggVh1AovgEGUkp KVbTdv8LCSI78FwF8Yd50j6V5Z6h2qT0nAe033e8bAjSRFOovARA3r9uTb4DDHkxCW 0M5UfBmLEPkzWQ9yTEupigqYalj2GmWbiUHPDkrdc82SEEZHjpKesGHNpaZPHPNpMF hoscYxetbBottenls7WG97EjGZ27QoFwzgg4UgR886icfMrGZxIjavuFZQ7kXtwDar CyE4xEgX13reQ== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 2/5] rcu/exp: Remove confusing needless full barrier on task unblock Date: Wed, 9 Jul 2025 16:09:06 +0530 Message-Id: <20250709103909.15498-3-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> References: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Frederic Weisbecker A full memory barrier in the RCU-PREEMPT task unblock path advertizes to order the context switch (or rather the accesses prior to rcu_read_unlock()) with the expedited grace period fastpath. However the grace period can not complete without the rnp calling into rcu_report_exp_rnp() with the node locked. This reports the quiescent state in a fully ordered fashion against updater's accesses thanks to: 1) The READ-SIDE smp_mb__after_unlock_lock() barrier across nodes locking while propagating QS up to the root. 2) The UPDATE-SIDE smp_mb__after_unlock_lock() barrier while holding the the root rnp to wait/check for the GP completion. 3) The (perhaps redundant given step 1) and 2)) smp_mb() in rcu_seq_end() before the grace period completes. This makes the explicit barrier in this place superfluous. Therefore remove it as it is confusing. Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Joel Fernandes Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree_plugin.h | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 0b0f56f6abc8..0532a13cb75e 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -534,7 +534,6 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *= t, unsigned long flags) WARN_ON_ONCE(rnp->completedqs =3D=3D rnp->gp_seq && (!empty_norm || rnp->qsmask)); empty_exp =3D sync_rcu_exp_done(rnp); - smp_mb(); /* ensure expedited fastpath sees end of RCU c-s. */ np =3D rcu_next_node_entry(t, rnp); list_del_init(&t->rcu_node_entry); t->rcu_blocked_node =3D NULL; --=20 2.40.1 From nobody Tue Oct 7 14:04:47 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB12028D8D5; Wed, 9 Jul 2025 10:40:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057601; cv=none; b=k/SitUqwDJek22m9QPTkEIBvObgFe05zI6Gbzkae+8MqKMLHKnGWQREI1ZVB3S7iMVkOT08dZPH0Civ9OZFj3YH0nKTnkO4PrHOurALZ+8cw3AexVfmW+P74HQ/e0rFVOuAD2+BKxZvQ1vuHiepm+YfOvgVGBPnc6XFbyPsDbYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057601; c=relaxed/simple; bh=d0jNauGDQgc7MQrmLvQ3Mcsdm9lf6ZQmo/JtMesLTDA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qzZj/AiJfvyhx+vcm2uwMYF+ephixhmv752Dub+dcQBBRnsH7JEFhg7PFX44EK9gS5FNUgy027VyIMf2v5b5T/MFEWFpO+l/5L0IuEEWlg2rGdm+qCauzI0sZC3SBT3wnUcBJZfCoJ6ei8E927mJ7kTUN+tliZKxjOwox9poGF0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bKJTddOw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bKJTddOw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0656C4CEF4; Wed, 9 Jul 2025 10:39:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057601; bh=d0jNauGDQgc7MQrmLvQ3Mcsdm9lf6ZQmo/JtMesLTDA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bKJTddOwwjfnAUWDCR9ajJk9gz84/1RgTjYn80d3gMlI/LuwMa2OvtOY86Rwe2+I/ SCB+LvlZ+pxzur8alaWG4ZZB4dtLfjDZLmFQXBeIAIFlrAMLrGas8nsNAp130U/RPD Z65SiKVsbT91kuRSnZ5t+6vLFu8tZ+ui+M9bZnTNvhKqg5jVytPM6fToFPAB5zhkfa fb8xFiMyhX89tK15gYkMF4ylr3eDNKoRqAe3ROCHdU+ky9EEr5pon/MbJAZM2IGK0a tf/dYwdNf44+JHll4tMZMUpdV4W/yDzc0yi8VuEBoRY8+bteEFSRHTQZnF1UrqfHhd TjbPF+ZD0Xp4w== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 3/5] rcu/exp: Remove needless CPU up quiescent state report Date: Wed, 9 Jul 2025 16:09:07 +0530 Message-Id: <20250709103909.15498-4-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> References: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Frederic Weisbecker A CPU coming online checks for an ongoing grace period and reports a quiescent state accordingly if needed. This special treatment that shortcuts the expedited IPI finds its origin as an optimization purpose on the following commit: 338b0f760e84 (rcu: Better hotplug handling for synchronize_sched_expedited= () The point is to avoid an IPI while waiting for a CPU to become online or failing to become offline. However this is pointless and even error prone for several reasons: * If the CPU has been seen offline in the first round scanning offline and idle CPUs, no IPI is even tried and the quiescent state is reported on behalf of the CPU. * This means that if the IPI fails, the CPU just became offline. So it's unlikely to become online right away, unless the cpu hotplug operation failed and rolled back, which is a rare event that can wait a jiffy for a new IPI to be issued. * But then the "optimization" applying on failing CPU hotplug down only applies to !PREEMPT_RCU. * This force reports a quiescent state even if ->cpu_no_qs.b.exp is not set. As a result it can race with remote QS reports on the same rdp. Fortunately it happens to be OK but an accident is waiting to happen. For all those reasons, remove this optimization that doesn't look worthy to keep around. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Reviewed-by: Paul E. McKenney Signed-off-by: Joel Fernandes Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree.c | 2 -- kernel/rcu/tree_exp.h | 45 ++----------------------------------------- 2 files changed, 2 insertions(+), 45 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 14d4499c6fc3..0bda23fec690 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -160,7 +160,6 @@ static void rcu_report_qs_rnp(unsigned long mask, struc= t rcu_node *rnp, unsigned long gps, unsigned long flags); static void invoke_rcu_core(void); static void rcu_report_exp_rdp(struct rcu_data *rdp); -static void sync_sched_exp_online_cleanup(int cpu); static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rn= p); static bool rcu_rdp_is_offloaded(struct rcu_data *rdp); static bool rcu_rdp_cpu_online(struct rcu_data *rdp); @@ -4268,7 +4267,6 @@ int rcutree_online_cpu(unsigned int cpu) raw_spin_unlock_irqrestore_rcu_node(rnp, flags); if (rcu_scheduler_active =3D=3D RCU_SCHEDULER_INACTIVE) return 0; /* Too early in boot for scheduler work. */ - sync_sched_exp_online_cleanup(cpu); =20 // Stop-machine done, so allow nohz_full to disable tick. tick_dep_clear(TICK_DEP_BIT_RCU); diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 2fa7aa9155bd..6058a734090c 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -751,12 +751,8 @@ static void rcu_exp_handler(void *unused) struct task_struct *t =3D current; =20 /* - * First, is there no need for a quiescent state from this CPU, - * or is this CPU already looking for a quiescent state for the - * current grace period? If either is the case, just leave. - * However, this should not happen due to the preemptible - * sync_sched_exp_online_cleanup() implementation being a no-op, - * so warn if this does happen. + * WARN if the CPU is unexpectedly already looking for a + * QS or has already reported one. */ ASSERT_EXCLUSIVE_WRITER_SCOPED(rdp->cpu_no_qs.b.exp); if (WARN_ON_ONCE(!(READ_ONCE(rnp->expmask) & rdp->grpmask) || @@ -803,11 +799,6 @@ static void rcu_exp_handler(void *unused) WARN_ON_ONCE(1); } =20 -/* PREEMPTION=3Dy, so no PREEMPTION=3Dn expedited grace period to clean up= after. */ -static void sync_sched_exp_online_cleanup(int cpu) -{ -} - /* * Scan the current list of tasks blocked within RCU read-side critical * sections, printing out the tid of each that is blocking the current @@ -885,38 +876,6 @@ static void rcu_exp_handler(void *unused) rcu_exp_need_qs(); } =20 -/* Send IPI for expedited cleanup if needed at end of CPU-hotplug operatio= n. */ -static void sync_sched_exp_online_cleanup(int cpu) -{ - unsigned long flags; - int my_cpu; - struct rcu_data *rdp; - int ret; - struct rcu_node *rnp; - - rdp =3D per_cpu_ptr(&rcu_data, cpu); - rnp =3D rdp->mynode; - my_cpu =3D get_cpu(); - /* Quiescent state either not needed or already requested, leave. */ - if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) || - READ_ONCE(rdp->cpu_no_qs.b.exp)) { - put_cpu(); - return; - } - /* Quiescent state needed on current CPU, so set it up locally. */ - if (my_cpu =3D=3D cpu) { - local_irq_save(flags); - rcu_exp_need_qs(); - local_irq_restore(flags); - put_cpu(); - return; - } - /* Quiescent state needed on some other CPU, send IPI. */ - ret =3D smp_call_function_single(cpu, rcu_exp_handler, NULL, 0); - put_cpu(); - WARN_ON_ONCE(ret); -} - /* * Because preemptible RCU does not exist, we never have to check for * tasks blocked within RCU read-side critical sections that are --=20 2.40.1 From nobody Tue Oct 7 14:04:47 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DC3E28D8D6; Wed, 9 Jul 2025 10:40:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057608; cv=none; b=LahkubHF/pY8o6N/NmC4Y9i/96gyl/rmwOtM18xxJKnCqPAVUx27RihVZGCCwwL9QYZQ+ZhB+wKcTnTqmCoSXyzCH4tcADZ32Y2F1okD0dY9Rbv4zR3HhrtJNJu+TPS6dAcWQIhsScidX5aszzOLtYViTSOY8ZGf/XjQZQy03hw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057608; c=relaxed/simple; bh=Vx5swSkJc6lENCnzFmEdVW2ZusvksdSbXWKBGrAa3HA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=D3ikV1JCvfHpKuhdUGlIPRtV25kZwZyiZe7vI6vQKQQILyBuN3vyyKO71s3Aig8heoFhaEqAjNkIZ4R9Makrjpc2kT9//RZDE/d2dJcBlGylsd4etkvaFFmMksOoiHSGmoaXttajWCOWUa87Nn7J6+Qt+NrSENs0n82y+hwWd8k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=c/nw7HwW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="c/nw7HwW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F68BC4CEEF; Wed, 9 Jul 2025 10:40:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057607; bh=Vx5swSkJc6lENCnzFmEdVW2ZusvksdSbXWKBGrAa3HA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=c/nw7HwWoVJrRSvzPBWsvVZEoxDFcSbVtBxT7qHlYOxIb3dgaBlj0KgE5KTaG05E8 wFrBKy+QSf9S86EpaVGoe+ZMCV5lKvvGgWKYy0VFJPU2DGPnD4gTInoMW9Y8g6IpPm O+i9w65WNtcVc8cfYxVwWtw7w2EiH+7q8DwiUxf8DnoJKyTKsrx3cpdEjqTjHbRV5C xusTg5k7u5jiZTI2ffkjLjHq/nsFXjmJUXJ8u426lJbGFG+hCHqdXMTb8FIKqTrgYN MuOAcXaT1Qlz9C5v1xva38yw4ERA6jpuxwr9GOQEBpB228OI4uPPaLDjj730JdvqTJ YbG3ScL/ng/Lw== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 4/5] rcu/exp: Warn on QS requested on dying CPU Date: Wed, 9 Jul 2025 16:09:08 +0530 Message-Id: <20250709103909.15498-5-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> References: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Frederic Weisbecker It is not possible to send an IPI to a dying CPU that has passed the CPUHP_TEARDOWN_CPU stage. Remaining unhandled IPIs are handled later at CPUHP_AP_SMPCFD_DYING stage by stop machine. This is the last opportunity for RCU exp handler to request an expedited quiescent state. And the upcoming final context switch between stop machine and idle must have reported the requested context switch. Therefore, it should not be possible to observe a pending requested expedited quiescent state when RCU finally stops watching the outgoing CPU. Once IPIs aren't possible anymore, the QS for the target CPU will be reported on its behalf by the RCU exp kworker. Provide an assertion to verify those expectations. Reviewed-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Joel Fernandes Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 0bda23fec690..00c182b3f978 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4356,6 +4356,12 @@ void rcutree_report_cpu_dead(void) * may introduce a new READ-side while it is actually off the QS masks. */ lockdep_assert_irqs_disabled(); + /* + * CPUHP_AP_SMPCFD_DYING was the last call for rcu_exp_handler() executio= n. + * The requested QS must have been reported on the last context switch + * from stop machine to idle. + */ + WARN_ON_ONCE(rdp->cpu_no_qs.b.exp); // Do any dangling deferred wakeups. do_nocb_deferred_wakeup(rdp); =20 --=20 2.40.1 From nobody Tue Oct 7 14:04:47 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01ED828D8D6; Wed, 9 Jul 2025 10:40:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057614; cv=none; b=sVcNG+VxDpUXlrAnzcVp3sZzEb+283Wbg1VDKnrOQepUSN1bRdTb9G/qB6X+4LRTz8zodhDrES9+qAVKxrUwu916mKMF9NhMrh838HAizVagjY4Nnm7Rg8Cq/+0AMx3oJrZ+SGkSaKL1VXe4WO0SDYYDsc4W1c1cWE39QkmuO68= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057614; c=relaxed/simple; bh=C0U4dlo4faHf+FVuL2gY/gpjRyDf6ZQoyRlsTiHuD7o=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RUo0RT0w3WDlJkOTjhsPRHCyj0FyjMBcHa1tqsrEObdVubP0u/O0YYC1KOoQpEcT+zE+bjS6QdNX1sSBFAdEU1JPHuswa2xaELhzcZQU3w28P6lSYKxmyxCGhPps35gmYOb9TNn5j7BkHINoATz0VPaiczaynfBuBbBYkN7+2Yw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=orse9NAH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="orse9NAH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 626EFC4CEEF; Wed, 9 Jul 2025 10:40:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057613; bh=C0U4dlo4faHf+FVuL2gY/gpjRyDf6ZQoyRlsTiHuD7o=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=orse9NAH1bs2QKfRJs5XM7Q5p1aWBq1YhoYwHYRl2cQVnsfepgJsDFy3kgA7GpmoT Jo4vC6HT1coJ740sk7NEGJ1/YPW+l2pkUMnrOnvEYRIpAs18CKZttxks5JY/hjolmo jf0G9IC/3EsIY9DeeyrrRl5+jCHP3v1bS+gIM+juD3m7ouzTsl9zF3cmhND5d1m3Fg Px6D/T+yX6FOJB3eefi9/scy0kFSIeOV6e1bTZkLU5r4156LUjQ1m9Btk4AbGJeWqW dpvwfmuKxJqDzxFXRxXqPJWYe0QxOJAaK1Ezh+/r09T1beW9JMjKI4p/dQfX4Q1ZAW q36RjJoQz0+aA== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 5/5] rcu/exp: Warn on CPU lagging for too long within hotplug IPI's blindspot Date: Wed, 9 Jul 2025 16:09:09 +0530 Message-Id: <20250709103909.15498-6-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> References: <20250709103909.15498-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Frederic Weisbecker A CPU within hotplug operations can make the RCU exp kworker lagging if: * The dying CPU is running after CPUHP_TEARDOWN_CPU but before rcutree_report_cpu_dead(). It is too late to send an IPI but RCU is still watching the CPU. Therefore the exp kworker can only wait for the target to reach rcutree_report_cpu_dead(). * The booting CPU is running after rcutree_report_cpu_starting() but before set_cpu_online(). RCU is watching the CPU but it is too early to be able to send an IPI. Therefore the exp kworker can only wait until it observes the CPU as officially online. Such a lag is expected to be very short. However #VMEXIT and other hazards can stay on the way. Report long delays, 50 jiffies is considered a high threshold already. Reported-by: Paul E. McKenney Reviewed-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Joel Fernandes [neeraj: Change max retries to 50 jiffies] Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree_exp.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 6058a734090c..076ad61e42f4 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -406,8 +406,18 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu= _exp_work *rewp) for_each_leaf_node_cpu_mask(rnp, cpu, mask_ofl_ipi) { struct rcu_data *rdp =3D per_cpu_ptr(&rcu_data, cpu); unsigned long mask =3D rdp->grpmask; + int nr_retries =3D 0; =20 retry_ipi: + /* + * In case of retrying, CPU either is lagging: + * + * - between CPUHP_TEARDOWN_CPU and rcutree_report_cpu_dead() + * or: + * - between rcutree_report_cpu_starting() and set_cpu_online() + */ + WARN_ON_ONCE(nr_retries++ > 50); + if (rcu_watching_snap_stopped_since(rdp, rdp->exp_watching_snap)) { mask_ofl_test |=3D mask; continue; --=20 2.40.1