From nobody Tue Oct 7 13:07:49 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42CAF291C0F; Wed, 9 Jul 2025 10:41:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057702; cv=none; b=OL9yiyaBl/3mf8Tjww0oPqmRwrazJreenWsfBdxrokd//TAP+p9OxvBmh4PbctYYDc5XuZZVM6a7LBGlDbkV4vIzSfmR7ska3+3UP3f21PkYR2BwcDJq+cBE0Y56X7nnvIjsgcoNKWRovkRB+utbduAo/Cw7jPBImnLDVH3+BXU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057702; c=relaxed/simple; bh=3oTDZhx4qliFBI97FxAN+GLECJvSn3lalKekQqoF2jc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=poRhq/woMwEl+JOIKjyEAJZIbCbPIWxzu77KZX4t6bvh/EWU53HbMvPZyAQ+8YszAB+VmQzAKklUFGngJH/KHtUYQ9C9XOS8LQE0xWhIt22bRGMpuqS93iSg73PlX+LatP+XX5kk7GAn92PgQz0pyiw+xhN5S1rglXHINUiYdxQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WqOuqJAU; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WqOuqJAU" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B44D4C4AF0C; Wed, 9 Jul 2025 10:41:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057701; bh=3oTDZhx4qliFBI97FxAN+GLECJvSn3lalKekQqoF2jc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WqOuqJAU5Pl75UXamJjzcDf+V89g6IsktFeLWyDcX/mqlJEwZnxG+EK7blMsIGRX7 crzz9S68t4hgMN/y1PIKoFSZe02IrKRBpiToWZpuwq1X7AwPX3JHm3yfNt7hqdeHWh ODdeYOKv4pcqHiBB7jFyO42kZ28gE3LRyWKsIV/XIImZuxEW0UVfgBrvhMpmnDHZJC 7YiFRJwHC/paziqD7Jlb1O5qOvqjq++WY03Vm06ET803GBeby9g/SPVD7/9CpZVxTQ D9r44YPVJAkUmiy/oc9EC2gViC6iCaHd4Nup/ZOGgxxjDOEpLkQKsomANHLIA4L6rZ Qf8wIIxS7csTw== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 1/5] rcu: Robustify rcu_is_cpu_rrupt_from_idle() Date: Wed, 9 Jul 2025 16:11:14 +0530 Message-Id: <20250709104118.15532-2-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> References: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Frederic Weisbecker RCU relies on the context tracking nesting counter in order to determine if it is running in extended quiescent state. However the context tracking nesting counter is not completely synchronized with the actual context tracking state: * The nesting counter is set to 1 or incremented further _after_ the actual state is set to RCU watching. * The nesting counter is set to 0 or decremented further _before_ the actual state is set to RCU not watching. Therefore it is safe to assume that if ct_nesting() > 0, RCU is watching. But if ct_nesting() <=3D 0, RCU is not watching except for tiny windows. This hasn't been a problem so far because rcu_is_cpu_rrupt_from_idle() has only been called from interrupts. However the code is confusing and abuses the role of the context tracking nesting counter while there are more accurate indicators available. Clarify and robustify accordingly. Signed-off-by: Frederic Weisbecker Signed-off-by: Joel Fernandes Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 14d4499c6fc3..f83bbb408895 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -377,7 +377,7 @@ EXPORT_SYMBOL_GPL(rcu_momentary_eqs); */ static int rcu_is_cpu_rrupt_from_idle(void) { - long nesting; + long nmi_nesting =3D ct_nmi_nesting(); =20 /* * Usually called from the tick; but also used from smp_function_call() @@ -389,21 +389,28 @@ static int rcu_is_cpu_rrupt_from_idle(void) /* Check for counter underflows */ RCU_LOCKDEP_WARN(ct_nesting() < 0, "RCU nesting counter underflow!"); - RCU_LOCKDEP_WARN(ct_nmi_nesting() <=3D 0, - "RCU nmi_nesting counter underflow/zero!"); =20 - /* Are we at first interrupt nesting level? */ - nesting =3D ct_nmi_nesting(); - if (nesting > 1) + /* Non-idle interrupt or nested idle interrupt */ + if (nmi_nesting > 1) return false; =20 /* - * If we're not in an interrupt, we must be in the idle task! + * Non nested idle interrupt (interrupting section where RCU + * wasn't watching). */ - WARN_ON_ONCE(!nesting && !is_idle_task(current)); + if (nmi_nesting =3D=3D 1) + return true; + + /* Not in an interrupt */ + if (!nmi_nesting) { + RCU_LOCKDEP_WARN(!in_task() || !is_idle_task(current), + "RCU nmi_nesting counter not in idle task!"); + return !rcu_is_watching_curr_cpu(); + } =20 - /* Does CPU appear to be idle from an RCU standpoint? */ - return ct_nesting() =3D=3D 0; + RCU_LOCKDEP_WARN(1, "RCU nmi_nesting counter underflow/zero!"); + + return false; } =20 #define DEFAULT_RCU_BLIMIT (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 1= 000 : 10) --=20 2.40.1 From nobody Tue Oct 7 13:07:49 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 552ED28DB63; Wed, 9 Jul 2025 10:41:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057708; cv=none; b=VbDLkPBoFIJlB80mwK1dzXZdctTBv0F4TCb5+2cmcdnn8tdluDhswFgan7WWF4w5/kcxyAPzvJ9Xypn2//wce5Be4N2JjcRpblRXV4+m8n2FeEOyiTwD6O1Pz8rGoHFAnaMzAnsD8r6Z5GaWN3nazmh3PkkR2XzAVDQFBZyF/Fo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057708; c=relaxed/simple; bh=M8ZjJ30VmwOexsmfFOpejwe1qhUqE7IeUo2Kq/99r1w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=V+B4uXP11nyH8Tox5CHzShf2uWBwy5oq6U9KbN/tCqTrkUsJuCtbCkIXsmkOAphbBkY+HDgFNUScDvOO63jlEpTkwtGHu9PJXlhkP3iXgch1ctvqgnaFiJcWRfcGNjw3Buly3upDMqWB1xpmVnzpiJxk9OL0pPAZe+NbdvB4kPo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PkB7/kas; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PkB7/kas" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8DEFDC4CEEF; Wed, 9 Jul 2025 10:41:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057707; bh=M8ZjJ30VmwOexsmfFOpejwe1qhUqE7IeUo2Kq/99r1w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PkB7/kasasKhTB/oXK19iJExNC4RuTeV6UDF5Tne9TWO8EYbXimp70tKs4ZXpqcTB R9Wl9k8CjYRlY/Wbmwzc74e0dSA8V0yHXRn4MS/ZFs/clxBkQoOuLX9OFZPUkOXEkh I4xjacRrVYzqbqcARgtpCT96cExVjk6di1WueP/7kBONyPuvDLNG/B1ZRnosnDVkz8 hFqCUCuAW7GgbvsCD4M5qmEbZprdArU+CXAhnMS4o28jDEXN6kKVAt/v7wN9EB9rGy qrjjTNSUSvW3aFWXgecqTl5HAxWZOEXovc+gXLEcpwxRjIv0mSkuh99eg/jkj8jqtu DuVitPe53ISmg== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 2/5] rcu: Protect ->defer_qs_iw_pending from data race Date: Wed, 9 Jul 2025 16:11:15 +0530 Message-Id: <20250709104118.15532-3-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> References: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Paul E. McKenney" On kernels built with CONFIG_IRQ_WORK=3Dy, when rcu_read_unlock() is invoked within an interrupts-disabled region of code [1], it will invoke rcu_read_unlock_special(), which uses an irq-work handler to force the system to notice when the RCU read-side critical section actually ends. That end won't happen until interrupts are enabled at the soonest. In some kernels, such as those booted with rcutree.use_softirq=3Dy, the irq-work handler is used unconditionally. The per-CPU rcu_data structure's ->defer_qs_iw_pending field is updated by the irq-work handler and is both read and updated by rcu_read_unlock_special(). This resulted in the following KCSAN splat: Reviewed-by: Frederic Weisbecker ------------------------------------------------------------------------ BUG: KCSAN: data-race in rcu_preempt_deferred_qs_handler / rcu_read_unlock_= special read to 0xffff96b95f42d8d8 of 1 bytes by task 90 on cpu 8: rcu_read_unlock_special+0x175/0x260 __rcu_read_unlock+0x92/0xa0 rt_spin_unlock+0x9b/0xc0 __local_bh_enable+0x10d/0x170 __local_bh_enable_ip+0xfb/0x150 rcu_do_batch+0x595/0xc40 rcu_cpu_kthread+0x4e9/0x830 smpboot_thread_fn+0x24d/0x3b0 kthread+0x3bd/0x410 ret_from_fork+0x35/0x40 ret_from_fork_asm+0x1a/0x30 write to 0xffff96b95f42d8d8 of 1 bytes by task 88 on cpu 8: rcu_preempt_deferred_qs_handler+0x1e/0x30 irq_work_single+0xaf/0x160 run_irq_workd+0x91/0xc0 smpboot_thread_fn+0x24d/0x3b0 kthread+0x3bd/0x410 ret_from_fork+0x35/0x40 ret_from_fork_asm+0x1a/0x30 no locks held by irq_work/8/88. irq event stamp: 200272 hardirqs last enabled at (200272): [] finish_task_switch= +0x131/0x320 hardirqs last disabled at (200271): [] __schedule+0x129/0= xd70 softirqs last enabled at (0): [] copy_process+0x4df/0x1c= c0 softirqs last disabled at (0): [<0000000000000000>] 0x0 ------------------------------------------------------------------------ The problem is that irq-work handlers run with interrupts enabled, which means that rcu_preempt_deferred_qs_handler() could be interrupted, and that interrupt handler might contain an RCU read-side critical section, which might invoke rcu_read_unlock_special(). In the strict KCSAN mode of operation used by RCU, this constitutes a data race on the ->defer_qs_iw_pending field. This commit therefore disables interrupts across the portion of the rcu_preempt_deferred_qs_handler() that updates the ->defer_qs_iw_pending field. This suffices because this handler is not a fast path. Signed-off-by: Paul E. McKenney Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree_plugin.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 0b0f56f6abc8..a91b2322a0cd 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -624,10 +624,13 @@ notrace void rcu_preempt_deferred_qs(struct task_stru= ct *t) */ static void rcu_preempt_deferred_qs_handler(struct irq_work *iwp) { + unsigned long flags; struct rcu_data *rdp; =20 rdp =3D container_of(iwp, struct rcu_data, defer_qs_iw); + local_irq_save(flags); rdp->defer_qs_iw_pending =3D false; + local_irq_restore(flags); } =20 /* --=20 2.40.1 From nobody Tue Oct 7 13:07:49 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4D1F28CF69; Wed, 9 Jul 2025 10:41:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057713; cv=none; b=Vi1CZXPPPU4k7uuKW2cmcsxZjdoiMbOfQTMgrk6+tpoKQ32QXdeCIyjS1ERHXBPHoVbIeKR0jDZPnhWWlUNrE0MLpeeRqq2BoK60H0HaZSqKqpfmDnZ9holhGi6b5cK4WGZ2rquCgRChG9FGc8b2YsCeap/nUZTF0MkKhy+ilkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057713; c=relaxed/simple; bh=+chBf2HZW9HmfqseG0+docrixkoKxthSGBjXdaZlYxI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OQLMiC2sIo8vsF2L82HIFdvnNv97eJItv4x1HJlN1Xcst7TqAKKPXuDUL0HSCpaG/hTRdzgY4y0oIMgudb6ICn/wbQlz422PnGItZba6RGciMZk+xSj99qYYivop0Xp+Cgk5zGqFMWUF5MHJDDv3xeYl5rvH6yUnbQxiVnzEI3I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tqpY0B+e; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tqpY0B+e" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9776CC4CEEF; Wed, 9 Jul 2025 10:41:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057713; bh=+chBf2HZW9HmfqseG0+docrixkoKxthSGBjXdaZlYxI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tqpY0B+eIQSVZouH73Jb8rFfJNGoUoYOok4YDK6fCIcLvD2RjoYm/43wJjKkietqj v0nHcL1WVVRLMc1Q7DWmxjKG6s3vdEUmW1pM9aR1OK3d2omE+i4Wyjn5rbO24Si+LJ PoIz2aYe/MlD0ZKxr/PmKv258SQPxIVlX8mpJAuuASQYehM15B6OQZdYtCJ6EKd2/y H/8XzynMhHsaYhF038hoj2RGJWxTgMLxP9uwRBa2GnymrVLv6aTG4cXJnNAkDSxqYM NP/lkMPyhTPtV/Qo1BFI4dgam+UdMZQXrImFrQNzmIZzJgipOBB/AgyU6T74fVgSi1 DjmuKGfmssM9Q== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 3/5] rcu: Enable rcu_normal_wake_from_gp on small systems Date: Wed, 9 Jul 2025 16:11:16 +0530 Message-Id: <20250709104118.15532-4-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> References: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Uladzislau Rezki (Sony)" Automatically enable the rcu_normal_wake_from_gp parameter on systems with a small number of CPUs. The activation threshold is set to 16 CPUs. This helps to reduce a latency of normal synchronize_rcu() API by waking up GP-waiters earlier and decoupling synchronize_rcu() callers from regular callback handling. A benchmark running 64 parallel jobs(system with 64 CPUs) invoking synchronize_rcu() demonstrates a notable latency reduction with the setting enabled. Latency distribution (microseconds): 0 - 9999 : 1 10000 - 19999 : 4 20000 - 29999 : 399 30000 - 39999 : 3197 40000 - 49999 : 10428 50000 - 59999 : 17363 60000 - 69999 : 15529 70000 - 79999 : 9287 80000 - 89999 : 4249 90000 - 99999 : 1915 100000 - 109999 : 922 110000 - 119999 : 390 120000 - 129999 : 187 ... 0 - 9999 : 1 10000 - 19999 : 234 20000 - 29999 : 6678 30000 - 39999 : 33463 40000 - 49999 : 20669 50000 - 59999 : 2766 60000 - 69999 : 183 ... Reviewed-by: Joel Fernandes Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Neeraj Upadhyay (AMD) Reviewed-by: Frederic Weisbecker --- kernel/rcu/tree.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index f83bbb408895..8c22db759978 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1632,8 +1632,10 @@ static void rcu_sr_put_wait_head(struct llist_node *= node) atomic_set_release(&sr_wn->inuse, 0); } =20 -/* Disabled by default. */ -static int rcu_normal_wake_from_gp; +/* Enable rcu_normal_wake_from_gp automatically on small systems. */ +#define WAKE_FROM_GP_CPU_THRESHOLD 16 + +static int rcu_normal_wake_from_gp =3D -1; module_param(rcu_normal_wake_from_gp, int, 0644); static struct workqueue_struct *sync_wq; =20 @@ -3250,7 +3252,7 @@ static void synchronize_rcu_normal(void) =20 trace_rcu_sr_normal(rcu_state.name, &rs.head, TPS("request")); =20 - if (!READ_ONCE(rcu_normal_wake_from_gp)) { + if (READ_ONCE(rcu_normal_wake_from_gp) < 1) { wait_rcu_gp(call_rcu_hurry); goto trace_complete_out; } @@ -4854,6 +4856,12 @@ void __init rcu_init(void) sync_wq =3D alloc_workqueue("sync_wq", WQ_MEM_RECLAIM, 0); WARN_ON(!sync_wq); =20 + /* Respect if explicitly disabled via a boot parameter. */ + if (rcu_normal_wake_from_gp < 0) { + if (num_possible_cpus() <=3D WAKE_FROM_GP_CPU_THRESHOLD) + rcu_normal_wake_from_gp =3D 1; + } + /* Fill in default value for rcutree.qovld boot parameter. */ /* -After- the rcu_node ->lock fields are initialized! */ if (qovld < 0) --=20 2.40.1 From nobody Tue Oct 7 13:07:49 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 075A928DF31; Wed, 9 Jul 2025 10:41:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057720; cv=none; b=cG3uKMh7opZTBx/SupdDdXvpKg1S8ilSvfhLbkKFa3XntNdQj1X20ni6OpN2WeLIZ6pTtiTC3D2bBErRMvJCRdf7sNzThDEB0g3q641cPup9gI7FixVKLVsNvFUd4/2iD2joGfuSMLHUzSrysF8KOYGhKP3KPzQLekobFqIdJe0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057720; c=relaxed/simple; bh=1ak+ddk3PY8OyM/OOq41y2PAuJg1y6QU46hpsXxpyMU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ctUR65lJv6Fm+hmaCsUS4OSnJf1g9L7G3kFGhT0yHA7GRHtRYwQIZEMa2LNgMTai0p2c+Epc5eO7oaTdeNed5rj33tW06W8XZ9ti4TZ+o4bWAAyDBg5XUJf3nj9m4w8If2fEqj1JItvoQzKFQ7Fd86TfVbf47ZemzHhLbvJtNiY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sLmooCTK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sLmooCTK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 57723C4CEF4; Wed, 9 Jul 2025 10:41:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057719; bh=1ak+ddk3PY8OyM/OOq41y2PAuJg1y6QU46hpsXxpyMU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sLmooCTKn9KtC14IgonQpFWfXmWit/XjYARFd1Iy/zuEapOqFBndL5gY1kW0BVOPb 0K0Af+pIHhfgO4Zt86wyFUt4ToWb0Evu6fh1+eNzhM2uCvz/f8CkGuhoZuZU65BKQN WCw2RwUVWEjzmAvAlJYeolz+owd7ZmZhLECLWe0uCXt+n0hO1d19e4VNOcllyghH4a 2+m69q55ibM7bjR3vkYoOUwMNRheAfb7Ey0mzUzpR97d6PEHGGSPy2zR3eYsACGxpX q4OpF2z155BpFuVsb6GLXfmRvicIAsdATCai2Oxcf4LEFXCnhasZKOhPM20BiZ98L+ 4EmpL5uel2afg== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" Subject: [PATCH rcu 4/5] Documentation/kernel-parameters: Update rcu_normal_wake_from_gp doc Date: Wed, 9 Jul 2025 16:11:17 +0530 Message-Id: <20250709104118.15532-5-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> References: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Uladzislau Rezki (Sony)" Update the documentation about rcu_normal_wake_from_gp parameter. Reviewed-by: Joel Fernandes Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Neeraj Upadhyay (AMD) Reviewed-by: Frederic Weisbecker --- Documentation/admin-guide/kernel-parameters.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index f1f2c0874da9..f7e4bee2b823 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5485,7 +5485,8 @@ echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=3D1" =20 - Default is 0. + Default is 1 if num_possible_cpus() <=3D 16 and it is not explicitly + disabled by the boot parameter passing 0. =20 rcuscale.gp_async=3D [KNL] Measure performance of asynchronous --=20 2.40.1 From nobody Tue Oct 7 13:07:49 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 410BF2BCF45; Wed, 9 Jul 2025 10:42:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057727; cv=none; b=GCEHvMuv/uTCv0WEl6WQlCZ1wj/r4GjHQsNW+iEADoZH3nw3T0M9y4Zsss8VL4Nd1dfR/JyugIL7h533I0GH6lXpAN/nasL7CdW025rXfVyRDJOx+eAZ7DVTQ+nEvZ54mYJWKB9ZO4tbCenZj3u0QUyoTl2/BGgNNf2PaP8HXuo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1752057727; c=relaxed/simple; bh=8kxDM7bPpCuwllffDDYVDUsn4K5+87KDsl5bcSGamoQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=F2wzXC4Xq4tEFRyN9z8REFBoq/WDNmcI8bEGVsUU/gc34m+0+SKwYR4y3578GYeR1BbgTohbpjzONGD+8O8K2ZjK/mQi+yeyev65Ak4W//tqH4vn+Y05qiVus6+X44iLQ1xsWAJcgk+soSHowu0edpHqidqqzGEtQTHnq9J4et4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=A8/cakJw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="A8/cakJw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A1AFC4CEEF; Wed, 9 Jul 2025 10:42:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752057726; bh=8kxDM7bPpCuwllffDDYVDUsn4K5+87KDsl5bcSGamoQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A8/cakJw40StZnliKOAJmGmsMRMG2Q7WoUEjrahB2WfMc2h7eDWMwPWbXFxAFzIPb azfy80jHExV0Tzgib8th5milOH3JT0Ltg/hd26fehFlGjp7ajIvBN1lkU9g/M7vKHE t0dWXUOkLM40m9zpeN2mkH2fhAqZVQRoaMM0y1hSHhbs7549tWzZSwZetZiSzlyC57 yoH3wyS6HrwzMQi1GzpCVHXr480jyTtMe87i2llwYq9xGWR/s6QSo+pxRCrKZvl3Cl mKirOC4+scfDyFMi4Xv/c2vpyYj6Ftzd8sEP3gMk41qkn+kUPIQ8PtecHmhKRBHVO6 2t5qvkEB+i2UQ== From: neeraj.upadhyay@kernel.org To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, joelagnelf@nvidia.com, frederic@kernel.org, boqun.feng@gmail.com, urezki@gmail.com, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, qiang.zhang1211@gmail.com, neeraj.iitr10@gmail.com, neeraj.upadhyay@amd.com, "Neeraj Upadhyay (AMD)" , Xiongfeng Wang , Qi Xi Subject: [PATCH rcu 5/5] rcu: Fix rcu_read_unlock() deadloop due to IRQ work Date: Wed, 9 Jul 2025 16:11:18 +0530 Message-Id: <20250709104118.15532-6-neeraj.upadhyay@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> References: <20250709104118.15532-1-neeraj.upadhyay@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Joel Fernandes During rcu_read_unlock_special(), if this happens during irq_exit(), we can lockup if an IPI is issued. This is because the IPI itself triggers the irq_exit() path causing a recursive lock up. This is precisely what Xiongfeng found when invoking a BPF program on the trace_tick_stop() tracepoint As shown in the trace below. Fix by managing the irq_work state correctly. irq_exit() __irq_exit_rcu() /* in_hardirq() returns false after this */ preempt_count_sub(HARDIRQ_OFFSET) tick_irq_exit() tick_nohz_irq_exit() tick_nohz_stop_sched_tick() trace_tick_stop() /* a bpf prog is hooked on this trace point */ __bpf_trace_tick_stop() bpf_trace_run2() rcu_read_unlock_special() /* will send a IPI to itself */ irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu); A simple reproducer can also be obtained by doing the following in tick_irq_exit(). It will hang on boot without the patch: static inline void tick_irq_exit(void) { + rcu_read_lock(); + WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true); + rcu_read_unlock(); + Reported-by: Xiongfeng Wang Closes: https://lore.kernel.org/all/9acd5f9f-6732-7701-6880-4b51190aa070@hu= awei.com/ Tested-by: Qi Xi Signed-off-by: Joel Fernandes Reviewed-by: "Paul E. McKenney" Signed-off-by: Neeraj Upadhyay (AMD) --- kernel/rcu/tree.h | 11 ++++++++++- kernel/rcu/tree_plugin.h | 23 +++++++++++++++++++---- 2 files changed, 29 insertions(+), 5 deletions(-) diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 3830c19cf2f6..f8f612269e6e 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -174,6 +174,15 @@ struct rcu_snap_record { unsigned long jiffies; /* Track jiffies value */ }; =20 +/* + * The IRQ work (deferred_qs_iw) is used by RCU to get scheduler's attenti= on. + * It can be in one of the following states: + * - DEFER_QS_IDLE: An IRQ work was never scheduled. + * - DEFER_QS_PENDING: An IRQ work was scheduler but never run. + */ +#define DEFER_QS_IDLE 0 +#define DEFER_QS_PENDING 1 + /* Per-CPU data for read-copy update. */ struct rcu_data { /* 1) quiescent-state and grace-period handling : */ @@ -192,7 +201,7 @@ struct rcu_data { /* during and after the last grace */ /* period it is aware of. */ struct irq_work defer_qs_iw; /* Obtain later scheduler attention. */ - bool defer_qs_iw_pending; /* Scheduler attention pending? */ + int defer_qs_iw_pending; /* Scheduler attention pending? */ struct work_struct strict_work; /* Schedule readers for strict GPs. */ =20 /* 2) batch handling */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index a91b2322a0cd..aec584812574 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -486,13 +486,16 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct= *t, unsigned long flags) struct rcu_node *rnp; union rcu_special special; =20 + rdp =3D this_cpu_ptr(&rcu_data); + if (rdp->defer_qs_iw_pending =3D=3D DEFER_QS_PENDING) + rdp->defer_qs_iw_pending =3D DEFER_QS_IDLE; + /* * If RCU core is waiting for this CPU to exit its critical section, * report the fact that it has exited. Because irqs are disabled, * t->rcu_read_unlock_special cannot change. */ special =3D t->rcu_read_unlock_special; - rdp =3D this_cpu_ptr(&rcu_data); if (!special.s && !rdp->cpu_no_qs.b.exp) { local_irq_restore(flags); return; @@ -629,7 +632,18 @@ static void rcu_preempt_deferred_qs_handler(struct irq= _work *iwp) =20 rdp =3D container_of(iwp, struct rcu_data, defer_qs_iw); local_irq_save(flags); - rdp->defer_qs_iw_pending =3D false; + + /* + * Requeue the IRQ work on next unlock in following situation: + * 1. rcu_read_unlock() queues IRQ work (state -> DEFER_QS_PENDING) + * 2. CPU enters new rcu_read_lock() + * 3. IRQ work runs but cannot report QS due to rcu_preempt_depth() > 0 + * 4. rcu_read_unlock() does not re-queue work (state still PENDING) + * 5. Deferred QS reporting does not happen. + */ + if (rcu_preempt_depth() > 0) + WRITE_ONCE(rdp->defer_qs_iw_pending, DEFER_QS_IDLE); + local_irq_restore(flags); } =20 @@ -676,7 +690,8 @@ static void rcu_read_unlock_special(struct task_struct = *t) set_tsk_need_resched(current); set_preempt_need_resched(); if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled && - expboost && !rdp->defer_qs_iw_pending && cpu_online(rdp->cpu)) { + expboost && rdp->defer_qs_iw_pending !=3D DEFER_QS_PENDING && + cpu_online(rdp->cpu)) { // Get scheduler to re-evaluate and call hooks. // If !IRQ_WORK, FQS scan will eventually IPI. if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) && @@ -686,7 +701,7 @@ static void rcu_read_unlock_special(struct task_struct = *t) else init_irq_work(&rdp->defer_qs_iw, rcu_preempt_deferred_qs_handler); - rdp->defer_qs_iw_pending =3D true; + rdp->defer_qs_iw_pending =3D DEFER_QS_PENDING; irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu); } } --=20 2.40.1