From nobody Mon May 11 01:26:30 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52720C433EF for ; Tue, 19 Apr 2022 12:23:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352273AbiDSM0R (ORCPT ); Tue, 19 Apr 2022 08:26:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244603AbiDSM0O (ORCPT ); Tue, 19 Apr 2022 08:26:14 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95BB635DD1 for ; Tue, 19 Apr 2022 05:23:31 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2E28A61508 for ; Tue, 19 Apr 2022 12:23:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D8E3AC385AC; Tue, 19 Apr 2022 12:23:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1650371010; bh=8EVj8d4/EgQnUGlxBLS7T3VU8yE8WTLp2Wka3M3gEHY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eSyare0qWVXojwTFjeZC55/WIqHbofdZpTHQYKgR3wWlw1jtDyYGByy/1Qpqsrgmv 1IT/AGvIDhmE0QPZ47iXONZuHw5JXuaOVwqZawfvQW0rnChetu15ShIk6JUbPZfN95 qcCBZD19UkHGiTTOmt4WBgJmJRpzoFtqmE2hafjfdAdBazlmrqkz0Xcv1agvgnO7up 6QN/Z7xhQfrj7NU9Wv3125x4ElijvqjISGcbAlKOjGIoh35O7WoHgT0hJAdDKpNFPh lgwbVGEbR72fnlyyP3VLtUIOvBy2N9MTkmGJgnZLeSIFvneZOT7Xz8bWlOBYslMeaO 449hxoHZNHN+w== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Frederic Weisbecker , Neeraj Upadhyay , Boqun Feng , Zqiang , Uladzislau Rezki , Joel Fernandes Subject: [PATCH 1/3] rcu/nocb: Add/del rdp to iterate from rcuog itself Date: Tue, 19 Apr 2022 14:23:18 +0200 Message-Id: <20220419122320.2060902-2-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220419122320.2060902-1-frederic@kernel.org> References: <20220419122320.2060902-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" NOCB rdp's are part of a group whose list is iterated by the corresponding rdp leader. This list is RCU traversed because an rdp can be either added or deleted concurrently. Upon addition, a new iteration to the list after a synchronization point (a pair of LOCK/UNLOCK ->nocb_gp_lock) is forced to make sure: 1) we didn't miss a new element added in the middle of an iteration 2) we didn't ignore a whole subset of the list due to an element being quickly deleted and then re-added. 3) we prevent from probably other surprises... Although this layout is expected to be safe, it doesn't help anybody to sleep well. Simplify instead the nocb state toggling with moving the list modification from the nocb (de-)offloading workqueue to the rcuog kthreads instead. Whenever the rdp leader is expected to (re-)set the SEGCBLIST_KTHREAD_GP flag of a target rdp, the latter is queued so that the leader handles the flag flip along with adding or deleting the target rdp to the list to iterate. This way the list modification and iteration happen from the same kthread and those operations can't race altogether. As a bonus, the flags for each rdp don't need to be checked locklessly before each iteration, which is one less opportunity to produce nightmares. Signed-off-by: Frederic Weisbecker Cc: Neeraj Upadhyay Cc: Boqun Feng Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Zqiang --- kernel/rcu/tree.h | 1 + kernel/rcu/tree_nocb.h | 138 +++++++++++++++++++++-------------------- 2 files changed, 71 insertions(+), 68 deletions(-) diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 5634e76106c4..f12178eb8380 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -235,6 +235,7 @@ struct rcu_data { * if rdp_gp. */ struct list_head nocb_entry_rdp; /* rcu_data node in wakeup chain. */ + struct rcu_data *nocb_toggling_rdp; /* rdp queued for (de-)offloading */ =20 /* The following fields are used by CB kthread, hence new cacheline. */ struct rcu_data *nocb_gp_rdp ____cacheline_internodealigned_in_smp; diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index 46694e13398a..dac74952e1d1 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -546,52 +546,43 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp= , bool was_alldone, } } =20 -/* - * Check if we ignore this rdp. - * - * We check that without holding the nocb lock but - * we make sure not to miss a freshly offloaded rdp - * with the current ordering: - * - * rdp_offload_toggle() nocb_gp_enabled_cb() - * ------------------------- ---------------------------- - * WRITE flags LOCK nocb_gp_lock - * LOCK nocb_gp_lock READ/WRITE nocb_gp_sleep - * READ/WRITE nocb_gp_sleep UNLOCK nocb_gp_lock - * UNLOCK nocb_gp_lock READ flags - */ -static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp) -{ - u8 flags =3D SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP; - - return rcu_segcblist_test_flags(&rdp->cblist, flags); -} - -static inline bool nocb_gp_update_state_deoffloading(struct rcu_data *rdp, - bool *needwake_state) +static int nocb_gp_toggle_rdp(struct rcu_data *rdp, + bool *wake_state) { struct rcu_segcblist *cblist =3D &rdp->cblist; + unsigned long flags; + int ret; =20 - if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) { - if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { - rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP); - if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) - *needwake_state =3D true; - } - return false; + rcu_nocb_lock_irqsave(rdp, flags); + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) && + !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { + /* + * Offloading. Set our flag and notify the offload worker. + * We will handle this rdp until it ever gets de-offloaded. + */ + rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP); + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) + *wake_state =3D true; + ret =3D 1; + } else if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED) && + rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { + /* + * De-offloading. Clear our flag and notify the de-offload worker. + * We will ignore this rdp until it ever gets re-offloaded. + */ + rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP); + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) + *wake_state =3D true; + ret =3D 0; + } else { + WARN_ON_ONCE(1); + ret =3D -1; } =20 - /* - * De-offloading. Clear our flag and notify the de-offload worker. - * We will ignore this rdp until it ever gets re-offloaded. - */ - WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)); - rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP); - if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) - *needwake_state =3D true; - return true; -} + rcu_nocb_unlock_irqrestore(rdp, flags); =20 + return ret; +} =20 /* * No-CBs GP kthreads come here to wait for additional callbacks to show up @@ -609,7 +600,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) bool needwait_gp =3D false; // This prevents actual uninitialized use. bool needwake; bool needwake_gp; - struct rcu_data *rdp; + struct rcu_data *rdp, *rdp_toggling =3D NULL; struct rcu_node *rnp; unsigned long wait_gp_seq =3D 0; // Suppress "use uninitialized" warning. bool wasempty =3D false; @@ -634,19 +625,10 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) * is added to the list, so the skipped-over rcu_data structures * won't be ignored for long. */ - list_for_each_entry_rcu(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp, 1) { - bool needwake_state =3D false; - - if (!nocb_gp_enabled_cb(rdp)) - continue; + list_for_each_entry(rdp, &my_rdp->nocb_head_rdp, nocb_entry_rdp) { trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); rcu_nocb_lock_irqsave(rdp, flags); - if (nocb_gp_update_state_deoffloading(rdp, &needwake_state)) { - rcu_nocb_unlock_irqrestore(rdp, flags); - if (needwake_state) - swake_up_one(&rdp->nocb_state_wq); - continue; - } + lockdep_assert_held(&rdp->nocb_lock); bypass_ncbs =3D rcu_cblist_n_cbs(&rdp->nocb_bypass); if (bypass_ncbs && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) || @@ -656,8 +638,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) bypass_ncbs =3D rcu_cblist_n_cbs(&rdp->nocb_bypass); } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) { rcu_nocb_unlock_irqrestore(rdp, flags); - if (needwake_state) - swake_up_one(&rdp->nocb_state_wq); continue; /* No callbacks here, try next. */ } if (bypass_ncbs) { @@ -705,8 +685,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) } if (needwake_gp) rcu_gp_kthread_wake(); - if (needwake_state) - swake_up_one(&rdp->nocb_state_wq); } =20 my_rdp->nocb_gp_bypass =3D bypass; @@ -739,15 +717,49 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) !READ_ONCE(my_rdp->nocb_gp_sleep)); trace_rcu_this_gp(rnp, my_rdp, wait_gp_seq, TPS("EndWait")); } + if (!rcu_nocb_poll) { raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); + // (De-)queue an rdp to/from the group if its nocb state is changing + rdp_toggling =3D my_rdp->nocb_toggling_rdp; + if (rdp_toggling) + my_rdp->nocb_toggling_rdp =3D NULL; + if (my_rdp->nocb_defer_wakeup > RCU_NOCB_WAKE_NOT) { WRITE_ONCE(my_rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); del_timer(&my_rdp->nocb_timer); } WRITE_ONCE(my_rdp->nocb_gp_sleep, true); raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); + } else { + rdp_toggling =3D READ_ONCE(my_rdp->nocb_toggling_rdp); + if (rdp_toggling) { + /* + * Paranoid locking to make sure nocb_toggling_rdp is well + * reset *before* we (re)set SEGCBLIST_KTHREAD_GP or we could + * race with another round of nocb toggling for this rdp. + * Nocb locking should prevent from that already but we stick + * to paranoia, especially in rare path. + */ + raw_spin_lock_irqsave(&my_rdp->nocb_gp_lock, flags); + my_rdp->nocb_toggling_rdp =3D NULL; + raw_spin_unlock_irqrestore(&my_rdp->nocb_gp_lock, flags); + } } + + if (rdp_toggling) { + bool wake_state =3D false; + int ret; + + ret =3D nocb_gp_toggle_rdp(rdp_toggling, &wake_state); + if (ret =3D=3D 1) + list_add_tail(&rdp_toggling->nocb_entry_rdp, &my_rdp->nocb_head_rdp); + else if (ret =3D=3D 0) + list_del(&rdp_toggling->nocb_entry_rdp); + if (wake_state) + swake_up_one(&rdp_toggling->nocb_state_wq); + } + my_rdp->nocb_gp_seq =3D -1; WARN_ON(signal_pending(current)); } @@ -966,6 +978,8 @@ static int rdp_offload_toggle(struct rcu_data *rdp, swake_up_one(&rdp->nocb_cb_wq); =20 raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); + // Queue this rdp for add/del to/from the list to iterate on rcuog + WRITE_ONCE(rdp_gp->nocb_toggling_rdp, rdp); if (rdp_gp->nocb_gp_sleep) { rdp_gp->nocb_gp_sleep =3D false; wake_gp =3D true; @@ -1013,8 +1027,6 @@ static long rcu_nocb_rdp_deoffload(void *arg) swait_event_exclusive(rdp->nocb_state_wq, !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP)); - /* Stop nocb_gp_wait() from iterating over this structure. */ - list_del_rcu(&rdp->nocb_entry_rdp); /* * Lock one last time to acquire latest callback updates from kthreads * so we can later handle callbacks locally without locking. @@ -1079,16 +1091,6 @@ static long rcu_nocb_rdp_offload(void *arg) =20 pr_info("Offloading %d\n", rdp->cpu); =20 - /* - * Cause future nocb_gp_wait() invocations to iterate over - * structure, resetting ->nocb_gp_sleep and waking up the related - * "rcuog". Since nocb_gp_wait() in turn locks ->nocb_gp_lock - * before setting ->nocb_gp_sleep again, we are guaranteed to - * iterate this newly added structure before "rcuog" goes to - * sleep again. - */ - list_add_tail_rcu(&rdp->nocb_entry_rdp, &rdp->nocb_gp_rdp->nocb_head_rdp); - /* * Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING * is set. --=20 2.25.1 From nobody Mon May 11 01:26:30 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 406B5C433EF for ; Tue, 19 Apr 2022 12:23:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352282AbiDSM0U (ORCPT ); Tue, 19 Apr 2022 08:26:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352267AbiDSM0P (ORCPT ); Tue, 19 Apr 2022 08:26:15 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDF3636307 for ; Tue, 19 Apr 2022 05:23:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6378D61508 for ; Tue, 19 Apr 2022 12:23:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 12F28C385A7; Tue, 19 Apr 2022 12:23:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1650371012; bh=8W6J/a2HTyChmb/5v1B4Mtw37/myuTcF67qxGda3gHM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PrnnTIh5AhzFhFoiXFiPXVu0FlkRXtlQ3HNnyB87U+3oHWL62p6aqVpjH2CjoD9Qw rVNXDXaKYNlRH46rsCWqQ/rDkFuTNJ0NWgp5lbdWodabP+WEoUOdXbk2Zz2f55mHTn D66oyolipfrjgR9N+54XaFaFj2JgZuktzNU3kbfhF6vr/xInMS0eUtgdhmwzWHE6vk s1rSjrVj4nTqVqX0vyxCUIg7+PQ8353LFQ5XMxy3bEsz3tGeXBRccTPLT/lfWdQXXF LQvV7GaUaSaieGzU73Ugif9/EMWehKogd0Nzav8vWeZOV04SBt+PZkkreplJG6osYB ijL+HbKpXs8Hg== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Zqiang , Frederic Weisbecker , Neeraj Upadhyay , Boqun Feng , Uladzislau Rezki , Joel Fernandes Subject: [PATCH 2/3] rcu/nocb: Invert rcu_state.barrier_mutex VS hotplug lock locking order Date: Tue, 19 Apr 2022 14:23:19 +0200 Message-Id: <20220419122320.2060902-3-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220419122320.2060902-1-frederic@kernel.org> References: <20220419122320.2060902-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zqiang In case of failure to spawn either rcuog or rcuo[p] kthreads for a given rdp, rcu_nocb_rdp_deoffload() needs to be called with the hotplug lock and the barrier_mutex held. However cpus write lock is already held while calling rcutree_prepare_cpu(). It's not possible to call rcu_nocb_rdp_deoffload() from there with just locking the barrier_mutex or this would result in a locking inversion against rcu_nocb_cpu_deoffload() which holds both locks in the reverse order. Simply solve this with inverting the locking order inside rcu_nocb_cpu_[de]offload(). This will be a pre-requisite to toggle NOCB states toward cpusets anyway. Signed-off-by: Zqiang Cc: Neeraj Upadhyay Cc: Boqun Feng Cc: Uladzislau Rezki Cc: Joel Fernandes Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree_nocb.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index dac74952e1d1..f2f2cab6285a 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1055,8 +1055,8 @@ int rcu_nocb_cpu_deoffload(int cpu) struct rcu_data *rdp =3D per_cpu_ptr(&rcu_data, cpu); int ret =3D 0; =20 - mutex_lock(&rcu_state.barrier_mutex); cpus_read_lock(); + mutex_lock(&rcu_state.barrier_mutex); if (rcu_rdp_is_offloaded(rdp)) { if (cpu_online(cpu)) { ret =3D work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp); @@ -1067,8 +1067,8 @@ int rcu_nocb_cpu_deoffload(int cpu) ret =3D -EINVAL; } } - cpus_read_unlock(); mutex_unlock(&rcu_state.barrier_mutex); + cpus_read_unlock(); =20 return ret; } @@ -1134,8 +1134,8 @@ int rcu_nocb_cpu_offload(int cpu) struct rcu_data *rdp =3D per_cpu_ptr(&rcu_data, cpu); int ret =3D 0; =20 - mutex_lock(&rcu_state.barrier_mutex); cpus_read_lock(); + mutex_lock(&rcu_state.barrier_mutex); if (!rcu_rdp_is_offloaded(rdp)) { if (cpu_online(cpu)) { ret =3D work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp); @@ -1146,8 +1146,8 @@ int rcu_nocb_cpu_offload(int cpu) ret =3D -EINVAL; } } - cpus_read_unlock(); mutex_unlock(&rcu_state.barrier_mutex); + cpus_read_unlock(); =20 return ret; } --=20 2.25.1 From nobody Mon May 11 01:26:30 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49C48C433EF for ; Tue, 19 Apr 2022 12:23:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352299AbiDSM01 (ORCPT ); Tue, 19 Apr 2022 08:26:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352278AbiDSM0S (ORCPT ); Tue, 19 Apr 2022 08:26:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F7E53617A for ; Tue, 19 Apr 2022 05:23:36 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 90E846149B for ; Tue, 19 Apr 2022 12:23:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40DB0C385A5; Tue, 19 Apr 2022 12:23:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1650371015; bh=Pbn8y8UibwqYIdOkM1GCt1gTd+ojcDKirE+bzLDzcsI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=AfK3K5V4OTkKlGm7WBTtwQGVwntDHbN9bYvbnGHqXdPRM5+b8P6ovtC8GEVwW/LdI rIGsoVgeJyG89eue3+QiQ2nf8LAklyGxezJHj5pvRA5Tha2QoD0VbkiDp4+GbMpjeK FH2Kn/6je3dn8wAG0qHRU0Kzm/H4+B77h5BasbPECP2nvMVZXH0WHP1XEGJOs/HpqP r6GZC8wEvuwMp3U8TkTPangx+vK+DCy1TpBWUkgQXxzyCdkzy/BFMWyPGxUZiqFE46 4Z1HYOMC0gAxq0D/euQuenOIFGWqlj7x9RCKhZ9DbnhLnwOEnjkh5AxyPCeNpOWBOF 6bFQu1wn5y+ZQ== From: Frederic Weisbecker To: "Paul E . McKenney" Cc: LKML , Zqiang , Frederic Weisbecker , Neeraj Upadhyay , Boqun Feng , Uladzislau Rezki , Joel Fernandes Subject: [PATCH 3/3] rcu/nocb: Fix NOCB kthreads spawn failure with rcu_nocb_rdp_deoffload() direct call Date: Tue, 19 Apr 2022 14:23:20 +0200 Message-Id: <20220419122320.2060902-4-frederic@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220419122320.2060902-1-frederic@kernel.org> References: <20220419122320.2060902-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Zqiang If the rcuog/o[p] kthreads spawn failed, the offloaded rdp needs to be explicitly deoffloaded, otherwise the target rdp is still considered offloaded even though nothing actually handles the callbacks. Signed-off-by: Zqiang Cc: Neeraj Upadhyay Cc: Boqun Feng Cc: Uladzislau Rezki Cc: Joel Fernandes Signed-off-by: Frederic Weisbecker --- kernel/rcu/tree_nocb.h | 80 +++++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 16 deletions(-) diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index f2f2cab6285a..4cf9a29bba79 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -986,10 +986,7 @@ static int rdp_offload_toggle(struct rcu_data *rdp, } raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); =20 - if (wake_gp) - wake_up_process(rdp_gp->nocb_gp_kthread); - - return 0; + return wake_gp; } =20 static long rcu_nocb_rdp_deoffload(void *arg) @@ -997,9 +994,15 @@ static long rcu_nocb_rdp_deoffload(void *arg) struct rcu_data *rdp =3D arg; struct rcu_segcblist *cblist =3D &rdp->cblist; unsigned long flags; - int ret; + int wake_gp; + struct rcu_data *rdp_gp =3D rdp->nocb_gp_rdp; =20 - WARN_ON_ONCE(rdp->cpu !=3D raw_smp_processor_id()); + /* + * rcu_nocb_rdp_deoffload() may be called directly if + * rcuog/o[p] spawn failed, because at this time the rdp->cpu + * is not online yet. + */ + WARN_ON_ONCE((rdp->cpu !=3D raw_smp_processor_id()) && cpu_online(rdp->cp= u)); =20 pr_info("De-offloading %d\n", rdp->cpu); =20 @@ -1023,10 +1026,41 @@ static long rcu_nocb_rdp_deoffload(void *arg) */ rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE); invoke_rcu_core(); - ret =3D rdp_offload_toggle(rdp, false, flags); - swait_event_exclusive(rdp->nocb_state_wq, - !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | - SEGCBLIST_KTHREAD_GP)); + wake_gp =3D rdp_offload_toggle(rdp, false, flags); + + mutex_lock(&rdp_gp->nocb_gp_kthread_mutex); + if (rdp_gp->nocb_gp_kthread) { + if (wake_gp) + wake_up_process(rdp_gp->nocb_gp_kthread); + + /* + * If rcuo[p] kthread spawn failed, directly remove SEGCBLIST_KTHREAD_CB. + * Just wait SEGCBLIST_KTHREAD_GP to be cleared by rcuog. + */ + if (!rdp->nocb_cb_kthread) { + rcu_nocb_lock_irqsave(rdp, flags); + rcu_segcblist_clear_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB); + rcu_nocb_unlock_irqrestore(rdp, flags); + } + + swait_event_exclusive(rdp->nocb_state_wq, + !rcu_segcblist_test_flags(cblist, + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP)); + } else { + /* + * No kthread to clear the flags for us or remove the rdp from the nocb = list + * to iterate. Do it here instead. Locking doesn't look stricly necessary + * but we stick to paranoia in this rare path. + */ + rcu_nocb_lock_irqsave(rdp, flags); + rcu_segcblist_clear_flags(&rdp->cblist, + SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP); + rcu_nocb_unlock_irqrestore(rdp, flags); + + list_del(&rdp->nocb_entry_rdp); + } + mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex); + /* * Lock one last time to acquire latest callback updates from kthreads * so we can later handle callbacks locally without locking. @@ -1047,7 +1081,7 @@ static long rcu_nocb_rdp_deoffload(void *arg) WARN_ON_ONCE(rcu_cblist_n_cbs(&rdp->nocb_bypass)); =20 =20 - return ret; + return 0; } =20 int rcu_nocb_cpu_deoffload(int cpu) @@ -1079,7 +1113,8 @@ static long rcu_nocb_rdp_offload(void *arg) struct rcu_data *rdp =3D arg; struct rcu_segcblist *cblist =3D &rdp->cblist; unsigned long flags; - int ret; + int wake_gp; + struct rcu_data *rdp_gp =3D rdp->nocb_gp_rdp; =20 WARN_ON_ONCE(rdp->cpu !=3D raw_smp_processor_id()); /* @@ -1089,6 +1124,9 @@ static long rcu_nocb_rdp_offload(void *arg) if (!rdp->nocb_gp_rdp) return -EINVAL; =20 + if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread)) + return -EINVAL; + pr_info("Offloading %d\n", rdp->cpu); =20 /* @@ -1113,7 +1151,9 @@ static long rcu_nocb_rdp_offload(void *arg) * WRITE flags READ callbacks * rcu_nocb_unlock() rcu_nocb_unlock() */ - ret =3D rdp_offload_toggle(rdp, true, flags); + wake_gp =3D rdp_offload_toggle(rdp, true, flags); + if (wake_gp) + wake_up_process(rdp_gp->nocb_gp_kthread); swait_event_exclusive(rdp->nocb_state_wq, rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) && rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)); @@ -1126,7 +1166,7 @@ static long rcu_nocb_rdp_offload(void *arg) rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE); rcu_nocb_unlock_irqrestore(rdp, flags); =20 - return ret; + return 0; } =20 int rcu_nocb_cpu_offload(int cpu) @@ -1248,7 +1288,7 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu) "rcuog/%d", rdp_gp->cpu); if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo GP kthread, OOM is no= w expected behavior\n", __func__)) { mutex_unlock(&rdp_gp->nocb_gp_kthread_mutex); - return; + goto end; } WRITE_ONCE(rdp_gp->nocb_gp_kthread, t); if (kthread_prio) @@ -1260,12 +1300,20 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu) t =3D kthread_run(rcu_nocb_cb_kthread, rdp, "rcuo%c/%d", rcu_state.abbr, cpu); if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now= expected behavior\n", __func__)) - return; + goto end; =20 if (kthread_prio) sched_setscheduler_nocheck(t, SCHED_FIFO, &sp); WRITE_ONCE(rdp->nocb_cb_kthread, t); WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread); + return; +end: + mutex_lock(&rcu_state.barrier_mutex); + if (rcu_rdp_is_offloaded(rdp)) { + rcu_nocb_rdp_deoffload(rdp); + cpumask_clear_cpu(cpu, rcu_nocb_mask); + } + mutex_unlock(&rcu_state.barrier_mutex); } =20 /* How many CB CPU IDs per GP kthread? Default of -1 for sqrt(nr_cpu_ids)= . */ --=20 2.25.1