From nobody Mon Feb 9 13:01:35 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB07D25949A; Sun, 2 Nov 2025 21:44:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762119880; cv=none; b=A343iKnfkZUN1SvVl800zdC5P84VrzkB3zevoJt017aUBrGRLv4F4841bmYSc9zYMYGkWQ3b3E0S9dokaR8TLPKhv61Vt20Gw0GM8HIczMJLY2CF/mVjzX5r+8wAz/Iimg74SS7UfwLM7j6VISvk70pOawz9giO7CLSAL/zkRLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762119880; c=relaxed/simple; bh=R4kFwOOFYwrXB688uqk8xRjU0CSAKW/oggKTVNMYKqU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=i8+P1i9c6CINaOli3CpIrH8gi072+6nrQKnOQxZlDMvh8kxs+6TLUnBHOMqC4/3mtYX2+dOnLz+n5HeHyqL7RqENwCRmZyI5ZiLU4JsNWnNzdRXJaLLayjqZZCNB/QhzfqqHRxWCjmI/r6Cw+DSfMKaNrI9n+0z2oz3loX+EdKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=HoNbtz9G; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="HoNbtz9G" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF40DC113D0; Sun, 2 Nov 2025 21:44:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762119879; bh=R4kFwOOFYwrXB688uqk8xRjU0CSAKW/oggKTVNMYKqU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HoNbtz9GmnNxGwEWhNeSECkfCd8sjgQCadJgzAA8UCnfQpFu6H3nb5u7Mu09hr3/H oDptY4r0jGPfbbBmnOWcIHnRO1ND4GZKuFryKo1F7WcjLo0XxfZhj14m4j8ja1+clz Kj4hMfmB7dfSCRkBumoIUpiiehB85RLARaOBJKGtvIgumrCsbHtZgZE9wvsyQxjkI0 IfFxwva1a3PCHz8tWgLhuesiBr5xEXhVSSpSzNBrfnacmGtLPk45VI3oUECeKwQw7y +DdGvbjaNmNsAwmZicXIeDmgfOx9HWlX5sdGAu8hSCrgjGLdgsej49rRGAo973lskN 2qr32duqsG/Xw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id A58B2CE16C7; Sun, 2 Nov 2025 13:44:37 -0800 (PST) From: "Paul E. McKenney" To: rcu@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, rostedt@goodmis.org, "Paul E. McKenney" , Catalin Marinas , Will Deacon , Mark Rutland , Mathieu Desnoyers , Sebastian Andrzej Siewior , linux-arm-kernel@lists.infradead.org, bpf@vger.kernel.org Subject: [PATCH 17/19] srcu: Optimize SRCU-fast-updown for arm64 Date: Sun, 2 Nov 2025 13:44:34 -0800 Message-Id: <20251102214436.3905633-17-paulmck@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <082fb8ba-91b8-448e-a472-195eb7b282fd@paulmck-laptop> References: <082fb8ba-91b8-448e-a472-195eb7b282fd@paulmck-laptop> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some arm64 platforms have slow per-CPU atomic operations, for example, the Neoverse V2. This commit therefore moves SRCU-fast from per-CPU atomic operations to interrupt-disabled non-read-modify-write-atomic atomic_read()/atomic_set() operations. This works because SRCU-fast-updown is not invoked from read-side primitives, which means that if srcu_read_unlock_fast() NMI handlers. This means that srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() can exclude themselves and each other This reduces the overhead of calls to srcu_read_lock_fast_updown() and srcu_read_unlock_fast_updown() from about 100ns to about 12ns on an ARM Neoverse V2. Although this is not excellent compared to about 2ns on x86, it sure beats 100ns. This command was used to measure the overhead: tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus = --duration 5 --configs NOPREEMPT --kconfig "CONFIG_NR_CPUS=3D64 CONFIG_TASK= S_TRACE_RCU=3Dy" --bootargs "refscale.loops=3D100000 refscale.guest_os_dela= y=3D5 refscale.nreaders=3D64 refscale.holdoff=3D30 torture.disable_onoff_at= _boot refscale.scale_type=3Dsrcu-fast-updown refscale.verbose_batched=3D8 t= orture.verbose_sleep_frequency=3D8 torture.verbose_sleep_duration=3D8 refsc= ale.nruns=3D100" --trust-make Signed-off-by: Paul E. McKenney Cc: Catalin Marinas Cc: Will Deacon Cc: Mark Rutland Cc: Mathieu Desnoyers Cc: Steven Rostedt Cc: Sebastian Andrzej Siewior Cc: Cc: --- include/linux/srcutree.h | 56 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index d6f978b50472..70560dc4636c 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -253,6 +253,34 @@ static inline struct srcu_ctr __percpu *__srcu_ctr_to_= ptr(struct srcu_struct *ss return &ssp->sda->srcu_ctrs[idx]; } =20 +/* + * Non-atomic manipulation of SRCU lock counters. + */ +static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast_na(s= truct srcu_struct *ssp) +{ + atomic_long_t *scnp; + struct srcu_ctr __percpu *scp; + + lockdep_assert_preemption_disabled(); + scp =3D READ_ONCE(ssp->srcu_ctrp); + scnp =3D raw_cpu_ptr(&scp->srcu_locks); + atomic_long_set(scnp, atomic_long_read(scnp) + 1); + return scp; +} + +/* + * Non-atomic manipulation of SRCU unlock counters. + */ +static inline void notrace +__srcu_read_unlock_fast_na(struct srcu_struct *ssp, struct srcu_ctr __perc= pu *scp) +{ + atomic_long_t *scnp; + + lockdep_assert_preemption_disabled(); + scnp =3D raw_cpu_ptr(&scp->srcu_unlocks); + atomic_long_set(scnp, atomic_long_read(scnp) + 1); +} + /* * Counts the new reader in the appropriate per-CPU element of the * srcu_struct. Returns a pointer that must be passed to the matching @@ -327,12 +355,23 @@ __srcu_read_unlock_fast(struct srcu_struct *ssp, stru= ct srcu_ctr __percpu *scp) static inline struct srcu_ctr __percpu notrace *__srcu_read_lock_fast_updown(struct srcu= _struct *ssp) { - struct srcu_ctr __percpu *scp =3D READ_ONCE(ssp->srcu_ctrp); + struct srcu_ctr __percpu *scp; =20 - if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) + if (IS_ENABLED(CONFIG_ARM64) && IS_ENABLED(CONFIG_ARM64_USE_LSE_PERCPU_AT= OMICS)) { + unsigned long flags; + + local_irq_save(flags); + scp =3D __srcu_read_lock_fast_na(ssp); + local_irq_restore(flags); /* Avoids leaking the critical section. */ + return scp; + } + + scp =3D READ_ONCE(ssp->srcu_ctrp); + if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) { this_cpu_inc(scp->srcu_locks.counter); // Y, and implicit RCU reader. - else + } else { atomic_long_inc(raw_cpu_ptr(&scp->srcu_locks)); // Y, and implicit RCU = reader. + } barrier(); /* Avoid leaking the critical section. */ return scp; } @@ -350,10 +389,17 @@ static inline void notrace __srcu_read_unlock_fast_updown(struct srcu_struct *ssp, struct srcu_ctr __= percpu *scp) { barrier(); /* Avoid leaking the critical section. */ - if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) + if (IS_ENABLED(CONFIG_ARM64)) { + unsigned long flags; + + local_irq_save(flags); + __srcu_read_unlock_fast_na(ssp, scp); + local_irq_restore(flags); + } else if (!IS_ENABLED(CONFIG_NEED_SRCU_NMI_SAFE)) { this_cpu_inc(scp->srcu_unlocks.counter); // Z, and implicit RCU reader. - else + } else { atomic_long_inc(raw_cpu_ptr(&scp->srcu_unlocks)); // Z, and implicit RC= U reader. + } } =20 void __srcu_check_read_flavor(struct srcu_struct *ssp, int read_flavor); --=20 2.40.1