Documentation/atomic_t.txt | 14 +- arch/arm64/Kconfig | 3 + arch/arm64/include/asm/barrier.h | 23 ++++ arch/arm64/include/asm/cmpxchg.h | 62 +++++++-- arch/arm64/include/asm/delay-const.h | 28 ++++ arch/arm64/include/asm/rqspinlock.h | 85 ------------ arch/arm64/lib/delay.c | 17 +-- drivers/clocksource/arm_arch_timer.c | 2 + drivers/cpuidle/poll_state.c | 21 +-- drivers/soc/qcom/rpmh-rsc.c | 8 +- include/asm-generic/barrier.h | 97 ++++++++++++++ include/linux/atomic.h | 10 ++ include/linux/atomic/atomic-long.h | 18 ++- include/linux/sched/idle.h | 29 +++++ kernel/bpf/rqspinlock.c | 77 +++++++---- lib/Kconfig.debug | 10 ++ lib/tests/Makefile | 1 + lib/tests/barrier-timeout-test.c | 185 +++++++++++++++++++++++++++ scripts/atomic/gen-atomic-long.sh | 16 ++- 19 files changed, 528 insertions(+), 178 deletions(-) create mode 100644 arch/arm64/include/asm/delay-const.h create mode 100644 lib/tests/barrier-timeout-test.c
Hi,
Main change in this version:
- addressed some review comments from sashiko (see commit notes)
- The one notable change is to the implementation of
smp_cond_load_acquire_timeout() where there was a missed
control dependency in the timeout case.
All the others are minor.
- fixed a low probability race in the kunit test added in v11.
- added a bunch of kunit tests validating the implementation's
use of the clock.
Andrew, if the changes look okay, could we take this in your mm-nomm
tree as before?
The core kernel often uses smp_cond_load_{relaxed,acquire}() to spin
on condition variables with architectural primitives used to avoid
hammering the relevant cachelines.
(This primitive can vary greatly across architectures: on x86 it's a
cpu_relax() to slow down the pipeline. On arm64, this is a __cmpwait()
which waits for a cacheline to change state in a time limited fashion.)
Regardless of architectural details, typical smp_cond_load*() usage
does not allow for termination until the condition change occurs.
Beyond the core kernel, there are cases where it is useful to additionally
terminate on a timeout. Two cases:
- cpuidle poll_idle(): wait for need-resched until the cpuidle polling
duration expires.
- rqspinlock: nested qspinlock acquisition that terminates on timeout
or deadlock.
Accordingly add two interfaces (with their generic and arm64 specific
implementations):
smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)
Also add tif_need_resched_relaxed_wait() which wraps the polling
pattern and its scheduler specific details in poll_idle().
In addition add atomic_cond_read_*_timeout(),
atomic64_cond_read_*_timeout(), and atomic_long wrappers.
Structurally, both the smp_cond_load_*_timeout() interfaces are similar
to smp_cond_load*(), with the addition of a rate-limited time-check.
Usage
==
These interfaces drop straight-forwardly into the rqspinlock logic
since qspinlock already uses smp_cond_load*(), and the time-check
extension can now be used for timeout and deadlock handling.
Using tif_need_resched_relaxed_wait() in poll_idle() removes any
architectural details allowing arm64 to straight-forwardly support
that path.
(However, for efficiency reasons cpuidle/poll_state.c continues to
depend on ARCH_HAS_CPU_RELAX since that is defined on architectures
with an optimized architectural primitive.)
Performance
==
Apart from simplifications due to this change, supporting polling in
cpuidle on arm64 helps improve wakeup latency (needs a few cpuidle/acpi
patches):
# perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
perf bench sched pipe -l 1000000 -c 4
# No haltpoll (and, no TIF_POLLING_NRFLAG):
Performance counter stats for 'CPU(s) 4,5' (5 runs):
25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
0 sched:sched_wake_idle_without_ipi # 0.000 /sec
12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
# Haltpoll:
Performance counter stats for 'CPU(s) 4,5' (5 runs):
15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
We get improved latency because we don't switch in and out of a
deeper sleep state or from the hypervisor. This also causes us to
execute ~20% fewer instructions.
Haris Okanovic also saw improvement in real workloads due to the
cpuidle changes: "observed 4-6% improvements in memcahed, cassandra,
mysql, and postgresql under certain loads. Other applications likely
benefit too." [12]
Changelog:
v11 [13] (as listed above):
- addressed some review comments from sashiko (see commit notes)
- The one notable change is to the implementation of
smp_cond_load_acquire_timeout() where there was a missed
control dependency in the timeout case.
All the others are minor.
- fixed a low probability race in the kunit test added in v11.
- added a bunch of kunit tests validating the implementation's
use of the clock.
v10 [10]:
- add a comment mentioning that smp_cond_load_relaxed_timeout() might
be using architectural primitives that don't support MMIO.
(David Laight, Catalin Marinas)
- added a kunit test for smp_cond_load_relaxed_timeout() (Andrew
Morton.)
v9 [9]:
- s/@cond/@cond_expr/ (Randy Dunlap)
- Clarify that SMP_TIMEOUT_POLL_COUNT is only around memory
addresses. (David Laight)
- Add the missing config ARCH_HAS_CPU_RELAX in arch/arm64/Kconfig.
(Catalin Marinas).
- Switch to arch_counter_get_cntvct_stable() (via __delay_cycles())
in the cmpwait path instead of using arch_timer_read_counter().
(Catalin Marinas)
v8 [0]:
- Defer evaluation of @time_expr_ns to when we hit the slowpath.
(comment from Alexei Starovoitov).
- Mention that cpu_poll_relax() is better than raw CPU polling
only where ARCH_HAS_CPU_RELAX is defined.
- also define ARCH_HAS_CPU_RELAX for arm64.
(Came out of a discussion with Will Deacon.)
- Split out WFET and WFE handling. I was doing both of these
in a common handler.
(From Will Deacon and in an earlier revision by Catalin Marinas.)
- Add mentions of atomic_cond_read_{relaxed,acquire}(),
atomic_cond_read_{relaxed,acquire}_timeout() in
Documentation/atomic_t.txt.
- Use the BIT() macro to do the checking in tif_bitset_relaxed_wait().
- Cleanup unnecessary assignments, casts etc in poll_idle().
(From Rafael Wysocki.)
- Fixup warnings from kernel build robot
v7 [1]:
- change the interface to separately provide the timeout. This is
useful for supporting WFET and similar primitives which can do
timed waiting (suggested by Arnd Bergmann).
- Adapting rqspinlock code to this changed interface also
necessitated allowing time_expr to fail.
- rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().
- add WFET support (suggested by Arnd Bergmann).
- add support for atomic-long wrappers.
- add a new scheduler interface tif_need_resched_relaxed_wait() which
encapsulates the polling logic used by poll_idle().
- interface suggested by (Rafael J. Wysocki).
v6 [2]:
- fixup missing timeout parameters in atomic64_cond_read_*_timeout()
- remove a race between setting of TIF_NEED_RESCHED and the call to
smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
would be set even if we hadn't spent any time waiting.
(The original check compared against local_clock(), which would have been
fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
(Both from meta-CI bot)
v5 [3]:
- use cpu_poll_relax() instead of cpu_relax().
- instead of defining an arm64 specific
smp_cond_load_relaxed_timeout(), just define the appropriate
cpu_poll_relax().
- re-read the target pointer when we exit due to the time-check.
- s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
(Suggested by Will Deacon)
- add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
interfaces.
- rqspinlock: use atomic_cond_read_acquire_timeout().
- cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
(Suggested by Catalin Marinas)
- rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64
v4 [4]:
- naming change 's/timewait/timeout/'
- resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
and fixup use of RES_CHECK_TIMEOUT().
(Both suggested by Catalin Marinas)
v3 [5]:
- further interface simplifications (suggested by Catalin Marinas)
v2 [6]:
- simplified the interface (suggested by Catalin Marinas)
- get rid of wait_policy, and a multitude of constants
- adds a slack parameter
This helped remove a fair amount of duplicated code duplication and in
hindsight unnecessary constants.
v1 [7]:
- add wait_policy (coarse and fine)
- derive spin-count etc at runtime instead of using arbitrary
constants.
Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [8]
Comments appreciated!
Thanks
Ankur
[0] https://lore.kernel.org/lkml/20251215044919.460086-1-ankur.a.arora@oracle.com/
[1] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@oracle.com/
[2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
[3] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
[4] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@oracle.com/
[5] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@oracle.com/
[6] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
[7] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
[8] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@amazon.com/
[9] https://lore.kernel.org/lkml/20260209023153.2661784-1-ankur.a.arora@oracle.com/
[10] https://lore.kernel.org/lkml/20260316013651.3225328-1-ankur.a.arora@oracle.com/
[11] https://lore.kernel.org/lkml/20230809134837.GM212435@hirez.programming.kicks-ass.net/
[12] https://lore.kernel.org/lkml/c6f3c8d3f1f2e89a9dc7ae22482973b5a51b08cb.camel@amazon.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: bpf@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pm@vger.kernel.org
Ankur Arora (15):
asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
arm64: barrier: Support smp_cond_load_relaxed_timeout()
arm64/delay: move some constants out to a separate header
arm64: support WFET in smp_cond_load_relaxed_timeout()
arm64: rqspinlock: Remove private copy of
smp_cond_load_acquire_timewait()
asm-generic: barrier: Add smp_cond_load_acquire_timeout()
atomic: Add atomic_cond_read_*_timeout()
locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
bpf/rqspinlock: switch check_timeout() to a clock interface
bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
sched: add need-resched timed wait interface
cpuidle/poll_state: Wait for need-resched via
tif_need_resched_relaxed_wait()
arm64/delay: enable testing smp_cond_load_relaxed_timeout()
barrier: add tests for smp_cond_load_*_timeout()
barrier: add clock tests for smp_cond_load_relaxed_timeout()
Documentation/atomic_t.txt | 14 +-
arch/arm64/Kconfig | 3 +
arch/arm64/include/asm/barrier.h | 23 ++++
arch/arm64/include/asm/cmpxchg.h | 62 +++++++--
arch/arm64/include/asm/delay-const.h | 28 ++++
arch/arm64/include/asm/rqspinlock.h | 85 ------------
arch/arm64/lib/delay.c | 17 +--
drivers/clocksource/arm_arch_timer.c | 2 +
drivers/cpuidle/poll_state.c | 21 +--
drivers/soc/qcom/rpmh-rsc.c | 8 +-
include/asm-generic/barrier.h | 97 ++++++++++++++
include/linux/atomic.h | 10 ++
include/linux/atomic/atomic-long.h | 18 ++-
include/linux/sched/idle.h | 29 +++++
kernel/bpf/rqspinlock.c | 77 +++++++----
lib/Kconfig.debug | 10 ++
lib/tests/Makefile | 1 +
lib/tests/barrier-timeout-test.c | 185 +++++++++++++++++++++++++++
scripts/atomic/gen-atomic-long.sh | 16 ++-
19 files changed, 528 insertions(+), 178 deletions(-)
create mode 100644 arch/arm64/include/asm/delay-const.h
create mode 100644 lib/tests/barrier-timeout-test.c
--
2.31.1
Summarizing all of the bot reviews (sashiko/bpf-bot):
Most of the comments are same as v11. Let me outline the ones I think
are notable:
- edge cases around (timeout is -1, S64_MAX, U64_MAX).
I've noted in the first patch how these cases are probably best
addressed at review time instead of complicating the implementation
like in https://lore.kernel.org/lkml/874iklm1uy.fsf@oracle.com/
- as a side-effect of enabling ARCH_HAS_CPU_RELAX, acpi_processor_setup_cstate()
enables a NOP poll_idle() unintentionally (patch-2). I've described
it in more detail in my reply to that patch.
Will fix this.
- potentially missed control dependency in the timeout case of
smp_cond_load_acquire_timeout(). Probably need a better fix for
this than I have.
Need more thinktime as the bots would say. Will address this one
and the one below in reply to patches 6, 7.
- possibly torn reads with atomic64_cond_read_*_timeout() on 32-bit
architectures.
Ankur
Ankur Arora <ankur.a.arora@oracle.com> writes:
> Hi,
>
> Main change in this version:
>
> - addressed some review comments from sashiko (see commit notes)
> - The one notable change is to the implementation of
> smp_cond_load_acquire_timeout() where there was a missed
> control dependency in the timeout case.
> All the others are minor.
> - fixed a low probability race in the kunit test added in v11.
> - added a bunch of kunit tests validating the implementation's
> use of the clock.
>
> Andrew, if the changes look okay, could we take this in your mm-nomm
> tree as before?
>
> The core kernel often uses smp_cond_load_{relaxed,acquire}() to spin
> on condition variables with architectural primitives used to avoid
> hammering the relevant cachelines.
>
> (This primitive can vary greatly across architectures: on x86 it's a
> cpu_relax() to slow down the pipeline. On arm64, this is a __cmpwait()
> which waits for a cacheline to change state in a time limited fashion.)
>
> Regardless of architectural details, typical smp_cond_load*() usage
> does not allow for termination until the condition change occurs.
>
> Beyond the core kernel, there are cases where it is useful to additionally
> terminate on a timeout. Two cases:
>
> - cpuidle poll_idle(): wait for need-resched until the cpuidle polling
> duration expires.
>
> - rqspinlock: nested qspinlock acquisition that terminates on timeout
> or deadlock.
>
> Accordingly add two interfaces (with their generic and arm64 specific
> implementations):
>
> smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
> smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)
>
> Also add tif_need_resched_relaxed_wait() which wraps the polling
> pattern and its scheduler specific details in poll_idle().
> In addition add atomic_cond_read_*_timeout(),
> atomic64_cond_read_*_timeout(), and atomic_long wrappers.
>
> Structurally, both the smp_cond_load_*_timeout() interfaces are similar
> to smp_cond_load*(), with the addition of a rate-limited time-check.
>
> Usage
> ==
>
> These interfaces drop straight-forwardly into the rqspinlock logic
> since qspinlock already uses smp_cond_load*(), and the time-check
> extension can now be used for timeout and deadlock handling.
>
> Using tif_need_resched_relaxed_wait() in poll_idle() removes any
> architectural details allowing arm64 to straight-forwardly support
> that path.
> (However, for efficiency reasons cpuidle/poll_state.c continues to
> depend on ARCH_HAS_CPU_RELAX since that is defined on architectures
> with an optimized architectural primitive.)
>
>
> Performance
> ==
>
> Apart from simplifications due to this change, supporting polling in
> cpuidle on arm64 helps improve wakeup latency (needs a few cpuidle/acpi
> patches):
>
>
> # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
> perf bench sched pipe -l 1000000 -c 4
>
> # No haltpoll (and, no TIF_POLLING_NRFLAG):
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
> 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
> 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
> 0 sched:sched_wake_idle_without_ipi # 0.000 /sec
>
> 12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
>
>
> # Haltpoll:
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
> 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
> 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
> 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
>
> 7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
>
> We get improved latency because we don't switch in and out of a
> deeper sleep state or from the hypervisor. This also causes us to
> execute ~20% fewer instructions.
>
>
> Haris Okanovic also saw improvement in real workloads due to the
> cpuidle changes: "observed 4-6% improvements in memcahed, cassandra,
> mysql, and postgresql under certain loads. Other applications likely
> benefit too." [12]
>
>
> Changelog:
> v11 [13] (as listed above):
> - addressed some review comments from sashiko (see commit notes)
> - The one notable change is to the implementation of
> smp_cond_load_acquire_timeout() where there was a missed
> control dependency in the timeout case.
> All the others are minor.
> - fixed a low probability race in the kunit test added in v11.
> - added a bunch of kunit tests validating the implementation's
> use of the clock.
>
> v10 [10]:
> - add a comment mentioning that smp_cond_load_relaxed_timeout() might
> be using architectural primitives that don't support MMIO.
> (David Laight, Catalin Marinas)
> - added a kunit test for smp_cond_load_relaxed_timeout() (Andrew
> Morton.)
>
> v9 [9]:
> - s/@cond/@cond_expr/ (Randy Dunlap)
> - Clarify that SMP_TIMEOUT_POLL_COUNT is only around memory
> addresses. (David Laight)
> - Add the missing config ARCH_HAS_CPU_RELAX in arch/arm64/Kconfig.
> (Catalin Marinas).
> - Switch to arch_counter_get_cntvct_stable() (via __delay_cycles())
> in the cmpwait path instead of using arch_timer_read_counter().
> (Catalin Marinas)
>
> v8 [0]:
> - Defer evaluation of @time_expr_ns to when we hit the slowpath.
> (comment from Alexei Starovoitov).
>
> - Mention that cpu_poll_relax() is better than raw CPU polling
> only where ARCH_HAS_CPU_RELAX is defined.
> - also define ARCH_HAS_CPU_RELAX for arm64.
> (Came out of a discussion with Will Deacon.)
>
> - Split out WFET and WFE handling. I was doing both of these
> in a common handler.
> (From Will Deacon and in an earlier revision by Catalin Marinas.)
>
> - Add mentions of atomic_cond_read_{relaxed,acquire}(),
> atomic_cond_read_{relaxed,acquire}_timeout() in
> Documentation/atomic_t.txt.
>
> - Use the BIT() macro to do the checking in tif_bitset_relaxed_wait().
>
> - Cleanup unnecessary assignments, casts etc in poll_idle().
> (From Rafael Wysocki.)
>
> - Fixup warnings from kernel build robot
>
>
> v7 [1]:
> - change the interface to separately provide the timeout. This is
> useful for supporting WFET and similar primitives which can do
> timed waiting (suggested by Arnd Bergmann).
>
> - Adapting rqspinlock code to this changed interface also
> necessitated allowing time_expr to fail.
> - rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().
>
> - add WFET support (suggested by Arnd Bergmann).
> - add support for atomic-long wrappers.
> - add a new scheduler interface tif_need_resched_relaxed_wait() which
> encapsulates the polling logic used by poll_idle().
> - interface suggested by (Rafael J. Wysocki).
>
>
> v6 [2]:
> - fixup missing timeout parameters in atomic64_cond_read_*_timeout()
> - remove a race between setting of TIF_NEED_RESCHED and the call to
> smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
> would be set even if we hadn't spent any time waiting.
> (The original check compared against local_clock(), which would have been
> fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
> (Both from meta-CI bot)
>
>
> v5 [3]:
> - use cpu_poll_relax() instead of cpu_relax().
> - instead of defining an arm64 specific
> smp_cond_load_relaxed_timeout(), just define the appropriate
> cpu_poll_relax().
> - re-read the target pointer when we exit due to the time-check.
> - s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
> (Suggested by Will Deacon)
>
> - add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
> interfaces.
> - rqspinlock: use atomic_cond_read_acquire_timeout().
> - cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
> (Suggested by Catalin Marinas)
>
> - rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64
>
>
> v4 [4]:
> - naming change 's/timewait/timeout/'
> - resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
> and fixup use of RES_CHECK_TIMEOUT().
> (Both suggested by Catalin Marinas)
>
> v3 [5]:
> - further interface simplifications (suggested by Catalin Marinas)
>
> v2 [6]:
> - simplified the interface (suggested by Catalin Marinas)
> - get rid of wait_policy, and a multitude of constants
> - adds a slack parameter
> This helped remove a fair amount of duplicated code duplication and in
> hindsight unnecessary constants.
>
> v1 [7]:
> - add wait_policy (coarse and fine)
> - derive spin-count etc at runtime instead of using arbitrary
> constants.
>
> Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [8]
>
> Comments appreciated!
>
> Thanks
> Ankur
>
> [0] https://lore.kernel.org/lkml/20251215044919.460086-1-ankur.a.arora@oracle.com/
> [1] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@oracle.com/
> [2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
> [3] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
> [4] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@oracle.com/
> [5] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@oracle.com/
> [6] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
> [7] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
> [8] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@amazon.com/
> [9] https://lore.kernel.org/lkml/20260209023153.2661784-1-ankur.a.arora@oracle.com/
> [10] https://lore.kernel.org/lkml/20260316013651.3225328-1-ankur.a.arora@oracle.com/
> [11] https://lore.kernel.org/lkml/20230809134837.GM212435@hirez.programming.kicks-ass.net/
> [12] https://lore.kernel.org/lkml/c6f3c8d3f1f2e89a9dc7ae22482973b5a51b08cb.camel@amazon.com/
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: bpf@vger.kernel.org
> Cc: linux-arch@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-pm@vger.kernel.org
>
> Ankur Arora (15):
> asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
> arm64: barrier: Support smp_cond_load_relaxed_timeout()
> arm64/delay: move some constants out to a separate header
> arm64: support WFET in smp_cond_load_relaxed_timeout()
> arm64: rqspinlock: Remove private copy of
> smp_cond_load_acquire_timewait()
> asm-generic: barrier: Add smp_cond_load_acquire_timeout()
> atomic: Add atomic_cond_read_*_timeout()
> locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
> bpf/rqspinlock: switch check_timeout() to a clock interface
> bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
> sched: add need-resched timed wait interface
> cpuidle/poll_state: Wait for need-resched via
> tif_need_resched_relaxed_wait()
> arm64/delay: enable testing smp_cond_load_relaxed_timeout()
> barrier: add tests for smp_cond_load_*_timeout()
> barrier: add clock tests for smp_cond_load_relaxed_timeout()
>
> Documentation/atomic_t.txt | 14 +-
> arch/arm64/Kconfig | 3 +
> arch/arm64/include/asm/barrier.h | 23 ++++
> arch/arm64/include/asm/cmpxchg.h | 62 +++++++--
> arch/arm64/include/asm/delay-const.h | 28 ++++
> arch/arm64/include/asm/rqspinlock.h | 85 ------------
> arch/arm64/lib/delay.c | 17 +--
> drivers/clocksource/arm_arch_timer.c | 2 +
> drivers/cpuidle/poll_state.c | 21 +--
> drivers/soc/qcom/rpmh-rsc.c | 8 +-
> include/asm-generic/barrier.h | 97 ++++++++++++++
> include/linux/atomic.h | 10 ++
> include/linux/atomic/atomic-long.h | 18 ++-
> include/linux/sched/idle.h | 29 +++++
> kernel/bpf/rqspinlock.c | 77 +++++++----
> lib/Kconfig.debug | 10 ++
> lib/tests/Makefile | 1 +
> lib/tests/barrier-timeout-test.c | 185 +++++++++++++++++++++++++++
> scripts/atomic/gen-atomic-long.sh | 16 ++-
> 19 files changed, 528 insertions(+), 178 deletions(-)
> create mode 100644 arch/arm64/include/asm/delay-const.h
> create mode 100644 lib/tests/barrier-timeout-test.c
--
ankur
© 2016 - 2026 Red Hat, Inc.