[PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()

Ankur Arora posted 12 patches 2 days, 23 hours ago
arch/arm64/include/asm/barrier.h     | 23 ++++++++
arch/arm64/include/asm/cmpxchg.h     | 72 ++++++++++++++++-------
arch/arm64/include/asm/delay-const.h | 25 ++++++++
arch/arm64/include/asm/rqspinlock.h  | 85 ----------------------------
arch/arm64/lib/delay.c               | 13 +----
drivers/cpuidle/poll_state.c         | 27 ++-------
drivers/soc/qcom/rpmh-rsc.c          |  9 +--
include/asm-generic/barrier.h        | 84 +++++++++++++++++++++++++++
include/linux/atomic.h               | 10 ++++
include/linux/atomic/atomic-long.h   | 18 +++---
include/linux/sched/idle.h           | 29 ++++++++++
kernel/bpf/rqspinlock.c              | 72 ++++++++++++++---------
scripts/atomic/gen-atomic-long.sh    | 16 ++++--
13 files changed, 295 insertions(+), 188 deletions(-)
create mode 100644 arch/arm64/include/asm/delay-const.h
[PATCH v8 00/12] barrier: Add smp_cond_load_{relaxed,acquire}_timeout()
Posted by Ankur Arora 2 days, 23 hours ago
This series adds waited variants of the smp_cond_load() primitives:
smp_cond_load_relaxed_timeout(), and smp_cond_load_acquire_timeout().

As the name suggests, the new interfaces are meant for contexts where
you want to wait on a condition variable for a finite duration. This is
easy enough to do with a loop around cpu_relax(). There are, however,
architectures (ex. arm64) that allow waiting on a cacheline instead.

So, these interfaces handle a mixture of spin/wait with a
smp_cond_load() thrown in. The interfaces are:

   smp_cond_load_relaxed_timeout(ptr, cond_expr, time_expr, timeout)
   smp_cond_load_acquire_timeout(ptr, cond_expr, time_expr, timeout)

The parameters, time_expr, timeout determine when to bail out.

Also add tif_need_resched_relaxed_wait() which wraps the pattern used
in poll_idle() and abstracts out details of the interface and those
of the scheduler.

In addition add atomic_cond_read_*_timeout(), atomic64_cond_read_*_timeout(),
and atomic_long wrappers to the interfaces.

Finally update poll_idle() and resilient queued spinlocks to use them.

Changelog:

  v7 [0]:
   - change the interface to separately provide the timeout. This is
     useful for supporting WFET and similar primitives which can do
     timed waiting (suggested by Arnd Bergmann).

   - Adapting rqspinlock code to this changed interface also
     necessitated allowing time_expr to fail.
   - rqspinlock changes to adapt to the new smp_cond_load_acquire_timeout().

   - add WFET support (suggested by Arnd Bergmann).
   - add support for atomic-long wrappers.
   - add a new scheduler interface tif_need_resched_relaxed_wait() which
     encapsulates the polling logic used by poll_idle().
     - interface suggested by (Rafael J. Wysocki).

  v6 [1]:
   - fixup missing timeout parameters in atomic64_cond_read_*_timeout()
   - remove a race between setting of TIF_NEED_RESCHED and the call to
     smp_cond_load_relaxed_timeout(). This would mean that dev->poll_time_limit
     would be set even if we hadn't spent any time waiting.
     (The original check compared against local_clock(), which would have been
     fine, but I was instead using a cheaper check against _TIF_NEED_RESCHED.)
   (Both from meta-CI bot)

  v5 [2]:
   - use cpu_poll_relax() instead of cpu_relax().
   - instead of defining an arm64 specific
     smp_cond_load_relaxed_timeout(), just define the appropriate
     cpu_poll_relax().
   - re-read the target pointer when we exit due to the time-check.
   - s/SMP_TIMEOUT_SPIN_COUNT/SMP_TIMEOUT_POLL_COUNT/
   (Suggested by Will Deacon)

   - add atomic_cond_read_*_timeout() and atomic64_cond_read_*_timeout()
     interfaces.
   - rqspinlock: use atomic_cond_read_acquire_timeout().
   - cpuidle: use smp_cond_load_relaxed_tiemout() for polling.
   (Suggested by Catalin Marinas)

   - rqspinlock: define SMP_TIMEOUT_POLL_COUNT to be 16k for non arm64

  v4 [3]:
    - naming change 's/timewait/timeout/'
    - resilient spinlocks: get rid of res_smp_cond_load_acquire_waiting()
      and fixup use of RES_CHECK_TIMEOUT().
    (Both suggested by Catalin Marinas)

  v3 [4]:
    - further interface simplifications (suggested by Catalin Marinas)

  v2 [5]:
    - simplified the interface (suggested by Catalin Marinas)
       - get rid of wait_policy, and a multitude of constants
       - adds a slack parameter
      This helped remove a fair amount of duplicated code duplication and in
      hindsight unnecessary constants.

  v1 [6]:
     - add wait_policy (coarse and fine)
     - derive spin-count etc at runtime instead of using arbitrary
       constants.

Haris Okanovic tested v4 of this series with poll_idle()/haltpoll patches. [7]

Comments appreciated!

Thanks
Ankur

 [0] https://lore.kernel.org/lkml/20251028053136.692462-1-ankur.a.arora@oracle.com/
 [1] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
 [2] https://lore.kernel.org/lkml/20250911034655.3916002-1-ankur.a.arora@oracle.com/
 [3] https://lore.kernel.org/lkml/20250829080735.3598416-1-ankur.a.arora@oracle.com/
 [4] https://lore.kernel.org/lkml/20250627044805.945491-1-ankur.a.arora@oracle.com/
 [5] https://lore.kernel.org/lkml/20250502085223.1316925-1-ankur.a.arora@oracle.com/
 [6] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com/
 [7] https://lore.kernel.org/lkml/2cecbf7fb23ee83a4ce027e1be3f46f97efd585c.camel@amazon.com/

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: bpf@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pm@vger.kernel.org

Ankur Arora (12):
  asm-generic: barrier: Add smp_cond_load_relaxed_timeout()
  arm64: barrier: Support smp_cond_load_relaxed_timeout()
  arm64/delay: move some constants out to a separate header
  arm64: support WFET in smp_cond_relaxed_timeout()
  arm64: rqspinlock: Remove private copy of
    smp_cond_load_acquire_timewait()
  asm-generic: barrier: Add smp_cond_load_acquire_timeout()
  atomic: Add atomic_cond_read_*_timeout()
  locking/atomic: scripts: build atomic_long_cond_read_*_timeout()
  bpf/rqspinlock: switch check_timeout() to a clock interface
  bpf/rqspinlock: Use smp_cond_load_acquire_timeout()
  sched: add need-resched timed wait interface
  cpuidle/poll_state: Wait for need-resched via
    tif_need_resched_relaxed_wait()

 arch/arm64/include/asm/barrier.h     | 23 ++++++++
 arch/arm64/include/asm/cmpxchg.h     | 72 ++++++++++++++++-------
 arch/arm64/include/asm/delay-const.h | 25 ++++++++
 arch/arm64/include/asm/rqspinlock.h  | 85 ----------------------------
 arch/arm64/lib/delay.c               | 13 +----
 drivers/cpuidle/poll_state.c         | 27 ++-------
 drivers/soc/qcom/rpmh-rsc.c          |  9 +--
 include/asm-generic/barrier.h        | 84 +++++++++++++++++++++++++++
 include/linux/atomic.h               | 10 ++++
 include/linux/atomic/atomic-long.h   | 18 +++---
 include/linux/sched/idle.h           | 29 ++++++++++
 kernel/bpf/rqspinlock.c              | 72 ++++++++++++++---------
 scripts/atomic/gen-atomic-long.sh    | 16 ++++--
 13 files changed, 295 insertions(+), 188 deletions(-)
 create mode 100644 arch/arm64/include/asm/delay-const.h

-- 
2.31.1