arch/Kconfig | 3 ++ arch/arm64/Kconfig | 7 +++ arch/arm64/include/asm/barrier.h | 62 ++++++++++++++++++++++- arch/arm64/include/asm/cmpxchg.h | 26 ++++++---- arch/arm64/include/asm/cpuidle_haltpoll.h | 20 ++++++++ arch/arm64/include/asm/delay-const.h | 25 +++++++++ arch/arm64/include/asm/thread_info.h | 2 + arch/arm64/kernel/idle.c | 1 + arch/arm64/lib/delay.c | 13 ++--- arch/x86/Kconfig | 5 +- arch/x86/include/asm/cpuidle_haltpoll.h | 1 + arch/x86/kernel/kvm.c | 13 +++++ drivers/acpi/processor_idle.c | 43 +++++++++++++--- drivers/cpuidle/Kconfig | 5 +- drivers/cpuidle/Makefile | 2 +- drivers/cpuidle/cpuidle-haltpoll.c | 12 +---- drivers/cpuidle/governors/haltpoll.c | 6 +-- drivers/cpuidle/poll_state.c | 27 +++------- drivers/idle/Kconfig | 1 + include/asm-generic/barrier.h | 42 +++++++++++++++ include/linux/cpuidle.h | 2 +- include/linux/cpuidle_haltpoll.h | 5 ++ 22 files changed, 252 insertions(+), 71 deletions(-) create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h create mode 100644 arch/arm64/include/asm/delay-const.h
This patchset adds support for polling in idle via poll_idle() on
arm64.
There are two main changes in this version:
1. rework the series to take Catalin Marinas' comments on the semantics
of smp_cond_load_relaxed() (and how earlier versions of this
series were abusing them) into account.
This also allows dropping of the somewhat strained connections
between haltpoll and the event-stream.
2. earlier versions of this series were adding support for poll_idle()
but only using it in the haltpoll driver. Add Lifeng's patch to
broaden it out by also polling in acpi-idle.
The benefit of polling in idle is to reduce the cost of remote wakeups.
When enabled, these can be done just by setting the need-resched bit,
instead of sending an IPI, and incurring the cost of handling the
interrupt on the receiver side. When running on a VM it also saves
the cost of WFE trapping (when enabled.)
Comparing sched-pipe performance on a guest VM:
# perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
perf bench sched pipe -l 1000000 -c 4
# no polling in idle
Performance counter stats for 'CPU(s) 4,5' (5 runs):
25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
0 sched:sched_wake_idle_without_ipi # 0.000 /sec
12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
# polling in idle (with haltpoll):
Performance counter stats for 'CPU(s) 4,5' (5 runs):
15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
Tomohiro Misono and Haris Okanovic also report similar latency
improvements on Grace and Graviton systems (for v7) [1] [2].
Lifeng also reports improved context switch latency on a bare-metal
machine with acpi-idle [3].
The series is in four parts:
- patches 1-4,
"asm-generic: add barrier smp_cond_load_relaxed_timeout()"
"cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout()"
"cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL"
"Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig"
add smp_cond_load_relaxed_timeout() and switch poll_idle() to
using it. Also, do some munging of related kconfig options.
- patches 5-7,
"arm64: barrier: add support for smp_cond_relaxed_timeout()"
"arm64: define TIF_POLLING_NRFLAG"
"arm64: add support for polling in idle"
add support for the new barrier, the polling flag and enable
poll_idle() support.
- patches 8, 9-13,
"ACPI: processor_idle: Support polling state for LPI"
"cpuidle-haltpoll: define arch_haltpoll_want()"
"governors/haltpoll: drop kvm_para_available() check"
"cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL"
"arm64: idle: export arch_cpu_idle"
"arm64: support cpuidle-haltpoll"
add support for polling via acpi-idle, and cpuidle-haltpoll.
- patches 14, 15,
"arm64/delay: move some constants out to a separate header"
"arm64: support WFET in smp_cond_relaxed_timeout()"
are RFC patches to enable WFET support.
Changelog:
v9:
- reworked the series to address a comment from Catalin Marinas
about how v8 was abusing semantics of smp_cond_load_relaxed().
- add poll_idle() support in acpi-idle (Lifeng Zheng)
- dropped some earlier "Tested-by", "Reviewed-by" due to the
above rework.
v8: No logic changes. Largely respin of v7, with changes
noted below:
- move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its
own patch.
(patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL")
- address comments simplifying arm64 support (Will Deacon)
(patch-11 "arm64: support cpuidle-haltpoll")
v7: No significant logic changes. Mostly a respin of v6.
- minor cleanup in poll_idle() (Christoph Lameter)
- fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c
(Tomohiro Misono)
v6:
- reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL
changes together (comment from Christoph Lameter)
- threshes out the commit messages a bit more (comments from Christoph
Lameter, Sudeep Holla)
- also rework selection of cpuidle-haltpoll. Now selected based
on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
- moved back to arch_haltpoll_want() (comment from Joao Martins)
Also, arch_haltpoll_want() now takes the force parameter and is
now responsible for the complete selection (or not) of haltpoll.
- fixes the build breakage on i386
- fixes the cpuidle-haltpoll module breakage on arm64 (comment from
Tomohiro Misono, Haris Okanovic)
v5:
- rework the poll_idle() loop around smp_cond_load_relaxed() (review
comment from Tomohiro Misono.)
- also rework selection of cpuidle-haltpoll. Now selected based
on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
- arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on
arm64 now depends on the event-stream being enabled.
- limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic)
- ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL.
v4 changes from v3:
- change 7/8 per Rafael input: drop the parens and use ret for the final check
- add 8/8 which renames the guard for building poll_state
v3 changes from v2:
- fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig
- add Ack-by from Rafael Wysocki on 2/7
v2 changes from v1:
- added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ
(this improves by 50% at least the CPU cycles consumed in the tests above:
10,716,881,137 now vs 14,503,014,257 before)
- removed the ifdef from patch 1 per RafaelW
Please review.
[1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@TY3PR01MB11148.jpnprd01.prod.outlook.com/
[2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@amazon.com/
[3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@huawei.com/
Ankur Arora (10):
asm-generic: add barrier smp_cond_load_relaxed_timeout()
cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout()
cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL
arm64: barrier: add support for smp_cond_relaxed_timeout()
arm64: add support for polling in idle
cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL
arm64: idle: export arch_cpu_idle
arm64: support cpuidle-haltpoll
arm64/delay: move some constants out to a separate header
arm64: support WFET in smp_cond_relaxed_timeout()
Joao Martins (4):
Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig
arm64: define TIF_POLLING_NRFLAG
cpuidle-haltpoll: define arch_haltpoll_want()
governors/haltpoll: drop kvm_para_available() check
Lifeng Zheng (1):
ACPI: processor_idle: Support polling state for LPI
arch/Kconfig | 3 ++
arch/arm64/Kconfig | 7 +++
arch/arm64/include/asm/barrier.h | 62 ++++++++++++++++++++++-
arch/arm64/include/asm/cmpxchg.h | 26 ++++++----
arch/arm64/include/asm/cpuidle_haltpoll.h | 20 ++++++++
arch/arm64/include/asm/delay-const.h | 25 +++++++++
arch/arm64/include/asm/thread_info.h | 2 +
arch/arm64/kernel/idle.c | 1 +
arch/arm64/lib/delay.c | 13 ++---
arch/x86/Kconfig | 5 +-
arch/x86/include/asm/cpuidle_haltpoll.h | 1 +
arch/x86/kernel/kvm.c | 13 +++++
drivers/acpi/processor_idle.c | 43 +++++++++++++---
drivers/cpuidle/Kconfig | 5 +-
drivers/cpuidle/Makefile | 2 +-
drivers/cpuidle/cpuidle-haltpoll.c | 12 +----
drivers/cpuidle/governors/haltpoll.c | 6 +--
drivers/cpuidle/poll_state.c | 27 +++-------
drivers/idle/Kconfig | 1 +
include/asm-generic/barrier.h | 42 +++++++++++++++
include/linux/cpuidle.h | 2 +-
include/linux/cpuidle_haltpoll.h | 5 ++
22 files changed, 252 insertions(+), 71 deletions(-)
create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h
create mode 100644 arch/arm64/include/asm/delay-const.h
--
2.43.5
Just a ping on this to see if there are any further comments. Thanks Ankur Ankur Arora <ankur.a.arora@oracle.com> writes: > This patchset adds support for polling in idle via poll_idle() on > arm64. > > There are two main changes in this version: > > 1. rework the series to take Catalin Marinas' comments on the semantics > of smp_cond_load_relaxed() (and how earlier versions of this > series were abusing them) into account. > > This also allows dropping of the somewhat strained connections > between haltpoll and the event-stream. > > 2. earlier versions of this series were adding support for poll_idle() > but only using it in the haltpoll driver. Add Lifeng's patch to > broaden it out by also polling in acpi-idle. > > The benefit of polling in idle is to reduce the cost of remote wakeups. > When enabled, these can be done just by setting the need-resched bit, > instead of sending an IPI, and incurring the cost of handling the > interrupt on the receiver side. When running on a VM it also saves > the cost of WFE trapping (when enabled.) > > Comparing sched-pipe performance on a guest VM: > > # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \ > perf bench sched pipe -l 1000000 -c 4 > > # no polling in idle > > Performance counter stats for 'CPU(s) 4,5' (5 runs): > > 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% ) > 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% ) > 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% ) > 0 sched:sched_wake_idle_without_ipi # 0.000 /sec > > 12.615 +- 0.977 seconds time elapsed ( +- 7.75% ) > > > # polling in idle (with haltpoll): > > Performance counter stats for 'CPU(s) 4,5' (5 runs): > > 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% ) > 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% ) > 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% ) > 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% ) > > 7.566 +- 0.756 seconds time elapsed ( +- 10.00% ) > > Tomohiro Misono and Haris Okanovic also report similar latency > improvements on Grace and Graviton systems (for v7) [1] [2]. > Lifeng also reports improved context switch latency on a bare-metal > machine with acpi-idle [3]. > > The series is in four parts: > > - patches 1-4, > > "asm-generic: add barrier smp_cond_load_relaxed_timeout()" > "cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout()" > "cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL" > "Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig" > > add smp_cond_load_relaxed_timeout() and switch poll_idle() to > using it. Also, do some munging of related kconfig options. > > - patches 5-7, > > "arm64: barrier: add support for smp_cond_relaxed_timeout()" > "arm64: define TIF_POLLING_NRFLAG" > "arm64: add support for polling in idle" > > add support for the new barrier, the polling flag and enable > poll_idle() support. > > - patches 8, 9-13, > > "ACPI: processor_idle: Support polling state for LPI" > > "cpuidle-haltpoll: define arch_haltpoll_want()" > "governors/haltpoll: drop kvm_para_available() check" > "cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL" > "arm64: idle: export arch_cpu_idle" > "arm64: support cpuidle-haltpoll" > > add support for polling via acpi-idle, and cpuidle-haltpoll. > > - patches 14, 15, > "arm64/delay: move some constants out to a separate header" > "arm64: support WFET in smp_cond_relaxed_timeout()" > > are RFC patches to enable WFET support. > > Changelog: > > v9: > > - reworked the series to address a comment from Catalin Marinas > about how v8 was abusing semantics of smp_cond_load_relaxed(). > - add poll_idle() support in acpi-idle (Lifeng Zheng) > - dropped some earlier "Tested-by", "Reviewed-by" due to the > above rework. > > v8: No logic changes. Largely respin of v7, with changes > noted below: > > - move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its > own patch. > (patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL") > > - address comments simplifying arm64 support (Will Deacon) > (patch-11 "arm64: support cpuidle-haltpoll") > > v7: No significant logic changes. Mostly a respin of v6. > > - minor cleanup in poll_idle() (Christoph Lameter) > - fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c > (Tomohiro Misono) > > v6: > > - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL > changes together (comment from Christoph Lameter) > - threshes out the commit messages a bit more (comments from Christoph > Lameter, Sudeep Holla) > - also rework selection of cpuidle-haltpoll. Now selected based > on the architectural selection of ARCH_CPUIDLE_HALTPOLL. > - moved back to arch_haltpoll_want() (comment from Joao Martins) > Also, arch_haltpoll_want() now takes the force parameter and is > now responsible for the complete selection (or not) of haltpoll. > - fixes the build breakage on i386 > - fixes the cpuidle-haltpoll module breakage on arm64 (comment from > Tomohiro Misono, Haris Okanovic) > > v5: > - rework the poll_idle() loop around smp_cond_load_relaxed() (review > comment from Tomohiro Misono.) > - also rework selection of cpuidle-haltpoll. Now selected based > on the architectural selection of ARCH_CPUIDLE_HALTPOLL. > - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on > arm64 now depends on the event-stream being enabled. > - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic) > - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL. > > v4 changes from v3: > - change 7/8 per Rafael input: drop the parens and use ret for the final check > - add 8/8 which renames the guard for building poll_state > > v3 changes from v2: > - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig > - add Ack-by from Rafael Wysocki on 2/7 > > v2 changes from v1: > - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ > (this improves by 50% at least the CPU cycles consumed in the tests above: > 10,716,881,137 now vs 14,503,014,257 before) > - removed the ifdef from patch 1 per RafaelW > > Please review. > > [1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@TY3PR01MB11148.jpnprd01.prod.outlook.com/ > [2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@amazon.com/ > [3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@huawei.com/ > > Ankur Arora (10): > asm-generic: add barrier smp_cond_load_relaxed_timeout() > cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout() > cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL > arm64: barrier: add support for smp_cond_relaxed_timeout() > arm64: add support for polling in idle > cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL > arm64: idle: export arch_cpu_idle > arm64: support cpuidle-haltpoll > arm64/delay: move some constants out to a separate header > arm64: support WFET in smp_cond_relaxed_timeout() > > Joao Martins (4): > Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig > arm64: define TIF_POLLING_NRFLAG > cpuidle-haltpoll: define arch_haltpoll_want() > governors/haltpoll: drop kvm_para_available() check > > Lifeng Zheng (1): > ACPI: processor_idle: Support polling state for LPI > > arch/Kconfig | 3 ++ > arch/arm64/Kconfig | 7 +++ > arch/arm64/include/asm/barrier.h | 62 ++++++++++++++++++++++- > arch/arm64/include/asm/cmpxchg.h | 26 ++++++---- > arch/arm64/include/asm/cpuidle_haltpoll.h | 20 ++++++++ > arch/arm64/include/asm/delay-const.h | 25 +++++++++ > arch/arm64/include/asm/thread_info.h | 2 + > arch/arm64/kernel/idle.c | 1 + > arch/arm64/lib/delay.c | 13 ++--- > arch/x86/Kconfig | 5 +- > arch/x86/include/asm/cpuidle_haltpoll.h | 1 + > arch/x86/kernel/kvm.c | 13 +++++ > drivers/acpi/processor_idle.c | 43 +++++++++++++--- > drivers/cpuidle/Kconfig | 5 +- > drivers/cpuidle/Makefile | 2 +- > drivers/cpuidle/cpuidle-haltpoll.c | 12 +---- > drivers/cpuidle/governors/haltpoll.c | 6 +-- > drivers/cpuidle/poll_state.c | 27 +++------- > drivers/idle/Kconfig | 1 + > include/asm-generic/barrier.h | 42 +++++++++++++++ > include/linux/cpuidle.h | 2 +- > include/linux/cpuidle_haltpoll.h | 5 ++ > 22 files changed, 252 insertions(+), 71 deletions(-) > create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h > create mode 100644 arch/arm64/include/asm/delay-const.h -- ankur
Ankur Arora <ankur.a.arora@oracle.com> writes:
> This patchset adds support for polling in idle via poll_idle() on
> arm64.
>
> There are two main changes in this version:
>
> 1. rework the series to take Catalin Marinas' comments on the semantics
> of smp_cond_load_relaxed() (and how earlier versions of this
> series were abusing them) into account.
There was a recent series adding resilient spinlocks which might also have
use for an smp_cond_load_{acquire,release}_timeout() interface.
(https://lore.kernel.org/lkml/20250107192202.GA36003@noisy.programming.kicks-ass.net/)
So, unless anybody has any objection I'm planning to split out this
series into two parts:
- adding smp_cond_load_*_timeout() (with an arm64 implementation)
- arm64 support for poll_idle() and haltpoll
Ankur
> This also allows dropping of the somewhat strained connections
> between haltpoll and the event-stream.
>
> 2. earlier versions of this series were adding support for poll_idle()
> but only using it in the haltpoll driver. Add Lifeng's patch to
> broaden it out by also polling in acpi-idle.
>
> The benefit of polling in idle is to reduce the cost of remote wakeups.
> When enabled, these can be done just by setting the need-resched bit,
> instead of sending an IPI, and incurring the cost of handling the
> interrupt on the receiver side. When running on a VM it also saves
> the cost of WFE trapping (when enabled.)
>
> Comparing sched-pipe performance on a guest VM:
>
> # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \
> perf bench sched pipe -l 1000000 -c 4
>
> # no polling in idle
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% )
> 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% )
> 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% )
> 0 sched:sched_wake_idle_without_ipi # 0.000 /sec
>
> 12.615 +- 0.977 seconds time elapsed ( +- 7.75% )
>
>
> # polling in idle (with haltpoll):
>
> Performance counter stats for 'CPU(s) 4,5' (5 runs):
>
> 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% )
> 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% )
> 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% )
> 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% )
>
> 7.566 +- 0.756 seconds time elapsed ( +- 10.00% )
>
> Tomohiro Misono and Haris Okanovic also report similar latency
> improvements on Grace and Graviton systems (for v7) [1] [2].
> Lifeng also reports improved context switch latency on a bare-metal
> machine with acpi-idle [3].
>
> The series is in four parts:
>
> - patches 1-4,
>
> "asm-generic: add barrier smp_cond_load_relaxed_timeout()"
> "cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout()"
> "cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL"
> "Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig"
>
> add smp_cond_load_relaxed_timeout() and switch poll_idle() to
> using it. Also, do some munging of related kconfig options.
>
> - patches 5-7,
>
> "arm64: barrier: add support for smp_cond_relaxed_timeout()"
> "arm64: define TIF_POLLING_NRFLAG"
> "arm64: add support for polling in idle"
>
> add support for the new barrier, the polling flag and enable
> poll_idle() support.
>
> - patches 8, 9-13,
>
> "ACPI: processor_idle: Support polling state for LPI"
>
> "cpuidle-haltpoll: define arch_haltpoll_want()"
> "governors/haltpoll: drop kvm_para_available() check"
> "cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL"
> "arm64: idle: export arch_cpu_idle"
> "arm64: support cpuidle-haltpoll"
>
> add support for polling via acpi-idle, and cpuidle-haltpoll.
>
> - patches 14, 15,
> "arm64/delay: move some constants out to a separate header"
> "arm64: support WFET in smp_cond_relaxed_timeout()"
>
> are RFC patches to enable WFET support.
>
> Changelog:
>
> v9:
>
> - reworked the series to address a comment from Catalin Marinas
> about how v8 was abusing semantics of smp_cond_load_relaxed().
> - add poll_idle() support in acpi-idle (Lifeng Zheng)
> - dropped some earlier "Tested-by", "Reviewed-by" due to the
> above rework.
>
> v8: No logic changes. Largely respin of v7, with changes
> noted below:
>
> - move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its
> own patch.
> (patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL")
>
> - address comments simplifying arm64 support (Will Deacon)
> (patch-11 "arm64: support cpuidle-haltpoll")
>
> v7: No significant logic changes. Mostly a respin of v6.
>
> - minor cleanup in poll_idle() (Christoph Lameter)
> - fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c
> (Tomohiro Misono)
>
> v6:
>
> - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL
> changes together (comment from Christoph Lameter)
> - threshes out the commit messages a bit more (comments from Christoph
> Lameter, Sudeep Holla)
> - also rework selection of cpuidle-haltpoll. Now selected based
> on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
> - moved back to arch_haltpoll_want() (comment from Joao Martins)
> Also, arch_haltpoll_want() now takes the force parameter and is
> now responsible for the complete selection (or not) of haltpoll.
> - fixes the build breakage on i386
> - fixes the cpuidle-haltpoll module breakage on arm64 (comment from
> Tomohiro Misono, Haris Okanovic)
>
> v5:
> - rework the poll_idle() loop around smp_cond_load_relaxed() (review
> comment from Tomohiro Misono.)
> - also rework selection of cpuidle-haltpoll. Now selected based
> on the architectural selection of ARCH_CPUIDLE_HALTPOLL.
> - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on
> arm64 now depends on the event-stream being enabled.
> - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic)
> - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL.
>
> v4 changes from v3:
> - change 7/8 per Rafael input: drop the parens and use ret for the final check
> - add 8/8 which renames the guard for building poll_state
>
> v3 changes from v2:
> - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig
> - add Ack-by from Rafael Wysocki on 2/7
>
> v2 changes from v1:
> - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ
> (this improves by 50% at least the CPU cycles consumed in the tests above:
> 10,716,881,137 now vs 14,503,014,257 before)
> - removed the ifdef from patch 1 per RafaelW
>
> Please review.
>
> [1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@TY3PR01MB11148.jpnprd01.prod.outlook.com/
> [2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@amazon.com/
> [3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@huawei.com/
>
> Ankur Arora (10):
> asm-generic: add barrier smp_cond_load_relaxed_timeout()
> cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout()
> cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL
> arm64: barrier: add support for smp_cond_relaxed_timeout()
> arm64: add support for polling in idle
> cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL
> arm64: idle: export arch_cpu_idle
> arm64: support cpuidle-haltpoll
> arm64/delay: move some constants out to a separate header
> arm64: support WFET in smp_cond_relaxed_timeout()
>
> Joao Martins (4):
> Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig
> arm64: define TIF_POLLING_NRFLAG
> cpuidle-haltpoll: define arch_haltpoll_want()
> governors/haltpoll: drop kvm_para_available() check
>
> Lifeng Zheng (1):
> ACPI: processor_idle: Support polling state for LPI
>
> arch/Kconfig | 3 ++
> arch/arm64/Kconfig | 7 +++
> arch/arm64/include/asm/barrier.h | 62 ++++++++++++++++++++++-
> arch/arm64/include/asm/cmpxchg.h | 26 ++++++----
> arch/arm64/include/asm/cpuidle_haltpoll.h | 20 ++++++++
> arch/arm64/include/asm/delay-const.h | 25 +++++++++
> arch/arm64/include/asm/thread_info.h | 2 +
> arch/arm64/kernel/idle.c | 1 +
> arch/arm64/lib/delay.c | 13 ++---
> arch/x86/Kconfig | 5 +-
> arch/x86/include/asm/cpuidle_haltpoll.h | 1 +
> arch/x86/kernel/kvm.c | 13 +++++
> drivers/acpi/processor_idle.c | 43 +++++++++++++---
> drivers/cpuidle/Kconfig | 5 +-
> drivers/cpuidle/Makefile | 2 +-
> drivers/cpuidle/cpuidle-haltpoll.c | 12 +----
> drivers/cpuidle/governors/haltpoll.c | 6 +--
> drivers/cpuidle/poll_state.c | 27 +++-------
> drivers/idle/Kconfig | 1 +
> include/asm-generic/barrier.h | 42 +++++++++++++++
> include/linux/cpuidle.h | 2 +-
> include/linux/cpuidle_haltpoll.h | 5 ++
> 22 files changed, 252 insertions(+), 71 deletions(-)
> create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h
> create mode 100644 arch/arm64/include/asm/delay-const.h
--
ankur
On Mon, Jan 20, 2025 at 01:13:25PM -0800, Ankur Arora wrote:
>
> Ankur Arora <ankur.a.arora@oracle.com> writes:
>
> > This patchset adds support for polling in idle via poll_idle() on
> > arm64.
> >
> > There are two main changes in this version:
> >
> > 1. rework the series to take Catalin Marinas' comments on the semantics
> > of smp_cond_load_relaxed() (and how earlier versions of this
> > series were abusing them) into account.
>
> There was a recent series adding resilient spinlocks which might also have
> use for an smp_cond_load_{acquire,release}_timeout() interface.
> (https://lore.kernel.org/lkml/20250107192202.GA36003@noisy.programming.kicks-ass.net/)
Urgh, that reminds me that I need to go look at that...
Will
© 2016 - 2026 Red Hat, Inc.