ARCH_HAS_CPU_RELAX is a bit of a misnomer since all architectures
define cpu_relax(). Not all, however, have a performant version, with
some only implementing it as a compiler barrier.
In contexts that this config option is used, it is expected to provide
an architectural primitive that can be used as part of a polling
mechanism -- one that would be cheaper than spinning in a tight loop.
Advertise the availability of such a primitive by renaming to
ARCH_HAS_OPTIMIZED_POLL. And, while at it, explicitly condition
cpuidle-haltpoll and intel-idle, both of which depend on a polling
state, on it.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
arch/x86/Kconfig | 2 +-
drivers/acpi/processor_idle.c | 4 ++--
drivers/cpuidle/Kconfig | 2 +-
drivers/cpuidle/Makefile | 2 +-
drivers/idle/Kconfig | 1 +
include/linux/cpuidle.h | 2 +-
6 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4474bf32d0a4..b238c874875a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -368,7 +368,7 @@ config ARCH_MAY_HAVE_PC_FDC
config GENERIC_CALIBRATE_DELAY
def_bool y
-config ARCH_HAS_CPU_RELAX
+config ARCH_HAS_OPTIMIZED_POLL
def_bool y
config ARCH_HIBERNATION_POSSIBLE
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index bd6a7857ce05..ccef38410950 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -36,7 +36,7 @@
#include <asm/cpu.h>
#endif
-#define ACPI_IDLE_STATE_START (IS_ENABLED(CONFIG_ARCH_HAS_CPU_RELAX) ? 1 : 0)
+#define ACPI_IDLE_STATE_START (IS_ENABLED(CONFIG_ARCH_HAS_OPTIMIZED_POLL) ? 1 : 0)
static unsigned int max_cstate __read_mostly = ACPI_PROCESSOR_MAX_POWER;
module_param(max_cstate, uint, 0400);
@@ -787,7 +787,7 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr)
if (max_cstate == 0)
max_cstate = 1;
- if (IS_ENABLED(CONFIG_ARCH_HAS_CPU_RELAX)) {
+ if (IS_ENABLED(CONFIG_ARCH_HAS_OPTIMIZED_POLL)) {
cpuidle_poll_state_init(drv);
count = 1;
} else {
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index cac5997dca50..75f6e176bbc8 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -73,7 +73,7 @@ endmenu
config HALTPOLL_CPUIDLE
tristate "Halt poll cpuidle driver"
- depends on X86 && KVM_GUEST
+ depends on X86 && KVM_GUEST && ARCH_HAS_OPTIMIZED_POLL
select CPU_IDLE_GOV_HALTPOLL
default y
help
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index d103342b7cfc..f29dfd1525b0 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -7,7 +7,7 @@ obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
obj-$(CONFIG_DT_IDLE_GENPD) += dt_idle_genpd.o
-obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
+obj-$(CONFIG_ARCH_HAS_OPTIMIZED_POLL) += poll_state.o
obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o
##################################################################################
diff --git a/drivers/idle/Kconfig b/drivers/idle/Kconfig
index 6707d2539fc4..6f9b1d48fede 100644
--- a/drivers/idle/Kconfig
+++ b/drivers/idle/Kconfig
@@ -4,6 +4,7 @@ config INTEL_IDLE
depends on CPU_IDLE
depends on X86
depends on CPU_SUP_INTEL
+ depends on ARCH_HAS_OPTIMIZED_POLL
help
Enable intel_idle, a cpuidle driver that includes knowledge of
native Intel hardware idle features. The acpi_idle driver
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3183aeb7f5b4..7e7e58a17b07 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -275,7 +275,7 @@ static inline void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
}
#endif
-#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX)
+#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_OPTIMIZED_POLL)
void cpuidle_poll_state_init(struct cpuidle_driver *drv);
#else
static inline void cpuidle_poll_state_init(struct cpuidle_driver *drv) {}
--
2.39.3
On Tue, 30 Apr 2024, Ankur Arora wrote: > ARCH_HAS_CPU_RELAX is a bit of a misnomer since all architectures > define cpu_relax(). Not all, however, have a performant version, with > some only implementing it as a compiler barrier. > > In contexts that this config option is used, it is expected to provide > an architectural primitive that can be used as part of a polling > mechanism -- one that would be cheaper than spinning in a tight loop. The intend of cpu_relax() is not a polling mechanism. Initial AFAICT it was introduced on x86 as the REP NOP instruction. Aka as PAUSE. And it was part of a spin loop. So there was no connection to polling anything. The intend was to make the processor aware that we are in a spin loop. Various processors have different actions that they take upon encountering such a cpu relax operation. The polling (WFE/WFI) available on ARM (and potentially other platforms) is a different mechanism that is actually intended to reduce the power requirement of the processor until a certain condition is met and that check is done in hardware. These are not the same and I think we need both config options. The issues that you have with WFET later in the patchset arise from not making this distinction. The polling (waiting for an event) could be implemented for a processor not supporting that in hardware by using a loop that checks for the condition and then does a cpu_relax(). With that you could f.e. support the existing cpu_relax() and also have some form of cpu_poll() interface.
Christoph Lameter (Ampere) <cl@gentwo.org> writes: > On Tue, 30 Apr 2024, Ankur Arora wrote: > >> ARCH_HAS_CPU_RELAX is a bit of a misnomer since all architectures >> define cpu_relax(). Not all, however, have a performant version, with >> some only implementing it as a compiler barrier. >> >> In contexts that this config option is used, it is expected to provide >> an architectural primitive that can be used as part of a polling >> mechanism -- one that would be cheaper than spinning in a tight loop. > > The intend of cpu_relax() is not a polling mechanism. Initial AFAICT it was > introduced on x86 as the REP NOP instruction. Aka as PAUSE. And it was part of a > spin loop. So there was no connection to polling anything. Agreed, cpu_relax() is just a mechanism to tell the pipeline that we are in a spin-loop. > The intend was to make the processor aware that we are in a spin loop. Various > processors have different actions that they take upon encountering such a cpu > relax operation. Sure, though most processors don't have a nice mechanism to do that. x86 clearly has the REP; NOP thing. arm64 only has a YIELD which from my measurements is basically a NOP when executed on a system without hardware threads. And that's why only x86 defines ARCH_HAS_CPU_RELAX. > The polling (WFE/WFI) available on ARM (and potentially other platforms) is a > different mechanism that is actually intended to reduce the power requirement of > the processor until a certain condition is met and that check is done in > hardware. Sure. Which almost exactly fits the bill for the poll-idle loop -- except for the timeout part. > These are not the same and I think we need both config options. My main concern is that poll_idle() conflates polling in idle with ARCH_HAS_CPU_RELAX, when they aren't really related. So, poll_idle(), and its users should depend on ARCH_HAS_OPTIMIZED_POLL which, if defined by some architecture, means that poll_idle() would be better than a spin-wait loop. Beyond that I'm okay to keep ARCH_HAS_CPU_RELAX around. That said, do you see a use for ARCH_HAS_CPU_RELAX? The only current user is the poll-idle path. > The issues that you have with WFET later in the patchset arise from not making > this distinction. Did you mean the issue with WFE? I'm not using WFET in this patchset at all. With WFE, sure there's a problem in that you depend on an interrupt or the event-stream to get out of the wait. And, so sometimes you would overshoot the target poll timeout. > The polling (waiting for an event) could be implemented for a processor not > supporting that in hardware by using a loop that checks for the condition and > then does a cpu_relax(). Yeah. That's exactly what patch-6 does. smp_cond_load_relaxed() uses cpu_relax() internally in its spin-loop variant (non arm64). On arm64, this would use LDXR; WFE. Or are you suggesting implementing the arm64 loop via cpu_relax() (and thus YIELD?) Ankur > With that you could f.e. support the existing cpu_relax() and also have some > form of cpu_poll() interface.
© 2016 - 2025 Red Hat, Inc.