It is considered good practice to call cpu_relax() in busy loops, see
Documentation/process/volatile-considered-harmful.rst. This can not
only lower CPU power consumption or yield to a hyperthreaded twin
processor, but also allows an architecture to mitigate hardware issues
(e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
architecture-specific cpu_relax() implementation.
In addition, cpu_relax() is also a compiler barrier. It is not
immediately obvious that the @op argument "function" will result in an
actual function call (e.g. in case of inlining).
Where a function call is a C sequence point, this is lost on inlining.
Therefore, with agressive enough optimization it might be possible for
the compiler to hoist the:
(val) = op(args);
"load" out of the loop because it doesn't see the value changing. The
addition of cpu_relax() would inhibit this.
As the iopoll helpers lack calls to cpu_relax(), people are sometimes
reluctant to use them, and may fall back to open-coded polling loops
(including cpu_relax() calls) instead.
Fix this by adding calls to cpu_relax() to the iopoll helpers:
- For the non-atomic case, it is sufficient to call cpu_relax() in
case of a zero sleep-between-reads value, as a call to
usleep_range() is a safe barrier otherwise. However, it doesn't
hurt to add the call regardless, for simplicity, and for similarity
with the atomic case below.
- For the atomic case, cpu_relax() must be called regardless of the
sleep-between-reads value, as there is no guarantee all
architecture-specific implementations of udelay() handle this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
v2:
- Add Acked-by,
- Add compiler barrier and inlining explanation (thanks, Peter!).
---
include/linux/iopoll.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
index 2c8860e406bd8cae..0417360a6db9b0d6 100644
--- a/include/linux/iopoll.h
+++ b/include/linux/iopoll.h
@@ -53,6 +53,7 @@
} \
if (__sleep_us) \
usleep_range((__sleep_us >> 2) + 1, __sleep_us); \
+ cpu_relax(); \
} \
(cond) ? 0 : -ETIMEDOUT; \
})
@@ -95,6 +96,7 @@
} \
if (__delay_us) \
udelay(__delay_us); \
+ cpu_relax(); \
} \
(cond) ? 0 : -ETIMEDOUT; \
})
--
2.34.1
On Wed, 10 May 2023 at 15:23, Geert Uytterhoeven <geert+renesas@glider.be> wrote: > > It is considered good practice to call cpu_relax() in busy loops, see > Documentation/process/volatile-considered-harmful.rst. This can not > only lower CPU power consumption or yield to a hyperthreaded twin > processor, but also allows an architecture to mitigate hardware issues > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > architecture-specific cpu_relax() implementation. > > In addition, cpu_relax() is also a compiler barrier. It is not > immediately obvious that the @op argument "function" will result in an > actual function call (e.g. in case of inlining). > > Where a function call is a C sequence point, this is lost on inlining. > Therefore, with agressive enough optimization it might be possible for > the compiler to hoist the: > > (val) = op(args); > > "load" out of the loop because it doesn't see the value changing. The > addition of cpu_relax() would inhibit this. > > As the iopoll helpers lack calls to cpu_relax(), people are sometimes > reluctant to use them, and may fall back to open-coded polling loops > (including cpu_relax() calls) instead. > > Fix this by adding calls to cpu_relax() to the iopoll helpers: > - For the non-atomic case, it is sufficient to call cpu_relax() in > case of a zero sleep-between-reads value, as a call to > usleep_range() is a safe barrier otherwise. However, it doesn't > hurt to add the call regardless, for simplicity, and for similarity > with the atomic case below. > - For the atomic case, cpu_relax() must be called regardless of the > sleep-between-reads value, as there is no guarantee all > architecture-specific implementations of udelay() handle this. > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Acked-by: Arnd Bergmann <arnd@arndb.de> Makes sense to me! Feel free to add: Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Kind regards Uffe > --- > v2: > - Add Acked-by, > - Add compiler barrier and inlining explanation (thanks, Peter!). > --- > include/linux/iopoll.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h > index 2c8860e406bd8cae..0417360a6db9b0d6 100644 > --- a/include/linux/iopoll.h > +++ b/include/linux/iopoll.h > @@ -53,6 +53,7 @@ > } \ > if (__sleep_us) \ > usleep_range((__sleep_us >> 2) + 1, __sleep_us); \ > + cpu_relax(); \ > } \ > (cond) ? 0 : -ETIMEDOUT; \ > }) > @@ -95,6 +96,7 @@ > } \ > if (__delay_us) \ > udelay(__delay_us); \ > + cpu_relax(); \ > } \ > (cond) ? 0 : -ETIMEDOUT; \ > }) > -- > 2.34.1 >
* Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > It is considered good practice to call cpu_relax() in busy loops, see > Documentation/process/volatile-considered-harmful.rst. This can not > only lower CPU power consumption or yield to a hyperthreaded twin > processor, but also allows an architecture to mitigate hardware issues > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > architecture-specific cpu_relax() implementation. > > In addition, cpu_relax() is also a compiler barrier. It is not > immediately obvious that the @op argument "function" will result in an > actual function call (e.g. in case of inlining). > > Where a function call is a C sequence point, this is lost on inlining. > Therefore, with agressive enough optimization it might be possible for > the compiler to hoist the: > > (val) = op(args); > > "load" out of the loop because it doesn't see the value changing. The > addition of cpu_relax() would inhibit this. > > As the iopoll helpers lack calls to cpu_relax(), people are sometimes > reluctant to use them, and may fall back to open-coded polling loops > (including cpu_relax() calls) instead. > > Fix this by adding calls to cpu_relax() to the iopoll helpers: > - For the non-atomic case, it is sufficient to call cpu_relax() in > case of a zero sleep-between-reads value, as a call to > usleep_range() is a safe barrier otherwise. However, it doesn't > hurt to add the call regardless, for simplicity, and for similarity > with the atomic case below. > - For the atomic case, cpu_relax() must be called regardless of the > sleep-between-reads value, as there is no guarantee all > architecture-specific implementations of udelay() handle this. Reviewed-by: Tony Lindgren <tony@atomide.com>
From: Tony Lindgren > Sent: 11 May 2023 07:49 > > * Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > > It is considered good practice to call cpu_relax() in busy loops, see > > Documentation/process/volatile-considered-harmful.rst. This can not > > only lower CPU power consumption or yield to a hyperthreaded twin > > processor, but also allows an architecture to mitigate hardware issues > > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > > architecture-specific cpu_relax() implementation. Don't you also need to call cond_resched() (at least some times). Otherwise the process can't be pre-empted and a RT process that last ran on that cpu will never be scheduled. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi David,
On Thu, May 11, 2023 at 12:49 PM David Laight <David.Laight@aculab.com> wrote:
> > * Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]:
> > > It is considered good practice to call cpu_relax() in busy loops, see
> > > Documentation/process/volatile-considered-harmful.rst. This can not
> > > only lower CPU power consumption or yield to a hyperthreaded twin
> > > processor, but also allows an architecture to mitigate hardware issues
> > > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the
> > > architecture-specific cpu_relax() implementation.
>
> Don't you also need to call cond_resched() (at least some times).
> Otherwise the process can't be pre-empted and a RT process
> that last ran on that cpu will never be scheduled.
According to [1], cond_resched() must be called at least once per few
tens of milliseconds.
read_poll_timeout() uses usleep_range(), which calls schedule_hrtimeout_range().
read_poll_timeout_atomic() should not be used with multi-ms timeouts anyway.
So I guess we're OK?
[1] https://elixir.bootlin.com/linux/latest/source/Documentation/RCU/Design/Requirements/Requirements.rst#L2348
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
From: Geert Uytterhoeven > Sent: 23 May 2023 08:30 > > Hi David, > > On Thu, May 11, 2023 at 12:49 PM David Laight <David.Laight@aculab.com> wrote: > > > * Geert Uytterhoeven <geert+renesas@glider.be> [230510 13:23]: > > > > It is considered good practice to call cpu_relax() in busy loops, see > > > > Documentation/process/volatile-considered-harmful.rst. This can not > > > > only lower CPU power consumption or yield to a hyperthreaded twin > > > > processor, but also allows an architecture to mitigate hardware issues > > > > (e.g. ARM Erratum 754327 for Cortex-A9 prior to r2p0) in the > > > > architecture-specific cpu_relax() implementation. > > > > Don't you also need to call cond_resched() (at least some times). > > Otherwise the process can't be pre-empted and a RT process > > that last ran on that cpu will never be scheduled. > > According to [1], cond_resched() must be called at least once per few > tens of milliseconds. Hmmm.... tens of milliseconds is really much too long for RT threads. A limit nearer 1ms would be barely acceptable. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
© 2016 - 2026 Red Hat, Inc.