include/linux/eventpoll.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
Saves two function calls, and one stac/clac pair.
stac/clac is rather expensive on older cpus like Zen 2.
A synthetic network stress test gives a ~1.5% increase of pps
on AMD Zen 2.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
---
Prior attempts by Kuniyuki, in case some background is needed:
v2 : https://lkml.org/lkml/2025/10/28/1553
v1: https://lore.kernel.org/lkml/20251023000535.2897002-1-kuniyu@google.com/
Thanks !
---
include/linux/eventpoll.h | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index ccb478eb174bc2e4454cee59f805f10cfd5bb5fb..ea9ca0e4172aab13b8e576258e1d457a913dcebf 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -82,11 +82,14 @@ static inline struct epoll_event __user *
epoll_put_uevent(__poll_t revents, __u64 data,
struct epoll_event __user *uevent)
{
- if (__put_user(revents, &uevent->events) ||
- __put_user(data, &uevent->data))
- return NULL;
-
+ scoped_user_write_access_size(uevent, sizeof(*uevent), efault) {
+ unsafe_put_user(revents, &uevent->events, efault);
+ unsafe_put_user(data, &uevent->data, efault);
+ }
return uevent+1;
+
+efault:
+ return NULL;
}
#endif
base-commit: 4ae12d8bd9a830799db335ee661d6cbc6597f838
prerequisite-patch-id: f6002c357582927a383603a22e69bc0d7a5b9528
--
2.53.0.473.g4a7958ca14-goog
Le 07/03/2026 à 21:07, Eric Dumazet a écrit :
> Saves two function calls, and one stac/clac pair.
>
> stac/clac is rather expensive on older cpus like Zen 2.
>
> A synthetic network stress test gives a ~1.5% increase of pps
> on AMD Zen 2.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Kuniyuki Iwashima <kuniyu@google.com>
> ---
> Prior attempts by Kuniyuki, in case some background is needed:
>
> v2 : https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2025%2F10%2F28%2F1553&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C85a6664e663a4b224ce208de7c852734%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C639085108483835748%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=tYX5SAFAkM%2B4b2ooJnd8A%2FCO3hp8Evn1AmB0R5TFI4M%3D&reserved=0
> v1: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20251023000535.2897002-1-kuniyu%40google.com%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7C85a6664e663a4b224ce208de7c852734%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C639085108483859728%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=wfPZz01ZLu%2B4HABR941GaOE6PnsLlsAnm%2F0xVEsjM8g%3D&reserved=0
>
> Thanks !
> ---
> include/linux/eventpoll.h | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
> index ccb478eb174bc2e4454cee59f805f10cfd5bb5fb..ea9ca0e4172aab13b8e576258e1d457a913dcebf 100644
> --- a/include/linux/eventpoll.h
> +++ b/include/linux/eventpoll.h
> @@ -82,11 +82,14 @@ static inline struct epoll_event __user *
> epoll_put_uevent(__poll_t revents, __u64 data,
> struct epoll_event __user *uevent)
> {
> - if (__put_user(revents, &uevent->events) ||
> - __put_user(data, &uevent->data))
> - return NULL;
> -
> + scoped_user_write_access_size(uevent, sizeof(*uevent), efault) {
You can use scoped_user_write_access() instead which is defined as
#define scoped_user_write_access(udst, elbl) \
scoped_user_write_access_size(udst, sizeof(*(udst)), elbl)
> + unsafe_put_user(revents, &uevent->events, efault);
> + unsafe_put_user(data, &uevent->data, efault);
> + }
> return uevent+1;
> +
> +efault:
> + return NULL;
> }
> #endif
>
>
> base-commit: 4ae12d8bd9a830799db335ee661d6cbc6597f838
> prerequisite-patch-id: f6002c357582927a383603a22e69bc0d7a5b9528
On Sat, 7 Mar 2026 at 12:07, Eric Dumazet <edumazet@google.com> wrote:
>
> Saves two function calls, and one stac/clac pair.
Bah. I'll just apply this directly, because the source code looks
prettier, it generates better code, and it avoids the locally
unchecked user address, so it's arguably "more correct".
Yes, we checked the address range in ep_check_params, so the
previously unchecked user address wasn't buggy, but it's a pattern
we've actively tried to move away from.
Local checks avoid the worry about where the check has been done, and
the ep_check_params() check could now be removed...
Except we can't, because we have that magical ARM case due to the
horrific layout of this struct that then causes issues for the ARM
oabi situation.
So that ARM version of this same function would have to be fixed to
use this model too before we could actually remove the
ep_check_params() check.
Oh well. This makes one little piece of the kernel better, even if
other horrors lurk everywhere...
Linus
On Sat, 7 Mar 2026 at 15:33, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Oh well. This makes one little piece of the kernel better, even if
> other horrors lurk everywhere...
Side note: if you want to improve code generation just a *tiny* bit
more, you should make that epoll_put_uevent() return a success value
instead ot the updated pointer value.
Because right now the caller does this:
events = epoll_put_uevent(revents, epi->event.data, events);
if (!events) {
and that "if (!events)" test triggers even for the success case,
because the compiler doesn't know whether the "uevent+1" might
overflow (and we really *do* want to make sure the compiler doesn't
remove overflow checks even if it causes these kinds of issues).
If epoll_put_uevent() just returns a success/fail marker (or 0/-EFAULT
or whatever), we wouldn't have that issue.
HOWEVER - the same magical ARM case would need to be dealt with - the
real size of the user space notion of "struct epoll_event" isn't
actually known to generic code, because it ends up depending on
architecture-specific packing issues.
So you'd have to pass in 'events' by reference and epoll_put_uevent()
would still update it.
Probably not worth it - it would mainly get rid of a perfectly
predicted conditional branch, and might possibly help a tiny bit with
register liveness analysis. But it's unlikely to have any really
visible effects - it's not like the clac/stac overhead.
But since I looked at the generated code, I thought I'd mention it.
Linus
On Sun, Mar 8, 2026 at 12:45 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, 7 Mar 2026 at 15:33, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Oh well. This makes one little piece of the kernel better, even if
> > other horrors lurk everywhere...
>
> Side note: if you want to improve code generation just a *tiny* bit
> more, you should make that epoll_put_uevent() return a success value
> instead ot the updated pointer value.
>
> Because right now the caller does this:
>
> events = epoll_put_uevent(revents, epi->event.data, events);
> if (!events) {
>
> and that "if (!events)" test triggers even for the success case,
> because the compiler doesn't know whether the "uevent+1" might
> overflow (and we really *do* want to make sure the compiler doesn't
> remove overflow checks even if it causes these kinds of issues).
>
> If epoll_put_uevent() just returns a success/fail marker (or 0/-EFAULT
> or whatever), we wouldn't have that issue.
>
> HOWEVER - the same magical ARM case would need to be dealt with - the
> real size of the user space notion of "struct epoll_event" isn't
> actually known to generic code, because it ends up depending on
> architecture-specific packing issues.
>
> So you'd have to pass in 'events' by reference and epoll_put_uevent()
> would still update it.
>
> Probably not worth it - it would mainly get rid of a perfectly
> predicted conditional branch, and might possibly help a tiny bit with
> register liveness analysis. But it's unlikely to have any really
> visible effects - it's not like the clac/stac overhead.
>
> But since I looked at the generated code, I thought I'd mention it.
>
> Linus
Thanks Linus
BTW, not really a bug, but we could remove these extra semicolons
added in af4e9ef3d784 ("uaccess: Fix scoped_user_read_access() for
'pointer to const'")
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 809e4f7dfdbd4d4426ef38457ae108f119cb1bee..cdce14d864d6b71ea87396aee35acd6c9f188ae0
100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -654,15 +654,15 @@ static inline void user_access_restore(unsigned
long flags) { }
static __always_inline void __scoped_user_read_access_end(const void *p)
{
user_read_access_end();
-};
+}
static __always_inline void __scoped_user_write_access_end(const void *p)
{
user_write_access_end();
-};
+}
static __always_inline void __scoped_user_rw_access_end(const void *p)
{
user_access_end();
-};
+}
/**
* __scoped_user_access_begin - Start a scoped user access
On Sun, 8 Mar 2026 06:19:13 +0100
Eric Dumazet <edumazet@google.com> wrote:
> On Sun, Mar 8, 2026 at 12:45 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Sat, 7 Mar 2026 at 15:33, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Oh well. This makes one little piece of the kernel better, even if
> > > other horrors lurk everywhere...
> >
> > Side note: if you want to improve code generation just a *tiny* bit
> > more, you should make that epoll_put_uevent() return a success value
> > instead ot the updated pointer value.
> >
> > Because right now the caller does this:
> >
> > events = epoll_put_uevent(revents, epi->event.data, events);
> > if (!events) {
> >
> > and that "if (!events)" test triggers even for the success case,
> > because the compiler doesn't know whether the "uevent+1" might
> > overflow (and we really *do* want to make sure the compiler doesn't
> > remove overflow checks even if it causes these kinds of issues).
> >
> > If epoll_put_uevent() just returns a success/fail marker (or 0/-EFAULT
> > or whatever), we wouldn't have that issue.
> >
> > HOWEVER - the same magical ARM case would need to be dealt with - the
> > real size of the user space notion of "struct epoll_event" isn't
> > actually known to generic code, because it ends up depending on
> > architecture-specific packing issues.
> >
> > So you'd have to pass in 'events' by reference and epoll_put_uevent()
> > would still update it.
> >
> > Probably not worth it - it would mainly get rid of a perfectly
> > predicted conditional branch, and might possibly help a tiny bit with
> > register liveness analysis. But it's unlikely to have any really
> > visible effects - it's not like the clac/stac overhead.
> >
> > But since I looked at the generated code, I thought I'd mention it.
> >
> > Linus
>
> Thanks Linus
>
> BTW, not really a bug, but we could remove these extra semicolons
> added in af4e9ef3d784 ("uaccess: Fix scoped_user_read_access() for
> 'pointer to const'")
>
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index 809e4f7dfdbd4d4426ef38457ae108f119cb1bee..cdce14d864d6b71ea87396aee35acd6c9f188ae0
> 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -654,15 +654,15 @@ static inline void user_access_restore(unsigned
> long flags) { }
> static __always_inline void __scoped_user_read_access_end(const void *p)
> {
> user_read_access_end();
> -};
> +}
> static __always_inline void __scoped_user_write_access_end(const void *p)
> {
> user_write_access_end();
> -};
> +}
> static __always_inline void __scoped_user_rw_access_end(const void *p)
> {
> user_access_end();
> -};
> +}
I was clearly writing those too quickly...
David
>
> /**
> * __scoped_user_access_begin - Start a scoped user access
>
© 2016 - 2026 Red Hat, Inc.