fs/eventpoll.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
After commit 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in
the future"), the following program would immediately enter a busy
loop in the kernel:
```
int main() {
int e = epoll_create1(0);
struct epoll_event event = {.events = EPOLLIN};
epoll_ctl(e, EPOLL_CTL_ADD, 0, &event);
const struct timespec timeout = {.tv_nsec = 1};
epoll_pwait2(e, &event, 1, &timeout, 0);
}
```
This happens because the given (non-zero) timeout of 1 nanosecond
usually expires before ep_poll() is entered and then
ep_schedule_timeout() returns false, but `timed_out` is never set
because the code line that sets it is skipped. This quickly turns
into a soft lockup, RCU stalls and deadlocks, inflicting severe
headaches to the whole system.
When the timeout has expired, we don't need to schedule a hrtimer, but
we should set the `timed_out` variable. Therefore, I suggest moving
the ep_schedule_timeout() check into the `timed_out` expression
instead of skipping it.
Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")
Cc: Joe Damato <jdamato@fastly.com>
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
fs/eventpoll.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4bc264b854c4..d4dbffdedd08 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2111,9 +2111,10 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
write_unlock_irq(&ep->lock);
- if (!eavail && ep_schedule_timeout(to))
- timed_out = !schedule_hrtimeout_range(to, slack,
- HRTIMER_MODE_ABS);
+ if (!eavail)
+ timed_out = !ep_schedule_timeout(to) ||
+ !schedule_hrtimeout_range(to, slack,
+ HRTIMER_MODE_ABS);
__set_current_state(TASK_RUNNING);
/*
--
2.47.2
On Tue, 29 Apr 2025 20:58:27 +0200, Max Kellermann wrote:
> After commit 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in
> the future"), the following program would immediately enter a busy
> loop in the kernel:
>
> ```
> int main() {
> int e = epoll_create1(0);
> struct epoll_event event = {.events = EPOLLIN};
> epoll_ctl(e, EPOLL_CTL_ADD, 0, &event);
> const struct timespec timeout = {.tv_nsec = 1};
> epoll_pwait2(e, &event, 1, &timeout, 0);
> }
> ```
>
> [...]
I've taken this version but also credited/mentioned Joe in the commit message,
noting that I added that info
---
Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes
[1/1] fs/eventpoll: fix endless busy loop after timeout has expired
https://git.kernel.org/vfs/vfs/c/d9ec73301099
On Tue 29-04-25 20:58:27, Max Kellermann wrote:
> After commit 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in
> the future"), the following program would immediately enter a busy
> loop in the kernel:
>
> ```
> int main() {
> int e = epoll_create1(0);
> struct epoll_event event = {.events = EPOLLIN};
> epoll_ctl(e, EPOLL_CTL_ADD, 0, &event);
> const struct timespec timeout = {.tv_nsec = 1};
> epoll_pwait2(e, &event, 1, &timeout, 0);
> }
> ```
>
> This happens because the given (non-zero) timeout of 1 nanosecond
> usually expires before ep_poll() is entered and then
> ep_schedule_timeout() returns false, but `timed_out` is never set
> because the code line that sets it is skipped. This quickly turns
> into a soft lockup, RCU stalls and deadlocks, inflicting severe
> headaches to the whole system.
>
> When the timeout has expired, we don't need to schedule a hrtimer, but
> we should set the `timed_out` variable. Therefore, I suggest moving
> the ep_schedule_timeout() check into the `timed_out` expression
> instead of skipping it.
>
> Fixes: 0a65bc27bd64 ("eventpoll: Set epoll timeout if it's in the future")
> Cc: Joe Damato <jdamato@fastly.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
I agree this makes the logic somewhat more obvious than Joe's fix so feel
free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Thanks!
Honza
> ---
> fs/eventpoll.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4bc264b854c4..d4dbffdedd08 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2111,9 +2111,10 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
>
> write_unlock_irq(&ep->lock);
>
> - if (!eavail && ep_schedule_timeout(to))
> - timed_out = !schedule_hrtimeout_range(to, slack,
> - HRTIMER_MODE_ABS);
> + if (!eavail)
> + timed_out = !ep_schedule_timeout(to) ||
> + !schedule_hrtimeout_range(to, slack,
> + HRTIMER_MODE_ABS);
> __set_current_state(TASK_RUNNING);
>
> /*
> --
> 2.47.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
© 2016 - 2026 Red Hat, Inc.