[v1] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

[PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

Posted by Sasha Levin 4 days, 11 hours ago

From: Maoyi Xie <maoyixie.tju@gmail.com>

[ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]

io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
timespec from the caller via ext_arg->ts. It arms an ABS mode
hrtimer in __io_cqring_wait_schedule(). The conversion path in
io_uring/wait.c parses ext_arg->ts inline rather than going
through io_parse_user_time(). It therefore does not pick up the
time namespace conversion added by the previous patch.

Apply timens_ktime_to_host() to the parsed time on the
IORING_ENTER_ABS_TIMER branch. This mirrors the IORING_TIMEOUT_ABS
fix in io_parse_user_time(). Use ctx->clockid as the clock id.
ctx->clockid is set either at ring creation or via
IORING_REGISTER_CLOCK.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces. It is also a no-op for callers in the initial time
namespace. The fast path is unchanged.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, call io_uring_enter with min_complete=1,
IORING_ENTER_ABS_TIMER, and ts = now + 1s. The call returns
-ETIME after <1ms instead of after the expected ~1s.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Link: https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Phase Walkthrough

### Phase 1: Commit Message Forensics
Record: Subsystem `io_uring/wait`; action verb `honour`; intent is to
make `IORING_ENTER_ABS_TIMER` interpret caller absolute times in the
caller’s time namespace.

Record: Tags present:
`Suggested-by: Pavel Begunkov`, `Suggested-by: Jens Axboe`, author
`Signed-off-by: Maoyi Xie`, `Link:
https://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`,
maintainer `Signed-off-by: Jens Axboe`. No `Fixes:`, `Reported-by:`,
`Tested-by:`, `Reviewed-by`, `Acked-by`, or `Cc: stable`.

Record: The commit describes a real userspace-visible bug:
`io_uring_enter()` with `IORING_ENTER_ABS_TIMER` parses `ext_arg->ts`
directly, then arms an absolute hrtimer without converting from the
caller’s time namespace to host time. The supplied reproducer in
`unshare --user --time` with a `-10s` monotonic offset returns `-ETIME`
in under 1 ms instead of about 1 second.

Record: This is not hidden cleanup. It is a direct correctness fix for
absolute timeout interpretation in time namespaces.

### Phase 2: Diff Analysis
Record: One file changed, `io_uring/wait.c`, 5 insertions and 1
deletion. Function modified: `io_cqring_wait()`. Scope: single-file
surgical fix.

Record: Before, `ext_arg->ts` was converted with
`timespec64_to_ktime()`. If `IORING_ENTER_ABS_TIMER` was unset, the code
added `start_time`; if set, it used the raw caller value as a host
absolute deadline. After, the absolute branch calls
`timens_ktime_to_host(ctx->clockid, iowq.timeout)`, while the relative
branch remains unchanged.

Record: Bug category is logic/correctness in time namespace handling.
The broken mechanism is that a namespaced absolute
`CLOCK_MONOTONIC`/`CLOCK_BOOTTIME` timestamp was fed to a host hrtimer
as if it were already in host time.

Record: Fix quality is strong: minimal, local, uses existing kernel
helper, and no new API. Regression risk is very low because
`timens_ktime_to_host()` is verified as a no-op for the initial time
namespace, for unsupported clocks, and when `CONFIG_TIME_NS` is
disabled.

### Phase 3: Git History Investigation
Record: `git blame` on the changed wait lines points to `0105b0562a5e`
(`io_uring: split out CQ waiting code into wait.c`) for the current file
location. The same logic predates the split; `2b8e976b9842` (`io_uring:
user registered clockid for wait timeouts`) shows this absolute-wait
path using `ctx->clockid` and is contained by `v6.12-rc1`.

Record: No `Fixes:` tag is present, so there was no tagged introducing
commit to follow. I inspected the companion parent commit instead:
`9cc6bac1bebf` fixes the same time-namespace issue for
`IORING_TIMEOUT_ABS`.

Record: Recent related history shows this is patch 2/2 after
`9cc6bac1bebf`. The candidate’s parent is exactly `9cc6bac1bebf`, but
this wait fix compiles independently as long as `timens_ktime_to_host()`
and `ctx->clockid` exist.

Record: Author history in `io_uring` before this commit only showed the
companion timeout fix. Jens Axboe applied the patch, and Pavel/Jens were
suggested-by/review participants.

Record: Dependencies: affected stable trees need `ctx->clockid` and
`timens_ktime_to_host()`. I verified both exist in local `for-
greg/6.12-100`; the same `IORING_ENTER_ABS_TIMER` buggy line exists in
`6.12`, `6.18`, `6.19`, and `7.0` local stable branches, but not in
`5.10`, `5.15`, `6.1`, or `6.6`.

### Phase 4: Mailing List And External Research
Record: `b4 dig -c 45d2b37a37ab...` found the original submission at `ht
tps://patch.msgid.link/20260504153755.1293932-3-maoyi.xie@ntu.edu.sg`.

Record: `b4 dig -a` found only v1 of the series. The thread shows Jens
applied both patches with commit IDs `9cc6bac1bebf` and `45d2b37a37ab`.

Record: `b4 dig -w` shows the right people/lists were included: Maoyi
Xie, Jens Axboe, Pavel Begunkov, `io-uring@vger.kernel.org`, and `linux-
kernel@vger.kernel.org`.

Record: Reviewer feedback was positive: Pavel wrote “both look good” and
requested a liburing test; Jens replied “+1” for the test and later
applied the series. No NAKs or objections found.

Record: No separate bug-report link exists beyond the patch
thread/reproducer. Stable-specific WebFetch was blocked by Anubis, and
local thread search found no stable nomination.

### Phase 5: Code Semantic Analysis
Record: Modified function: `io_cqring_wait()`.

Record: Callers: `io_uring_enter(2)` reaches `io_cqring_wait()` when
`IORING_ENTER_GETEVENTS` is set, after `io_get_ext_arg()` copies/parses
the userspace getevents argument. This is directly syscall-reachable.

Record: Key callees: `timespec64_to_ktime()`, `timens_ktime_to_host()`,
`ktime_add()`, `io_get_time()`, `io_cqring_schedule_timeout()`, and
hrtimer setup/start helpers.

Record: Call chain: userspace `io_uring_enter()` -> `io_get_ext_arg()`
-> `io_cqring_wait()` -> `io_cqring_wait_schedule()` ->
`__io_cqring_wait_schedule()` -> `io_cqring_schedule_timeout()` ->
absolute hrtimer. The buggy path is reachable from userspace with
`IORING_ENTER_GETEVENTS | IORING_ENTER_EXT_ARG |
IORING_ENTER_ABS_TIMER`.

Record: Similar patterns: the companion commit fixes
`io_parse_user_time()` for `IORING_TIMEOUT_ABS`; POSIX timers,
`clock_nanosleep`, alarm timers, and `timerfd` already use
`timens_ktime_to_host()` for absolute timers.

### Phase 6: Stable Tree Analysis
Record: Local stable-branch grep found the buggy
`IORING_ENTER_ABS_TIMER` code in `for-greg/6.12-100`, `for-
greg/6.18-100`, `for-greg/6.19-200`, and `for-greg/7.0-100`. It was
absent from `5.10`, `5.15`, `6.1`, and `6.6`.

Record: Backport difficulty: current `7.0.y` apply check succeeds
cleanly. `6.12`/`7.0` have `io_uring/wait.c`; `6.18`/`6.19` local
branches have the same logic in `io_uring/io_uring.c`, so those need a
path/context backport but not semantic rework.

Record: No related fix with this subject was found in the checked stable
candidate branches.

### Phase 7: Subsystem Context
Record: Subsystem is `io_uring`, a core async I/O syscall subsystem.
Criticality: IMPORTANT, not universal core MM/VFS, but directly
userspace-facing and widely used.

Record: Subsystem activity is high; recent `io_uring` history has many
fixes and feature changes. This specific change is small despite the
active subsystem.

### Phase 8: Impact And Risk
Record: Affected population: users of `io_uring_enter()` absolute CQ
wait timeouts inside non-initial time namespaces, especially container-
like environments. Branch-limited to stable trees that contain
`IORING_ENTER_ABS_TIMER`.

Record: Trigger: userspace can trigger via `io_uring_enter()` with
`IORING_ENTER_ABS_TIMER` and a timespec from a shifted time namespace.
The provided reproducer uses `unshare --user --time`; whether fully
unprivileged depends on system user-namespace policy.

Record: Failure mode: incorrect timeout behavior. With the reproduced
negative offset, the wait returns `-ETIME` immediately; with other
offsets, absolute waits can be delayed incorrectly. Severity: MEDIUM to
HIGH user-visible correctness bug, potential application timeout/hang
behavior, but not a kernel crash, memory corruption, or security fix.

Record: Benefit is moderate/high for affected containerized users
because it restores syscall semantics. Risk is very low: one local
conditional change plus an include, using established helper semantics.

### Phase 9: Final Synthesis
Record: Evidence for backporting: real reproduced bug, syscall-
reachable, affects stable branches with the feature, tiny patch,
maintainer-applied, positive reviewer feedback, matches established
time-namespace behavior elsewhere.

Record: Evidence against backporting: not a crash/security/data-
corruption fix; affects a narrower feature combination; no explicit
stable nomination; older stable trees do not contain the affected
feature.

Record: Unresolved: I did not run the reproducer locally. Lore WebFetch
was blocked by Anubis, but `b4` successfully fetched the thread. Exact
first upstream introduction of `IORING_ENTER_ABS_TIMER` was not cleanly
reconstructed from local blame alone, but affected stable branches were
directly verified by grep.

Stable rules:
1. Obviously correct and tested: yes by code inspection, reproducer, and
   positive review; no formal `Tested-by`.
2. Fixes a real bug: yes, reproduced wrong timeout result.
3. Important issue: yes for affected users, because absolute waits can
   return immediately or at the wrong time.
4. Small and contained: yes, 6-line single-function change.
5. No new features/APIs: yes.
6. Can apply to stable: yes for current `7.0.y`; minor path adjustment
   may be needed in some branches.

No automatic exception category applies.

## Verification
- [Phase 1] Parsed `git show` commit message and tags for
  `45d2b37a37ab98484693533496395c610a2cab96`.
- [Phase 2] Verified diff is one file, `io_uring/wait.c`, 5 insertions/1
  deletion in `io_cqring_wait()`.
- [Phase 3] Ran `git blame` on the changed lines; current file location
  comes from `0105b0562a5e`.
- [Phase 3] Inspected `2b8e976b9842`; verified `ctx->clockid`,
  `io_get_time(ctx)`, and selected-clock wait timeout support.
- [Phase 3] Inspected companion commit `9cc6bac1bebf`; verified same
  class of fix for `IORING_TIMEOUT_ABS`.
- [Phase 4] Ran `b4 dig`, `b4 dig -a`, `b4 dig -w`, and `b4 mbox`;
  verified v1-only series, correct recipients, positive feedback, and
  applied notice.
- [Phase 5] Read `io_uring_enter()` and `io_get_ext_arg()` call path;
  verified direct syscall reachability.
- [Phase 5] Verified `timens_ktime_to_host()` behavior in
  `include/linux/time_namespace.h` and `kernel/time/namespace.c`.
- [Phase 5] Verified similar established conversions in `kernel/time`
  and `fs/timerfd.c`.
- [Phase 6] Ran `git grep` on local stable branches; affected: `6.12`,
  `6.18`, `6.19`, `7.0`; unaffected: `5.10`, `5.15`, `6.1`, `6.6`.
- [Phase 6] Ran `git apply --check` for the candidate patch on current
  `7.0.y`; it applies cleanly.
- [Phase 8] Verified reproducer details from commit and mailing-list
  cover letter; did not execute it locally.

This should be backported to stable trees that contain
`IORING_ENTER_ABS_TIMER`, with the companion timeout patch strongly
recommended for complete io_uring absolute-timeout time-namespace
correctness.

**YES**

 io_uring/wait.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/io_uring/wait.c b/io_uring/wait.c
index 91df86ce0d18c..ec01e78a216d6 100644
--- a/io_uring/wait.c
+++ b/io_uring/wait.c
@@ -5,6 +5,7 @@
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/io_uring.h>
+#include <linux/time_namespace.h>
 
 #include <trace/events/io_uring.h>
 
@@ -229,7 +230,10 @@ int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 
 	if (ext_arg->ts_set) {
 		iowq.timeout = timespec64_to_ktime(ext_arg->ts);
-		if (!(flags & IORING_ENTER_ABS_TIMER))
+		if (flags & IORING_ENTER_ABS_TIMER)
+			iowq.timeout = timens_ktime_to_host(ctx->clockid,
+							    iowq.timeout);
+		else
 			iowq.timeout = ktime_add(iowq.timeout, start_time);
 	}
 
-- 
2.53.0

Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

Posted by Jens Axboe 4 days, 11 hours ago

On 5/20/26 5:18 AM, Sasha Levin wrote:
> From: Maoyi Xie <maoyixie.tju@gmail.com>
> 
> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
> 
> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
> timespec from the caller via ext_arg->ts. It arms an ABS mode
> hrtimer in __io_cqring_wait_schedule(). The conversion path in
> io_uring/wait.c parses ext_arg->ts inline rather than going
> through io_parse_user_time(). It therefore does not pick up the
> time namespace conversion added by the previous patch.

Once again - If you auto-pick this one, please also do the other one in
the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
do just one of them.

-- 
Jens Axboe

Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

Posted by Jens Axboe 1 day, 8 hours ago

On 5/20/26 5:40 AM, Jens Axboe wrote:
> On 5/20/26 5:18 AM, Sasha Levin wrote:
>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>
>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>
>> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
>> timespec from the caller via ext_arg->ts. It arms an ABS mode
>> hrtimer in __io_cqring_wait_schedule(). The conversion path in
>> io_uring/wait.c parses ext_arg->ts inline rather than going
>> through io_parse_user_time(). It therefore does not pick up the
>> time namespace conversion added by the previous patch.
> 
> Once again - If you auto-pick this one, please also do the other one in
> the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
> do just one of them.

And once again, no reply. What is going on with stable these days?

-- 
Jens Axboe

Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

Posted by Sasha Levin 1 day, 7 hours ago

On Sat, May 23, 2026 at 08:23:13AM -0600, Jens Axboe wrote:
>On 5/20/26 5:40 AM, Jens Axboe wrote:
>> On 5/20/26 5:18 AM, Sasha Levin wrote:
>>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>>
>>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>>
>>> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
>>> timespec from the caller via ext_arg->ts. It arms an ABS mode
>>> hrtimer in __io_cqring_wait_schedule(). The conversion path in
>>> io_uring/wait.c parses ext_arg->ts inline rather than going
>>> through io_parse_user_time(). It therefore does not pick up the
>>> time namespace conversion added by the previous patch.
>>
>> Once again - If you auto-pick this one, please also do the other one in
>> the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
>> do just one of them.
>
>And once again, no reply. What is going on with stable these days?

Jens, as I've mentioned in the previous mail, I handle the AUTOSEL mails weeks
after I originally sent them out for reviews.

The volume of mails and patches makes it really difficult to give prompt
answers here. I have no idea if 9cc6bac1bebf8310d2950d1411a91479e86d69a1
applies cleanly, whether I need to ask for a backport, or whether I should just
drop 45d2b37a37ab9848 until I sit down and get to this batch of AUTOSEL
commits.

If this process doesn't work well for you, I'm happy top skip all
non-stable-tagged commits for io_uring. This is supposed to be only a best
effort attempt to catch commits that slipped through the cracks.

-- 
Thanks,
Sasha

Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

Posted by Jens Axboe 1 day, 7 hours ago

On 5/23/26 8:45 AM, Sasha Levin wrote:
> On Sat, May 23, 2026 at 08:23:13AM -0600, Jens Axboe wrote:
>> On 5/20/26 5:40 AM, Jens Axboe wrote:
>>> On 5/20/26 5:18 AM, Sasha Levin wrote:
>>>> From: Maoyi Xie <maoyixie.tju@gmail.com>
>>>>
>>>> [ Upstream commit 45d2b37a37ab98484693533496395c610a2cab96 ]
>>>>
>>>> io_uring_enter() with IORING_ENTER_ABS_TIMER takes an absolute
>>>> timespec from the caller via ext_arg->ts. It arms an ABS mode
>>>> hrtimer in __io_cqring_wait_schedule(). The conversion path in
>>>> io_uring/wait.c parses ext_arg->ts inline rather than going
>>>> through io_parse_user_time(). It therefore does not pick up the
>>>> time namespace conversion added by the previous patch.
>>>
>>> Once again - If you auto-pick this one, please also do the other one in
>>> the series, 9cc6bac1bebf8310d2950d1411a91479e86d69a1. Makes no sense to
>>> do just one of them.
>>
>> And once again, no reply. What is going on with stable these days?
> 
> Jens, as I've mentioned in the previous mail, I handle the AUTOSEL
> mails weeks after I originally sent them out for reviews.

And you think that's working fine? I would suggest that's a terrible
process. How are maintainers supposed to deal with that? Patches x and y
are autoselected and an email is sent out. Maintainers react to that,
either saying "no don't pick X" or "if you pick Y, please also do Z".
The expectation would then be a reply that says "ok, doing that" or
whatever might be appropriate there. Instead, it's just silence. And now
I have to follow-up MULTIPLE times to ensure the right thing is being
done. We're about 2 weeks into this particular incidence, and
hilariously, I still have no idea what the state is on your end. Did it
get dropped? Did the other one I asked for get picked up? Nobody knows!

At least Greg actually promptly replies for the non-autosel stuff he
does. Which is the ONLY thing that makes Fixes tags and CC stable
actually work. The AUTOSEL stuff, it does not. When it happens to pick
the right patches, yeah all is good. But when there's a problem, the
process is terrible, as evidenced by this particular patch.

> The volume of mails and patches makes it really difficult to give
> prompt answers here. I have no idea if
> 9cc6bac1bebf8310d2950d1411a91479e86d69a1 applies cleanly, whether I
> need to ask for a backport, or whether I should just drop
> 45d2b37a37ab9848 until I sit down and get to this batch of AUTOSEL
> commits.

If you can't handle basic replies when running AUTOSEL, then I don't
think you should have that process in the first place.

> If this process doesn't work well for you, I'm happy top skip all
> non-stable-tagged commits for io_uring. This is supposed to be only a
> best effort attempt to catch commits that slipped through the cracks.

Please don't do AUTOSEL for any patches for any subsystem that I am a
maintainer or co-maintainer of. Until this part of the stable tree
process can be improved, it's a net negative.

-- 
Jens Axboe

Re: [PATCH AUTOSEL 7.0] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER

Posted by Sasha Levin 1 day, 7 hours ago

On Sat, May 23, 2026 at 08:55:43AM -0600, Jens Axboe wrote:
>On 5/23/26 8:45 AM, Sasha Levin wrote:
>> The volume of mails and patches makes it really difficult to give
>> prompt answers here. I have no idea if
>> 9cc6bac1bebf8310d2950d1411a91479e86d69a1 applies cleanly, whether I
>> need to ask for a backport, or whether I should just drop
>> 45d2b37a37ab9848 until I sit down and get to this batch of AUTOSEL
>> commits.
>
>If you can't handle basic replies when running AUTOSEL, then I don't
>think you should have that process in the first place.

You know, you're probably right. I'll just take a break from AUTOSEL for now.

-- 
Thanks,
Sasha