[PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts

Maoyi Xie posted 2 patches 1 month, 1 week ago
io_uring/timeout.c | 35 ++++++++++++++++++++++-------------
io_uring/wait.c    |  6 +++++-
2 files changed, 27 insertions(+), 14 deletions(-)
[PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
Posted by Maoyi Xie 1 month, 1 week ago
This series addresses two io_uring code paths that arm an ABS
hrtimer from a timestamp supplied by the caller. Both paths skip
the conversion from the submitter's time namespace view to host
view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
default, or optionally CLOCK_BOOTTIME.

All four other ABS timer interfaces already do this conversion:
timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
alarm_timer_nsleep(TIMER_ABSTIME), and
timerfd_settime(TFD_TIMER_ABSTIME).

Patch 1/2 (io_uring/timeout) covers IORING_OP_TIMEOUT and
IORING_OP_LINK_TIMEOUT via io_parse_user_time(). It is essentially
the draft Pavel posted on the original thread. I rebased it on
io_uring-7.1 and verified end to end.

Patch 2/2 (io_uring/wait) covers the IORING_ENTER_ABS_TIMER path
in io_uring_enter(). That path parses ext_arg->ts inline rather
than going through io_parse_user_time(). Patch 1/2 therefore does
not cover it.

Per Pavel and Jens's discussion on the original thread, the two
sites use two direct timens_ktime_to_host() call sites rather
than a shared helper. Patch 1/2 also splits the existing
io_timeout_get_clock() into a flags only io_flags_to_clock(), so
io_parse_user_time() can resolve the clock without a
struct io_timeout_data.

SQPOLL is automatically covered. The SQPOLL kernel thread is
created via create_io_thread() with CLONE_THREAD and no CLONE_NEW*
flag. copy_namespaces() therefore shares the submitter's nsproxy
by reference. timens_ktime_to_host() through "current" sees the
submitter's time_ns when called from the SQPOLL kthread. PoCs for
both paths confirm this.

Reproducers (run inside unshare --user --time with a -10s
monotonic offset):

  IORING_TIMEOUT_ABS (patch 1/2):
    vanilla 7.1-rc:  elapsed = 1 ms  (bug, fires immediately)
    patched:         elapsed = 1000 ms (offset honoured)

  IORING_ENTER_ABS_TIMER (patch 2/2):
    vanilla 7.1-rc:  elapsed = 1 ms  (bug)
    patched:         elapsed = 999 ms (offset honoured)

Maoyi Xie (2):
  io_uring/timeout: honour caller's time namespace for
    IORING_TIMEOUT_ABS
  io_uring/wait: honour caller's time namespace for
    IORING_ENTER_ABS_TIMER

 io_uring/timeout.c | 35 ++++++++++++++++++++++-------------
 io_uring/wait.c    |  6 +++++-
 2 files changed, 27 insertions(+), 14 deletions(-)


base-commit: 04fe9aeb4f3c0999e6715385664c677469dfd8f4
-- 
2.34.1
Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
Posted by Jens Axboe 1 month, 1 week ago
On Mon, 04 May 2026 23:37:53 +0800, Maoyi Xie wrote:
> This series addresses two io_uring code paths that arm an ABS
> hrtimer from a timestamp supplied by the caller. Both paths skip
> the conversion from the submitter's time namespace view to host
> view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
> default, or optionally CLOCK_BOOTTIME.
> 
> All four other ABS timer interfaces already do this conversion:
> timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
> alarm_timer_nsleep(TIMER_ABSTIME), and
> timerfd_settime(TFD_TIMER_ABSTIME).
> 
> [...]

Applied, thanks!

[1/2] io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
      commit: 9cc6bac1bebf8310d2950d1411a91479e86d69a1
[2/2] io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
      commit: 45d2b37a37ab98484693533496395c610a2cab96

Best regards,
-- 
Jens Axboe
Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
Posted by Pavel Begunkov 1 month, 1 week ago
On 5/4/26 16:37, Maoyi Xie wrote:
> This series addresses two io_uring code paths that arm an ABS
> hrtimer from a timestamp supplied by the caller. Both paths skip
> the conversion from the submitter's time namespace view to host
> view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
> default, or optionally CLOCK_BOOTTIME.
> 
> All four other ABS timer interfaces already do this conversion:
> timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
> alarm_timer_nsleep(TIMER_ABSTIME), and
> timerfd_settime(TFD_TIMER_ABSTIME).
> 
> Patch 1/2 (io_uring/timeout) covers IORING_OP_TIMEOUT and
> IORING_OP_LINK_TIMEOUT via io_parse_user_time(). It is essentially
> the draft Pavel posted on the original thread. I rebased it on
> io_uring-7.1 and verified end to end.
> 
> Patch 2/2 (io_uring/wait) covers the IORING_ENTER_ABS_TIMER path
> in io_uring_enter(). That path parses ext_arg->ts inline rather
> than going through io_parse_user_time(). Patch 1/2 therefore does
> not cover it.
> 
> Per Pavel and Jens's discussion on the original thread, the two
> sites use two direct timens_ktime_to_host() call sites rather
> than a shared helper. Patch 1/2 also splits the existing
> io_timeout_get_clock() into a flags only io_flags_to_clock(), so
> io_parse_user_time() can resolve the clock without a
> struct io_timeout_data.
> 
> SQPOLL is automatically covered. The SQPOLL kernel thread is
> created via create_io_thread() with CLONE_THREAD and no CLONE_NEW*
> flag. copy_namespaces() therefore shares the submitter's nsproxy
> by reference. timens_ktime_to_host() through "current" sees the
> submitter's time_ns when called from the SQPOLL kthread. PoCs for
> both paths confirm this.

At a quick glance, both look good. I think you had an isolated
reproducer, are you sending it as a liburing test? Would be
greatly appreciated.

-- 
Pavel Begunkov
Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
Posted by Jens Axboe 1 month, 1 week ago
On 5/6/26 3:05 AM, Pavel Begunkov wrote:
> On 5/4/26 16:37, Maoyi Xie wrote:
>> This series addresses two io_uring code paths that arm an ABS
>> hrtimer from a timestamp supplied by the caller. Both paths skip
>> the conversion from the submitter's time namespace view to host
>> view via timens_ktime_to_host(). The clock is CLOCK_MONOTONIC by
>> default, or optionally CLOCK_BOOTTIME.
>>
>> All four other ABS timer interfaces already do this conversion:
>> timer_settime(TIMER_ABSTIME), clock_nanosleep(TIMER_ABSTIME),
>> alarm_timer_nsleep(TIMER_ABSTIME), and
>> timerfd_settime(TFD_TIMER_ABSTIME).
>>
>> Patch 1/2 (io_uring/timeout) covers IORING_OP_TIMEOUT and
>> IORING_OP_LINK_TIMEOUT via io_parse_user_time(). It is essentially
>> the draft Pavel posted on the original thread. I rebased it on
>> io_uring-7.1 and verified end to end.
>>
>> Patch 2/2 (io_uring/wait) covers the IORING_ENTER_ABS_TIMER path
>> in io_uring_enter(). That path parses ext_arg->ts inline rather
>> than going through io_parse_user_time(). Patch 1/2 therefore does
>> not cover it.
>>
>> Per Pavel and Jens's discussion on the original thread, the two
>> sites use two direct timens_ktime_to_host() call sites rather
>> than a shared helper. Patch 1/2 also splits the existing
>> io_timeout_get_clock() into a flags only io_flags_to_clock(), so
>> io_parse_user_time() can resolve the clock without a
>> struct io_timeout_data.
>>
>> SQPOLL is automatically covered. The SQPOLL kernel thread is
>> created via create_io_thread() with CLONE_THREAD and no CLONE_NEW*
>> flag. copy_namespaces() therefore shares the submitter's nsproxy
>> by reference. timens_ktime_to_host() through "current" sees the
>> submitter's time_ns when called from the SQPOLL kthread. PoCs for
>> both paths confirm this.
> 
> At a quick glance, both look good. I think you had an isolated
> reproducer, are you sending it as a liburing test? Would be
> greatly appreciated.

+1 Yes please, test case for liburing would be great!

-- 
Jens Axboe
Re: [PATCH 0/2] io_uring: honour submitter's time namespace for ABS timeouts
Posted by Maoyi Xie 1 month, 1 week ago
Hi Pavel,

Thanks for the look. We will turn the reproducers into a
liburing test and send it shortly.

The current shape is two minimal C programs. Each forks into
a fresh user namespace plus time namespace with a -10s
monotonic offset. The child submits either IORING_OP_TIMEOUT
or io_uring_enter with IORING_ENTER_ABS_TIMER and a deadline
of now + 1s. The test asserts the call returns after the
expected ~1000ms rather than after <1ms.

We will reshape that into a single liburing test that
exercises both paths. The test will gate the unshare on
CLONE_NEWUSER | CLONE_NEWTIME availability so it skips
gracefully on kernels without time namespace support. It
will use the standard t_* helpers.

Maoyi
Nanyang Technological University
https://maoyixie.com/