From nobody Tue Sep 16 08:27:01 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 398F7C3DA7D for ; Thu, 5 Jan 2023 13:44:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232492AbjAENom (ORCPT ); Thu, 5 Jan 2023 08:44:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230261AbjAENok (ORCPT ); Thu, 5 Jan 2023 08:44:40 -0500 Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9D2132E98 for ; Thu, 5 Jan 2023 05:44:38 -0800 (PST) Received: by mail-wm1-x32d.google.com with SMTP id k26-20020a05600c1c9a00b003d972646a7dso1331108wms.5 for ; Thu, 05 Jan 2023 05:44:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=fiP50d/4wrLXIRRquNuNAlbq+XVfP1x5KxwsWe04iCI=; b=ayIh1ZezmhZyoOvJH4Y4i6j4YOiT/mzHfaQquyz6M0DNzoGTO/FkJqiy0/NabjhGAu ciRMpWroIqj8n3Ieeg0Y0PHDNxzwrVCiZ6Vt9gcB2VZr4UOqMPlRIGreOYxu1Bs/8cR+ 0R0O0ikmPBeNASwr0n09Wlx2ShMMkqbsBaaXzkitGNZG2hknu892hjMBfeOzx+kve/r1 /5NDgl8AtNRjjhGemJ5sdusxm8Py7yTfnTWVNwPi6p5Elw5A1J/PbIQbxwvuYh/nKgue mb6kNeZ3HgZg4VHT7li4uSr9w3S/rUOKbZnUferzgd5ZnZ1lae2W74OeG1WRPT0Ifmej MJfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fiP50d/4wrLXIRRquNuNAlbq+XVfP1x5KxwsWe04iCI=; b=je7Af/1WYsQfx4n8hl2XvwSBRL7svPG7Tlt/FdA7giE8EkcpKGay6Ko1LkByY0njxn +XXisi1ugqGplYX1PE4C/aMBuVMCVKqcUEa6ESn6hbIYZtu4BqVO8dOF/BSViva8dDoH XZPooqdapTtTNNOstA6Ok2TMY6Lc3AdbxKeqcLxzE9qg2axwq3lOg4Pu82OsvD0RN6j2 mboNjlUxOEsOuKz4qtidSEcYLKOinZFnL94GAKYBGLelqyglOCafsINjiehNj75AqOD0 w3qx9YhtLCmnEdYqdgCe8R0105sCPzSuH/cA8T9k4vdwkcqhjB9scYVEL/aujb5uIAYl ojCg== X-Gm-Message-State: AFqh2kphk0IspUKWggE6ltnlcj66pEXDjW6oxQ4jQ3LjCWMtmg60zX3U q3vNK/8uqjCFrScXzH3oMbpD4b2f7uplsADJIOw= X-Google-Smtp-Source: AMrXdXtUqfb7QiUL+ZC8a+J4Q3lia3PQQSTmEVhBfgJ9MoiBUgkL63B6Pn8eP3xH41zSzc7wkhuG9w== X-Received: by 2002:a05:600c:3ca0:b0:3cf:ceac:94bb with SMTP id bg32-20020a05600c3ca000b003cfceac94bbmr178270wmb.0.1672926277125; Thu, 05 Jan 2023 05:44:37 -0800 (PST) Received: from localhost ([2a00:79e0:9d:4:d5b5:a846:e188:66d0]) by smtp.gmail.com with ESMTPSA id n14-20020a05600c3b8e00b003b49bd61b19sm2736241wms.15.2023.01.05.05.44.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Jan 2023 05:44:36 -0800 (PST) From: Jann Horn To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, Oleg Nesterov Subject: [PATCH] time: Prevent union confusion from unexpected restart_syscall() Date: Thu, 5 Jan 2023 14:44:03 +0100 Message-Id: <20230105134403.754986-1-jannh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The nanosleep syscalls use the restart_block mechanism, with a quirk: The `type` and `rmtp`/`compat_rmtp` fields are set up unconditionally on syscall entry, while the rest of the restart_block is only set up in the unlikely case that the syscall is actually interrupted by a signal (or pseudo-signal) that doesn't have a signal handler. If the restart_block was set up by a previous syscall (futex(..., FUTEX_WAIT, ...) or poll()) and hasn't been invalidated somehow since then, this will clobber some of the union fields used by futex_wait_restart()/do_restart_poll(). If userspace afterwards wrongly calls the restart_syscall syscall, futex_wait_restart()/do_restart_poll() will read struct fields that have been clobbered. This doesn't actually lead to anything particularly interesting because none of the union fields contain trusted kernel data, and futex(..., FUTEX_WAIT, ...) and poll() aren't syscalls where it makes much sense to apply seccomp filters to their arguments. So the current consequences are just of the "if userspace does bad stuff, it can damage itself, and that's not a problem" flavor. But still, it seems like a hazard for future developers, so invalidate the restart_block when partly setting it up in the nanosleep syscalls. Signed-off-by: Jann Horn --- reproducer, demonstrates nanosleep() clobbering the upper half of current->restart_block.poll.ufds (with TT_NATIVE=3D=3D1) and current->restart_block.poll.nfds (with 42): user@vm:~/restart_syscall$ cat restart_syscall_union_confusion.c=20 #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include #define SYSCHK(x) ({ \ typeof(x) __res =3D (x); \ if (__res =3D=3D (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) int main(void) { int child =3D SYSCHK(fork()); if (child =3D=3D 0) { struct pollfd *pollfds =3D SYSCHK(mmap((void*)0x100000000, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREP= LACE, -1, 0)); int dev_null_fd =3D SYSCHK(open("/dev/null", O_WRONLY)); for (int i=3D0; i<100; i++) pollfds[i] =3D (struct pollfd) { .fd =3D dev_null_fd, .events =3D POL= LOUT }; errno =3D 0; int res =3D poll(NULL, 0, 2000); printf("poll =3D %d (%m)\n", res); // this writes current->restart_block.nanosleep.{type,rmtp} struct timespec ts_sleep =3D { .tv_nsec =3D 1000 }; SYSCHK(nanosleep(&ts_sleep, (void*)42UL)); errno =3D 0; int ret =3D syscall(__NR_restart_syscall); if (ret =3D=3D -1 && errno =3D=3D EINTR) { printf("restart_syscall() returned EINTR, probably do_no_restart_sysc= all\n"); } else { printf("restart_syscall() =3D %d (%m)\n", ret); } for (int i=3D0; i<50; i++) printf("pollfds[%d].revents =3D 0x%x\n", i, pollfds[i].revents); exit(0); } else { // parent sleep(1); printf("sending SIGSTOP\n"); kill(child, SIGSTOP); sleep(1); printf("sending SIGCONT\n"); kill(child, SIGCONT); printf("waiting for child...\n"); int status; SYSCHK(waitpid(child, &status, 0)); } } user@vm:~/restart_syscall$ gcc -o restart_syscall_union_confusion restart_s= yscall_union_confusion.c=20 user@vm:~/restart_syscall$ ./restart_syscall_union_confusion=20 sending SIGSTOP sending SIGCONT waiting for child... poll =3D 0 (Success) restart_syscall() =3D 42 (Success) pollfds[0].revents =3D 0x4 pollfds[1].revents =3D 0x4 pollfds[2].revents =3D 0x4 pollfds[3].revents =3D 0x4 pollfds[4].revents =3D 0x4 pollfds[5].revents =3D 0x4 pollfds[6].revents =3D 0x4 pollfds[7].revents =3D 0x4 pollfds[8].revents =3D 0x4 pollfds[9].revents =3D 0x4 pollfds[10].revents =3D 0x4 pollfds[11].revents =3D 0x4 pollfds[12].revents =3D 0x4 pollfds[13].revents =3D 0x4 pollfds[14].revents =3D 0x4 pollfds[15].revents =3D 0x4 pollfds[16].revents =3D 0x4 pollfds[17].revents =3D 0x4 pollfds[18].revents =3D 0x4 pollfds[19].revents =3D 0x4 pollfds[20].revents =3D 0x4 pollfds[21].revents =3D 0x4 pollfds[22].revents =3D 0x4 pollfds[23].revents =3D 0x4 pollfds[24].revents =3D 0x4 pollfds[25].revents =3D 0x4 pollfds[26].revents =3D 0x4 pollfds[27].revents =3D 0x4 pollfds[28].revents =3D 0x4 pollfds[29].revents =3D 0x4 pollfds[30].revents =3D 0x4 pollfds[31].revents =3D 0x4 pollfds[32].revents =3D 0x4 pollfds[33].revents =3D 0x4 pollfds[34].revents =3D 0x4 pollfds[35].revents =3D 0x4 pollfds[36].revents =3D 0x4 pollfds[37].revents =3D 0x4 pollfds[38].revents =3D 0x4 pollfds[39].revents =3D 0x4 pollfds[40].revents =3D 0x4 pollfds[41].revents =3D 0x4 pollfds[42].revents =3D 0x0 pollfds[43].revents =3D 0x0 pollfds[44].revents =3D 0x0 pollfds[45].revents =3D 0x0 pollfds[46].revents =3D 0x0 pollfds[47].revents =3D 0x0 pollfds[48].revents =3D 0x0 pollfds[49].revents =3D 0x0 kernel/time/hrtimer.c | 2 ++ kernel/time/posix-stubs.c | 2 ++ kernel/time/posix-timers.c | 2 ++ 3 files changed, 6 insertions(+) diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 3ae661ab6260..e4f0e3b0c4f4 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -2126,6 +2126,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kernel_timespec _= _user *, rqtp, if (!timespec64_valid(&tu)) return -EINVAL; =20 + current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp =3D rmtp; return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, @@ -2147,6 +2148,7 @@ SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec= 32 __user *, rqtp, if (!timespec64_valid(&tu)) return -EINVAL; =20 + current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp =3D rmtp; return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c index 90ea5f373e50..828aeecbd1e8 100644 --- a/kernel/time/posix-stubs.c +++ b/kernel/time/posix-stubs.c @@ -147,6 +147,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which= _clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp =3D NULL; + current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp =3D rmtp; texp =3D timespec64_to_ktime(t); @@ -240,6 +241,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, whic= h_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp =3D NULL; + current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp =3D rmtp; texp =3D timespec64_to_ktime(t); diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 5dead89308b7..0c8a87a11b39 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -1270,6 +1270,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, whi= ch_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp =3D NULL; + current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_NATIVE : TT_NONE; current->restart_block.nanosleep.rmtp =3D rmtp; =20 @@ -1297,6 +1298,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, wh= ich_clock, int, flags, return -EINVAL; if (flags & TIMER_ABSTIME) rmtp =3D NULL; + current->restart_block.fn =3D do_no_restart_syscall; current->restart_block.nanosleep.type =3D rmtp ? TT_COMPAT : TT_NONE; current->restart_block.nanosleep.compat_rmtp =3D rmtp; =20 base-commit: 41c03ba9beea760bd2d2ac9250b09a2e192da2dc --=20 2.39.0.314.g84b9a713c41-goog