From nobody Sun May 10 14:15:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A9B3C433EF for ; Tue, 3 May 2022 08:11:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232723AbiECIOp (ORCPT ); Tue, 3 May 2022 04:14:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232663AbiECIOo (ORCPT ); Tue, 3 May 2022 04:14:44 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84866C48 for ; Tue, 3 May 2022 01:11:12 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id c14so5292761pfn.2 for ; Tue, 03 May 2022 01:11:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=l0k7GxLPIhyL0gTJPmTebLo5sk86kcYNxfTBPukwHWk=; b=wqk9tirp24ymGq2LY6Ox/38XRI3ezs+DSs5ZB0hX5L6VoonTH5M+cWn83oXFJUKnIX RaXWpKxGduurZtGEGIdengF/QY/2YgTXFecZeDhXEbUQt7EbWZnW0bimOyLEICRVfP+t rXt47/JGtHbztjhzPnvZOe/bOgRZIH1WlUMRE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=l0k7GxLPIhyL0gTJPmTebLo5sk86kcYNxfTBPukwHWk=; b=duYaVjT5+5cX9+mI3pP8ly37dJOkMeu9R6CFSX5yksJ1PqU4ZpLCDflP3IwWER0MC7 yPPtmGTzZBKJVCg5mJpzjSRRwLoOFxGIduRMju9v2AdRpMpNa00Kr+dHF+q96EtKQnnI ph90d7gPCn0BLeRMaX8kmOwxG04iH4kaGdd8Xq1HTnyMnrWhXpIVuZ8He2UuRN66fBB9 oAu5NVqAyb2Pp4ZOvEgP9WdRypUE3gf/dl2FX3S9LU0TbmS8HHgpBav6FPJnoIOjNRzt zaXYQWgdy0JsBG7Gru9usZOVldNK3IH55DM5ZOy/5OT9ankpNX6T1HneCavRttNVb+Sy F0iA== X-Gm-Message-State: AOAM531clMa7Kw18wVALCHl3YCJmw9e4O3D9l4WSnPWWh7Rg49GWqgDQ g5swa45yOv/FcwzbGjH+JS5rKQ== X-Google-Smtp-Source: ABdhPJxD6DE1CrGCzrcKsj1Uel6j/vb8gBYFKmfwZalXNg1+0AreOVWCS2szt1dr5yaZY97Rem9E3w== X-Received: by 2002:a63:a01:0:b0:39d:ac8f:d24f with SMTP id 1-20020a630a01000000b0039dac8fd24fmr12599285pgk.610.1651565471732; Tue, 03 May 2022 01:11:11 -0700 (PDT) Received: from localhost.localdomain (99-0-82-137.lightspeed.sntcca.sbcglobal.net. [99.0.82.137]) by smtp.gmail.com with ESMTPSA id h2-20020aa786c2000000b0050dc762815dsm5821069pfo.55.2022.05.03.01.11.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 May 2022 01:11:11 -0700 (PDT) From: Sargun Dhillon To: Kees Cook , LKML , Linux Containers Cc: Sargun Dhillon , Rodrigo Campos , Christian Brauner , Giuseppe Scrivano , Will Drewry , Andy Lutomirski , Tycho Andersen , Alban Crequy Subject: [PATCH v4 1/3] seccomp: Add wait_killable semantic to seccomp user notifier Date: Tue, 3 May 2022 01:09:56 -0700 Message-Id: <20220503080958.20220-2-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220503080958.20220-1-sargun@sargun.me> References: <20220503080958.20220-1-sargun@sargun.me> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) that makes it so that when notifications are received by the supervisor the notifying process will transition to wait killable semantics. Although wait killable isn't a set of semantics formally exposed to userspace, the concept is searchable. If the notifying process is signaled prior to the notification being received by the userspace agent, it will be handled as normal. One quirk about how this is handled is that the notifying process only switches to TASK_KILLABLE if it receives a wakeup from either an addfd or a signal. This is to avoid an unnecessary wakeup of the notifying task. The reasons behind switching into wait_killable only after userspace receives the notification are: * Avoiding unncessary work - Often, workloads will perform work that they may abort (request racing comes to mind). This allows for syscalls to be aborted safely prior to the notification being received by the supervisor. In this, the supervisor doesn't end up doing work that the workload does not want to complete anyways. * Avoiding side effects - We don't want the syscall to be interruptible once the supervisor starts doing work because it may not be trivial to reverse the operation. For example, unmounting a file system may take a long time, and it's hard to rollback, or treat that as reentrant. * Avoid breaking runtimes - Various runtimes do not GC when they are during a syscall (or while running native code that subsequently calls a syscall). If many notifications are blocked, and not picked up by the supervisor, this can get the application into a bad state. Signed-off-by: Sargun Dhillon --- .../userspace-api/seccomp_filter.rst | 10 +++++ include/linux/seccomp.h | 3 +- include/uapi/linux/seccomp.h | 2 + kernel/seccomp.c | 42 ++++++++++++++++++- 4 files changed, 54 insertions(+), 3 deletions(-) diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation= /userspace-api/seccomp_filter.rst index 539e9d4a4860..d1e2b9193f09 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst @@ -271,6 +271,16 @@ notifying process it will be replaced. The supervisor = can also add an FD, and respond atomically by using the ``SECCOMP_ADDFD_FLAG_SEND`` flag and the r= eturn value will be the injected file descriptor number. =20 +The notifying process can be preempted, resulting in the notification being +aborted. This can be problematic when trying to take actions on behalf of = the +notifying process that are long-running and typically retryable (mounting a +filesytem). Alternatively, at filter installation time, the +``SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV`` flag can be set. This flag make= s it +such that when a user notification is received by the supervisor, the noti= fying +process will ignore non-fatal signals until the response is sent. Signals = that +are sent prior to the notification being received by userspace are handled +normally. + It is worth noting that ``struct seccomp_data`` contains the values of reg= ister arguments to the syscall, but does not contain pointers to memory. The tas= k's memory is accessible to suitably privileged traces via ``ptrace()`` or diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 0c564e5d40ff..d31d76be4982 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -8,7 +8,8 @@ SECCOMP_FILTER_FLAG_LOG | \ SECCOMP_FILTER_FLAG_SPEC_ALLOW | \ SECCOMP_FILTER_FLAG_NEW_LISTENER | \ - SECCOMP_FILTER_FLAG_TSYNC_ESRCH) + SECCOMP_FILTER_FLAG_TSYNC_ESRCH | \ + SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) =20 /* sizeof() the first published struct seccomp_notif_addfd */ #define SECCOMP_NOTIFY_ADDFD_SIZE_VER0 24 diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 78074254ab98..0fdc6ef02b94 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -23,6 +23,8 @@ #define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) #define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) #define SECCOMP_FILTER_FLAG_TSYNC_ESRCH (1UL << 4) +/* Received notifications wait in killable state (only respond to fatal si= gnals) */ +#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV (1UL << 5) =20 /* * All BPF programs must return a 32-bit value. diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 2cb3bcd90eb3..8b416356bf43 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -201,6 +201,8 @@ static inline void seccomp_cache_prepare(struct seccomp= _filter *sfilter) * the filter can be freed. * @cache: cache of arch/syscall mappings to actions * @log: true if all actions except for SECCOMP_RET_ALLOW should be logged + * @wait_killable_recv: Put notifying process in killable state once the + * notification is received by the userspace listener. * @prev: points to a previously installed, or inherited, filter * @prog: the BPF program to evaluate * @notif: the struct that holds all notification related information @@ -221,6 +223,7 @@ struct seccomp_filter { refcount_t refs; refcount_t users; bool log; + bool wait_killable_recv; struct action_cache cache; struct seccomp_filter *prev; struct bpf_prog *prog; @@ -894,6 +897,10 @@ static long seccomp_attach_filter(unsigned int flags, if (flags & SECCOMP_FILTER_FLAG_LOG) filter->log =3D true; =20 + /* Set wait killable flag, if present. */ + if (flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) + filter->wait_killable_recv =3D true; + /* * If there is an existing filter, make it the prev and don't drop its * task reference. @@ -1081,6 +1088,12 @@ static void seccomp_handle_addfd(struct seccomp_kadd= fd *addfd, struct seccomp_kn complete(&addfd->completion); } =20 +static bool should_sleep_killable(struct seccomp_filter *match, + struct seccomp_knotif *n) +{ + return match->wait_killable_recv && n->state =3D=3D SECCOMP_NOTIFY_SENT; +} + static int seccomp_do_user_notification(int this_syscall, struct seccomp_filter *match, const struct seccomp_data *sd) @@ -1111,11 +1124,25 @@ static int seccomp_do_user_notification(int this_sy= scall, * This is where we wait for a reply from userspace. */ do { + bool wait_killable =3D should_sleep_killable(match, &n); + mutex_unlock(&match->notify_lock); - err =3D wait_for_completion_interruptible(&n.ready); + if (wait_killable) + err =3D wait_for_completion_killable(&n.ready); + else + err =3D wait_for_completion_interruptible(&n.ready); mutex_lock(&match->notify_lock); - if (err !=3D 0) + + if (err !=3D 0) { + /* + * Check to see if the notifcation got picked up and + * whether we should switch to wait killable. + */ + if (!wait_killable && should_sleep_killable(match, &n)) + continue; + goto interrupted; + } =20 addfd =3D list_first_entry_or_null(&n.addfd, struct seccomp_kaddfd, list); @@ -1485,6 +1512,9 @@ static long seccomp_notify_recv(struct seccomp_filter= *filter, mutex_lock(&filter->notify_lock); knotif =3D find_notification(filter, unotif.id); if (knotif) { + /* Reset the process to make sure it's not stuck */ + if (should_sleep_killable(filter, knotif)) + complete(&knotif->ready); knotif->state =3D SECCOMP_NOTIFY_INIT; up(&filter->notif->request); } @@ -1830,6 +1860,14 @@ static long seccomp_set_mode_filter(unsigned int fla= gs, ((flags & SECCOMP_FILTER_FLAG_TSYNC_ESRCH) =3D=3D 0)) return -EINVAL; =20 + /* + * The SECCOMP_FILTER_FLAG_WAIT_KILLABLE_SENT flag doesn't make sense + * without the SECCOMP_FILTER_FLAG_NEW_LISTENER flag. + */ + if ((flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) && + ((flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) =3D=3D 0)) + return -EINVAL; + /* Prepare the new filter before holding any locks. */ prepared =3D seccomp_prepare_user_filter(filter); if (IS_ERR(prepared)) --=20 2.25.1 From nobody Sun May 10 14:15:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED4F4C433F5 for ; Tue, 3 May 2022 08:11:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232803AbiECIPJ (ORCPT ); Tue, 3 May 2022 04:15:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232663AbiECIOq (ORCPT ); Tue, 3 May 2022 04:14:46 -0400 Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87B26167F2 for ; Tue, 3 May 2022 01:11:14 -0700 (PDT) Received: by mail-pg1-x52d.google.com with SMTP id 202so3224947pgc.9 for ; Tue, 03 May 2022 01:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Uy1D1ynmt1IvB209RVxSGUYOS4OREreCXZ3NiChVu50=; b=E2y5XQowGGGTd48FMByGXrfc0z/3P9KkfNQyjB15pahPXOI1SrG4scBxsm0bITCKjU SGRDRdUvjyv16VM3dfDCI5156o6d2eQAdn06H7KLaVuaPfHYksWUJpkQG1/Ze2cd9Byr EbkuJky7ixTt25HtQq+L5eax4uYHg/qmF0Z5w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Uy1D1ynmt1IvB209RVxSGUYOS4OREreCXZ3NiChVu50=; b=b5jSbUuftUrqVjscw5plJ662c/AlaAyn462+rjBWUV7kO5dSvYMiY0fWmqLNmUGrNN F7Ueb/wvhUh6DJbVyx4UxiPzBrXe+/OXIYGt5306mRLmA20MOzxLZiERbUxmHiirzcV6 t88DRnNKSVAX/NkpKUf+YWdg1bY0UEDzJ/ZaWLrKkcV/ANfx9M93zmEgGQOwMWy5G7aF xkYkHALi0Hxocs7nn2JoBte4tzFT7cML6OxbOLw4Mk/7WQbenTm2vEsozWx4cU7OHNcN WfTwQD/m+MAwtJ7UyueuLsN/7FmGo2mJ6vVm2agKdQ0+rS25JqT2EzAg3pKLLW4Zkpga ZXYQ== X-Gm-Message-State: AOAM532Lcoa9v3cEPR3CkOldoccf3iXzKoCKS9OZ6Z6aqXB/idqmye20 BXU6R2goEFwEPWVgMeHdwBYVCWD68ecmMQAd X-Google-Smtp-Source: ABdhPJzOC/8AW6mEMzYOaZsn4tjCI+qEc8+no77r/W35tKH7mGl1EWryWbr6xNiESXlqi83hB4k2/g== X-Received: by 2002:a05:6a00:164c:b0:50a:472a:6b0a with SMTP id m12-20020a056a00164c00b0050a472a6b0amr15030982pfc.77.1651565473813; Tue, 03 May 2022 01:11:13 -0700 (PDT) Received: from localhost.localdomain (99-0-82-137.lightspeed.sntcca.sbcglobal.net. [99.0.82.137]) by smtp.gmail.com with ESMTPSA id h2-20020aa786c2000000b0050dc762815dsm5821069pfo.55.2022.05.03.01.11.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 May 2022 01:11:12 -0700 (PDT) From: Sargun Dhillon To: Kees Cook , LKML , Linux Containers Cc: Sargun Dhillon , Rodrigo Campos , Christian Brauner , Giuseppe Scrivano , Will Drewry , Andy Lutomirski , Tycho Andersen , Alban Crequy , linux-kselftest@vger.kernel.org Subject: [PATCH v4 2/3] selftests/seccomp: Refactor get_proc_stat to split out file reading code Date: Tue, 3 May 2022 01:09:57 -0700 Message-Id: <20220503080958.20220-3-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220503080958.20220-1-sargun@sargun.me> References: <20220503080958.20220-1-sargun@sargun.me> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This splits up the get_proc_stat function to make it so we can use it as a generic helper to read the nth field from multiple different files, versus replicating the logic in multiple places. Signed-off-by: Sargun Dhillon Cc: linux-kselftest@vger.kernel.org --- tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++------ 1 file changed, 38 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/= selftests/seccomp/seccomp_bpf.c index ab340c4759a3..4fb5eda89223 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -4231,32 +4231,54 @@ TEST(user_notification_addfd_rlimit) close(memfd); } =20 -static char get_proc_stat(int pid) +/* + * gen_nth - Get the nth, space separated entry in a file. + * + * Returns the length of the read field. + * Throws error if field is zero-lengthed. + */ +static ssize_t get_nth(struct __test_metadata *_metadata, const char *path, + const unsigned int position, char **entry) { - char proc_path[100] =3D {0}; char *line =3D NULL; - size_t len =3D 0; + unsigned int i; ssize_t nread; - char status; + size_t len =3D 0; FILE *f; - int i; =20 - snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid); - f =3D fopen(proc_path, "r"); - if (f =3D=3D NULL) - ksft_exit_fail_msg("%s - Could not open %s\n", - strerror(errno), proc_path); + f =3D fopen(path, "r"); + ASSERT_NE(f, NULL) { + TH_LOG("Coud not open %s: %s", path, strerror(errno)); + } =20 - for (i =3D 0; i < 3; i++) { + for (i =3D 0; i < position; i++) { nread =3D getdelim(&line, &len, ' ', f); - if (nread <=3D 0) - ksft_exit_fail_msg("Failed to read status: %s\n", - strerror(errno)); + ASSERT_GE(nread, 0) { + TH_LOG("Failed to read %d entry in file %s", i, path); + } } + fclose(f); + + ASSERT_GT(nread, 0) { + TH_LOG("Entry in file %s had zero length", path); + } + + *entry =3D line; + return nread - 1; +} + +/* For a given PID, get the task state (D, R, etc...) */ +static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid) +{ + char proc_path[100] =3D {0}; + char status; + char *line; + + snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid); + ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1); =20 status =3D *line; free(line); - fclose(f); =20 return status; } @@ -4317,7 +4339,7 @@ TEST(user_notification_fifo) /* This spins until all of the children are sleeping */ restart_wait: for (i =3D 0; i < ARRAY_SIZE(pids); i++) { - if (get_proc_stat(pids[i]) !=3D 'S') { + if (get_proc_stat(_metadata, pids[i]) !=3D 'S') { nanosleep(&delay, NULL); goto restart_wait; } --=20 2.25.1 From nobody Sun May 10 14:15:40 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61406C433EF for ; Tue, 3 May 2022 08:12:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232749AbiECIO7 (ORCPT ); Tue, 3 May 2022 04:14:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232739AbiECIOr (ORCPT ); Tue, 3 May 2022 04:14:47 -0400 Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 346D1167F2 for ; Tue, 3 May 2022 01:11:16 -0700 (PDT) Received: by mail-pj1-x1033.google.com with SMTP id e24so14699654pjt.2 for ; Tue, 03 May 2022 01:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EfFREWnAIM+KTJeyLZzl+cb5YVjXhixI113UYmPJpgU=; b=FE6dQqFeF2gZTY/fxJTPWB3JdPTQhHX+9r8wlI0aSYk1X28Lk4ag3mMWRwbcnLEiWN eB0SrZqDCP3BCsGQt44MKK+8nZO1F1m6JJdh2+FbMYZ2sF29Gw99iSzDCRYwlu9Yujqi am+KW546ZhUumHB97jtnLW/ZHYMw0feUmFArg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EfFREWnAIM+KTJeyLZzl+cb5YVjXhixI113UYmPJpgU=; b=qHzfKvMhUwv+60cWIpAlpg4gQ6UVmo7UonAOiu9ahyJEmuo3Diuf2S/rcjeDKeVyPi 5Ue5MpdTWVnhlnn8uot8UBncUXY7IAUdNQR4Bvgda4J0aGqKRuX8FDERaDyNTC2ZY6SJ 6jmZL3GKhrVaFILwWeZnwaFg8x5WpJEsOaljUZHHW7INYUUhtX/siPn4GAoZwgZFXfbl zcI5hi1ZEc8jawyFUh6zTOvV9gHt9rjnWwx2y+5D5tH601MH+nSvSPY1Vq14NJDMmPYK xjv9IRrIz9rt11OgN6u7FsAenqzSskypSE/jgwKdABHwGRezOMplFq3suzsHhTt41Vw7 igSg== X-Gm-Message-State: AOAM533ao+wK576TXj+IewjnCvUYb1By1+qV8tvNlI+bSQIw5U7Awa5P XLMi1IX+XcxQtWzdkeUarEnBqQ== X-Google-Smtp-Source: ABdhPJwUAUSc2oe4DDeYjw+beDozzp9WC2zMm02uf1xyn/WtkL4Wnigmo1IqSAiyaqB+NGf5Cgvidw== X-Received: by 2002:a17:902:8f94:b0:14f:d9b3:52c2 with SMTP id z20-20020a1709028f9400b0014fd9b352c2mr15224110plo.103.1651565475377; Tue, 03 May 2022 01:11:15 -0700 (PDT) Received: from localhost.localdomain (99-0-82-137.lightspeed.sntcca.sbcglobal.net. [99.0.82.137]) by smtp.gmail.com with ESMTPSA id h2-20020aa786c2000000b0050dc762815dsm5821069pfo.55.2022.05.03.01.11.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 May 2022 01:11:14 -0700 (PDT) From: Sargun Dhillon To: Kees Cook , LKML , Linux Containers Cc: Sargun Dhillon , Rodrigo Campos , Christian Brauner , Giuseppe Scrivano , Will Drewry , Andy Lutomirski , Tycho Andersen , Alban Crequy Subject: [PATCH v4 3/3] selftests/seccomp: Add test for wait killable notifier Date: Tue, 3 May 2022 01:09:58 -0700 Message-Id: <20220503080958.20220-4-sargun@sargun.me> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220503080958.20220-1-sargun@sargun.me> References: <20220503080958.20220-1-sargun@sargun.me> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This verifies that if a filter is set up with the wait killable feature that it obeys the semantics that non-fatal signals are ignored during a notification after the notification is received. Cases tested: * Non-fatal signal prior to receive * Non-fatal signal during receive * Fatal signal after receive The normal signal handling is tested in user_notification_signal. That behaviour remains unchanged. On an unsupported kernel, these tests will immediately bail as it relies on a new seccomp flag. Signed-off-by: Sargun Dhillon --- tools/testing/selftests/seccomp/seccomp_bpf.c | 228 ++++++++++++++++++ 1 file changed, 228 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/= selftests/seccomp/seccomp_bpf.c index 4fb5eda89223..931dd1b6d385 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -59,6 +59,8 @@ #define SKIP(s, ...) XFAIL(s, ##__VA_ARGS__) #endif =20 +#define MIN(X, Y) ((X) < (Y) ? (X) : (Y)) + #ifndef PR_SET_PTRACER # define PR_SET_PTRACER 0x59616d61 #endif @@ -268,6 +270,10 @@ struct seccomp_notif_addfd_big { #define SECCOMP_FILTER_FLAG_TSYNC_ESRCH (1UL << 4) #endif =20 +#ifndef SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV +#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV (1UL << 5) +#endif + #ifndef seccomp int seccomp(unsigned int op, unsigned int flags, void *args) { @@ -4362,6 +4368,228 @@ TEST(user_notification_fifo) } } =20 +/* get_proc_syscall - Get the syscall in progress for a given pid + * + * Returns the current syscall number for a given process + * Returns -1 if not in syscall (running or blocked) + */ +static long get_proc_syscall(struct __test_metadata *_metadata, int pid) +{ + char proc_path[100] =3D {0}; + long ret =3D -1; + ssize_t nread; + char *line; + + snprintf(proc_path, sizeof(proc_path), "/proc/%d/syscall", pid); + nread =3D get_nth(_metadata, proc_path, 1, &line); + ASSERT_GT(nread, 0); + + if (!strncmp("running", line, MIN(7, nread))) + ret =3D strtol(line, NULL, 16); + + free(line); + return ret; +} + +/* Ensure non-fatal signals prior to receive are unmodified */ +TEST(user_notification_wait_killable_pre_notification) +{ + struct sigaction new_action =3D { + .sa_handler =3D signal_handler, + }; + int listener, status, sk_pair[2]; + pid_t pid; + long ret; + char c; + /* 100 ms */ + struct timespec delay =3D { .tv_nsec =3D 100000000 }; + + ASSERT_EQ(sigemptyset(&new_action.sa_mask), 0); + + ret =3D prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) + { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + + listener =3D user_notif_syscall( + __NR_getppid, SECCOMP_FILTER_FLAG_NEW_LISTENER | + SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV); + ASSERT_GE(listener, 0); + + /* + * Check that we can kill the process with SIGUSR1 prior to receiving + * the notification. SIGUSR1 is wired up to a custom signal handler, + * and make sure it gets called. + */ + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (pid =3D=3D 0) { + close(sk_pair[0]); + handled =3D sk_pair[1]; + + /* Setup the non-fatal sigaction without SA_RESTART */ + if (sigaction(SIGUSR1, &new_action, NULL)) { + perror("sigaction"); + exit(1); + } + + ret =3D syscall(__NR_getppid); + /* Make sure we got a return from a signal interruption */ + exit(ret !=3D -1 || errno !=3D EINTR); + } + + /* + * Make sure we've gotten to the seccomp user notification wait + * from getppid prior to sending any signals + */ + while (get_proc_syscall(_metadata, pid) !=3D __NR_getppid && + get_proc_stat(_metadata, pid) !=3D 'S') + nanosleep(&delay, NULL); + + /* Send non-fatal kill signal */ + EXPECT_EQ(kill(pid, SIGUSR1), 0); + + /* wait for process to exit (exit checks for EINTR) */ + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); +} + +/* Ensure non-fatal signals after receive are blocked */ +TEST(user_notification_wait_killable) +{ + struct sigaction new_action =3D { + .sa_handler =3D signal_handler, + }; + struct seccomp_notif_resp resp =3D {}; + struct seccomp_notif req =3D {}; + int listener, status, sk_pair[2]; + pid_t pid; + long ret; + char c; + /* 100 ms */ + struct timespec delay =3D { .tv_nsec =3D 100000000 }; + + ASSERT_EQ(sigemptyset(&new_action.sa_mask), 0); + + ret =3D prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) + { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + + listener =3D user_notif_syscall( + __NR_getppid, SECCOMP_FILTER_FLAG_NEW_LISTENER | + SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (pid =3D=3D 0) { + close(sk_pair[0]); + handled =3D sk_pair[1]; + + /* Setup the sigaction without SA_RESTART */ + if (sigaction(SIGUSR1, &new_action, NULL)) { + perror("sigaction"); + exit(1); + } + + /* Make sure that the syscall is completed (no EINTR) */ + ret =3D syscall(__NR_getppid); + exit(ret !=3D USER_NOTIF_MAGIC); + } + + /* + * Get the notification, to make move the notifying process into a + * non-preemptible (TASK_KILLABLE) state. + */ + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + /* Send non-fatal kill signal */ + EXPECT_EQ(kill(pid, SIGUSR1), 0); + + /* + * Make sure the task enters moves to TASK_KILLABLE by waiting for + * D (Disk Sleep) state after receiving non-fatal signal. + */ + while (get_proc_stat(_metadata, pid) !=3D 'D') + nanosleep(&delay, NULL); + + resp.id =3D req.id; + resp.val =3D USER_NOTIF_MAGIC; + /* Make sure the notification is found and able to be replied to */ + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + /* + * Make sure that the signal handler does get called once we're back in + * userspace. + */ + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); + /* wait for process to exit (exit checks for USER_NOTIF_MAGIC) */ + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); +} + +/* Ensure fatal signals after receive are not blocked */ +TEST(user_notification_wait_killable_fatal) +{ + struct seccomp_notif req =3D {}; + int listener, status; + pid_t pid; + long ret; + /* 100 ms */ + struct timespec delay =3D { .tv_nsec =3D 100000000 }; + + ret =3D prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(0, ret) + { + TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); + } + + listener =3D user_notif_syscall( + __NR_getppid, SECCOMP_FILTER_FLAG_NEW_LISTENER | + SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + + if (pid =3D=3D 0) { + /* This should never complete as it should get a SIGTERM */ + syscall(__NR_getppid); + exit(1); + } + + while (get_proc_stat(_metadata, pid) !=3D 'S') + nanosleep(&delay, NULL); + + /* + * Get the notification, to make move the notifying process into a + * non-preemptible (TASK_KILLABLE) state. + */ + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + /* Kill the process with a fatal signal */ + EXPECT_EQ(kill(pid, SIGTERM), 0); + + /* + * Wait for the process to exit, and make sure the process terminated + * due to the SIGTERM signal. + */ + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFSIGNALED(status)); + EXPECT_EQ(SIGTERM, WTERMSIG(status)); +} + /* * TODO: * - expand NNP testing --=20 2.25.1