From nobody Fri Jun 12 12:45:17 2026 Received: from mail-dl1-f51.google.com (mail-dl1-f51.google.com [74.125.82.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F7153D3007 for ; Fri, 15 May 2026 04:27:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778819265; cv=none; b=CE0SbetRQhFtDipB6OUJKmeBHGQB9CH1bd/MEK1mc+cPvFITIW+TlVfsIZu8SVALwq+7NL+5v891TPy9bJMIw9E4M0DQKUo7FjodjMgII5GNf2NSLf3zVA+TTJ9TweVvXCdFF+tXrjnqhdcOy2QzHvsw+VseA6yYR985enyw1Jg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778819265; c=relaxed/simple; bh=wDkEy7xF5tGyakmkT9NhCy3rzIv97GXO6mT5IxMONe8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sECAMlSa+fS/sWm1k5OZiQPUyI9BlNZPUxzGFoOLAkgeUui/6HTU0IhXkOakX5ngIjbcK37TBedPdEMJLA3SR19YASJGmCmrR6Ui8rUQ62bkBlVVg7jVKX+n0E9mFX4OPIo0OhJHyk/QVfIazy4NFFqqGtwATtU0RSJXJXA/KWQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=r0fGJyD+; arc=none smtp.client-ip=74.125.82.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="r0fGJyD+" Received: by mail-dl1-f51.google.com with SMTP id a92af1059eb24-1329fc4bf77so1186677c88.1 for ; Thu, 14 May 2026 21:27:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778819262; x=1779424062; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GOwSMJfIG9RiyaIscIKi1b0UXzBsMU87RObuJZ2Udfk=; b=r0fGJyD+f+tgMgQ41k90Je8xDZgjDAsEDx8CRKQtVeduWUKtidQgy9prWPU1Zyitx4 VmrZZRd6Hv4L616klrvjwDMAAut29eVTIA50fWDEAYx8Ssqof6bxqP3KLvcHKY379ua8 ET4jq7v4OWh1iIG1NowfBav6CcHnnKzSp3CWYPvzMpvk2WGHe9wsDKfDBFelQvlm+0Q3 sz3uS9u0pZ+D7Rud9xRLEzjin3fJQLODM8O8SLoHLHpDnr9IHCwXjU1YdL0+dN784KUL U+dasFEWCNXlA6RGk8kGqToD/QH8HBhiE9/u9gLfrWQ8EGHmLJrTVrV0UkeANAaijmAb h9BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778819262; x=1779424062; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=GOwSMJfIG9RiyaIscIKi1b0UXzBsMU87RObuJZ2Udfk=; b=YQgM0wluSnE1DCxOHFsjd3NHfjNYj4hA3uVmSz/0EEqgBfoSXaUbvs6MXwKkOFoPxA 3C4S4AuP+M3/eRTe0jlUDMhl2dRIdqkxKNoPZ2ProCvHj5K4BpHaYitM841mVzS5rS9Y O+ISUPT57wpDl3nvuB2uupjyuY7ntuvaSBVFN7TGJ0uN3e/WvIlbLhCXVb/h43KJZSU8 +cBcxiAWqTfn1bT/IbeRudicDTmSyT/cEjP3ztZLdEEFwRaid/8ccCt2AsVYbDcfency JbtOSOL6wIMo6P2q3Jormy+WvFmlb9+b549SJorFXqG2jCkVZIaEJK72wqn3zp5CdWqO k9WA== X-Forwarded-Encrypted: i=1; AFNElJ8z1MsWx/7rMU4qg43h7QllUYRtHKeW5d1XGtrtM0rLckRElRcz4/4aKzIRlC6eJfBoR1gx6C4kEklNagk=@vger.kernel.org X-Gm-Message-State: AOJu0YwVpgWy7NdH5d6HJJMLikNhRe0ExxwRd1Fn05Didcw8bcwuCXFV OWnn334KWi8u67AVl+RnXpt5JpLNg593aHPDtTQD9SVDyAdNeV1YrchZ X-Gm-Gg: Acq92OGsohQZmn+hh4W8swZjisrO+bIzeoKQuxzmqloHsJ3pu1Z2PwcjkKbzr1JNJj9 qGCe/dcQG8Ok9lX8sYU32A2I5xJd+FRfRAkzeYl4YAznLG/we2xWJmx21tKtZSg2Zk78BDIcEXM qLjw5a5kErpoXzH+XjasxXeonU8VI7qopgNOYazj11FlbarpUknoVODGOssYx9ziISBCfhhcIK0 k7QKhmMSzvu64xORC3DRaFwOA1X7cQcUHZ5T2fUGfKidpOAsLGpWbjEzGsJCTLRbdA8m5vHJF24 lkwuR1OKwxWlHPVCHDz+YicEYreI4OEHWdjQya3GtrhjhdaWTyjnZZutKMHxFFODnVG7oBseTd9 TinUq+FM2P5x5DX4LS3oYeE76/vbNTUZyXRp0AVqYmBPOy8HzGUd9WykjDZrxTeFagJmcAqjsCT PYyvM1NxDH7UQMWbD3i2DohNxLgnDnEI1fl8JJnoKIYAsAiVjAJy3/frsuD/JC X-Received: by 2002:a05:7300:ed85:b0:2ff:c5b1:2d6b with SMTP id 5a478bee46e88-303986bdc57mr1199354eec.32.1778819261980; Thu, 14 May 2026 21:27:41 -0700 (PDT) Received: from pop-os.. ([2601:647:6802:dbc0:5cba:6b6f:e327:bade]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-302944ffb85sm5665521eec.7.2026.05.14.21.27.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 May 2026 21:27:41 -0700 (PDT) From: Cong Wang To: Kees Cook , linux-kernel@vger.kernel.org Cc: Andy Lutomirski , Will Drewry , Christian Brauner , Cong Wang Subject: [RFC PATCH v2 1/3] seccomp: add SECCOMP_IOCTL_NOTIF_INJECT for race-free unotify Date: Thu, 14 May 2026 21:27:36 -0700 Message-ID: <20260515042738.1723296-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260515042738.1723296-1-xiyou.wangcong@gmail.com> References: <20260515042738.1723296-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Cong Wang seccomp_unotify(2) leaves a documented TOCTOU window for unprivileged supervisors: a sibling thread or CLONE_VM peer can mutate pointer-arg buffers between the supervisor's process_vm_readv() and the kernel's re-read on SECCOMP_USER_NOTIF_FLAG_CONTINUE. ptrace()/proc/pid/mem are not available to unprivileged supervisors, so today there is no race-free path for content-aware policy on CONTINUE. Reshape the v1 PIN_ARGS proposal as a PTRACE_SYSCALL-style redirect. The supervisor describes a substitute syscall via a single new ioctl SECCOMP_IOCTL_NOTIF_INJECT. The struct mirrors ptrace_syscall_info.entry (nr + args[6]) and adds a kernel-input buffer plus an args_in_buf_mask bitmap; pointer-shaped args are encoded as byte offsets into that buffer. SECCOMP_IOCTL_NOTIF_SEND with SECCOMP_USER_NOTIF_FLAG_INJECTED consumes the redirect: the trapped task wakes, dispatches into the matching kernel-mode syscall helper, and the helper's return value becomes the trapped syscall's return value. The trapped task's user mm is never re-read for the substituted syscall, so peer mutations after attach have no effect. The v1 injectable-syscall whitelist is openat (filp_open + fd_install), bind (sockfd_lookup + kernel_bind) and write (kernel_write). The substitute nr must match the trapped syscall's number, preventing a malicious supervisor from converting "task tried to bind()" into "kernel does an openat() on the task's behalf". Total cumulative buffer size is capped at a hardcoded 1 MiB; allocations are GFP_KERNEL_ACCOUNT so the trapped task's memcg pays the cost. This is intentionally a strict subset of what ptrace can already do via PTRACE_POKEDATA + PTRACE_SETREGSET. It does not add a kernel capability. It provides a listener-fd-gated, syscall-whitelisted, narrower interface to that capability suitable for unprivileged seccomp_unotify supervisors, where ptrace's privilege model (CAP_SYS_PTRACE) and per-syscall overhead (signal-stop cycle plus O(N) peer-thread coordination per syscall) are not viable. SECCOMP_IOCTL_NOTIF_ADDFD set the precedent for this kind of narrow listener-fd interface to a ptrace-overlapping capability. Lifecycle: the inject record attaches to the knotif on the INJECT ioctl and is consumed at NOTIF_SEND with FLAG_INJECTED. It is freed on listener close, task exit, supervisor changing its mind (CONTINUE without INJECTED, or plain deny), or SIGKILL of the trapped task. The whole feature lives in a new kernel/seccomp_inject.c plus a small dispatcher in kernel/seccomp.c. fs/, mm/, net/ and lib/ are unmodified. Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Cong Wang --- MAINTAINERS | 2 + include/uapi/linux/seccomp.h | 65 ++++++++ kernel/Makefile | 1 + kernel/seccomp.c | 121 ++++++++++++++- kernel/seccomp_inject.c | 281 +++++++++++++++++++++++++++++++++++ kernel/seccomp_inject.h | 65 ++++++++ 6 files changed, 533 insertions(+), 2 deletions(-) create mode 100644 kernel/seccomp_inject.c create mode 100644 kernel/seccomp_inject.h diff --git a/MAINTAINERS b/MAINTAINERS index 6aa3fe2ee1bb..120f913b58b1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24097,6 +24097,8 @@ F: Documentation/userspace-api/seccomp_filter.rst F: include/linux/seccomp.h F: include/uapi/linux/seccomp.h F: kernel/seccomp.c +F: kernel/seccomp_inject.c +F: kernel/seccomp_inject.h F: tools/testing/selftests/kselftest_harness.h F: tools/testing/selftests/kselftest_harness/ F: tools/testing/selftests/seccomp/* diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index dbfc9b37fcae..fa51790f51cf 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -108,6 +108,18 @@ struct seccomp_notif { */ #define SECCOMP_USER_NOTIF_FLAG_CONTINUE (1UL << 0) =20 +/* + * SECCOMP_USER_NOTIF_FLAG_INJECTED =E2=80=94 consume a syscall-redirect + * description previously attached via SECCOMP_IOCTL_NOTIF_INJECT for + * this notification. The trapped task wakes, dispatches into the + * matching kernel-mode syscall helper using the supervisor-provided + * (in-kernel) buffer for any pointer-shaped argument, and the helper's + * return value becomes the trapped syscall's return value. + * + * Mutually exclusive with SECCOMP_USER_NOTIF_FLAG_CONTINUE. + */ +#define SECCOMP_USER_NOTIF_FLAG_INJECTED (1UL << 1) + struct seccomp_notif_resp { __u64 id; __s64 val; @@ -137,6 +149,52 @@ struct seccomp_notif_addfd { __u32 newfd_flags; }; =20 +/** + * struct seccomp_notif_inject =E2=80=94 describe a kernel-validated sysca= ll + * to perform on behalf of a trapped task. + * + * The supervisor attaches one of these to a pending notification via + * SECCOMP_IOCTL_NOTIF_INJECT, then commits with SECCOMP_IOCTL_NOTIF_SEND + * setting SECCOMP_USER_NOTIF_FLAG_INJECTED. The kernel substitutes the + * trapped task's syscall with @nr/@args[6] and dispatches into a + * kernel-mode helper for the syscall, with any pointer argument backed + * by an offset into @buf (a kernel-side copy of the supervisor's + * bytes), so the trapped task's user mm is not re-read. + * + * @id: notification id from SECCOMP_IOCTL_NOTIF_RECV. + * @nr: substitute syscall number (matches ptrace_syscall_info.entry.nr). + * @args: substitute syscall arguments. For each i with the i'th bit + * set in @args_in_buf_mask, args[i] is interpreted as a byte + * offset into @buf rather than as the raw syscall argument + * value; the kernel materializes the corresponding pointer + * from its in-kernel copy of @buf before dispatch. Other args + * pass through as scalars (matches + * ptrace_syscall_info.entry.args). + * @buf: __user pointer to kernel-input bytes. The kernel copies + * @buf_size bytes from this buffer at SECCOMP_IOCTL_NOTIF_INJECT + * time and acts on its kernel-side copy thereafter. + * @buf_size: bytes available at @buf, capped at + * SECCOMP_NOTIF_INJECT_MAX_BYTES. + * @args_in_buf_mask: bitmask. Bit i set means args[i] is a byte + * offset into @buf rather than a raw argument value. + */ +struct seccomp_notif_inject { + __u64 id; + __u64 nr; + __u64 args[6]; + __u64 buf; + __u32 buf_size; + __u32 args_in_buf_mask; +}; + +/* + * Hard cap on the cumulative bytes a single SECCOMP_IOCTL_NOTIF_INJECT + * request may copy. Defensive bound only =E2=80=94 typical injects are a = few + * KiB (one PATH_MAX path, an argv block, etc.). Hardcoded rather than + * a sysctl: there is no legitimate reason to tune this at runtime. + */ +#define SECCOMP_NOTIF_INJECT_MAX_BYTES (1UL << 20) /* 1 MiB */ + #define SECCOMP_IOC_MAGIC '!' #define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr) #define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type) @@ -154,4 +212,11 @@ struct seccomp_notif_addfd { =20 #define SECCOMP_IOCTL_NOTIF_SET_FLAGS SECCOMP_IOW(4, __u64) =20 +/* Attach a kernel-validated syscall redirect to a pending notification. + * Consumed by SECCOMP_IOCTL_NOTIF_SEND with + * SECCOMP_USER_NOTIF_FLAG_INJECTED. + */ +#define SECCOMP_IOCTL_NOTIF_INJECT SECCOMP_IOW(5, \ + struct seccomp_notif_inject) + #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/Makefile b/kernel/Makefile index 6785982013dc..fa4c129384c7 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -106,6 +106,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) +=3D watchdog.o obj-$(CONFIG_HARDLOCKUP_DETECTOR_BUDDY) +=3D watchdog_buddy.o obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) +=3D watchdog_perf.o obj-$(CONFIG_SECCOMP) +=3D seccomp.o +obj-$(CONFIG_SECCOMP_FILTER) +=3D seccomp_inject.o obj-$(CONFIG_RELAY) +=3D relay.o obj-$(CONFIG_SYSCTL) +=3D utsname_sysctl.o obj-$(CONFIG_TASK_DELAY_ACCT) +=3D delayacct.o diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 066909393c38..a891e3e13eef 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -44,6 +44,8 @@ #include #include =20 +#include "seccomp_inject.h" + /* * When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced, it had the * wrong direction flag in the ioctl number. This is the broken one, @@ -97,6 +99,15 @@ struct seccomp_knotif { =20 /* outstanding addfd requests */ struct list_head addfd; + + /* + * Outstanding SECCOMP_IOCTL_NOTIF_INJECT record, attached by + * the supervisor under filter->notify_lock. Consumed by the + * trapped task on SECCOMP_USER_NOTIF_FLAG_INJECTED, freed in + * the same path. Also freed if the knotif is dropped without + * being injected. + */ + struct seccomp_inject_record *inject; }; =20 /** @@ -1248,8 +1259,41 @@ static int seccomp_do_user_notification(int this_sys= call, mutex_unlock(&match->notify_lock); =20 /* Userspace requests to continue the syscall. */ - if (flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) + if (flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) { + /* + * Discard any inject the supervisor attached then changed + * its mind about. + */ + seccomp_inject_record_free(n.inject); return 0; + } + + /* + * Userspace requested kernel-side syscall injection. Run the + * helper in the trapped task's context, free the record, and + * deliver the helper's result as the syscall return value. + */ + if (flags & SECCOMP_USER_NOTIF_FLAG_INJECTED) { + long inject_ret =3D -EINVAL; + + if (n.inject) { + inject_ret =3D seccomp_inject_dispatch(n.inject); + seccomp_inject_record_free(n.inject); + } + if (inject_ret < 0) + syscall_set_return_value(current, current_pt_regs(), + (int)inject_ret, 0); + else + syscall_set_return_value(current, current_pt_regs(), + 0, inject_ret); + return -1; + } + + /* + * Interrupted, listener gone, or normal deny/allow: free any + * inject the supervisor attached but never consumed. + */ + seccomp_inject_record_free(n.inject); =20 syscall_set_return_value(current, current_pt_regs(), err, ret); @@ -1632,13 +1676,22 @@ static long seccomp_notify_send(struct seccomp_filt= er *filter, if (copy_from_user(&resp, buf, sizeof(resp))) return -EFAULT; =20 - if (resp.flags & ~SECCOMP_USER_NOTIF_FLAG_CONTINUE) + if (resp.flags & ~(SECCOMP_USER_NOTIF_FLAG_CONTINUE | + SECCOMP_USER_NOTIF_FLAG_INJECTED)) + return -EINVAL; + + if ((resp.flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) && + (resp.flags & SECCOMP_USER_NOTIF_FLAG_INJECTED)) return -EINVAL; =20 if ((resp.flags & SECCOMP_USER_NOTIF_FLAG_CONTINUE) && (resp.error || resp.val)) return -EINVAL; =20 + if ((resp.flags & SECCOMP_USER_NOTIF_FLAG_INJECTED) && + (resp.error || resp.val)) + return -EINVAL; + ret =3D mutex_lock_interruptible(&filter->notify_lock); if (ret < 0) return ret; @@ -1655,6 +1708,12 @@ static long seccomp_notify_send(struct seccomp_filte= r *filter, goto out; } =20 + /* INJECTED requires a prior SECCOMP_IOCTL_NOTIF_INJECT for this id. */ + if ((resp.flags & SECCOMP_USER_NOTIF_FLAG_INJECTED) && !knotif->inject) { + ret =3D -EINVAL; + goto out; + } + ret =3D 0; knotif->state =3D SECCOMP_NOTIFY_REPLIED; knotif->error =3D resp.error; @@ -1823,6 +1882,62 @@ static long seccomp_notify_addfd(struct seccomp_filt= er *filter, return ret; } =20 +static long seccomp_notify_inject(struct seccomp_filter *filter, + struct seccomp_notif_inject __user *uinj) +{ + struct seccomp_notif_inject inj; + struct seccomp_inject_record *rec =3D NULL; + struct seccomp_knotif *knotif; + long ret; + + if (copy_from_user(&inj, uinj, sizeof(inj))) + return -EFAULT; + + ret =3D seccomp_inject_record_build(&inj, &rec); + if (ret) + return ret; + + ret =3D mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + goto err_free; + + knotif =3D find_notification(filter, inj.id); + if (!knotif) { + ret =3D -ENOENT; + goto err_unlock; + } + + if (knotif->state !=3D SECCOMP_NOTIFY_SENT) { + ret =3D -EINPROGRESS; + goto err_unlock; + } + + /* + * The supervisor cannot redirect a trapped syscall into an + * unrelated syscall (e.g. inject openat into a task trapped on + * bind), which would be a confused-deputy. + */ + if (inj.nr !=3D (u64)knotif->data->nr) { + ret =3D -ESRCH; + goto err_unlock; + } + + if (knotif->inject) { + ret =3D -EEXIST; + goto err_unlock; + } + + knotif->inject =3D rec; + mutex_unlock(&filter->notify_lock); + return 0; + +err_unlock: + mutex_unlock(&filter->notify_lock); +err_free: + seccomp_inject_record_free(rec); + return ret; +} + static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -1840,6 +1955,8 @@ static long seccomp_notify_ioctl(struct file *file, u= nsigned int cmd, return seccomp_notify_id_valid(filter, buf); case SECCOMP_IOCTL_NOTIF_SET_FLAGS: return seccomp_notify_set_flags(filter, arg); + case SECCOMP_IOCTL_NOTIF_INJECT: + return seccomp_notify_inject(filter, buf); } =20 /* Extensible Argument ioctls */ diff --git a/kernel/seccomp_inject.c b/kernel/seccomp_inject.c new file mode 100644 index 000000000000..2c2be3232f15 --- /dev/null +++ b/kernel/seccomp_inject.c @@ -0,0 +1,281 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * SECCOMP_IOCTL_NOTIF_INJECT: kernel-mode syscall execution on + * behalf of an unprivileged seccomp_unotify supervisor. + * + * The supervisor describes a syscall (nr + args[6]) and a kernel-input + * buffer; pointer-shaped args are encoded as byte offsets into that + * buffer. On SECCOMP_USER_NOTIF_FLAG_INJECTED, the trapped task wakes + * inside seccomp_do_user_notification(), dispatches into the matching + * kernel-mode syscall helper, and the helper's return value becomes + * the trapped syscall's return value. The trapped task's user mm is + * never re-read for the substituted syscall, closing the documented + * TOCTOU race against CLONE_VM peers. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "seccomp_inject.h" + +/* Per-syscall injector. Runs in the trapped task's context. */ +typedef long (*seccomp_injector_fn)(const struct seccomp_inject_record *re= c); + +static long inject_openat(const struct seccomp_inject_record *rec); +static long inject_bind(const struct seccomp_inject_record *rec); +static long inject_write(const struct seccomp_inject_record *rec); + +/* + * The injectable-syscall whitelist. Keep this small and explicit: + * each entry pins a kernel-mode helper to a specific syscall number. + * A NULL injector for an unlisted nr means "supervisor cannot inject + * this syscall" and seccomp_inject_record_build() rejects it with + * -EOPNOTSUPP. + */ +static seccomp_injector_fn seccomp_injector_for(u64 nr) +{ + switch (nr) { +#ifdef __NR_openat + case __NR_openat: + return inject_openat; +#endif +#ifdef __NR_bind + case __NR_bind: + return inject_bind; +#endif +#ifdef __NR_write + case __NR_write: + return inject_write; +#endif + default: + return NULL; + } +} + +/* + * Treat args[i] as a byte offset into rec->buf and return a kernel + * pointer at that offset, with @needed bytes required to be accessible + * (i.e. offset + needed <=3D buf_size). NULL on out-of-bounds. + */ +static const void *inject_buf_at(const struct seccomp_inject_record *rec, + unsigned int i, size_t needed) +{ + u64 off; + + if (i >=3D 6) + return NULL; + if (!(rec->args_in_buf_mask & (1U << i))) + return NULL; + off =3D rec->args[i]; + if (off > rec->buf_size) + return NULL; + if (rec->buf_size - off < needed) + return NULL; + return (const u8 *)rec->buf + off; +} + +/* NUL-terminated string at args[i] within rec->buf, or NULL on bound miss= . */ +static const char *inject_buf_cstr(const struct seccomp_inject_record *rec, + unsigned int i) +{ + const char *s; + u64 off; + size_t avail; + + if (i >=3D 6) + return NULL; + if (!(rec->args_in_buf_mask & (1U << i))) + return NULL; + off =3D rec->args[i]; + if (off >=3D rec->buf_size) + return NULL; + s =3D (const char *)rec->buf + off; + avail =3D rec->buf_size - off; + if (memchr(s, '\0', avail) =3D=3D NULL) + return NULL; + return s; +} + +static long inject_openat(const struct seccomp_inject_record *rec) +{ + int dfd =3D (int)rec->args[0]; + int flags =3D (int)rec->args[2]; + umode_t mode =3D (umode_t)rec->args[3]; + const char *path; + struct file *f; + int fd; + + /* v1: dfd must be AT_FDCWD. Other dfd values land in v2. */ + if (dfd !=3D AT_FDCWD) + return -EOPNOTSUPP; + + path =3D inject_buf_cstr(rec, 1); + if (!path) + return -EINVAL; + + f =3D filp_open(path, flags, mode); + if (IS_ERR(f)) + return PTR_ERR(f); + + fd =3D get_unused_fd_flags(flags); + if (fd < 0) { + filp_close(f, NULL); + return fd; + } + fd_install(fd, f); + return fd; +} + +static long inject_bind(const struct seccomp_inject_record *rec) +{ + int sockfd =3D (int)rec->args[0]; + int addrlen =3D (int)rec->args[2]; + struct sockaddr_storage addr; + const void *src; + struct socket *sock; + long ret; + int err; + + if (addrlen < 0 || addrlen > sizeof(addr)) + return -EINVAL; + + src =3D inject_buf_at(rec, 1, addrlen); + if (!src) + return -EINVAL; + memcpy(&addr, src, addrlen); + + sock =3D sockfd_lookup(sockfd, &err); + if (!sock) + return err; + + ret =3D kernel_bind(sock, (struct sockaddr_unsized *)&addr, addrlen); + sockfd_put(sock); + return ret; +} + +static long inject_write(const struct seccomp_inject_record *rec) +{ + int fd =3D (int)rec->args[0]; + size_t count =3D (size_t)rec->args[2]; + const void *src; + ssize_t ret; + struct fd f; + loff_t pos; + + if (count > MAX_RW_COUNT) + count =3D MAX_RW_COUNT; + + src =3D inject_buf_at(rec, 1, count); + if (!src) + return -EINVAL; + + f =3D fdget_pos(fd); + if (!fd_file(f)) + return -EBADF; + + pos =3D fd_file(f)->f_pos; + ret =3D kernel_write(fd_file(f), src, count, &pos); + if (ret > 0) + fd_file(f)->f_pos =3D pos; + fdput_pos(f); + return ret; +} + +void seccomp_inject_record_free(struct seccomp_inject_record *rec) +{ + if (!rec) + return; + kvfree(rec->buf); + kfree(rec); +} + +/* + * Validate @uinj fields (syscall_nr in whitelist, args_in_buf_mask + * within bounds, offsets in bounds, buf_size capped), copy in the + * supervisor's buf, and return a record the caller owns. Caller is + * responsible for seccomp_inject_record_free() on success path if + * the record is not subsequently attached to a knotif. + */ +long seccomp_inject_record_build(const struct seccomp_notif_inject *uinj, + struct seccomp_inject_record **out) +{ + struct seccomp_inject_record *rec; + void __user *user_buf; + long ret; + unsigned int i; + + *out =3D NULL; + + if (!seccomp_injector_for(uinj->nr)) + return -EOPNOTSUPP; + + /* args_in_buf_mask must reference args[0..5] only. */ + if (uinj->args_in_buf_mask & ~((1U << 6) - 1)) + return -EINVAL; + + if (uinj->buf_size > SECCOMP_NOTIF_INJECT_MAX_BYTES) + return -E2BIG; + + user_buf =3D (void __user *)(uintptr_t)uinj->buf; + if (uinj->buf_size && !user_buf) + return -EINVAL; + + /* Bounds-check each in-buf arg before any allocation. */ + for (i =3D 0; i < 6; i++) { + if (!(uinj->args_in_buf_mask & (1U << i))) + continue; + if (uinj->args[i] > uinj->buf_size) + return -EINVAL; + } + + rec =3D kzalloc(sizeof(*rec), GFP_KERNEL_ACCOUNT); + if (!rec) + return -ENOMEM; + + rec->nr =3D uinj->nr; + memcpy(rec->args, uinj->args, sizeof(rec->args)); + rec->args_in_buf_mask =3D uinj->args_in_buf_mask; + rec->buf_size =3D uinj->buf_size; + + if (uinj->buf_size) { + rec->buf =3D kvmalloc(uinj->buf_size, GFP_KERNEL_ACCOUNT); + if (!rec->buf) { + ret =3D -ENOMEM; + goto err; + } + if (copy_from_user(rec->buf, user_buf, uinj->buf_size)) { + ret =3D -EFAULT; + goto err; + } + } + + *out =3D rec; + return 0; + +err: + seccomp_inject_record_free(rec); + return ret; +} + +/* + * Top-level dispatch. Runs in the trapped task's context (current is + * the trapped task). Returns the kernel helper's result. + */ +long seccomp_inject_dispatch(const struct seccomp_inject_record *rec) +{ + seccomp_injector_fn injector =3D seccomp_injector_for(rec->nr); + + if (!injector) + return -EOPNOTSUPP; + return injector(rec); +} diff --git a/kernel/seccomp_inject.h b/kernel/seccomp_inject.h new file mode 100644 index 000000000000..b85e4399381c --- /dev/null +++ b/kernel/seccomp_inject.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Internal interfaces for SECCOMP_IOCTL_NOTIF_INJECT. + * + * The inject record allocation, validation, per-syscall injectors and + * the dispatch entrypoint live in kernel/seccomp_inject.c to keep + * kernel/seccomp.c focused on the notify state machine. + */ +#ifndef _KERNEL_SECCOMP_INJECT_H +#define _KERNEL_SECCOMP_INJECT_H + +#include +#include + +struct seccomp_knotif; +struct seccomp_notif_inject; + +/** + * struct seccomp_inject_record - kernel-side per-knotif inject state. + * @nr: substitute syscall number, validated against the injectable + * whitelist. + * @args: substitute syscall arguments. For each i with the i'th bit + * set in @args_in_buf_mask, args[i] is a byte offset into + * @buf rather than the raw argument value. + * @args_in_buf_mask: bitmask of pointer-shaped args backed by @buf. + * @buf_size: bytes valid in @buf. + * @buf: kernel-owned copy of the supervisor-supplied bytes; allocated + * at attach time, freed at consumption (or knotif teardown). + * + * Allocated by SECCOMP_IOCTL_NOTIF_INJECT, attached to the knotif under + * filter->notify_lock, consumed by the trapped task on + * SECCOMP_USER_NOTIF_FLAG_INJECTED, freed in the same path. Also freed + * if the knotif is dropped without being injected (listener close, + * task exit, supervisor changes its mind). + */ +struct seccomp_inject_record { + u64 nr; + u64 args[6]; + u32 args_in_buf_mask; + u32 buf_size; + void *buf; +}; + +#ifdef CONFIG_SECCOMP_FILTER + +/* Allocate, validate, and copy in @uinj. Caller takes ownership of *out. = */ +long seccomp_inject_record_build(const struct seccomp_notif_inject *uinj, + struct seccomp_inject_record **out); + +/* Free a record built by seccomp_inject_record_build(). */ +void seccomp_inject_record_free(struct seccomp_inject_record *rec); + +/* Dispatch a built record into the matching kernel-mode syscall helper. + * Runs in the trapped task's context (current is the trapped task). + * Returns the helper's result, which becomes the syscall return value. + */ +long seccomp_inject_dispatch(const struct seccomp_inject_record *rec); + +#else /* !CONFIG_SECCOMP_FILTER */ + +static inline void seccomp_inject_record_free(struct seccomp_inject_record= *rec) { } + +#endif /* CONFIG_SECCOMP_FILTER */ + +#endif /* _KERNEL_SECCOMP_INJECT_H */ --=20 2.43.0 From nobody Fri Jun 12 12:45:17 2026 Received: from mail-dy1-f179.google.com (mail-dy1-f179.google.com [74.125.82.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BBCA3D45D0 for ; Fri, 15 May 2026 04:27:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778819266; cv=none; b=XPg8CVs/on1vzfPrkVKyfIpSgpcz13i4NALrw3eTBq9CuONnlb+SHPG9PFTFMFW4v7EXPidvRP4XwniH1BesCrU2yyfO5iingiIdlpwR8NM944v9z9ZaAKrSFZiNmvNb1OLGbBJiX3Bm1xe4A4SIH7r5n8kgspEs81XsXoGMiJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778819266; c=relaxed/simple; bh=BqMODMnGhv8zZFtc9tO5za+rEDjRP2i6tid/hhV3IG8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AdOP1lW367+XojwjUPuQw6UV0u/ZQP9pAmwDZMj6iN7rXtebqipNELbHnwYbkJCa/gS84MS57tkxbzv9510wcw7aTxJhdN+g0jm8VWepiXVdUR0FzM7oQpsSgeVOJjCqWPUIR2C0nwTNu0KCV9bWyBLfIIj41IWghhrhYt0QsHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Jrp7gvsn; arc=none smtp.client-ip=74.125.82.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Jrp7gvsn" Received: by mail-dy1-f179.google.com with SMTP id 5a478bee46e88-2f7020a928eso12354172eec.1 for ; Thu, 14 May 2026 21:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778819264; x=1779424064; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gw9Caaf8QhnsI+L0BDRVE7f1uC7whTFCyE5y+aVhg2A=; b=Jrp7gvsn7D1EmfK/FRRgjBzfykNuNoB91UrSA9D7oYkvd9d6ZDpzdyNf7ZEkkfiiU4 UzIJ56Yk7lQ482pnYBsRWvCE483xTcdES5uPu44uE/sFfqIxW6JsDUNwU9kgeihJJOqH SHljV4r842+RTvbW0erZ5aycNTL4dbHQpMvxekMFJm8WYI7PDVf1hNCVREIDXTh4dY5b 9Q2FlOfhaE8dxdZwWnujFe2a1QvhlkoR+k2zb5dS4bV1ofoSavF6op3vjA6PYonFZWdt xSMoBYIr7sct9QZECq73BiSLZGJuQ5Le/4lu7FjhrRaOk4aeFQyJ70MFBARTJfj9Fm9A loPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778819264; x=1779424064; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=gw9Caaf8QhnsI+L0BDRVE7f1uC7whTFCyE5y+aVhg2A=; b=Y/Winre+uX/TqN1sT+kxOh0a4Q2DVKkkujwG7MqaraPGS1y6NaBw0mH29Aaa03/mfv JtFws+JgNAiSgKnALK1t2/dTUFTGuhBfH68tOLi4HRh8UpI52Ts6ZO914UDyr3SjMDA4 bQG5YK75LZYTpHvb9d+NndbNOdO8MiWht9E1jJX5mJNXisV4Mm7CcxF3b6hFm8rVTvnm UbMP2r+Xb218xzXiF0qzOfFq+w3ysMBX1ta3Xle9ZR9M8tJnXX06fhMQBjVO6lWKT6YC ZY//VRr+uYElgOB4VX8X7ai8I596X2jGil1+ybDYFMjq0eZfUiAi6ELwkylJ3Nmp7dhK y6fg== X-Forwarded-Encrypted: i=1; AFNElJ9XQbpPdhockad5hS5PfzPwlQ6OyHjgdfTRmjYyU6OXVJlw+Fwgrlr47QCw6pJFbnMw87vkTTAKSEPExMw=@vger.kernel.org X-Gm-Message-State: AOJu0Yy9JEmEI3reLeNRn4kcFHE3pkVkUsrifoNMcckKdqzlh1tEnSJW clnk97BsHINDF81UE6mECQ7vs0CHrrfhhdxI1Beg1q/wkMdfFup4Cd8u X-Gm-Gg: Acq92OG0URlG78nSVvpxEdxgyDDVzWianFHAx5YNIuwoN4pRZxMVdc9ouBvoQ7Gpg3M 1wfofeEuXRl6e3aoXmuH5etQoaMdgpjqnvsssYnEXEFB0QTFeG8igGtGNmfDXV1wtx8qGh83gQv mQIfJku+qk5/SfwvsPtXkbLk4NJzHQ5pV5HRJ/2Y3zrdOFoALLPYOvfLDZIpLBFNrihGnrKLm/O Kpu/AA+aKl03rSdQEgnOqi2NgYGhx4V8wvAo0clCMbfHURS6UReXTrHwMPMEGq/qEVlzmZkxWkU XWGmhT5agzGsfY7afR8MtRbSZHNYe0ooQbSJWnuqPBd1WAr89x9jBodcSaNIlDK3v065BQcf1pn MWOYUX6ux6pWnMpsSjlM57twkjpY4BxnEW91eSQ1s6bGgaKPM5xqJuxvcsd9Un1PvE2kIxJIOJj /yIzVyv0vrsoSXIbToin9cyqQ+S53kd+vsCXAbdHZhegDpaEsPd0kaooFleqvk X-Received: by 2002:a05:7300:5708:b0:2c7:3a7:c792 with SMTP id 5a478bee46e88-3039867717emr1161169eec.20.1778819263666; Thu, 14 May 2026 21:27:43 -0700 (PDT) Received: from pop-os.. ([2601:647:6802:dbc0:5cba:6b6f:e327:bade]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-302944ffb85sm5665521eec.7.2026.05.14.21.27.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 May 2026 21:27:42 -0700 (PDT) From: Cong Wang To: Kees Cook , linux-kernel@vger.kernel.org Cc: Andy Lutomirski , Will Drewry , Christian Brauner , Cong Wang Subject: [RFC PATCH v2 2/3] selftests/seccomp: add seccomp_notif_inject coverage Date: Thu, 14 May 2026 21:27:37 -0700 Message-ID: <20260515042738.1723296-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260515042738.1723296-1-xiyou.wangcong@gmail.com> References: <20260515042738.1723296-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang Add a standalone selftest binary for SECCOMP_IOCTL_NOTIF_INJECT covering positive end-to-end injection on each of the v1 syscalls (openat, bind, write) and the negative paths (unsupported syscall, trapped/inject syscall mismatch, double inject, CONTINUE+INJECTED conflict, INJECTED without prior attach). The end-to-end tests fork a child that issues a filtered syscall with one set of arguments and verify that, after the supervisor attaches a substitute via SECCOMP_IOCTL_NOTIF_INJECT and replies with SECCOMP_USER_NOTIF_FLAG_INJECTED, the kernel acts on the supervisor-supplied bytes rather than the child's user mm. Lives in its own file rather than seccomp_bpf.c since the feature is unrelated to the BPF filter machinery. Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Cong Wang --- tools/testing/selftests/seccomp/.gitignore | 1 + tools/testing/selftests/seccomp/Makefile | 2 +- .../selftests/seccomp/seccomp_notif_inject.c | 434 ++++++++++++++++++ 3 files changed, 436 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/seccomp/seccomp_notif_inject.c diff --git a/tools/testing/selftests/seccomp/.gitignore b/tools/testing/sel= ftests/seccomp/.gitignore index dec678577f9c..096f77a0b136 100644 --- a/tools/testing/selftests/seccomp/.gitignore +++ b/tools/testing/selftests/seccomp/.gitignore @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only seccomp_bpf seccomp_benchmark +seccomp_notif_inject diff --git a/tools/testing/selftests/seccomp/Makefile b/tools/testing/selft= ests/seccomp/Makefile index 584fba487037..8a4ca92f1af3 100644 --- a/tools/testing/selftests/seccomp/Makefile +++ b/tools/testing/selftests/seccomp/Makefile @@ -3,5 +3,5 @@ CFLAGS +=3D -Wl,-no-as-needed -Wall $(KHDR_INCLUDES) LDFLAGS +=3D -lpthread LDLIBS +=3D -lcap =20 -TEST_GEN_PROGS :=3D seccomp_bpf seccomp_benchmark +TEST_GEN_PROGS :=3D seccomp_bpf seccomp_benchmark seccomp_notif_inject include ../lib.mk diff --git a/tools/testing/selftests/seccomp/seccomp_notif_inject.c b/tools= /testing/selftests/seccomp/seccomp_notif_inject.c new file mode 100644 index 000000000000..3d5bb97b8df9 --- /dev/null +++ b/tools/testing/selftests/seccomp/seccomp_notif_inject.c @@ -0,0 +1,434 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Selftest for SECCOMP_IOCTL_NOTIF_INJECT. + * + * Exercises end-to-end syscall injection via the listener-fd + * supervisor pattern: the child issues a filtered syscall, the + * supervisor describes a substitute syscall in a kernel buffer, the + * kernel runs the substitute on the child's behalf using kernel-mode + * helpers (filp_open/kernel_bind/kernel_write), and the result lands + * as the child's syscall return value. The trapped task's user mm is + * never re-read for the substituted syscall. + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest_harness.h" + +#ifndef __NR_seccomp +#define __NR_seccomp 317 +#endif + +static int seccomp_install(int nr) +{ + struct sock_filter filter[] =3D { + BPF_STMT(BPF_LD | BPF_W | BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, nr, 0, 1), + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_USER_NOTIF), + BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW), + }; + struct sock_fprog prog =3D { + .len =3D (unsigned short)ARRAY_SIZE(filter), + .filter =3D filter, + }; + + return syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER, + SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog); +} + +/* ---------------------------------------------------------------- + * openat injection. + * ---------------------------------------------------------------- + */ +TEST(notif_inject_openat) +{ + char tmp_real[] =3D "/tmp/seccomp-inject-XXXXXX"; + int real_fd, listener, status; + pid_t pid; + + real_fd =3D mkstemp(tmp_real); + ASSERT_GE(real_fd, 0); + ASSERT_EQ(write(real_fd, "real-data", 9), 9); + ASSERT_EQ(close(real_fd), 0); + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_openat); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + char readback[16] =3D {0}; + int fd; + + fd =3D openat(AT_FDCWD, "/this/path/does/not/exist", O_RDONLY); + if (fd < 0) + _exit(10); + if (read(fd, readback, sizeof(readback) - 1) <=3D 0) + _exit(11); + _exit(memcmp(readback, "real-data", 9) =3D=3D 0 ? 0 : 12); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + struct seccomp_notif_inject inj =3D {0}; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + EXPECT_EQ(req.data.nr, __NR_openat); + + inj.id =3D req.id; + inj.nr =3D __NR_openat; + inj.args[0] =3D AT_FDCWD; + inj.args[1] =3D 0; + inj.args[2] =3D O_RDONLY; + inj.args[3] =3D 0; + inj.buf =3D (uintptr_t)tmp_real; + inj.buf_size =3D strlen(tmp_real) + 1; + inj.args_in_buf_mask =3D 1U << 1; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), 0) { + TH_LOG("INJECT failed: %s", strerror(errno)); + } + + resp.id =3D req.id; + resp.flags =3D SECCOMP_USER_NOTIF_FLAG_INJECTED; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_TRUE(WIFEXITED(status)); + EXPECT_EQ(WEXITSTATUS(status), 0) { + TH_LOG("child exit status: %d", WEXITSTATUS(status)); + } + + unlink(tmp_real); + close(listener); +} + +/* ---------------------------------------------------------------- + * write injection. + * Child issues write(fd, "agent-data", 10); supervisor injects + * write(fd, "kernel-bytes", 12); verify file content matches the + * kernel-injected bytes. + * ---------------------------------------------------------------- + */ +TEST(notif_inject_write) +{ + char path[] =3D "/tmp/seccomp-inject-write-XXXXXX"; + static const char inject_bytes[] =3D "kernel-bytes"; + int file_fd, listener, status; + char file_content[32]; + pid_t pid; + + file_fd =3D mkstemp(path); + ASSERT_GE(file_fd, 0); + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_write); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + ssize_t n; + + n =3D write(file_fd, "agent-data", 10); + _exit(n > 0 ? 0 : 10); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + struct seccomp_notif_inject inj =3D {0}; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + EXPECT_EQ(req.data.nr, __NR_write); + + inj.id =3D req.id; + inj.nr =3D __NR_write; + inj.args[0] =3D req.data.args[0]; /* fd, pass-through */ + inj.args[1] =3D 0; + inj.args[2] =3D strlen(inject_bytes); + inj.buf =3D (uintptr_t)inject_bytes; + inj.buf_size =3D strlen(inject_bytes); + inj.args_in_buf_mask =3D 1U << 1; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), 0); + + resp.id =3D req.id; + resp.flags =3D SECCOMP_USER_NOTIF_FLAG_INJECTED; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_TRUE(WIFEXITED(status)); + EXPECT_EQ(WEXITSTATUS(status), 0); + + memset(file_content, 0, sizeof(file_content)); + ASSERT_EQ(lseek(file_fd, 0, SEEK_SET), 0); + ASSERT_EQ(read(file_fd, file_content, sizeof(file_content) - 1), + (ssize_t)strlen(inject_bytes)); + EXPECT_EQ(memcmp(file_content, inject_bytes, strlen(inject_bytes)), 0) { + TH_LOG("file content: '%s'", file_content); + } + + close(file_fd); + unlink(path); + close(listener); +} + +/* ---------------------------------------------------------------- + * bind injection. + * ---------------------------------------------------------------- + */ +TEST(notif_inject_bind) +{ + struct sockaddr_un real_addr =3D { .sun_family =3D AF_UNIX }; + char real_path[] =3D "/tmp/seccomp-inject-bind-XXXXXX"; + int listener, status; + pid_t pid; + + mktemp(real_path); + strcpy(real_addr.sun_path, real_path); + unlink(real_addr.sun_path); + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_bind); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + struct sockaddr_un fake =3D { .sun_family =3D AF_UNIX }; + int s; + + strcpy(fake.sun_path, "/tmp/seccomp-inject-bind-fake"); + s =3D socket(AF_UNIX, SOCK_STREAM, 0); + if (s < 0) + _exit(10); + if (bind(s, (struct sockaddr *)&fake, sizeof(fake)) < 0) + _exit(11); + _exit(0); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + struct seccomp_notif_inject inj =3D {0}; + struct stat st; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + EXPECT_EQ(req.data.nr, __NR_bind); + + inj.id =3D req.id; + inj.nr =3D __NR_bind; + inj.args[0] =3D req.data.args[0]; /* sockfd, pass-through */ + inj.args[1] =3D 0; + inj.args[2] =3D sizeof(real_addr); + inj.buf =3D (uintptr_t)&real_addr; + inj.buf_size =3D sizeof(real_addr); + inj.args_in_buf_mask =3D 1U << 1; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), 0); + + resp.id =3D req.id; + resp.flags =3D SECCOMP_USER_NOTIF_FLAG_INJECTED; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_TRUE(WIFEXITED(status)); + EXPECT_EQ(WEXITSTATUS(status), 0); + + /* + * The kernel-injected path should exist; the agent's intended + * path should not. + */ + EXPECT_EQ(stat(real_path, &st), 0); + EXPECT_EQ(stat("/tmp/seccomp-inject-bind-fake", &st), -1); + + unlink(real_path); + close(listener); +} + +/* ---------------------------------------------------------------- + * Negative paths. + * ---------------------------------------------------------------- + */ +TEST(notif_inject_unsupported_syscall) +{ + int listener, status; + pid_t pid; + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_close); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + close(99); + _exit(0); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + struct seccomp_notif_inject inj =3D {0}; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + /* close() is not in the injectable whitelist. */ + inj.id =3D req.id; + inj.nr =3D __NR_close; + inj.args[0] =3D 99; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), -1); + EXPECT_EQ(errno, EOPNOTSUPP); + + /* Cleanly deny so the child can exit. */ + resp.id =3D req.id; + resp.error =3D -EBADF; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + close(listener); +} + +TEST(notif_inject_syscall_mismatch) +{ + int listener, status; + pid_t pid; + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_openat); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + openat(AT_FDCWD, "/nonexistent", O_RDONLY); + _exit(0); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + struct seccomp_notif_inject inj =3D {0}; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + /* Trapped on openat, but supervisor tries to inject bind. */ + inj.id =3D req.id; + inj.nr =3D __NR_bind; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), -1); + EXPECT_EQ(errno, ESRCH); + + resp.id =3D req.id; + resp.error =3D -EACCES; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + close(listener); +} + +TEST(notif_inject_double) +{ + char tmp_real[] =3D "/tmp/seccomp-inject-d-XXXXXX"; + int real_fd, listener, status; + pid_t pid; + + real_fd =3D mkstemp(tmp_real); + ASSERT_GE(real_fd, 0); + close(real_fd); + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_openat); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + openat(AT_FDCWD, "/nonexistent", O_RDONLY); + _exit(0); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + struct seccomp_notif_inject inj =3D {0}; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + inj.id =3D req.id; + inj.nr =3D __NR_openat; + inj.args[0] =3D AT_FDCWD; + inj.args[1] =3D 0; + inj.args[2] =3D O_RDONLY; + inj.buf =3D (uintptr_t)tmp_real; + inj.buf_size =3D strlen(tmp_real) + 1; + inj.args_in_buf_mask =3D 1U << 1; + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), 0); + + /* Second INJECT for the same id is rejected. */ + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_INJECT, &inj), -1); + EXPECT_EQ(errno, EEXIST); + + resp.id =3D req.id; + resp.flags =3D SECCOMP_USER_NOTIF_FLAG_INJECTED; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + EXPECT_EQ(waitpid(pid, &status, 0), pid); + + unlink(tmp_real); + close(listener); +} + +TEST(notif_inject_continue_pinned_conflict) +{ + int listener, status; + pid_t pid; + + ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)); + listener =3D seccomp_install(__NR_openat); + ASSERT_GE(listener, 0); + + pid =3D fork(); + ASSERT_GE(pid, 0); + if (pid =3D=3D 0) { + openat(AT_FDCWD, "/nonexistent", O_RDONLY); + _exit(0); + } + + struct seccomp_notif req =3D {0}; + struct seccomp_notif_resp resp =3D {0}; + + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); + + /* CONTINUE | INJECTED is rejected. */ + resp.id =3D req.id; + resp.flags =3D SECCOMP_USER_NOTIF_FLAG_CONTINUE | + SECCOMP_USER_NOTIF_FLAG_INJECTED; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); + EXPECT_EQ(errno, EINVAL); + + /* INJECTED without prior INJECT ioctl is rejected too. */ + resp.flags =3D SECCOMP_USER_NOTIF_FLAG_INJECTED; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); + EXPECT_EQ(errno, EINVAL); + + /* Cleanly deny so the child exits. */ + resp.flags =3D 0; + resp.error =3D -ENOENT; + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); + EXPECT_EQ(waitpid(pid, &status, 0), pid); + close(listener); +} + +TEST_HARNESS_MAIN --=20 2.43.0 From nobody Fri Jun 12 12:45:17 2026 Received: from mail-dy1-f176.google.com (mail-dy1-f176.google.com [74.125.82.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 666B53D47BF for ; Fri, 15 May 2026 04:27:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778819267; cv=none; b=JpcFkizA3fwXwPkRNWLOLD0xNte/PNejS7rYY8T5lgHg614CPVZHeH+YdUKf2Jepxv/AF1Nl0bYzDQctnfM/QcxZwII5xiPmpDbmk2UknFR5IA7zPmRJuoRxyK92FxL2w7JVqOWsetcnURT2sNfOq0niYANbzX5fPyaZYKgF+5g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778819267; c=relaxed/simple; bh=2WJm/Ku7SAg5Fkj7uYKLQ72rDvZb+fGr0aJhcZY+WZU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uMpCcVR0kyVLMldu090/zsH/ArEuGg1YjgRBsGAaJvPEHFteEL6ytcpJhYanzd/zMIfOnwmLDRMolob7ldDRbkngGuOayJPkIZrpTiE0YWGLMkj6TuoGWcA82uHAqjyqi8e42SP1uz3FYb4EXjnHtdLDz5r9L1u0vn1ujif8ZLE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bvZzmXp0; arc=none smtp.client-ip=74.125.82.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bvZzmXp0" Received: by mail-dy1-f176.google.com with SMTP id 5a478bee46e88-2c156c4a9efso12140016eec.1 for ; Thu, 14 May 2026 21:27:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778819265; x=1779424065; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=H8MHjy4DJSATyQVyoUYq05Pl2Edc8Pn1COKxWoctJow=; b=bvZzmXp0PH981GFA03HeSo1dxRqzRTqnEM9ji8L+SVbRl9wd/e4JG0UvbOl5KQnYwO GqKimNX0gXgdRo9KyHc6oA9gsS5WThUStQt+ew+rb4tutY2hXcO78c8EKMt6Wo1T30eH 47mS7ON0afXzIEXSEe1nIY6qWBP3wlj19Gh6ctsqbsEJfF9oeJD5PXHU1HcO10L10WEz /M/I6u1kq2dRvgJxHO8I2ob5CPloL2Fgv9AJ/cW+aJGYuYbfkTCJfBC0fkHJ8inzPrk+ 0uIW7PVPr6EA/lN+JYGfp3HGYODsH9uMSCj6nsZhK7ulLqQ+/pBGgNG+MY5Ya5SCAM9q X/dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778819265; x=1779424065; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=H8MHjy4DJSATyQVyoUYq05Pl2Edc8Pn1COKxWoctJow=; b=lrCRS+ZkIwNWC3BHZEfTuUbLzrQPMwrXR7BdLHLpyHt9eEdhdh++T/nYDjDDWgueq/ zuBl8b35UjlksedpFx/qLap0A9qsUd0zsSoJoYXAn/lple3AzsM6tk03kpH+VC5jnJ+p oBS5eiunUVuWUIfj5e6bVzAIhJJly5kG36ce4J5KFYRRM4Dnp7ORKtCsJ6GLMprXH6LP 6wu1xE7WUw+bnKaggY/zclK8BirTV2P0vyMbIMvinJeYvmLF8bnaH3GEoxp3FqGLQiiL 0K3pzH77V6nghzA/iNbI6Ymt7HKgi795Sk5DoEGww4FE8no8/CXkgZ59FMb/kR2N9JNy oaGg== X-Forwarded-Encrypted: i=1; AFNElJ8Z2S2S85U2R7SnDNXxsXLP2Urkairj1T1RfTHJQj6kjM3Q85/VcXWc4rVypWQU6U092xCspHGi+lauJ6o=@vger.kernel.org X-Gm-Message-State: AOJu0YwdoDfI5Wk74yTd70l31n9pyeHagt15WzFlSDFob3YazlLm2Arf 6yS6px+eLR5Cv3dU2t+GvLPdJUXkGDwKEGoJxjxszYI7q5KJNsrbD0Vm X-Gm-Gg: Acq92OGMpbzXm2kbMZev8KJRO4Lft1DYRCWwBY03OaLM+NfjwPFSTjKkc9ukPwcEOdt dceJ121Tik4Kfphbee4c6QYGtjiDFPy3HMfYt2bdgcsYPH6WiYc49YXjudkZzorJHqOMEtinEd3 RrBoKHPDmSdcHsqtAeDo9y5WoFx2o59yfpWWoub4pqGx7rb82oKtkBjHdi1di+OLB664rffSuyu xM3hopui142UBwTJG/5i5qDmvt8L3Pi9rmKBMJfKN6+l6lnONaYuKfipTUd7AIFL7U7JI2yi2FF 04QyRnaqhgXlDVsx9mJj2av3BaIX7wJwEB1UkgOW4tEpi5/QF5jfLP6IiN6Hglt+I3mmd1BceLc ZcOCcIz1cswhaOnIuP4Y/QGG3fDXWpyv/jDjXTCXcxE06kz21eqGB0JBPqrr+Rv8b+vRvkc/JVV aOUyq4RdvYu8z2/0zFEMCWeTAjJgP5+a8AB+Q5ezjNxADdauXMtmlKGUDrWz+E X-Received: by 2002:a05:7300:5790:b0:2ed:e14:e954 with SMTP id 5a478bee46e88-303986721c1mr1110447eec.30.1778819265349; Thu, 14 May 2026 21:27:45 -0700 (PDT) Received: from pop-os.. ([2601:647:6802:dbc0:5cba:6b6f:e327:bade]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-302944ffb85sm5665521eec.7.2026.05.14.21.27.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 May 2026 21:27:44 -0700 (PDT) From: Cong Wang To: Kees Cook , linux-kernel@vger.kernel.org Cc: Andy Lutomirski , Will Drewry , Christian Brauner , Cong Wang Subject: [RFC PATCH v2 3/3] Documentation: seccomp: document SECCOMP_IOCTL_NOTIF_INJECT Date: Thu, 14 May 2026 21:27:38 -0700 Message-ID: <20260515042738.1723296-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260515042738.1723296-1-xiyou.wangcong@gmail.com> References: <20260515042738.1723296-1-xiyou.wangcong@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Cong Wang Add a "Syscall Injection" subsection to the user-notification chapter covering the motivation (closing the documented TOCTOU window for unprivileged supervisors), the substitute-syscall flow via SECCOMP_IOCTL_NOTIF_INJECT and SECCOMP_USER_NOTIF_FLAG_INJECTED, the ptrace-shaped struct layout, the kernel-buffer-backed pointer arguments, the listener-fd capability model, and the relationship to ptrace's existing register/memory manipulation. Assisted-by: Claude:claude-opus-4.6 Signed-off-by: Cong Wang --- .../userspace-api/seccomp_filter.rst | 42 +++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation= /userspace-api/seccomp_filter.rst index cff0fa7f3175..9057505b2b92 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst @@ -289,6 +289,48 @@ above in this document: all arguments being read from = the tracee's memory should be read into the tracer's memory before any policy decisions are ma= de. This allows for an atomic decision on syscall arguments. =20 +Syscall Injection +----------------- + +For unprivileged supervisors, ``ptrace()`` and ``/proc/pid/mem`` are not +available, and reading the tracee's memory via ``process_vm_readv()`` +remains racy: a sibling thread or ``CLONE_VM`` peer can mutate pointer-arg +buffers between the supervisor's read and the kernel's re-read on +``SECCOMP_USER_NOTIF_FLAG_CONTINUE``. ``SECCOMP_IOCTL_NOTIF_INJECT`` +closes that race by letting the supervisor describe a substitute syscall +(``nr`` plus ``args[6]``, mirroring ``ptrace_syscall_info.entry``) whose +pointer arguments are backed by a kernel-side copy of supervisor-supplied +bytes rather than the tracee's user mm. + +The supervisor receives a notification as today, then issues +``ioctl(SECCOMP_IOCTL_NOTIF_INJECT, &inj)`` with a +``struct seccomp_notif_inject`` describing the substitute. Each pointer- +shaped argument is encoded as a byte offset into ``inj.buf`` (a user +buffer the kernel copies in at attach time); the ``args_in_buf_mask`` +field flags which ``args[i]`` are offsets versus raw scalar values. +The substitute is consumed by ``ioctl(SECCOMP_IOCTL_NOTIF_SEND)`` with +``SECCOMP_USER_NOTIF_FLAG_INJECTED``: the trapped task wakes, dispatches +into the matching kernel-mode helper (``filp_open`` for ``openat``, +``kernel_bind`` for ``bind``, ``kernel_write`` for ``write``), and the +helper's return value becomes the trapped syscall's return value. + +The trapped task's user mm is never re-read for the substituted syscall, +so peer mutations after ``SECCOMP_IOCTL_NOTIF_INJECT`` returns have no +effect. + +Injection is gated by listener-fd possession (the same capability model +as the rest of the user-notification interface) and by an explicit +kernel-side whitelist of injectable syscalls. The substitute ``nr`` must +match the trapped syscall's number, preventing a malicious supervisor +from converting "task tried to bind()" into "kernel does an openat() on +the task's behalf". + +This is intentionally a strict subset of ``PTRACE_SYSCALL`` + +``PTRACE_POKEDATA`` + ``PTRACE_SETREGSET``: the same kernel capability +(running a syscall in the trapped task's context with kernel-validated +args), exposed to unprivileged listener-fd-holding supervisors with a +narrowed surface and no need for ``CAP_SYS_PTRACE``. + Sysctls =3D=3D=3D=3D=3D=3D=3D =20 --=20 2.43.0