If we have more than one listener in the tree and lower listener
wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
we must consult with upper listeners first, otherwise it is a
clear seccomp restrictions bypass scenario.
Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: Kees Cook <kees@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stéphane Graber <stgraber@stgraber.org>
Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ded3f6a6430b..262390451ff1 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
ret = cur_ret;
+ /*
+ * No matter what we had before in matches->filters[],
+ * we need to overwrite it, because current action is more
+ * restrictive than any previous one.
+ */
matches->n = 1;
matches->filters[0] = f;
+ } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
+ ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
+ /*
+ * For multiple SECCOMP_RET_USER_NOTIF results, we need to
+ * track all filters that resulted in the same action, because
+ * we might need to notify a few of them to get a final decision.
+ */
+ matches->filters[matches->n++] = f;
}
}
return ret;
@@ -1362,8 +1375,24 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
return 0;
case SECCOMP_RET_USER_NOTIF:
- if (seccomp_do_user_notification(match, &sd))
- goto skip;
+ for (unsigned char i = 0; i < matches.n; i++) {
+ match = matches.filters[i];
+ /*
+ * If userspace wants us to skip this syscall, do so.
+ * But if userspace wants to continue syscall, we
+ * must consult with the upper-level filters listeners
+ * and act accordingly.
+ *
+ * Note, that if there are multiple filters returned
+ * SECCOMP_RET_USER_NOTIF, and final result is
+ * SECCOMP_RET_USER_NOTIF too, then seccomp_run_filters()
+ * has populated matches.filters[] array with all of them
+ * in order from the lowest-level (closest to a
+ * current->seccomp.filter) to the highest-level.
+ */
+ if (seccomp_do_user_notification(match, &sd))
+ goto skip;
+ }
return 0;
--
2.43.0
On Tue, Dec 2, 2025 at 12:52 PM Alexander Mikhalitsyn
<aleksandr.mikhalitsyn@canonical.com> wrote:
>
> If we have more than one listener in the tree and lower listener
> wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
> we must consult with upper listeners first, otherwise it is a
> clear seccomp restrictions bypass scenario.
>
> Cc: linux-kernel@vger.kernel.org
> Cc: bpf@vger.kernel.org
> Cc: Kees Cook <kees@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Aleksa Sarai <cyphar@cyphar.com>
> Cc: Tycho Andersen <tycho@tycho.pizza>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Stéphane Graber <stgraber@stgraber.org>
> Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
> kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
> 1 file changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index ded3f6a6430b..262390451ff1 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>
> if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> ret = cur_ret;
> + /*
> + * No matter what we had before in matches->filters[],
> + * we need to overwrite it, because current action is more
> + * restrictive than any previous one.
> + */
> matches->n = 1;
> matches->filters[0] = f;
> + } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
> + ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
My bad. We also have to check f->notif in there like that:
} else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
- ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
+ (ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) &&
+ f->notif) {
/*
After Kees's comment I have some idea about how to potentially get rid
of matches->filters static
array. I'll try to rework this.
> + /*
> + * For multiple SECCOMP_RET_USER_NOTIF results, we need to
> + * track all filters that resulted in the same action, because
> + * we might need to notify a few of them to get a final decision.
> + */
> + matches->filters[matches->n++] = f;
> }
> }
> return ret;
> @@ -1362,8 +1375,24 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
> return 0;
>
> case SECCOMP_RET_USER_NOTIF:
> - if (seccomp_do_user_notification(match, &sd))
> - goto skip;
> + for (unsigned char i = 0; i < matches.n; i++) {
> + match = matches.filters[i];
> + /*
> + * If userspace wants us to skip this syscall, do so.
> + * But if userspace wants to continue syscall, we
> + * must consult with the upper-level filters listeners
> + * and act accordingly.
> + *
> + * Note, that if there are multiple filters returned
> + * SECCOMP_RET_USER_NOTIF, and final result is
> + * SECCOMP_RET_USER_NOTIF too, then seccomp_run_filters()
> + * has populated matches.filters[] array with all of them
> + * in order from the lowest-level (closest to a
> + * current->seccomp.filter) to the highest-level.
> + */
> + if (seccomp_do_user_notification(match, &sd))
> + goto skip;
> + }
>
> return 0;
>
> --
> 2.43.0
>
On Wed, Dec 03, 2025 at 04:29:49PM +0100, Aleksandr Mikhalitsyn wrote:
> On Tue, Dec 2, 2025 at 12:52 PM Alexander Mikhalitsyn
> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >
> > If we have more than one listener in the tree and lower listener
> > wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
> > we must consult with upper listeners first, otherwise it is a
> > clear seccomp restrictions bypass scenario.
> >
> > Cc: linux-kernel@vger.kernel.org
> > Cc: bpf@vger.kernel.org
> > Cc: Kees Cook <kees@kernel.org>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Will Drewry <wad@chromium.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Aleksa Sarai <cyphar@cyphar.com>
> > Cc: Tycho Andersen <tycho@tycho.pizza>
> > Cc: Andrei Vagin <avagin@gmail.com>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Stéphane Graber <stgraber@stgraber.org>
> > Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> > kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
> > 1 file changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index ded3f6a6430b..262390451ff1 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >
> > if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> > ret = cur_ret;
> > + /*
> > + * No matter what we had before in matches->filters[],
> > + * we need to overwrite it, because current action is more
> > + * restrictive than any previous one.
> > + */
> > matches->n = 1;
> > matches->filters[0] = f;
> > + } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
> > + ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
>
> My bad. We also have to check f->notif in there like that:
For my own education: why is that? Shouldn't
seccomp_do_user_notification() be smart enough to catch this case (and
indeed, there is a TOCTOU if you do it here?)?
Thanks,
Tycho
On Thu, Dec 4, 2025 at 4:18 PM Tycho Andersen <tycho@kernel.org> wrote:
>
> On Wed, Dec 03, 2025 at 04:29:49PM +0100, Aleksandr Mikhalitsyn wrote:
> > On Tue, Dec 2, 2025 at 12:52 PM Alexander Mikhalitsyn
> > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > >
> > > If we have more than one listener in the tree and lower listener
> > > wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
> > > we must consult with upper listeners first, otherwise it is a
> > > clear seccomp restrictions bypass scenario.
> > >
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: bpf@vger.kernel.org
> > > Cc: Kees Cook <kees@kernel.org>
> > > Cc: Andy Lutomirski <luto@amacapital.net>
> > > Cc: Will Drewry <wad@chromium.org>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Shuah Khan <shuah@kernel.org>
> > > Cc: Aleksa Sarai <cyphar@cyphar.com>
> > > Cc: Tycho Andersen <tycho@tycho.pizza>
> > > Cc: Andrei Vagin <avagin@gmail.com>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Stéphane Graber <stgraber@stgraber.org>
> > > Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
> > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > > ---
> > > kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
> > > 1 file changed, 31 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > > index ded3f6a6430b..262390451ff1 100644
> > > --- a/kernel/seccomp.c
> > > +++ b/kernel/seccomp.c
> > > @@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> > >
> > > if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> > > ret = cur_ret;
> > > + /*
> > > + * No matter what we had before in matches->filters[],
> > > + * we need to overwrite it, because current action is more
> > > + * restrictive than any previous one.
> > > + */
> > > matches->n = 1;
> > > matches->filters[0] = f;
> > > + } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
> > > + ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
> >
> > My bad. We also have to check f->notif in there like that:
>
Hi Tycho,
sorry for the delay with a reply.
> For my own education: why is that? Shouldn't
> seccomp_do_user_notification() be smart enough to catch this case (and
> indeed, there is a TOCTOU if you do it here?)?
seccomp_do_user_notification() is smart enough to handle the case when
a listener file descriptor was closed,
but a tricky part here is that SECCOMP_RET_USER_NOTIF can be (legally)
returned by the seccomp filter
program even when there was no listener at all.
Then, as nothing prevents you from loading a program like:
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF)
with
seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog) // << no
SECCOMP_FILTER_FLAG_NEW_LISTENER flag
we can easily do OOB write in matches->filters[] array, because our
limitation with 8 elements only works for those who
set the SECCOMP_FILTER_FLAG_NEW_LISTENER flag.
Actually, I decided to get rid of this array in the next version of patchset.
Hope to virtually meet you at LPC soon! ;)
Kind regards,
Alex
>
> Thanks,
>
> Tycho
© 2016 - 2026 Red Hat, Inc.