[PATCH v2 4/6] seccomp: handle multiple listeners case

Alexander Mikhalitsyn posted 6 patches 2 months, 1 week ago
There is a newer version of this series
[PATCH v2 4/6] seccomp: handle multiple listeners case
Posted by Alexander Mikhalitsyn 2 months, 1 week ago
If we have more than one listener in the tree and lower listener
wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
we must consult with upper listeners first, otherwise it is a
clear seccomp restrictions bypass scenario.

Cc: linux-kernel@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: Kees Cook <kees@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Stéphane Graber <stgraber@stgraber.org>
Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index ded3f6a6430b..262390451ff1 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
 
 		if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
 			ret = cur_ret;
+			/*
+			 * No matter what we had before in matches->filters[],
+			 * we need to overwrite it, because current action is more
+			 * restrictive than any previous one.
+			 */
 			matches->n = 1;
 			matches->filters[0] = f;
+		} else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
+			    ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
+			/*
+			 * For multiple SECCOMP_RET_USER_NOTIF results, we need to
+			 * track all filters that resulted in the same action, because
+			 * we might need to notify a few of them to get a final decision.
+			 */
+			matches->filters[matches->n++] = f;
 		}
 	}
 	return ret;
@@ -1362,8 +1375,24 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
 		return 0;
 
 	case SECCOMP_RET_USER_NOTIF:
-		if (seccomp_do_user_notification(match, &sd))
-			goto skip;
+		for (unsigned char i = 0; i < matches.n; i++) {
+			match = matches.filters[i];
+			/*
+			 * If userspace wants us to skip this syscall, do so.
+			 * But if userspace wants to continue syscall, we
+			 * must consult with the upper-level filters listeners
+			 * and act accordingly.
+			 *
+			 * Note, that if there are multiple filters returned
+			 * SECCOMP_RET_USER_NOTIF, and final result is
+			 * SECCOMP_RET_USER_NOTIF too, then seccomp_run_filters()
+			 * has populated matches.filters[] array with all of them
+			 * in order from the lowest-level (closest to a
+			 * current->seccomp.filter) to the highest-level.
+			 */
+			if (seccomp_do_user_notification(match, &sd))
+				goto skip;
+		}
 
 		return 0;
 
-- 
2.43.0

Re: [PATCH v2 4/6] seccomp: handle multiple listeners case
Posted by Aleksandr Mikhalitsyn 2 months, 1 week ago
On Tue, Dec 2, 2025 at 12:52 PM Alexander Mikhalitsyn
<aleksandr.mikhalitsyn@canonical.com> wrote:
>
> If we have more than one listener in the tree and lower listener
> wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
> we must consult with upper listeners first, otherwise it is a
> clear seccomp restrictions bypass scenario.
>
> Cc: linux-kernel@vger.kernel.org
> Cc: bpf@vger.kernel.org
> Cc: Kees Cook <kees@kernel.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Aleksa Sarai <cyphar@cyphar.com>
> Cc: Tycho Andersen <tycho@tycho.pizza>
> Cc: Andrei Vagin <avagin@gmail.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Stéphane Graber <stgraber@stgraber.org>
> Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
>  kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index ded3f6a6430b..262390451ff1 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>
>                 if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
>                         ret = cur_ret;
> +                       /*
> +                        * No matter what we had before in matches->filters[],
> +                        * we need to overwrite it, because current action is more
> +                        * restrictive than any previous one.
> +                        */
>                         matches->n = 1;
>                         matches->filters[0] = f;
> +               } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
> +                           ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {

My bad. We also have to check f->notif in there like that:

                } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
-                           ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
+                          (ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) &&
+                          f->notif) {
                        /*

After Kees's comment I have some idea about how to potentially get rid
of matches->filters static
array. I'll try to rework this.

> +                       /*
> +                        * For multiple SECCOMP_RET_USER_NOTIF results, we need to
> +                        * track all filters that resulted in the same action, because
> +                        * we might need to notify a few of them to get a final decision.
> +                        */
> +                       matches->filters[matches->n++] = f;
>                 }
>         }
>         return ret;
> @@ -1362,8 +1375,24 @@ static int __seccomp_filter(int this_syscall, const bool recheck_after_trace)
>                 return 0;
>
>         case SECCOMP_RET_USER_NOTIF:
> -               if (seccomp_do_user_notification(match, &sd))
> -                       goto skip;
> +               for (unsigned char i = 0; i < matches.n; i++) {
> +                       match = matches.filters[i];
> +                       /*
> +                        * If userspace wants us to skip this syscall, do so.
> +                        * But if userspace wants to continue syscall, we
> +                        * must consult with the upper-level filters listeners
> +                        * and act accordingly.
> +                        *
> +                        * Note, that if there are multiple filters returned
> +                        * SECCOMP_RET_USER_NOTIF, and final result is
> +                        * SECCOMP_RET_USER_NOTIF too, then seccomp_run_filters()
> +                        * has populated matches.filters[] array with all of them
> +                        * in order from the lowest-level (closest to a
> +                        * current->seccomp.filter) to the highest-level.
> +                        */
> +                       if (seccomp_do_user_notification(match, &sd))
> +                               goto skip;
> +               }
>
>                 return 0;
>
> --
> 2.43.0
>
Re: [PATCH v2 4/6] seccomp: handle multiple listeners case
Posted by Tycho Andersen 2 months, 1 week ago
On Wed, Dec 03, 2025 at 04:29:49PM +0100, Aleksandr Mikhalitsyn wrote:
> On Tue, Dec 2, 2025 at 12:52 PM Alexander Mikhalitsyn
> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >
> > If we have more than one listener in the tree and lower listener
> > wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
> > we must consult with upper listeners first, otherwise it is a
> > clear seccomp restrictions bypass scenario.
> >
> > Cc: linux-kernel@vger.kernel.org
> > Cc: bpf@vger.kernel.org
> > Cc: Kees Cook <kees@kernel.org>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Will Drewry <wad@chromium.org>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Shuah Khan <shuah@kernel.org>
> > Cc: Aleksa Sarai <cyphar@cyphar.com>
> > Cc: Tycho Andersen <tycho@tycho.pizza>
> > Cc: Andrei Vagin <avagin@gmail.com>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Stéphane Graber <stgraber@stgraber.org>
> > Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > ---
> >  kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
> >  1 file changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index ded3f6a6430b..262390451ff1 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >
> >                 if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> >                         ret = cur_ret;
> > +                       /*
> > +                        * No matter what we had before in matches->filters[],
> > +                        * we need to overwrite it, because current action is more
> > +                        * restrictive than any previous one.
> > +                        */
> >                         matches->n = 1;
> >                         matches->filters[0] = f;
> > +               } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
> > +                           ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
> 
> My bad. We also have to check f->notif in there like that:

For my own education: why is that? Shouldn't
seccomp_do_user_notification() be smart enough to catch this case (and
indeed, there is a TOCTOU if you do it here?)?

Thanks,

Tycho
Re: [PATCH v2 4/6] seccomp: handle multiple listeners case
Posted by Aleksandr Mikhalitsyn 2 months ago
On Thu, Dec 4, 2025 at 4:18 PM Tycho Andersen <tycho@kernel.org> wrote:
>
> On Wed, Dec 03, 2025 at 04:29:49PM +0100, Aleksandr Mikhalitsyn wrote:
> > On Tue, Dec 2, 2025 at 12:52 PM Alexander Mikhalitsyn
> > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > >
> > > If we have more than one listener in the tree and lower listener
> > > wants us to continue syscall (SECCOMP_USER_NOTIF_FLAG_CONTINUE)
> > > we must consult with upper listeners first, otherwise it is a
> > > clear seccomp restrictions bypass scenario.
> > >
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: bpf@vger.kernel.org
> > > Cc: Kees Cook <kees@kernel.org>
> > > Cc: Andy Lutomirski <luto@amacapital.net>
> > > Cc: Will Drewry <wad@chromium.org>
> > > Cc: Jonathan Corbet <corbet@lwn.net>
> > > Cc: Shuah Khan <shuah@kernel.org>
> > > Cc: Aleksa Sarai <cyphar@cyphar.com>
> > > Cc: Tycho Andersen <tycho@tycho.pizza>
> > > Cc: Andrei Vagin <avagin@gmail.com>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Stéphane Graber <stgraber@stgraber.org>
> > > Reviewed-by: Tycho Andersen (AMD) <tycho@kernel.org>
> > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> > > ---
> > >  kernel/seccomp.c | 33 +++++++++++++++++++++++++++++++--
> > >  1 file changed, 31 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > > index ded3f6a6430b..262390451ff1 100644
> > > --- a/kernel/seccomp.c
> > > +++ b/kernel/seccomp.c
> > > @@ -448,8 +448,21 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> > >
> > >                 if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
> > >                         ret = cur_ret;
> > > +                       /*
> > > +                        * No matter what we had before in matches->filters[],
> > > +                        * we need to overwrite it, because current action is more
> > > +                        * restrictive than any previous one.
> > > +                        */
> > >                         matches->n = 1;
> > >                         matches->filters[0] = f;
> > > +               } else if ((ACTION_ONLY(cur_ret) == ACTION_ONLY(ret)) &&
> > > +                           ACTION_ONLY(cur_ret) == SECCOMP_RET_USER_NOTIF) {
> >
> > My bad. We also have to check f->notif in there like that:
>

Hi Tycho,

sorry for the delay with a reply.

> For my own education: why is that? Shouldn't
> seccomp_do_user_notification() be smart enough to catch this case (and
> indeed, there is a TOCTOU if you do it here?)?

seccomp_do_user_notification() is smart enough to handle the case when
a listener file descriptor was closed,
but a tricky part here is that SECCOMP_RET_USER_NOTIF can be (legally)
returned by the seccomp filter
program even when there was no listener at all.

Then, as nothing prevents you from loading a program like:
    BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_USER_NOTIF)
with
     seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog) // << no
SECCOMP_FILTER_FLAG_NEW_LISTENER flag

we can easily do OOB write in matches->filters[] array, because our
limitation with 8 elements only works for those who
set the SECCOMP_FILTER_FLAG_NEW_LISTENER flag.

Actually, I decided to get rid of this array in the next version of patchset.

Hope to virtually meet you at LPC soon! ;)

Kind regards,
Alex

>
> Thanks,
>
> Tycho