[RFC PATCH v2 0/1] seal system mappings

jeffxu@chromium.org posted 1 patch 1 month, 1 week ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         | 10 ++++
arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
fs/exec.c                                     | 53 +++++++++++++++++++
include/linux/fs.h                            |  1 +
kernel/events/uprobes.c                       |  2 +-
mm/mmap.c                                     |  1 +
security/Kconfig                              | 26 +++++++++
7 files changed, 99 insertions(+), 3 deletions(-)
[RFC PATCH v2 0/1] seal system mappings
Posted by jeffxu@chromium.org 1 month, 1 week ago
From: Jeff Xu <jeffxu@chromium.org>

Seal vdso, vvar, sigpage, uprobes and vsyscall.

Those mappings are readonly or executable only, sealing can protect
them from ever changing during the life time of the process. For
complete descriptions of memory sealing, please see mseal.rst [1].

System mappings such as vdso, vvar, and sigpage (for arm) are
generated by the kernel during program initialization. These mappings
are designated as non-writable, and sealing them will prevent them
from ever becoming writeable.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [2], thus sealable.

The vdso, vvar, sigpage, and uprobe mappings all invoke the
_install_special_mapping() function. As no other mappings utilize this
function, it is logical to incorporate sealing logic within
_install_special_mapping(). This approach avoids the necessity of
modifying code across various architecture-specific implementations.

The vsyscall mapping, which has its own initialization function, is
sealed in the XONLY case, it seems to be the most common and secure
case of using vsyscall.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the mapping of vdso, vvar, and sigpage during restore
operations. Consequently, this feature cannot be universally enabled
across all systems. To address this, a kernel configuration option has
been introduced to enable or disable this functionality. Note, uprobe
is always sealed and not controlled by this kernel configuration.

I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS,
which doesn’t use CHECKPOINT_RESTORE.

[1] Documentation/userspace-api/mseal.rst
[2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/

History:
V2:
  Seal uprobe always (Oleg Nesterov)
  Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
  Rebase to linux_main

V1:
https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/

Jeff Xu (1):
  exec: seal system mappings

 .../admin-guide/kernel-parameters.txt         | 10 ++++
 arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
 fs/exec.c                                     | 53 +++++++++++++++++++
 include/linux/fs.h                            |  1 +
 kernel/events/uprobes.c                       |  2 +-
 mm/mmap.c                                     |  1 +
 security/Kconfig                              | 26 +++++++++
 7 files changed, 99 insertions(+), 3 deletions(-)

-- 
2.47.0.rc1.288.g06298d1525-goog

Re: [RFC PATCH v2 0/1] seal system mappings
Posted by Liam R. Howlett 1 month, 1 week ago
* jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]:
> From: Jeff Xu <jeffxu@chromium.org>
> 
> Seal vdso, vvar, sigpage, uprobes and vsyscall.
> 
> Those mappings are readonly or executable only, sealing can protect
> them from ever changing during the life time of the process. For
> complete descriptions of memory sealing, please see mseal.rst [1].
> 
> System mappings such as vdso, vvar, and sigpage (for arm) are
> generated by the kernel during program initialization. These mappings
> are designated as non-writable, and sealing them will prevent them
> from ever becoming writeable.
                              ^ or ever removed.

This is a pretty big deal.  Platforms are trying to make it so that vdso
is the fast path, but if they are removed then things stop using them
and it's not a problem.  This description is robbing them of the
information they need to know that, and it's not in your change log
either.

I had said before that you need to be clear about the inability to
remove the mappings that are sealed, as the description above heavily
implies that it is only stopping them from becoming writeable.

> 
> Unlike the aforementioned mappings, the uprobe mapping is not
> established during program startup. However, its lifetime is the same
> as the process's lifetime [2], thus sealable.
> 
> The vdso, vvar, sigpage, and uprobe mappings all invoke the
> _install_special_mapping() function. As no other mappings utilize this
> function, it is logical to incorporate sealing logic within
> _install_special_mapping(). This approach avoids the necessity of
> modifying code across various architecture-specific implementations.
> 
> The vsyscall mapping, which has its own initialization function, is
> sealed in the XONLY case, it seems to be the most common and secure
> case of using vsyscall.
> 
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the mapping of vdso, vvar, and sigpage during restore
> operations. Consequently, this feature cannot be universally enabled
> across all systems. To address this, a kernel configuration option has
> been introduced to enable or disable this functionality. Note, uprobe
> is always sealed and not controlled by this kernel configuration.
> 
> I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS,
> which doesn’t use CHECKPOINT_RESTORE.
> 
> [1] Documentation/userspace-api/mseal.rst
> [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
> 
> History:
> V2:
>   Seal uprobe always (Oleg Nesterov)
>   Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)

The only update to the comment I see is the pointer to mseal.rst for a
complete description?

>   Rebase to linux_main
> 
> V1:
> https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
> 
> Jeff Xu (1):
>   exec: seal system mappings
> 
>  .../admin-guide/kernel-parameters.txt         | 10 ++++
>  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
>  fs/exec.c                                     | 53 +++++++++++++++++++
>  include/linux/fs.h                            |  1 +
>  kernel/events/uprobes.c                       |  2 +-
>  mm/mmap.c                                     |  1 +
>  security/Kconfig                              | 26 +++++++++
>  7 files changed, 99 insertions(+), 3 deletions(-)
> 
> -- 
> 2.47.0.rc1.288.g06298d1525-goog
> 
Re: [RFC PATCH v2 0/1] seal system mappings
Posted by Jeff Xu 1 month, 1 week ago
On Wed, Oct 16, 2024 at 4:18 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]:
> > From: Jeff Xu <jeffxu@chromium.org>
> >
> > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> >
> > Those mappings are readonly or executable only, sealing can protect
> > them from ever changing during the life time of the process. For
> > complete descriptions of memory sealing, please see mseal.rst [1].
> >
> > System mappings such as vdso, vvar, and sigpage (for arm) are
> > generated by the kernel during program initialization. These mappings
> > are designated as non-writable, and sealing them will prevent them
> > from ever becoming writeable.
>                               ^ or ever removed.
>
> This is a pretty big deal.  Platforms are trying to make it so that vdso
> is the fast path, but if they are removed then things stop using them
> and it's not a problem.  This description is robbing them of the
> information they need to know that, and it's not in your change log
> either.
>
> I had said before that you need to be clear about the inability to
> remove the mappings that are sealed, as the description above heavily
> implies that it is only stopping them from becoming writeable.
>
The mseal.rst has the full description about memory sealing, I don't
see the point to repeat all the blocked operations in the commit
message here.

I don't know why you would think this heavily implies that only
stopping them from becoming writable, There is already reminder: **
For complete descriptions ** of memory sealing, please see mseal.rst

> >
> > Unlike the aforementioned mappings, the uprobe mapping is not
> > established during program startup. However, its lifetime is the same
> > as the process's lifetime [2], thus sealable.
> >
> > The vdso, vvar, sigpage, and uprobe mappings all invoke the
> > _install_special_mapping() function. As no other mappings utilize this
> > function, it is logical to incorporate sealing logic within
> > _install_special_mapping(). This approach avoids the necessity of
> > modifying code across various architecture-specific implementations.
> >
> > The vsyscall mapping, which has its own initialization function, is
> > sealed in the XONLY case, it seems to be the most common and secure
> > case of using vsyscall.
> >
> > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > alter the mapping of vdso, vvar, and sigpage during restore
> > operations. Consequently, this feature cannot be universally enabled
> > across all systems. To address this, a kernel configuration option has
> > been introduced to enable or disable this functionality. Note, uprobe
> > is always sealed and not controlled by this kernel configuration.
> >
> > I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS,
> > which doesn’t use CHECKPOINT_RESTORE.
> >
> > [1] Documentation/userspace-api/mseal.rst
> > [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
> >
> > History:
> > V2:
> >   Seal uprobe always (Oleg Nesterov)
> >   Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov)
>
> The only update to the comment I see is the pointer to mseal.rst for a
> complete description?
>
> >   Rebase to linux_main
> >
> > V1:
> > https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/
> >
> > Jeff Xu (1):
> >   exec: seal system mappings
> >
> >  .../admin-guide/kernel-parameters.txt         | 10 ++++
> >  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
> >  fs/exec.c                                     | 53 +++++++++++++++++++
> >  include/linux/fs.h                            |  1 +
> >  kernel/events/uprobes.c                       |  2 +-
> >  mm/mmap.c                                     |  1 +
> >  security/Kconfig                              | 26 +++++++++
> >  7 files changed, 99 insertions(+), 3 deletions(-)
> >
> > --
> > 2.47.0.rc1.288.g06298d1525-goog
> >
Re: [RFC PATCH v2 0/1] seal system mappings
Posted by Liam R. Howlett 1 month, 1 week ago
* Jeff Xu <jeffxu@chromium.org> [241016 20:59]:
> On Wed, Oct 16, 2024 at 4:18 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]:
> > > From: Jeff Xu <jeffxu@chromium.org>
> > >
> > > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> > >
> > > Those mappings are readonly or executable only, sealing can protect
> > > them from ever changing during the life time of the process. For
> > > complete descriptions of memory sealing, please see mseal.rst [1].
> > >
> > > System mappings such as vdso, vvar, and sigpage (for arm) are
> > > generated by the kernel during program initialization. These mappings
> > > are designated as non-writable, and sealing them will prevent them
> > > from ever becoming writeable.
> >                               ^ or ever removed.
> >
> > This is a pretty big deal.  Platforms are trying to make it so that vdso
> > is the fast path, but if they are removed then things stop using them
> > and it's not a problem.  This description is robbing them of the
> > information they need to know that, and it's not in your change log
> > either.
> >
> > I had said before that you need to be clear about the inability to
> > remove the mappings that are sealed, as the description above heavily
> > implies that it is only stopping them from becoming writeable.
> >
> The mseal.rst has the full description about memory sealing, I don't
> see the point to repeat all the blocked operations in the commit
> message here.

That's not what I said or asked for.

It's three words to add.

> 
> I don't know why you would think this heavily implies that only
> stopping them from becoming writable, 

Because you left it out of the description that states what this does,
and it's three words that adds a whole lot of context.

I told you to add that detail because it is _way_ more important to
almost everyone that you are making mappings exist for the lifetime of
the process than the fact that not writeable mappings will still be not
writeable.

One points out the fundamental change to the mappings you list while the
other tells people how things won't change.  I get that not changing is
the point of the patch, but it's not the only thing it does.

> There is already reminder: **
> For complete descriptions ** of memory sealing, please see mseal.rst

Getting good code reviews is difficult.  There are very few people who
do a good job of the very few people who do it at all.  If you want
people to read your code, then you need to spoon feed it to them.

You have just put up a huge barrier to someone doing a code review at
all, let alone a good code review.  So if you want good code, you need
to provide good information with a low bar of entrance.  We can all
agree that good review creates good code (well, better code, at least).

What you have done here is limited the number of people who are willing
to even entertain the idea of looking at the following patch.  And of
the people who do read the patch, probably close to zero will read that
document.

It's (usually) nothing personal, it's just human nature and a function
of time.

Thanks,
Liam
Re: [RFC PATCH v2 0/1] seal system mappings
Posted by Jeff Xu 2 weeks, 1 day ago
Hi Liam

On Wed, Oct 16, 2024 at 7:03 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Jeff Xu <jeffxu@chromium.org> [241016 20:59]:
> > On Wed, Oct 16, 2024 at 4:18 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > >
> > > * jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]:
> > > > From: Jeff Xu <jeffxu@chromium.org>
> > > >
> > > > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> > > >
> > > > Those mappings are readonly or executable only, sealing can protect
> > > > them from ever changing during the life time of the process. For
> > > > complete descriptions of memory sealing, please see mseal.rst [1].
> > > >
I will mention unmap  in the above sentence.

> > > > System mappings such as vdso, vvar, and sigpage (for arm) are
> > > > generated by the kernel during program initialization. These mappings
> > > > are designated as non-writable, and sealing them will prevent them
> > > > from ever becoming writeable.
> > >                               ^ or ever removed.
> > >
This section is about the mappings (vdso, etc)  created during program
initialization vs later time as uprobe, I will revise  to make it
clearer to the reader.

Thanks
-Jeff