[RFC PATCH v1 0/1] seal system mappings

jeffxu@chromium.org posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
.../admin-guide/kernel-parameters.txt         |  9 ++++
arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
fs/exec.c                                     | 53 +++++++++++++++++++
include/linux/fs.h                            |  1 +
mm/mmap.c                                     |  1 +
security/Kconfig                              | 26 +++++++++
6 files changed, 97 insertions(+), 2 deletions(-)
[RFC PATCH v1 0/1] seal system mappings
Posted by jeffxu@chromium.org 1 month, 3 weeks ago
From: Jeff Xu <jeffxu@google.com>

Seal vdso, vvar, sigpage, uprobes and vsyscall.

Those mappings are readonly or executable only, sealing can protect
them from ever changing during the life time of the process.

System mappings such as vdso, vvar, and sigpage (for arm) are
generated by the kernel during program initialization. These mappings
are designated as non-writable, and sealing them will prevent them
from ever becoming writeable.

Unlike the aforementioned mappings, the uprobe mapping is not
established during program startup. However, its lifetime is the same
as the process's lifetime [1], thus sealable.

The vdso, vvar, sigpage, and uprobe mappings all invoke the
_install_special_mapping() function. As no other mappings utilize this
function, it is logical to incorporate sealing logic within
_install_special_mapping(). This approach avoids the necessity of
modifying code across various architecture-specific implementations.

The vsyscall mapping, which has its own initialization function, is
sealed in the XONLY case, it seems to be the most common and secure
case of using vsyscall.

It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the mapping of vdso, vvar, and sigpage during restore
operations. Consequently, this feature cannot be universally enabled
across all systems. To address this, a kernel configuration option has
been introduced to enable or disable this functionality. I tested
CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
CHECKPOINT_RESTORE, to verify the sealing works.

[1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/

Jeff Xu (1):
  exec: seal system mappings

 .../admin-guide/kernel-parameters.txt         |  9 ++++
 arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
 fs/exec.c                                     | 53 +++++++++++++++++++
 include/linux/fs.h                            |  1 +
 mm/mmap.c                                     |  1 +
 security/Kconfig                              | 26 +++++++++
 6 files changed, 97 insertions(+), 2 deletions(-)

-- 
2.47.0.rc0.187.ge670bccf7e-goog

Re: [RFC PATCH v1 0/1] seal system mappings
Posted by Liam R. Howlett 1 month, 2 weeks ago
* jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]:
> From: Jeff Xu <jeffxu@google.com>
> 
> Seal vdso, vvar, sigpage, uprobes and vsyscall.
> 
> Those mappings are readonly or executable only, sealing can protect
> them from ever changing during the life time of the process.
> 
> System mappings such as vdso, vvar, and sigpage (for arm) are
> generated by the kernel during program initialization. These mappings
> are designated as non-writable, and sealing them will prevent them
> from ever becoming writeable.

But it also means they cannot be unmapped, right?

I'm not saying it's a thing people should, but recent conversations
with the ppc people seem to indicate that people do 'things' to the vdso
such as removing it.

Won't this change mean they cannot do that, at least if mseal is enabled
on ppc64?  In which case we would have a different special mapping for
powerpc, or any other platform that wants to be able to unmap the vdso
(or vvar or whatever else?)

In fact, I came across people removing the vdso to catch callers to
those functions which they didn't want to allow.  In this case enabling
the security of mseal would not allow them to stop applications from
vdso calls.  Again, I'm not saying this is a good (or bad) idea but it
happening.

> 
> Unlike the aforementioned mappings, the uprobe mapping is not
> established during program startup. However, its lifetime is the same
> as the process's lifetime [1], thus sealable.
> 
> The vdso, vvar, sigpage, and uprobe mappings all invoke the
> _install_special_mapping() function. As no other mappings utilize this
> function, it is logical to incorporate sealing logic within
> _install_special_mapping(). This approach avoids the necessity of
> modifying code across various architecture-specific implementations.
> 
> The vsyscall mapping, which has its own initialization function, is
> sealed in the XONLY case, it seems to be the most common and secure
> case of using vsyscall.
> 
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the mapping of vdso, vvar, and sigpage during restore
> operations. Consequently, this feature cannot be universally enabled
> across all systems. To address this, a kernel configuration option has
> been introduced to enable or disable this functionality. I tested
> CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
> CHECKPOINT_RESTORE, to verify the sealing works.

I am hesitant to say that CRIU is the only user of moving the vdso, as
the ppc people wanted the ability for the fallback methods to still
function when the vdso was unmapped.

I am not sure we can change the user expected behaviour based on a
configuration option; users may be able to mmap/munmap but may not be
able to boot their own kernel, but maybe it's okay?

> 
> [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
> 
> Jeff Xu (1):
>   exec: seal system mappings
> 
>  .../admin-guide/kernel-parameters.txt         |  9 ++++
>  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
>  fs/exec.c                                     | 53 +++++++++++++++++++
>  include/linux/fs.h                            |  1 +
>  mm/mmap.c                                     |  1 +
>  security/Kconfig                              | 26 +++++++++
>  6 files changed, 97 insertions(+), 2 deletions(-)
> 
> -- 
> 2.47.0.rc0.187.ge670bccf7e-goog
> 
Re: [RFC PATCH v1 0/1] seal system mappings
Posted by Jeff Xu 1 month, 2 weeks ago
Hi Liam,

On Mon, Oct 7, 2024 at 7:19 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]:
> > From: Jeff Xu <jeffxu@google.com>
> >
> > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> >
> > Those mappings are readonly or executable only, sealing can protect
> > them from ever changing during the life time of the process.
> >
> > System mappings such as vdso, vvar, and sigpage (for arm) are
> > generated by the kernel during program initialization. These mappings
> > are designated as non-writable, and sealing them will prevent them
> > from ever becoming writeable.
>
> But it also means they cannot be unmapped, right?
>
> I'm not saying it's a thing people should, but recent conversations
> with the ppc people seem to indicate that people do 'things' to the vdso
> such as removing it.
>
> Won't this change mean they cannot do that, at least if mseal is enabled
> on ppc64?  In which case we would have a different special mapping for
> powerpc, or any other platform that wants to be able to unmap the vdso
> (or vvar or whatever else?)
>
> In fact, I came across people removing the vdso to catch callers to
> those functions which they didn't want to allow.  In this case enabling
> the security of mseal would not allow them to stop applications from
> vdso calls.  Again, I'm not saying this is a good (or bad) idea but it
> happening.
>
> >
> > Unlike the aforementioned mappings, the uprobe mapping is not
> > established during program startup. However, its lifetime is the same
> > as the process's lifetime [1], thus sealable.
> >
> > The vdso, vvar, sigpage, and uprobe mappings all invoke the
> > _install_special_mapping() function. As no other mappings utilize this
> > function, it is logical to incorporate sealing logic within
> > _install_special_mapping(). This approach avoids the necessity of
> > modifying code across various architecture-specific implementations.
> >
> > The vsyscall mapping, which has its own initialization function, is
> > sealed in the XONLY case, it seems to be the most common and secure
> > case of using vsyscall.
> >
> > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > alter the mapping of vdso, vvar, and sigpage during restore
> > operations. Consequently, this feature cannot be universally enabled
> > across all systems. To address this, a kernel configuration option has
> > been introduced to enable or disable this functionality. I tested
> > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
> > CHECKPOINT_RESTORE, to verify the sealing works.
>
> I am hesitant to say that CRIU is the only user of moving the vdso, as
> the ppc people wanted the ability for the fallback methods to still
> function when the vdso was unmapped.
>
> I am not sure we can change the user expected behaviour based on a
> configuration option; users may be able to mmap/munmap but may not be
> able to boot their own kernel, but maybe it's okay?
>
The text doesn't say CRIU is the **only** feature that is not
compatible with this.

The default config is "CONFIG_SEAL_SYSTEM_MAPPINGS_NEVER", and
distribution needs to opt-in for this feature, such as ChromeOS or
Android or other safe-by-default systems that doesn't allow to unmap
or remap vdso in production build.

Thanks
-Jeff


> >
> > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
> >
> > Jeff Xu (1):
> >   exec: seal system mappings
> >
> >  .../admin-guide/kernel-parameters.txt         |  9 ++++
> >  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
> >  fs/exec.c                                     | 53 +++++++++++++++++++
> >  include/linux/fs.h                            |  1 +
> >  mm/mmap.c                                     |  1 +
> >  security/Kconfig                              | 26 +++++++++
> >  6 files changed, 97 insertions(+), 2 deletions(-)
> >
> > --
> > 2.47.0.rc0.187.ge670bccf7e-goog
> >
Re: [RFC PATCH v1 0/1] seal system mappings
Posted by Liam R. Howlett 1 month, 2 weeks ago
* Jeff Xu <jeffxu@chromium.org> [241008 11:01]:
> Hi Liam,
> 
> On Mon, Oct 7, 2024 at 7:19 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]:
> > > From: Jeff Xu <jeffxu@google.com>
> > >
> > > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> > >
> > > Those mappings are readonly or executable only, sealing can protect
> > > them from ever changing during the life time of the process.
> > >
> > > System mappings such as vdso, vvar, and sigpage (for arm) are
> > > generated by the kernel during program initialization. These mappings
> > > are designated as non-writable, and sealing them will prevent them
> > > from ever becoming writeable.
> >
> > But it also means they cannot be unmapped, right?
> >
> > I'm not saying it's a thing people should, but recent conversations
> > with the ppc people seem to indicate that people do 'things' to the vdso
> > such as removing it.
> >
> > Won't this change mean they cannot do that, at least if mseal is enabled
> > on ppc64?  In which case we would have a different special mapping for
> > powerpc, or any other platform that wants to be able to unmap the vdso
> > (or vvar or whatever else?)
> >
> > In fact, I came across people removing the vdso to catch callers to
> > those functions which they didn't want to allow.  In this case enabling
> > the security of mseal would not allow them to stop applications from
> > vdso calls.  Again, I'm not saying this is a good (or bad) idea but it
> > happening.
> >
> > >
> > > Unlike the aforementioned mappings, the uprobe mapping is not
> > > established during program startup. However, its lifetime is the same
> > > as the process's lifetime [1], thus sealable.
> > >
> > > The vdso, vvar, sigpage, and uprobe mappings all invoke the
> > > _install_special_mapping() function. As no other mappings utilize this
> > > function, it is logical to incorporate sealing logic within
> > > _install_special_mapping(). This approach avoids the necessity of
> > > modifying code across various architecture-specific implementations.
> > >
> > > The vsyscall mapping, which has its own initialization function, is
> > > sealed in the XONLY case, it seems to be the most common and secure
> > > case of using vsyscall.
> > >
> > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > > alter the mapping of vdso, vvar, and sigpage during restore
> > > operations. Consequently, this feature cannot be universally enabled
> > > across all systems. To address this, a kernel configuration option has
> > > been introduced to enable or disable this functionality. I tested
> > > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
> > > CHECKPOINT_RESTORE, to verify the sealing works.
> >
> > I am hesitant to say that CRIU is the only user of moving the vdso, as
> > the ppc people wanted the ability for the fallback methods to still
> > function when the vdso was unmapped.
> >
> > I am not sure we can change the user expected behaviour based on a
> > configuration option; users may be able to mmap/munmap but may not be
> > able to boot their own kernel, but maybe it's okay?
> >
> The text doesn't say CRIU is the **only** feature that is not
> compatible with this.

Fair enough.

I read it that way since you pointed out breaking criu is the reason for
not enabling this by default, although it's probably the biggest reason
against doing this.

> 
> The default config is "CONFIG_SEAL_SYSTEM_MAPPINGS_NEVER", and
> distribution needs to opt-in for this feature, such as ChromeOS or
> Android or other safe-by-default systems that doesn't allow to unmap
> or remap vdso in production build.

Okay, but you never stated that they can't be unmapped or remapped in
this change; just that they will never become writeable.  It is worth
adding that detail in the description since it isn't entirely obvious
unless you know the workings of mseal.

> 
> Thanks
> -Jeff
> 
> 
> > >
> > > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
> > >
> > > Jeff Xu (1):
> > >   exec: seal system mappings
> > >
> > >  .../admin-guide/kernel-parameters.txt         |  9 ++++
> > >  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
> > >  fs/exec.c                                     | 53 +++++++++++++++++++
> > >  include/linux/fs.h                            |  1 +
> > >  mm/mmap.c                                     |  1 +
> > >  security/Kconfig                              | 26 +++++++++
> > >  6 files changed, 97 insertions(+), 2 deletions(-)
> > >
> > > --
> > > 2.47.0.rc0.187.ge670bccf7e-goog
> > >
Re: [RFC PATCH v1 0/1] seal system mappings
Posted by Jeff Xu 1 month, 2 weeks ago
On Tue, Oct 8, 2024 at 5:42 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * Jeff Xu <jeffxu@chromium.org> [241008 11:01]:
> > Hi Liam,
> >
> > On Mon, Oct 7, 2024 at 7:19 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> > >
> > > * jeffxu@chromium.org <jeffxu@chromium.org> [241004 12:32]:
> > > > From: Jeff Xu <jeffxu@google.com>
> > > >
> > > > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> > > >
> > > > Those mappings are readonly or executable only, sealing can protect
> > > > them from ever changing during the life time of the process.
> > > >
> > > > System mappings such as vdso, vvar, and sigpage (for arm) are
> > > > generated by the kernel during program initialization. These mappings
> > > > are designated as non-writable, and sealing them will prevent them
> > > > from ever becoming writeable.
> > >
> > > But it also means they cannot be unmapped, right?
> > >
> > > I'm not saying it's a thing people should, but recent conversations
> > > with the ppc people seem to indicate that people do 'things' to the vdso
> > > such as removing it.
> > >
> > > Won't this change mean they cannot do that, at least if mseal is enabled
> > > on ppc64?  In which case we would have a different special mapping for
> > > powerpc, or any other platform that wants to be able to unmap the vdso
> > > (or vvar or whatever else?)
> > >
> > > In fact, I came across people removing the vdso to catch callers to
> > > those functions which they didn't want to allow.  In this case enabling
> > > the security of mseal would not allow them to stop applications from
> > > vdso calls.  Again, I'm not saying this is a good (or bad) idea but it
> > > happening.
> > >
> > > >
> > > > Unlike the aforementioned mappings, the uprobe mapping is not
> > > > established during program startup. However, its lifetime is the same
> > > > as the process's lifetime [1], thus sealable.
> > > >
> > > > The vdso, vvar, sigpage, and uprobe mappings all invoke the
> > > > _install_special_mapping() function. As no other mappings utilize this
> > > > function, it is logical to incorporate sealing logic within
> > > > _install_special_mapping(). This approach avoids the necessity of
> > > > modifying code across various architecture-specific implementations.
> > > >
> > > > The vsyscall mapping, which has its own initialization function, is
> > > > sealed in the XONLY case, it seems to be the most common and secure
> > > > case of using vsyscall.
> > > >
> > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > > > alter the mapping of vdso, vvar, and sigpage during restore
> > > > operations. Consequently, this feature cannot be universally enabled
> > > > across all systems. To address this, a kernel configuration option has
> > > > been introduced to enable or disable this functionality. I tested
> > > > CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use
> > > > CHECKPOINT_RESTORE, to verify the sealing works.
> > >
> > > I am hesitant to say that CRIU is the only user of moving the vdso, as
> > > the ppc people wanted the ability for the fallback methods to still
> > > function when the vdso was unmapped.
> > >
> > > I am not sure we can change the user expected behaviour based on a
> > > configuration option; users may be able to mmap/munmap but may not be
> > > able to boot their own kernel, but maybe it's okay?
> > >
> > The text doesn't say CRIU is the **only** feature that is not
> > compatible with this.
>
> Fair enough.
>
> I read it that way since you pointed out breaking criu is the reason for
> not enabling this by default, although it's probably the biggest reason
> against doing this.
>
> >
> > The default config is "CONFIG_SEAL_SYSTEM_MAPPINGS_NEVER", and
> > distribution needs to opt-in for this feature, such as ChromeOS or
> > Android or other safe-by-default systems that doesn't allow to unmap
> > or remap vdso in production build.
>
> Okay, but you never stated that they can't be unmapped or remapped in
> this change; just that they will never become writeable.  It is worth
> adding that detail in the description since it isn't entirely obvious
> unless you know the workings of mseal.
>
Thanks, I will improve this section by adding more details on memory
sealing or maybe point to the mseal.rst document.

> >
> > Thanks
> > -Jeff
> >
> >
> > > >
> > > > [1] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/
> > > >
> > > > Jeff Xu (1):
> > > >   exec: seal system mappings
> > > >
> > > >  .../admin-guide/kernel-parameters.txt         |  9 ++++
> > > >  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
> > > >  fs/exec.c                                     | 53 +++++++++++++++++++
> > > >  include/linux/fs.h                            |  1 +
> > > >  mm/mmap.c                                     |  1 +
> > > >  security/Kconfig                              | 26 +++++++++
> > > >  6 files changed, 97 insertions(+), 2 deletions(-)
> > > >
> > > > --
> > > > 2.47.0.rc0.187.ge670bccf7e-goog
> > > >