.../admin-guide/kernel-parameters.txt | 10 ++++ arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- fs/exec.c | 53 +++++++++++++++++++ include/linux/fs.h | 1 + kernel/events/uprobes.c | 2 +- mm/mmap.c | 1 + security/Kconfig | 26 +++++++++ 7 files changed, 99 insertions(+), 3 deletions(-)
From: Jeff Xu <jeffxu@chromium.org> Seal vdso, vvar, sigpage, uprobes and vsyscall. Those mappings are readonly or executable only, sealing can protect them from ever changing during the life time of the process. For complete descriptions of memory sealing, please see mseal.rst [1]. System mappings such as vdso, vvar, and sigpage (for arm) are generated by the kernel during program initialization. These mappings are designated as non-writable, and sealing them will prevent them from ever becoming writeable. Unlike the aforementioned mappings, the uprobe mapping is not established during program startup. However, its lifetime is the same as the process's lifetime [2], thus sealable. The vdso, vvar, sigpage, and uprobe mappings all invoke the _install_special_mapping() function. As no other mappings utilize this function, it is logical to incorporate sealing logic within _install_special_mapping(). This approach avoids the necessity of modifying code across various architecture-specific implementations. The vsyscall mapping, which has its own initialization function, is sealed in the XONLY case, it seems to be the most common and secure case of using vsyscall. It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may alter the mapping of vdso, vvar, and sigpage during restore operations. Consequently, this feature cannot be universally enabled across all systems. To address this, a kernel configuration option has been introduced to enable or disable this functionality. Note, uprobe is always sealed and not controlled by this kernel configuration. I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, which doesn’t use CHECKPOINT_RESTORE. [1] Documentation/userspace-api/mseal.rst [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ History: V2: Seal uprobe always (Oleg Nesterov) Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov) Rebase to linux_main V1: https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/ Jeff Xu (1): exec: seal system mappings .../admin-guide/kernel-parameters.txt | 10 ++++ arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- fs/exec.c | 53 +++++++++++++++++++ include/linux/fs.h | 1 + kernel/events/uprobes.c | 2 +- mm/mmap.c | 1 + security/Kconfig | 26 +++++++++ 7 files changed, 99 insertions(+), 3 deletions(-) -- 2.47.0.rc1.288.g06298d1525-goog
* jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]: > From: Jeff Xu <jeffxu@chromium.org> > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > Those mappings are readonly or executable only, sealing can protect > them from ever changing during the life time of the process. For > complete descriptions of memory sealing, please see mseal.rst [1]. > > System mappings such as vdso, vvar, and sigpage (for arm) are > generated by the kernel during program initialization. These mappings > are designated as non-writable, and sealing them will prevent them > from ever becoming writeable. ^ or ever removed. This is a pretty big deal. Platforms are trying to make it so that vdso is the fast path, but if they are removed then things stop using them and it's not a problem. This description is robbing them of the information they need to know that, and it's not in your change log either. I had said before that you need to be clear about the inability to remove the mappings that are sealed, as the description above heavily implies that it is only stopping them from becoming writeable. > > Unlike the aforementioned mappings, the uprobe mapping is not > established during program startup. However, its lifetime is the same > as the process's lifetime [2], thus sealable. > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > _install_special_mapping() function. As no other mappings utilize this > function, it is logical to incorporate sealing logic within > _install_special_mapping(). This approach avoids the necessity of > modifying code across various architecture-specific implementations. > > The vsyscall mapping, which has its own initialization function, is > sealed in the XONLY case, it seems to be the most common and secure > case of using vsyscall. > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > alter the mapping of vdso, vvar, and sigpage during restore > operations. Consequently, this feature cannot be universally enabled > across all systems. To address this, a kernel configuration option has > been introduced to enable or disable this functionality. Note, uprobe > is always sealed and not controlled by this kernel configuration. > > I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, > which doesn’t use CHECKPOINT_RESTORE. > > [1] Documentation/userspace-api/mseal.rst > [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ > > History: > V2: > Seal uprobe always (Oleg Nesterov) > Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov) The only update to the comment I see is the pointer to mseal.rst for a complete description? > Rebase to linux_main > > V1: > https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/ > > Jeff Xu (1): > exec: seal system mappings > > .../admin-guide/kernel-parameters.txt | 10 ++++ > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > fs/exec.c | 53 +++++++++++++++++++ > include/linux/fs.h | 1 + > kernel/events/uprobes.c | 2 +- > mm/mmap.c | 1 + > security/Kconfig | 26 +++++++++ > 7 files changed, 99 insertions(+), 3 deletions(-) > > -- > 2.47.0.rc1.288.g06298d1525-goog >
On Wed, Oct 16, 2024 at 4:18 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]: > > From: Jeff Xu <jeffxu@chromium.org> > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > Those mappings are readonly or executable only, sealing can protect > > them from ever changing during the life time of the process. For > > complete descriptions of memory sealing, please see mseal.rst [1]. > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > generated by the kernel during program initialization. These mappings > > are designated as non-writable, and sealing them will prevent them > > from ever becoming writeable. > ^ or ever removed. > > This is a pretty big deal. Platforms are trying to make it so that vdso > is the fast path, but if they are removed then things stop using them > and it's not a problem. This description is robbing them of the > information they need to know that, and it's not in your change log > either. > > I had said before that you need to be clear about the inability to > remove the mappings that are sealed, as the description above heavily > implies that it is only stopping them from becoming writeable. > The mseal.rst has the full description about memory sealing, I don't see the point to repeat all the blocked operations in the commit message here. I don't know why you would think this heavily implies that only stopping them from becoming writable, There is already reminder: ** For complete descriptions ** of memory sealing, please see mseal.rst > > > > Unlike the aforementioned mappings, the uprobe mapping is not > > established during program startup. However, its lifetime is the same > > as the process's lifetime [2], thus sealable. > > > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > > _install_special_mapping() function. As no other mappings utilize this > > function, it is logical to incorporate sealing logic within > > _install_special_mapping(). This approach avoids the necessity of > > modifying code across various architecture-specific implementations. > > > > The vsyscall mapping, which has its own initialization function, is > > sealed in the XONLY case, it seems to be the most common and secure > > case of using vsyscall. > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > alter the mapping of vdso, vvar, and sigpage during restore > > operations. Consequently, this feature cannot be universally enabled > > across all systems. To address this, a kernel configuration option has > > been introduced to enable or disable this functionality. Note, uprobe > > is always sealed and not controlled by this kernel configuration. > > > > I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, > > which doesn’t use CHECKPOINT_RESTORE. > > > > [1] Documentation/userspace-api/mseal.rst > > [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ > > > > History: > > V2: > > Seal uprobe always (Oleg Nesterov) > > Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov) > > The only update to the comment I see is the pointer to mseal.rst for a > complete description? > > > Rebase to linux_main > > > > V1: > > https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@google.com/ > > > > Jeff Xu (1): > > exec: seal system mappings > > > > .../admin-guide/kernel-parameters.txt | 10 ++++ > > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > > fs/exec.c | 53 +++++++++++++++++++ > > include/linux/fs.h | 1 + > > kernel/events/uprobes.c | 2 +- > > mm/mmap.c | 1 + > > security/Kconfig | 26 +++++++++ > > 7 files changed, 99 insertions(+), 3 deletions(-) > > > > -- > > 2.47.0.rc1.288.g06298d1525-goog > >
* Jeff Xu <jeffxu@chromium.org> [241016 20:59]: > On Wed, Oct 16, 2024 at 4:18 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > * jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]: > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > > > Those mappings are readonly or executable only, sealing can protect > > > them from ever changing during the life time of the process. For > > > complete descriptions of memory sealing, please see mseal.rst [1]. > > > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > > generated by the kernel during program initialization. These mappings > > > are designated as non-writable, and sealing them will prevent them > > > from ever becoming writeable. > > ^ or ever removed. > > > > This is a pretty big deal. Platforms are trying to make it so that vdso > > is the fast path, but if they are removed then things stop using them > > and it's not a problem. This description is robbing them of the > > information they need to know that, and it's not in your change log > > either. > > > > I had said before that you need to be clear about the inability to > > remove the mappings that are sealed, as the description above heavily > > implies that it is only stopping them from becoming writeable. > > > The mseal.rst has the full description about memory sealing, I don't > see the point to repeat all the blocked operations in the commit > message here. That's not what I said or asked for. It's three words to add. > > I don't know why you would think this heavily implies that only > stopping them from becoming writable, Because you left it out of the description that states what this does, and it's three words that adds a whole lot of context. I told you to add that detail because it is _way_ more important to almost everyone that you are making mappings exist for the lifetime of the process than the fact that not writeable mappings will still be not writeable. One points out the fundamental change to the mappings you list while the other tells people how things won't change. I get that not changing is the point of the patch, but it's not the only thing it does. > There is already reminder: ** > For complete descriptions ** of memory sealing, please see mseal.rst Getting good code reviews is difficult. There are very few people who do a good job of the very few people who do it at all. If you want people to read your code, then you need to spoon feed it to them. You have just put up a huge barrier to someone doing a code review at all, let alone a good code review. So if you want good code, you need to provide good information with a low bar of entrance. We can all agree that good review creates good code (well, better code, at least). What you have done here is limited the number of people who are willing to even entertain the idea of looking at the following patch. And of the people who do read the patch, probably close to zero will read that document. It's (usually) nothing personal, it's just human nature and a function of time. Thanks, Liam
Hi Liam On Wed, Oct 16, 2024 at 7:03 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * Jeff Xu <jeffxu@chromium.org> [241016 20:59]: > > On Wed, Oct 16, 2024 at 4:18 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > > > > > * jeffxu@chromium.org <jeffxu@chromium.org> [241014 17:50]: > > > > From: Jeff Xu <jeffxu@chromium.org> > > > > > > > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > > > > > > > Those mappings are readonly or executable only, sealing can protect > > > > them from ever changing during the life time of the process. For > > > > complete descriptions of memory sealing, please see mseal.rst [1]. > > > > I will mention unmap in the above sentence. > > > > System mappings such as vdso, vvar, and sigpage (for arm) are > > > > generated by the kernel during program initialization. These mappings > > > > are designated as non-writable, and sealing them will prevent them > > > > from ever becoming writeable. > > > ^ or ever removed. > > > This section is about the mappings (vdso, etc) created during program initialization vs later time as uprobe, I will revise to make it clearer to the reader. Thanks -Jeff
© 2016 - 2024 Red Hat, Inc.