From: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Add a SandBox Mode document under Documentation/security. Describe the
concept, usage and known limitations.
Signed-off-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
---
Documentation/security/index.rst | 1 +
Documentation/security/sandbox-mode.rst | 180 ++++++++++++++++++++++++
2 files changed, 181 insertions(+)
create mode 100644 Documentation/security/sandbox-mode.rst
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 59f8fc106cb0..680a0b8bf28b 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -14,6 +14,7 @@ Security Documentation
sak
SCTP
self-protection
+ sandbox-mode
siphash
tpm/index
digsig
diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst
new file mode 100644
index 000000000000..4405b8858c4a
--- /dev/null
+++ b/Documentation/security/sandbox-mode.rst
@@ -0,0 +1,180 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+SandBox Mode
+============
+
+Introduction
+============
+
+The primary goal of SandBox Mode (SBM) is to reduce the impact of potential
+memory safety bugs in kernel code by decomposing the kernel. The SBM API
+allows to run each component inside an isolated execution environment. In
+particular, memory areas used as input and/or output are isolated from the
+rest of the kernel and surrounded by guard pages. Without arch hooks, this
+common base provides *weak isolation*.
+
+On architectures which implement the necessary arch hooks, SandBox Mode
+leverages hardware paging facilities and CPU privilege levels to enforce the
+use of only these predefined memory areas. With arch support, SBM can also
+recover from protection violations. This means that SBM forcibly terminates
+the sandbox and returns an error code (e.g. ``-EFAULT``) to the caller, so
+execution can continue. Such implementation provides *strong isolation*.
+
+A target function in a sandbox communicates with the rest of the kernel
+through a caller-defined interface, comprising read-only buffers (input),
+read-write buffers (output) and the return value. The caller can explicitly
+share other data with the sandbox, but doing so may reduce isolation strength.
+
+Protection of sensitive kernel data is currently out of scope. SandBox Mode is
+meant to run kernel code which would otherwise have full access to all system
+resources. SBM allows to impose a scoped access control policy on which
+resources are available to the sandbox. That said, protection of sensitive
+data is foreseen as a future goal, and that's why the API is designed to
+control not only memory writes but also memory reads.
+
+The expected use case for SandBox Mode is parsing data from untrusted sources,
+especially if the parsing cannot be reasonably done by a user mode helper.
+Keep in mind that a sandbox doesn't guarantee that the output data is correct.
+The result may be corrupt (e.g. as a result of an exploited bug) and where
+applicable, it should be sanitized before further use.
+
+Using SandBox Mode
+==================
+
+SandBox Mode is an optional feature, enabled with ``CONFIG_SANDBOX_MODE``.
+However, the SBM API is always defined regardless of the kernel configuration.
+It will call a function with the best available isolation, which is:
+
+* *strong isolation* if both ``CONFIG_SANDBOX_MODE`` and
+ ``CONFIG_ARCH_HAVE_SBM`` are set,
+* *weak isolation* if ``CONFIG_SANDBOX_MODE`` is set, but
+ ``CONFIG_ARCH_HAVE_SBM`` is unset,
+* *no isolation* if ``CONFIG_SANDBOX_MODE`` is unset.
+
+Code which cannot safely run with no isolation should depend on the relevant
+config option(s).
+
+The API can be used like this:
+
+.. code-block:: c
+
+ #include <linux/sbm.h>
+
+ /* Function to be executed in a sandbox. */
+ static SBM_DEFINE_FUNC(my_func, const struct my_input *, in,
+ struct my_output *, out)
+ {
+ /* Read from in, write to out. */
+ return 0;
+ }
+
+ int caller(...)
+ {
+ /* Declare a SBM instance. */
+ struct sbm sbm;
+
+ /* Initialize SBM instance. */
+ sbm_init(&sbm);
+
+ /* Execute my_func() using the SBM instance. */
+ err = sbm_call(&sbm, my_func,
+ SBM_COPY_IN(&sbm, input, in_size),
+ SBM_COPY_OUT(&sbm, output, out_size));
+
+ /* Clean up. */
+ sbm_destroy(&sbm);
+
+The return type of a sandbox mode function is always ``int``. The return value
+is zero on success and negative on error. That's because the SBM helpers
+return an error code (such as ``-ENOMEM``) if the call cannot be performed.
+
+If sbm_call() returns an error, you can use sbm_error() to decide whether the
+error was returned by the target function or because sandbox mode was aborted
+(or failed to run entirely).
+
+Public API
+----------
+
+.. kernel-doc:: include/linux/sbm.h
+ :identifiers: sbm sbm_init sbm_destroy sbm_exec sbm_error
+ SBM_COPY_IN SBM_COPY_OUT SBM_COPY_INOUT
+ SBM_DEFINE_CALL SBM_DEFINE_THUNK SBM_DEFINE_FUNC
+ sbm_call
+
+Arch Hooks
+----------
+
+These hooks must be implemented to select HAVE_ARCH_SBM.
+
+.. kernel-doc:: include/linux/sbm.h
+ :identifiers: arch_sbm_init arch_sbm_destroy arch_sbm_exec
+ arch_sbm_map_readonly arch_sbm_map_writable
+
+Current Limitations
+===================
+
+This section lists know limitations of the current SBM implementation, which
+are planned to be removed in the future.
+
+Stack
+-----
+
+There is no generic kernel API to run a function on an alternate stack, so SBM
+runs on the normal kernel stack by default. The kernel already offers
+self-protection against stack overflows and underflows as well as against
+overwriting on-stack data outside the current frame, but violations are
+usually fatal.
+
+This limitation can be solved for specific targets. Arch hooks can set up a
+separate stack and recover from stack frame overruns.
+
+Inherent Limitations
+====================
+
+This section lists limitations which are inherent to the concept.
+
+Explicit Code
+-------------
+
+The main idea behind SandBox Mode is decomposition of one big program (the
+Linux kernel) into multiple smaller programs that can be sandboxed. AFAIK
+there is no way to automate this task for an existing code base in C.
+
+Given the performance impact of running code in a sandbox, this limitation may
+be perceived as a benefit. It is expected that sandbox mode is introduced only
+knowingly and only where safety is more important than performance.
+
+Complex Data
+------------
+
+Although data structures are not serialized and deserialized between kernel
+mode and sandbox mode, all directly and indirectly referenced data structures
+must be explicitly mapped into the sandbox, which requires some manual effort.
+
+Copying of input/output buffers also incurs some runtime overhead. This
+overhead can be reduced by sharing data directly with the sandbox, but the
+resulting isolation is weaker, so it may or may not be acceptable, depending
+on the overall safety requirements.
+
+Page Granularity
+----------------
+
+Since paging is used to enforce memory safety, page size is the smallest unit.
+Objects mapped into the sandbox must be aligned to a page boundary, and buffer
+overflows may not be detected if they fit into the same page.
+
+On the other hand, even though such writes are not detected, they do not
+corrupt kernel data, because only the output buffer is copied back to kernel
+mode, and the (corrupted) rest of the page is ignored.
+
+Transitions
+-----------
+
+Transitions between kernel mode and sandbox mode are synchronous. That is,
+whenever entering or leaving sandbox mode, the currently running CPU executes
+the instructions necessary to save/restore its kernel-mode state. The API is
+generic enough to allow asynchronous transitions, e.g. to pass data to another
+CPU which is already running in sandbox mode. However, to see the benefits, a
+hypothetical implementation would require far-reaching changes in the kernel
+scheduler. This is (currently) out of scope.
--
2.34.1
On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <petrtesarik@huaweicloud.com> wrote: > +Although data structures are not serialized and deserialized between kernel > +mode and sandbox mode, all directly and indirectly referenced data structures > +must be explicitly mapped into the sandbox, which requires some manual effort. Maybe I'm missing something here, but... The requirement that the sandboxed function only ever touch two linear blocks of memory (yes?) seems a tremendous limitation. I mean, how can the sandboxed function call kmalloc()? How can it call any useful kernel functions? They'll all touch memory which lies outside the sandbox areas? Perhaps a simple but real-world example would help clarify.
On Wed, Feb 14, 2024 at 05:30:53AM -0800, Andrew Morton wrote: > On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <petrtesarik@huaweicloud.com> wrote: > > > +Although data structures are not serialized and deserialized between kernel > > +mode and sandbox mode, all directly and indirectly referenced data structures > > +must be explicitly mapped into the sandbox, which requires some manual effort. > > Maybe I'm missing something here, but... > > The requirement that the sandboxed function only ever touch two linear > blocks of memory (yes?) seems a tremendous limitation. I mean, how can > the sandboxed function call kmalloc()? How can it call any useful > kernel functions? They'll all touch memory which lies outside the > sandbox areas? > > Perhaps a simple but real-world example would help clarify. I agree, this looks like an "interesting" framework, but we don't add code to the kernel without a real, in-kernel user for it. Without such a thing, we can't even consider it for inclusion as we don't know how it will actually work and how any subsystem would use it. Petr, do you have an user for this today? thanks, greg k-h
On Wed, 14 Feb 2024 15:01:25 +0100 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Feb 14, 2024 at 05:30:53AM -0800, Andrew Morton wrote: > > On Wed, 14 Feb 2024 12:30:35 +0100 Petr Tesarik <petrtesarik@huaweicloud.com> wrote: > > > > > +Although data structures are not serialized and deserialized between kernel > > > +mode and sandbox mode, all directly and indirectly referenced data structures > > > +must be explicitly mapped into the sandbox, which requires some manual effort. > > > > Maybe I'm missing something here, but... > > > > The requirement that the sandboxed function only ever touch two linear > > blocks of memory (yes?) seems a tremendous limitation. I mean, how can > > the sandboxed function call kmalloc()? How can it call any useful > > kernel functions? They'll all touch memory which lies outside the > > sandbox areas? > > > > Perhaps a simple but real-world example would help clarify. > > I agree, this looks like an "interesting" framework, but we don't add > code to the kernel without a real, in-kernel user for it. > > Without such a thing, we can't even consider it for inclusion as we > don't know how it will actually work and how any subsystem would use it. > > Petr, do you have an user for this today? Hi Greg & Andrew, your observations is correct. In this form, the framework is quite limited, and exactly this objections was expected. You have even spotted one of the first enhancements I tested on top of this framework (dynamic memory allocation). The intended use case is code that processes untrusted data that is not properly sanitized, but where performance is not critical. Some examples include decompressing initramfs, loading a kernel module. Or decoding a boot logo; I think I've noticed a vulnerability in another project recently... ;-) Of course, even decompression needs dynamic memory. My plan is to extend the mechanism. Right now I'm mapping all of kernel text into the sandbox. Later, I'd like to decompose the text section too. The pages which contain sandboxed code should be mapped, but rest of the kernel should not. If the sandbox tries to call kmalloc(), vmalloc(), or schedule(), the attempt will generate a page fault. Sandbox page faults are already intercepted, so handle_sbm_call() can decide if the call should be allowed or not. If the sandbox policy says ALLOW, the page fault handler will perform the API call on behalf of the sandboxed code and return results, possibly with some post-call action, e.g. map some more pages to the address space. The fact that all communication with the rest of the kernel happens through CPU exceptions is the reason this mechanism is unsuitable for performance-critical applications. OK, so why didn't I send the whole thing? Decomposition of the kernel requires many more changes, e.g. in linker scripts. Some of them depend on this patch series. Before I go and clean up my code into something that can be submitted, I want to get feedback from guys like you, to know if the whole idea would be even considered, aka "Fail Fast". Petr T
On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > OK, so why didn't I send the whole thing? > > Decomposition of the kernel requires many more changes, e.g. in linker > scripts. Some of them depend on this patch series. Before I go and > clean up my code into something that can be submitted, I want to get > feedback from guys like you, to know if the whole idea would be even > considered, aka "Fail Fast". We can't honestly consider this portion without seeing how it would work, as we don't even see a working implementation that uses it to verify it at all. The joy of adding new frameworks is that you need a user before anyone can spend the time to review it, sorry. thanks, greg k-h
On Wed, 14 Feb 2024 16:11:05 +0100 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > OK, so why didn't I send the whole thing? > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > scripts. Some of them depend on this patch series. Before I go and > > clean up my code into something that can be submitted, I want to get > > feedback from guys like you, to know if the whole idea would be even > > considered, aka "Fail Fast". > > We can't honestly consider this portion without seeing how it would > work, as we don't even see a working implementation that uses it to > verify it at all. > > The joy of adding new frameworks is that you need a user before anyone > can spend the time to review it, sorry. Thank your for a quick assessment. Will it be sufficient if I send some code for illustration (with some quick&dirty hacks to bridge the gaps), or do you need clean and nice kernel code? Petr T
On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > On Wed, 14 Feb 2024 16:11:05 +0100 > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > OK, so why didn't I send the whole thing? > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > scripts. Some of them depend on this patch series. Before I go and > > > clean up my code into something that can be submitted, I want to get > > > feedback from guys like you, to know if the whole idea would be even > > > considered, aka "Fail Fast". > > > > We can't honestly consider this portion without seeing how it would > > work, as we don't even see a working implementation that uses it to > > verify it at all. > > > > The joy of adding new frameworks is that you need a user before anyone > > can spend the time to review it, sorry. > > Thank your for a quick assessment. Will it be sufficient if I send some > code for illustration (with some quick&dirty hacks to bridge the gaps), > or do you need clean and nice kernel code? Given that code is going to need a rewrite to make use of this anyways - why not just do the rewrite in Rust? Then you get memory safety, which seems to be what you're trying to achieve here. Or, you say this is for when performance isn't critical - why not a user mode helper?
On Wed, 14 Feb 2024 13:54:54 -0500 Kent Overstreet <kent.overstreet@linux.dev> wrote: > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > On Wed, 14 Feb 2024 16:11:05 +0100 > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > OK, so why didn't I send the whole thing? > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > scripts. Some of them depend on this patch series. Before I go and > > > > clean up my code into something that can be submitted, I want to get > > > > feedback from guys like you, to know if the whole idea would be even > > > > considered, aka "Fail Fast". > > > > > > We can't honestly consider this portion without seeing how it would > > > work, as we don't even see a working implementation that uses it to > > > verify it at all. > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > can spend the time to review it, sorry. > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > or do you need clean and nice kernel code? > > Given that code is going to need a rewrite to make use of this anyways - > why not just do the rewrite in Rust? Thank you for this question! I concur that rewriting the whole kernel in Rust would be a better option. I see two differences: 1. amount of work 2. regressions Rewriting something in Rust means pretty much writing it from scratch. Doing that necessarily introduces regressions. Old code has been used. It has been tested. In many corner cases. Lots of bugs have been found, and they’ve been fixed. If you write code from scratch, you lose much of the accumulated knowledge. It may still pay off in the long run. More importantly, sandbox mode can be viewed as a tool that enforces decomposition of kernel code. This decomposition is the main benefit. It requires understanding the dependencies among different parts of the kernel (both code flow and data flow), and that will in turn promote better design. > Then you get memory safety, which seems to be what you're trying to > achieve here. > > Or, you say this is for when performance isn't critical - why not a user > mode helper? Processes in user mode are susceptible to all kinds of attacks you may want to avoid. Sandbox mode can be used in situations where user mode does not exist, e.g. to display a boot logo or to unpack initramfs. Sandbox mode is part of the kernel, hence signed, which may be relevant if the kernel is locked down, so you can use it e.g. to parse a key from the bootloader and put it on the trusted keyring. Regarding performance, sandbox overhead is somewhere between kernel mode and UMH. It is not suitable for time-critical code (like handling NIC interrupts), but it's still much faster than UMH. Petr T
On Wed, Feb 14, 2024 at 09:09:37PM +0100, Petr Tesařík wrote: > On Wed, 14 Feb 2024 13:54:54 -0500 > Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > > On Wed, 14 Feb 2024 16:11:05 +0100 > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > > OK, so why didn't I send the whole thing? > > > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > > scripts. Some of them depend on this patch series. Before I go and > > > > > clean up my code into something that can be submitted, I want to get > > > > > feedback from guys like you, to know if the whole idea would be even > > > > > considered, aka "Fail Fast". > > > > > > > > We can't honestly consider this portion without seeing how it would > > > > work, as we don't even see a working implementation that uses it to > > > > verify it at all. > > > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > > can spend the time to review it, sorry. > > > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > > or do you need clean and nice kernel code? > > > > Given that code is going to need a rewrite to make use of this anyways - > > why not just do the rewrite in Rust? > > Thank you for this question! I concur that rewriting the whole kernel > in Rust would be a better option. I see two differences: > > 1. amount of work > 2. regressions > > Rewriting something in Rust means pretty much writing it from scratch. > Doing that necessarily introduces regressions. Old code has been used. > It has been tested. In many corner cases. Lots of bugs have been found, > and they’ve been fixed. If you write code from scratch, you lose much > of the accumulated knowledge. But it's work that already has some growing momentum behind it, especially in the area you cited - decompression algorithms. > More importantly, sandbox mode can be viewed as a tool that enforces > decomposition of kernel code. This decomposition is the main benefit. > It requires understanding the dependencies among different parts of the > kernel (both code flow and data flow), and that will in turn promote > better design. You see this as a tool for general purpose code...? Typical kernel code tends to be quite pointer heavy. > > Then you get memory safety, which seems to be what you're trying to > > achieve here. > > > > Or, you say this is for when performance isn't critical - why not a user > > mode helper? > > Processes in user mode are susceptible to all kinds of attacks you may > want to avoid. Sandbox mode can be used in situations where user mode > does not exist, e.g. to display a boot logo or to unpack initramfs. [citation needed] Running code in the kernel does not make it more secure from attack, and that's a pretty strange idea. One of the central jobs of the kernel is to provide isolation between different users. > Sandbox mode is part of the kernel, hence signed, which may be relevant > if the kernel is locked down, so you can use it e.g. to parse a key > from the bootloader and put it on the trusted keyring. > > Regarding performance, sandbox overhead is somewhere between kernel > mode and UMH. It is not suitable for time-critical code (like handling > NIC interrupts), but it's still much faster than UMH. yeah, this doesn't seem like a worthwhile direction to go in.
On Wed, 2024-02-14 at 15:19 -0500, Kent Overstreet wrote: > On Wed, Feb 14, 2024 at 09:09:37PM +0100, Petr Tesařík wrote: > > On Wed, 14 Feb 2024 13:54:54 -0500 > > Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > > > On Wed, 14 Feb 2024 16:11:05 +0100 > > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > > > OK, so why didn't I send the whole thing? > > > > > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > > > scripts. Some of them depend on this patch series. Before I go and > > > > > > clean up my code into something that can be submitted, I want to get > > > > > > feedback from guys like you, to know if the whole idea would be even > > > > > > considered, aka "Fail Fast". > > > > > > > > > > We can't honestly consider this portion without seeing how it would > > > > > work, as we don't even see a working implementation that uses it to > > > > > verify it at all. > > > > > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > > > can spend the time to review it, sorry. > > > > > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > > > or do you need clean and nice kernel code? > > > > > > Given that code is going to need a rewrite to make use of this anyways - > > > why not just do the rewrite in Rust? > > > > Thank you for this question! I concur that rewriting the whole kernel > > in Rust would be a better option. I see two differences: > > > > 1. amount of work > > 2. regressions > > > > Rewriting something in Rust means pretty much writing it from scratch. > > Doing that necessarily introduces regressions. Old code has been used. > > It has been tested. In many corner cases. Lots of bugs have been found, > > and they’ve been fixed. If you write code from scratch, you lose much > > of the accumulated knowledge. > > But it's work that already has some growing momentum behind it, > especially in the area you cited - decompression algorithms. > > > More importantly, sandbox mode can be viewed as a tool that enforces > > decomposition of kernel code. This decomposition is the main benefit. > > It requires understanding the dependencies among different parts of the > > kernel (both code flow and data flow), and that will in turn promote > > better design. > > You see this as a tool for general purpose code...? > > Typical kernel code tends to be quite pointer heavy. > > > > Then you get memory safety, which seems to be what you're trying to > > > achieve here. > > > > > > Or, you say this is for when performance isn't critical - why not a user > > > mode helper? > > > > Processes in user mode are susceptible to all kinds of attacks you may > > want to avoid. Sandbox mode can be used in situations where user mode > > does not exist, e.g. to display a boot logo or to unpack initramfs. > > [citation needed] > > Running code in the kernel does not make it more secure from attack, and > that's a pretty strange idea. One of the central jobs of the kernel is > to provide isolation between different users. + linux-security-module, keyrings It is not exactly about being more secure, but more privileged. I had a question in the past: https://lore.kernel.org/linux-integrity/eb31920bd00e2c921b0aa6ebed8745cb0130b0e1.camel@huaweicloud.com/ I basically need to parse PGP keys, to verify RPM package headers, extract the file digests from them, and use those digests as reference values for the kernel to decide whether or not files can be accessed depending on their integrity. It is very important that, in a locked-down system, even root is subject to integrity checks. So, the kernel has higher privileges than root. Can the kernel be trusted to provide strong enough process isolation, even against root, in a way that it can offload a workload to a user space process (PGP parsing), and yet maintain its privilege level (which would drop to root, if isolation cannot be guaranteed)? So, since we got no as an answer, we thought about something in the middle, we still run the code in the kernel, to keep the higher privilege, but at the same time we mitigate the risk of kernel memory corruption. Roberto
On Wed, 14 Feb 2024 15:19:04 -0500 Kent Overstreet <kent.overstreet@linux.dev> wrote: > On Wed, Feb 14, 2024 at 09:09:37PM +0100, Petr Tesařík wrote: > > On Wed, 14 Feb 2024 13:54:54 -0500 > > Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > > > On Wed, 14 Feb 2024 16:11:05 +0100 > > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > > > OK, so why didn't I send the whole thing? > > > > > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > > > scripts. Some of them depend on this patch series. Before I go and > > > > > > clean up my code into something that can be submitted, I want to get > > > > > > feedback from guys like you, to know if the whole idea would be even > > > > > > considered, aka "Fail Fast". > > > > > > > > > > We can't honestly consider this portion without seeing how it would > > > > > work, as we don't even see a working implementation that uses it to > > > > > verify it at all. > > > > > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > > > can spend the time to review it, sorry. > > > > > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > > > or do you need clean and nice kernel code? > > > > > > Given that code is going to need a rewrite to make use of this anyways - > > > why not just do the rewrite in Rust? > > > > Thank you for this question! I concur that rewriting the whole kernel > > in Rust would be a better option. I see two differences: > > > > 1. amount of work > > 2. regressions > > > > Rewriting something in Rust means pretty much writing it from scratch. > > Doing that necessarily introduces regressions. Old code has been used. > > It has been tested. In many corner cases. Lots of bugs have been found, > > and they’ve been fixed. If you write code from scratch, you lose much > > of the accumulated knowledge. > > But it's work that already has some growing momentum behind it, > especially in the area you cited - decompression algorithms. Fair enough, this is indeed going for a better solution. > > More importantly, sandbox mode can be viewed as a tool that enforces > > decomposition of kernel code. This decomposition is the main benefit. > > It requires understanding the dependencies among different parts of the > > kernel (both code flow and data flow), and that will in turn promote > > better design. > > You see this as a tool for general purpose code...? > > Typical kernel code tends to be quite pointer heavy. Yes. I believe this fact contributes to the difficulty of ensuring memory safety in the kernel. With so much code potentially depnding on any other kernel data structure, it does not help much that you protect it as a whole. A finer-grained protection would make more sense. > > > Then you get memory safety, which seems to be what you're trying to > > > achieve here. > > > > > > Or, you say this is for when performance isn't critical - why not a user > > > mode helper? > > > > Processes in user mode are susceptible to all kinds of attacks you may > > want to avoid. Sandbox mode can be used in situations where user mode > > does not exist, e.g. to display a boot logo or to unpack initramfs. > > [citation needed] I assume you mean citation for the kinds of attacks, not for the unavailability of user space before initramfs is unpacked. ;-) Here you go: On Wed, 2023-03-22 at 15:27 -0700, Alexei Starovoitov wrote: > On Wed, Mar 22, 2023 at 5:08 AM Roberto Sassu [...] > <roberto.sassu@huaweicloud.com> wrote: > > possible use case. The main goal is to move something that was running > > in the kernel to user space, with the same isolation guarantees as if > > the code was executed in the kernel. > > They are not the same guarantees. > UMD is exactly equivalent to root process running in user space. > Meaning it can be killed, ptraced, priority inverted, etc https://lore.kernel.org/lkml/CAADnVQJC0h7rtuntt0tqS5BbxWsmyWs3ZSbboZMmUKetMG2VhA@mail.gmail.com/ Petr T
On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > On Wed, 14 Feb 2024 16:11:05 +0100 > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > OK, so why didn't I send the whole thing? > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > scripts. Some of them depend on this patch series. Before I go and > > > clean up my code into something that can be submitted, I want to get > > > feedback from guys like you, to know if the whole idea would be even > > > considered, aka "Fail Fast". > > > > We can't honestly consider this portion without seeing how it would > > work, as we don't even see a working implementation that uses it to > > verify it at all. > > > > The joy of adding new frameworks is that you need a user before anyone > > can spend the time to review it, sorry. > > Thank your for a quick assessment. Will it be sufficient if I send some > code for illustration (with some quick&dirty hacks to bridge the gaps), > or do you need clean and nice kernel code? We need a real user in the kernel, otherwise why would we even consider it? Would you want to review a new subsystem that does nothing and has no real users? If not, why would you want us to? :) thanks, greg k-h
On Wed, 14 Feb 2024 19:48:52 +0100 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > On Wed, 14 Feb 2024 16:11:05 +0100 > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > OK, so why didn't I send the whole thing? > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > scripts. Some of them depend on this patch series. Before I go and > > > > clean up my code into something that can be submitted, I want to get > > > > feedback from guys like you, to know if the whole idea would be even > > > > considered, aka "Fail Fast". > > > > > > We can't honestly consider this portion without seeing how it would > > > work, as we don't even see a working implementation that uses it to > > > verify it at all. > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > can spend the time to review it, sorry. > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > or do you need clean and nice kernel code? > > We need a real user in the kernel, otherwise why would we even consider > it? Would you want to review a new subsystem that does nothing and has > no real users? If not, why would you want us to? :) Greg, please enlighten me on the process. How is something like this supposed to get in? Subsystem maintainers will not review code that depends on core features not yet reviewed by the respective maintainers. If I add only the API and a stub implementation, then it brings no benefit and attempts to introduce the API will be dismissed. I would certainly do just that if I was a maintainer... I could try to pack everything (base infrastructure, arch implementations, API users) into one big patch with pretty much everybody on the Cc list, but how is that ever going to get reviewed? Should I just go and maintain an out-of-tree repo for a few years, hoping that it gets merged one day, like bcachefs? Is this the way? Petr T
On Wed, Feb 14, 2024 at 08:42:54PM +0100, Petr Tesařík wrote: > On Wed, 14 Feb 2024 19:48:52 +0100 > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > > On Wed, 14 Feb 2024 16:11:05 +0100 > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > > OK, so why didn't I send the whole thing? > > > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > > scripts. Some of them depend on this patch series. Before I go and > > > > > clean up my code into something that can be submitted, I want to get > > > > > feedback from guys like you, to know if the whole idea would be even > > > > > considered, aka "Fail Fast". > > > > > > > > We can't honestly consider this portion without seeing how it would > > > > work, as we don't even see a working implementation that uses it to > > > > verify it at all. > > > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > > can spend the time to review it, sorry. > > > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > > or do you need clean and nice kernel code? > > > > We need a real user in the kernel, otherwise why would we even consider > > it? Would you want to review a new subsystem that does nothing and has > > no real users? If not, why would you want us to? :) > > Greg, please enlighten me on the process. How is something like this > supposed to get in? If you were in our shoes, what would you want to see in order to be able to properly review and judge if a new subsystem was ok to accept? > Subsystem maintainers will not review code that depends on core features > not yet reviewed by the respective maintainers. If I add only the API > and a stub implementation, then it brings no benefit and attempts to > introduce the API will be dismissed. I would certainly do just that if > I was a maintainer... Exactly, you need a real user. > I could try to pack everything (base infrastructure, arch > implementations, API users) into one big patch with pretty much > everybody on the Cc list, but how is that ever going to get reviewed? How are we supposed to know if any of this even works at all if you don't show that it actually works and is useful? Has any of that work even been done yet? I'm guessing it has (otherwise you wouldn't have posted this), but you are expecting us to just "trust us, stuff in the future is going to use this and need it" here. Again, we can not add new infrastructure for things that have no users, nor do you want us to. Ideally you will have at least 3 different users, as that seems to be the "magic number" that shows that the api/interface will actually work well, and is flexible enough. Just one user is great for proof-of-concept, but that usually isn't good enough to determine if it will work for others (and so it wouldn't need to be infrastructure at all, but rather just part of that one feature on its own.) > Should I just go and maintain an out-of-tree repo for a few years, > hoping that it gets merged one day, like bcachefs? Is this the way? No, show us how this is going to be used. Again, think about what you would want if you had to review this. thanks, greg k-h
On Thu, 15 Feb 2024 10:11:05 +0100 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > On Wed, Feb 14, 2024 at 08:42:54PM +0100, Petr Tesařík wrote: > > On Wed, 14 Feb 2024 19:48:52 +0100 > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > On Wed, Feb 14, 2024 at 05:31:12PM +0100, Petr Tesařík wrote: > > > > On Wed, 14 Feb 2024 16:11:05 +0100 > > > > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > > > > > > > > On Wed, Feb 14, 2024 at 03:55:24PM +0100, Petr Tesařík wrote: > > > > > > OK, so why didn't I send the whole thing? > > > > > > > > > > > > Decomposition of the kernel requires many more changes, e.g. in linker > > > > > > scripts. Some of them depend on this patch series. Before I go and > > > > > > clean up my code into something that can be submitted, I want to get > > > > > > feedback from guys like you, to know if the whole idea would be even > > > > > > considered, aka "Fail Fast". > > > > > > > > > > We can't honestly consider this portion without seeing how it would > > > > > work, as we don't even see a working implementation that uses it to > > > > > verify it at all. > > > > > > > > > > The joy of adding new frameworks is that you need a user before anyone > > > > > can spend the time to review it, sorry. > > > > > > > > Thank your for a quick assessment. Will it be sufficient if I send some > > > > code for illustration (with some quick&dirty hacks to bridge the gaps), > > > > or do you need clean and nice kernel code? > > > > > > We need a real user in the kernel, otherwise why would we even consider > > > it? Would you want to review a new subsystem that does nothing and has > > > no real users? If not, why would you want us to? :) > > > > Greg, please enlighten me on the process. How is something like this > > supposed to get in? > > If you were in our shoes, what would you want to see in order to be able > to properly review and judge if a new subsystem was ok to accept? > > > Subsystem maintainers will not review code that depends on core features > > not yet reviewed by the respective maintainers. If I add only the API > > and a stub implementation, then it brings no benefit and attempts to > > introduce the API will be dismissed. I would certainly do just that if > > I was a maintainer... > > Exactly, you need a real user. Er, what I was trying to say was rather: You need a real implementation of a core feature before a subsystem maintainer considers using it for their subsystem. But I get your point. I need *both*. > > I could try to pack everything (base infrastructure, arch > > implementations, API users) into one big patch with pretty much > > everybody on the Cc list, but how is that ever going to get reviewed? > > How are we supposed to know if any of this even works at all if you > don't show that it actually works and is useful? Has any of that work > even been done yet? I'm guessing it has (otherwise you wouldn't have > posted this), but you are expecting us to just "trust us, stuff in the > future is going to use this and need it" here. Understood. > Again, we can not add new infrastructure for things that have no users, > nor do you want us to. Ideally you will have at least 3 different > users, as that seems to be the "magic number" that shows that the > api/interface will actually work well, and is flexible enough. Just > one user is great for proof-of-concept, but that usually isn't good > enough to determine if it will work for others (and so it wouldn't need > to be infrastructure at all, but rather just part of that one feature on > its own.) > > > Should I just go and maintain an out-of-tree repo for a few years, > > hoping that it gets merged one day, like bcachefs? Is this the way? > > No, show us how this is going to be used. OK, working on it. > Again, think about what you would want if you had to review this. Review, or merge? For a review, I would want enough information to understand what it is *and* where it is going. As a matter of fact, hpa does not like the x86 implementation. For reasons that I do not fully understand (yet), but if the concept turns out to be impractical, then my submission will serve a purpose, as I can save myself (and anybody else with a similar idea) a lot of work by failing fast. Is this a valid way to get early feedback? Thanks, Petr T
On Thu, Feb 15, 2024 at 10:45:15AM +0100, Petr Tesařík wrote: > As a matter of fact, hpa does not like the x86 implementation. For > reasons that I do not fully understand (yet), but if the concept turns > out to be impractical, then my submission will serve a purpose, as I > can save myself (and anybody else with a similar idea) a lot of work by > failing fast. > > Is this a valid way to get early feedback? Asking for feedback is one thing (that's what RFC is for, right?), but submitting something and expecting us to review and accept it as-is like this, just doesn't work well. thanks, greg k-h
© 2016 - 2026 Red Hat, Inc.