[RFC 0/5] parker: PARtitioned KERnel

Fam Zheng posted 5 patches 1 week, 1 day ago
arch/x86/Kbuild                     |    3 +
arch/x86/Kconfig                    |    2 +
arch/x86/include/asm/smp.h          |    1 +
arch/x86/kernel/apic/apic_flat_64.c |    3 +-
arch/x86/kernel/e820.c              |    2 +-
arch/x86/kernel/setup.c             |    4 +
arch/x86/kernel/smpboot.c           |    2 +-
arch/x86/parker/Kconfig             |    4 +
arch/x86/parker/Makefile            |    3 +
arch/x86/parker/Makefile-full       |    3 +
arch/x86/parker/internal.h          |   54 ++
arch/x86/parker/kernfs.c            | 1266 +++++++++++++++++++++++++++
arch/x86/parker/setup.c             |  423 +++++++++
arch/x86/parker/trampoline.S        |   55 ++
arch/x86/parker/trampoline.h        |   10 +
drivers/thermal/intel/therm_throt.c |    3 +
include/linux/parker-bkup.h         |   22 +
include/linux/parker.h              |   22 +
include/uapi/linux/magic.h          |    1 +
19 files changed, 1880 insertions(+), 3 deletions(-)
create mode 100644 arch/x86/parker/Kconfig
create mode 100644 arch/x86/parker/Makefile
create mode 100644 arch/x86/parker/Makefile-full
create mode 100644 arch/x86/parker/internal.h
create mode 100644 arch/x86/parker/kernfs.c
create mode 100644 arch/x86/parker/setup.c
create mode 100644 arch/x86/parker/trampoline.S
create mode 100644 arch/x86/parker/trampoline.h
create mode 100644 include/linux/parker-bkup.h
create mode 100644 include/linux/parker.h
[RFC 0/5] parker: PARtitioned KERnel
Posted by Fam Zheng 1 week, 1 day ago
From: Thom Hughes <thom.hughes@bytedance.com>

Hi all,

Parker is a proposed feature in linux for multiple linux kernels to run
simultaneously on single machine, without traditional kvm virtualisation. This
is achieved by partitioning the CPU cores, memory and devices for
partitioning-aware Linux kernel.

=== Side note begin ===

Coincidentally it has some similarities with [1] but the design and
implementations are totally separate.

While there are still many open questions and pending work in this direction, we
hope to share the idea and collect feedbacks from you!

=== Side note end ===

Each kernel instance can have the same image, but the initial kernel, or Boot
Kernel, controls the hardware allocation and partition. All other kernels are
secondary kernel, or Application Kernel, touch their own assigned CPU/Memory/IO
devices.

The primary use case in mind for parker is on the machines with high core
counts, where scalability concerns may arise. Once started, there is no
communication between kernel instances. In other words, they share nothing thus
improve scalability. Each kernel needs its own (PCIe) devices for IO, such as
NVMe or NICs.

Another possible use case is for different kernel instances to have different
performance tunings, CONFIG_ options, FDO/PGO according to the workload.

On the implementation side, parker exposes a kernfs directory interface, and
uses kexec to hot-load secondary kernel images to reserved memory regions.
Before creating partitions, the Boot Kernel will offline cpus, reserve physical
memory (using CMA), unbind PCI devices, etc. allocating those to the Application
Kernel so that it can safely use it.

In terms of fault isolation or security, all kernel instances share the same
domain, as there is no supervising mechanism. A kernel bug in any partition can
cause problems for the whole physical machine. This is a tradeoff for
low-overhead / low-complexity, but hope in the future we can take advantage of
some hardware mechanism to introduce some isolation.

Signed-off-by: Thom Hughes <thom.hughes@bytedance.com>
Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>

[1] https://lore.kernel.org/lkml/20250918222607.186488-1-xiyou.wangcong@gmail.com/

Thom Hughes (5):
  x86/boot/e820: Fix memmap to parse with 1 argument
  x86/smpboot: Export wakeup_secondary_cpu_via_init
  x86/parker: Introduce parker kerfs interface
  x86/parker: Add parker initialisation code
  x86/apic: Make Parker instance use physical APIC

 arch/x86/Kbuild                     |    3 +
 arch/x86/Kconfig                    |    2 +
 arch/x86/include/asm/smp.h          |    1 +
 arch/x86/kernel/apic/apic_flat_64.c |    3 +-
 arch/x86/kernel/e820.c              |    2 +-
 arch/x86/kernel/setup.c             |    4 +
 arch/x86/kernel/smpboot.c           |    2 +-
 arch/x86/parker/Kconfig             |    4 +
 arch/x86/parker/Makefile            |    3 +
 arch/x86/parker/Makefile-full       |    3 +
 arch/x86/parker/internal.h          |   54 ++
 arch/x86/parker/kernfs.c            | 1266 +++++++++++++++++++++++++++
 arch/x86/parker/setup.c             |  423 +++++++++
 arch/x86/parker/trampoline.S        |   55 ++
 arch/x86/parker/trampoline.h        |   10 +
 drivers/thermal/intel/therm_throt.c |    3 +
 include/linux/parker-bkup.h         |   22 +
 include/linux/parker.h              |   22 +
 include/uapi/linux/magic.h          |    1 +
 19 files changed, 1880 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/parker/Kconfig
 create mode 100644 arch/x86/parker/Makefile
 create mode 100644 arch/x86/parker/Makefile-full
 create mode 100644 arch/x86/parker/internal.h
 create mode 100644 arch/x86/parker/kernfs.c
 create mode 100644 arch/x86/parker/setup.c
 create mode 100644 arch/x86/parker/trampoline.S
 create mode 100644 arch/x86/parker/trampoline.h
 create mode 100644 include/linux/parker-bkup.h
 create mode 100644 include/linux/parker.h

-- 
2.39.5
Re: [RFC 0/5] parker: PARtitioned KERnel
Posted by Dave Hansen 1 week ago
On 9/23/25 08:31, Fam Zheng wrote:
> In terms of fault isolation or security, all kernel instances share
> the same domain, as there is no supervising mechanism. A kernel bug
> in any partition can cause problems for the whole physical machine.
> This is a tradeoff for low-overhead / low-complexity, but hope in
> the future we can take advantage of some hardware mechanism to
> introduce some isolation.
I just don't think this is approach is viable. The buck needs to stop
_somewhere_. You can't just have a bunch of different kernels, with
nothing in charge of the system as a whole.

Just think of bus locks. They affect the whole system. What if one
kernel turns off split lock detection? Or has a different rate limit
than the others? What if one kernel is a big fan of WBINVD? How about
when they use resctrl to partition an L3 cache? How about microcode updates?

I'd just guess that there are a few hundred problems like that. Maybe more.

I'm not saying this won't be useful for a handful of folks in a tightly
controlled environment. But I just don't think it has a place in
mainline where it needs to work for everyone.
Re: [External] Re: [RFC 0/5] parker: PARtitioned KERnel
Posted by Fam Zheng 1 week ago
On Wed, Sep 24, 2025 at 4:23 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 9/23/25 08:31, Fam Zheng wrote:
> > In terms of fault isolation or security, all kernel instances share
> > the same domain, as there is no supervising mechanism. A kernel bug
> > in any partition can cause problems for the whole physical machine.
> > This is a tradeoff for low-overhead / low-complexity, but hope in
> > the future we can take advantage of some hardware mechanism to
> > introduce some isolation.
> I just don't think this is approach is viable. The buck needs to stop
> _somewhere_. You can't just have a bunch of different kernels, with
> nothing in charge of the system as a whole.
>
> Just think of bus locks. They affect the whole system. What if one
> kernel turns off split lock detection? Or has a different rate limit
> than the others? What if one kernel is a big fan of WBINVD? How about
> when they use resctrl to partition an L3 cache? How about microcode updates?

The model and motivation here is not to split the domain and give
different shares to different sysadmins, it's intended for one kernel
to partition itself. I agree we shouldn't have different kernels here:
one old, one new, one Linux, one Windows... All partitions should run
a verified parker-aware kernel. Actually, it may be a good idea to
force the same buildid in kexec between the boot kernel and secondary
ones.

Fam
Re: [RFC 0/5] parker: PARtitioned KERnel
Posted by H. Peter Anvin 1 week, 1 day ago
On 2025-09-23 08:31, Fam Zheng wrote:
> 
> Parker is a proposed feature in linux for multiple linux kernels to run
> simultaneously on single machine, without traditional kvm virtualisation. This
> is achieved by partitioning the CPU cores, memory and devices for
> partitioning-aware Linux kernel.
> 

This seems to be much better handled by a lightweight hypervisor. There is a
reason why ALL IBM mainframes have a low-level hard-partitioning hypervisor.

Typically that hypervisor will expose a static, very low level view of the
machine (e.g. no scheduling - VCPUs are mapped 1:1 to physical CPUs; no I/O
sharing or emulation, except possibly as needed to boot, and so on.)

Because the functionality of the hypervisor is so limited, the overhead is
minimal, but it CAN (but doesn't HAVE TO) provide memory and I/O isolation
between partitions.

	-hpa