[v2] KVM: PPC: Handle CPU compatibility mode for nested guests

[PATCH v2 0/5] KVM: PPC: Handle CPU compatibility mode for nested guests

Posted by Amit Machhiwal 1 month ago

On POWER systems, newer processor generations can operate in compatibility
modes corresponding to earlier generations (e.g., a Power11 system running
in Power10 compatibility mode). In such cases, the effective CPU level
exposed to guests differs from the physical processor generation.

This creates a problem for nested virtualization. When booting a nested KVM
guest (L2) inside a host KVM guest (L1) running in a compatibility mode,
userspace (e.g., QEMU) may derive the CPU model from the raw hardware PVR
and attempt to configure the nested guest accordingly. However, the L1
partition is constrained by the compatibility level negotiated with the
hypervisor (L0), and requests exceeding that level are rejected, leading to
guest boot failures such as:

  KVM-NESTEDv2: couldn't set guest wide elements

This series addresses the issue in two steps:

1. Detect and reject invalid compatibility requests early in KVM to avoid
   late failures.

2. Provide a mechanism for userspace to query the effective CPU
   compatibility modes supported by the host, so it can select an
   appropriate CPU model for nested guests.

To achieve this, the series introduces a new KVM capability and ioctl
(KVM_CAP_PPC_COMPAT_CAPS / KVM_PPC_GET_COMPAT_CAPS) that expose the
compatibility modes supported by the host.

The implementation supports both:

  - PowerVM (nested API v2), where compatibility information is obtained
    via the H_GUEST_GET_CAPABILITIES hypercall.
  - PowerNV (nested API v1), where compatibility is derived from the device
    tree ("cpu-version") representing the effective processor compatibility
    level.

This allows userspace (e.g., QEMU) to select a CPU model consistent with
the host compatibility mode, avoiding mismatches and enabling successful
nested guest boot.

Changes in v2:
  - Squashed patches 2 and 3 from v1 (capability introduction and ioctl
    wiring) into a single patch for better logical grouping
  - Changed kvm_ppc_compat_caps.flags from __u32 to __u64 for consistency
    and future extensibility
  - Addressed other review comments
  - Improved commit messages with clearer explanations of the changes

Patch summary:
  [1/5] Validate arch_compat against host compatibility mode
  [2/5] Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
  [3/5] Implement capability retrieval for PowerVM (API v2)
  [4/5] Add PowerNV support (API v1)
  [5/5] Document the new ioctl

Tested on:
  - Power11 pSeries LPAR in Power10 compatibility mode (nested API v2)
  - Power10 PowerNV system (and QEMU TCG PowerNV 11) with nested
    virtualization (API v1) with various combinations of KVM L1/L2 guests
    in various supported compatibility modes.

With this series, nested guests boot successfully in configurations where
they previously failed due to compatibility mismatches.

Related QEMU series:
  A corresponding QEMU series adds support for querying and using these
  compatibility capabilities when configuring nested KVM guests:
  https://lore.kernel.org/all/20260502140021.69712-1-amachhiw@linux.ibm.com/

v1: https://lore.kernel.org/linuxppc-dev/20260430054906.94431-1-amachhiw@linux.ibm.com/

Amit Machhiwal (5):
  KVM: PPC: Book3S HV: Validate arch_compat against host compatibility
    mode
  KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
  KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM
    on PowerVM
  KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM
    on PowerNV
  KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl

 Documentation/virt/kvm/api.rst      | 35 ++++++++++++++++
 arch/powerpc/include/asm/kvm_ppc.h  |  1 +
 arch/powerpc/include/uapi/asm/kvm.h |  6 +++
 arch/powerpc/kvm/book3s_hv.c        | 63 +++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c          | 21 ++++++++++
 include/uapi/linux/kvm.h            |  4 ++
 6 files changed, 130 insertions(+)


base-commit: 1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524
-- 
2.50.1 (Apple Git-155)

Re: [PATCH v2 0/5] KVM: PPC: Handle CPU compatibility mode for nested guests

Posted by Ritesh Harjani (IBM) 4 weeks, 1 day ago

Hi Amit,

Amit Machhiwal <amachhiw@linux.ibm.com> writes:

> On POWER systems, newer processor generations can operate in compatibility
> modes corresponding to earlier generations (e.g., a Power11 system running
> in Power10 compatibility mode). In such cases, the effective CPU level
> exposed to guests differs from the physical processor generation.
>
> This creates a problem for nested virtualization. When booting a nested KVM
> guest (L2) inside a host KVM guest (L1) running in a compatibility mode,
> userspace (e.g., QEMU) may derive the CPU model from the raw hardware PVR
> and attempt to configure the nested guest accordingly. However, the L1
> partition is constrained by the compatibility level negotiated with the
> hypervisor (L0), and requests exceeding that level are rejected, leading to
> guest boot failures such as:
>
>   KVM-NESTEDv2: couldn't set guest wide elements
>
> This series addresses the issue in two steps:
>
> 1. Detect and reject invalid compatibility requests early in KVM to avoid
>    late failures.
>
> 2. Provide a mechanism for userspace to query the effective CPU
>    compatibility modes supported by the host, so it can select an
>    appropriate CPU model for nested guests.
>

Do we really need to add a uapi change for this? Tools like Qemu can
read the device tree info of the host, isn't it?

> To achieve this, the series introduces a new KVM capability and ioctl
> (KVM_CAP_PPC_COMPAT_CAPS / KVM_PPC_GET_COMPAT_CAPS) that expose the
> compatibility modes supported by the host.
>
> The implementation supports both:
>
>   - PowerVM (nested API v2), where compatibility information is obtained
>     via the H_GUEST_GET_CAPABILITIES hypercall.
>   - PowerNV (nested API v1), where compatibility is derived from the device
>     tree ("cpu-version") representing the effective processor compatibility
>     level.

See there you go, for PowerNV if this info is provided in the device
tree, then Qemu could as well just read that info, no?

... yup, kvmppc_read_int_dt() can do that I guess.

So, my request is, can we look into this to see, if there is a possible
alternative to this? maybe we already have a mechanism which Qemu could
use to get this info already?

btw - I haven't given a full read of the patch series, but reading the
cover letter, I felt  we should atleast add this info to the cover
letter on, why a uapi change is really needed here, why can't the
existing alternatives work for us. 

-ritesh

>
> This allows userspace (e.g., QEMU) to select a CPU model consistent with
> the host compatibility mode, avoiding mismatches and enabling successful
> nested guest boot.
>

Re: [PATCH v2 0/5] KVM: PPC: Handle CPU compatibility mode for nested guests

Posted by Amit Machhiwal 4 weeks, 1 day ago

Hi Ritesh,

Thanks for taking a look at this series. Please find my comment inline below:

On 2026/05/14 08:49 AM, Ritesh Harjani wrote:
> 
> Hi Amit,
> 
> Amit Machhiwal <amachhiw@linux.ibm.com> writes:
> 
> > On POWER systems, newer processor generations can operate in compatibility
> > modes corresponding to earlier generations (e.g., a Power11 system running
> > in Power10 compatibility mode). In such cases, the effective CPU level
> > exposed to guests differs from the physical processor generation.
> >
> > This creates a problem for nested virtualization. When booting a nested KVM
> > guest (L2) inside a host KVM guest (L1) running in a compatibility mode,
> > userspace (e.g., QEMU) may derive the CPU model from the raw hardware PVR
> > and attempt to configure the nested guest accordingly. However, the L1
> > partition is constrained by the compatibility level negotiated with the
> > hypervisor (L0), and requests exceeding that level are rejected, leading to
> > guest boot failures such as:
> >
> >   KVM-NESTEDv2: couldn't set guest wide elements
> >
> > This series addresses the issue in two steps:
> >
> > 1. Detect and reject invalid compatibility requests early in KVM to avoid
> >    late failures.
> >
> > 2. Provide a mechanism for userspace to query the effective CPU
> >    compatibility modes supported by the host, so it can select an
> >    appropriate CPU model for nested guests.
> >
>
> Do we really need to add a uapi change for this? Tools like Qemu can
> read the device tree info of the host, isn't it?

While cpu-version is available in /proc/device-tree/cpus/<cpu#>/cpu-version on
both L1 booted on PowerNV and PowerVM LPARs, I believe the UAPI change is still
preferable for several reasons:

1. We would want to rely on the capabilities negotiated with pHYP (L0) in KVM on
   PowerVM case instead of device tree property. Also, the cpu-version property
   only depicts the current compat mode host (L1) is booted in but doesn't
   really point to what all compat modes are supported for the nested guest
   (L2).

2. procfs dependency: Not all systems run with procfs enabled (CONFIG_PROC_FS is
   optional). For example, minimal configurations (like buildroot) might disable
   it. The KVM ioctl works regardless of procfs availability since it accesses
   kernel data structures directly.

3. Kernel validation: The kernel validates and normalizes the compatibility
   information. For example, patch 1 adds validation logic that rejects invalid
   compatibility requests early. The ioctl ensures userspace gets validated,
   consistent data.

4. Abstraction & stability: While /proc/device-tree works today, it's an
   implementation detail. The UAPI provides a stable interface that won't break
   if the underlying mechanism changes.

5. Semantic clarity: KVM_PPC_GET_COMPAT_CAPS clearly expresses what
   compatibility modes can I use for KVM guests vs. parsing device tree which
   requires understanding the semantic meaning of cpu-version.

>
> > To achieve this, the series introduces a new KVM capability and ioctl
> > (KVM_CAP_PPC_COMPAT_CAPS / KVM_PPC_GET_COMPAT_CAPS) that expose the
> > compatibility modes supported by the host.
> >
> > The implementation supports both:
> >
> >   - PowerVM (nested API v2), where compatibility information is obtained
> >     via the H_GUEST_GET_CAPABILITIES hypercall.
> >   - PowerNV (nested API v1), where compatibility is derived from the device
> >     tree ("cpu-version") representing the effective processor compatibility
> >     level.
> 
> See there you go, for PowerNV if this info is provided in the device
> tree, then Qemu could as well just read that info, no?
>
> ... yup, kvmppc_read_int_dt() can do that I guess.
> 
> So, my request is, can we look into this to see, if there is a possible
> alternative to this? maybe we already have a mechanism which Qemu could
> use to get this info already?

You're right that QEMU could read the device tree from procfs. We had discussed
this approach internally as well. However, we believe the UAPI approach offers
additional benefits and looks more robust and future proof as outlined above.

> 
> btw - I haven't given a full read of the patch series, but reading the
> cover letter, I felt  we should atleast add this info to the cover
> letter on, why a uapi change is really needed here, why can't the
> existing alternatives work for us.

I have described above why we did the UAPI change for the approach followed in
this series. Could you please suggest what else can be added?

Thanks,
Amit

> -ritesh
> 
> >
> > This allows userspace (e.g., QEMU) to select a CPU model consistent with
> > the host compatibility mode, avoiding mismatches and enabling successful
> > nested guest boot.
> >

Re: [PATCH v2 0/5] KVM: PPC: Handle CPU compatibility mode for nested guests

Posted by Anushree Mathur 4 weeks ago


On 13/05/26 3:37 PM, Amit Machhiwal wrote:
> On POWER systems, newer processor generations can operate in compatibility
> modes corresponding to earlier generations (e.g., a Power11 system running
> in Power10 compatibility mode). In such cases, the effective CPU level
> exposed to guests differs from the physical processor generation.
>
> This creates a problem for nested virtualization. When booting a nested KVM
> guest (L2) inside a host KVM guest (L1) running in a compatibility mode,
> userspace (e.g., QEMU) may derive the CPU model from the raw hardware PVR
> and attempt to configure the nested guest accordingly. However, the L1
> partition is constrained by the compatibility level negotiated with the
> hypervisor (L0), and requests exceeding that level are rejected, leading to
> guest boot failures such as:
>
>    KVM-NESTEDv2: couldn't set guest wide elements
>
> This series addresses the issue in two steps:
>
> 1. Detect and reject invalid compatibility requests early in KVM to avoid
>     late failures.
>
> 2. Provide a mechanism for userspace to query the effective CPU
>     compatibility modes supported by the host, so it can select an
>     appropriate CPU model for nested guests.
>
> To achieve this, the series introduces a new KVM capability and ioctl
> (KVM_CAP_PPC_COMPAT_CAPS / KVM_PPC_GET_COMPAT_CAPS) that expose the
> compatibility modes supported by the host.
>
> The implementation supports both:
>
>    - PowerVM (nested API v2), where compatibility information is obtained
>      via the H_GUEST_GET_CAPABILITIES hypercall.
>    - PowerNV (nested API v1), where compatibility is derived from the device
>      tree ("cpu-version") representing the effective processor compatibility
>      level.
>
> This allows userspace (e.g., QEMU) to select a CPU model consistent with
> the host compatibility mode, avoiding mismatches and enabling successful
> nested guest boot.
>
> Changes in v2:
>    - Squashed patches 2 and 3 from v1 (capability introduction and ioctl
>      wiring) into a single patch for better logical grouping
>    - Changed kvm_ppc_compat_caps.flags from __u32 to __u64 for consistency
>      and future extensibility
>    - Addressed other review comments
>    - Improved commit messages with clearer explanations of the changes
>
> Patch summary:
>    [1/5] Validate arch_compat against host compatibility mode
>    [2/5] Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
>    [3/5] Implement capability retrieval for PowerVM (API v2)
>    [4/5] Add PowerNV support (API v1)
>    [5/5] Document the new ioctl
>
> Tested on:
>    - Power11 pSeries LPAR in Power10 compatibility mode (nested API v2)
>    - Power10 PowerNV system (and QEMU TCG PowerNV 11) with nested
>      virtualization (API v1) with various combinations of KVM L1/L2 guests
>      in various supported compatibility modes.
>
> With this series, nested guests boot successfully in configurations where
> they previously failed due to compatibility mismatches.
>
> Related QEMU series:
>    A corresponding QEMU series adds support for querying and using these
>    compatibility capabilities when configuring nested KVM guests:
>    https://lore.kernel.org/all/20260502140021.69712-1-amachhiw@linux.ibm.com/
>
> v1: https://lore.kernel.org/linuxppc-dev/20260430054906.94431-1-amachhiw@linux.ibm.com/
>
> Amit Machhiwal (5):
>    KVM: PPC: Book3S HV: Validate arch_compat against host compatibility
>      mode
>    KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
>    KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM
>      on PowerVM
>    KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM
>      on PowerNV
>    KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
>
>   Documentation/virt/kvm/api.rst      | 35 ++++++++++++++++
>   arch/powerpc/include/asm/kvm_ppc.h  |  1 +
>   arch/powerpc/include/uapi/asm/kvm.h |  6 +++
>   arch/powerpc/kvm/book3s_hv.c        | 63 +++++++++++++++++++++++++++++
>   arch/powerpc/kvm/powerpc.c          | 21 ++++++++++
>   include/uapi/linux/kvm.h            |  4 ++
>   6 files changed, 130 insertions(+)
>
>
> base-commit: 1d5dcaa3bd65f2e8c9baa14a393d3a2dc5db7524

Hi Amit,
I tried booting up a guest on P11 lpar booted with P10 compat mode 
applying your patch along with the qemu patch series and it has been 
working perfectly fine.

Host lscpu:

lscpu
Architecture:                ppc64le
   Byte Order:                Little Endian
CPU(s):                      80
   On-line CPU(s) list:       0-79
Model name:                  POWER10 (architected), altivec supported


Guest lscpu:

lscpu
Architecture:                ppc64le
   Byte Order:                Little Endian
CPU(s):                      10
   On-line CPU(s) list:       0-9
Model name:                  POWER10 (architected), altivec supported

Feel free to add :

Tested-by: Anushree Mathur <anushree.mathur@linux.ibm.com>

Thank you!
Anushree Mathur