Documentation/virt/kvm/api.rst | 23 ++++++++ arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 17 ++++++ arch/arm64/kvm/sys_regs.c | 26 ++++++++++ include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/arm64/native_cache_config.c | 52 +++++++++++++++++++ 7 files changed, 121 insertions(+) create mode 100644 tools/testing/selftests/kvm/arm64/native_cache_config.c
From: David Woodhouse <dwmw@amazon.co.uk>
Commit 7af0c2534f4c5 ("KVM: arm64: Normalize cache configuration")
fabricates CLIDR_EL1 and CCSIDR_EL1 values instead of using the real
hardware values. While this provides consistent values across
heterogeneous CPUs, it does cause visible changes in the CPU model
exposed to guests.
The commit claims that userspace can restore the original values, but
there is no way for userspace to obtain the real CLIDR_EL1 register
value — it is not fully reconstructible from sysfs, which lacks the
LoC, LoUU, and LoUIS fields.
Add a per-vcpu KVM_CAP_ARM_NATIVE_CACHE_CONFIG capability that reads
the real CLIDR_EL1 and all CCSIDR_EL1 values from the current physical
CPU and sets them on the vcpu.
This allows hypervisors to present the real hardware cache configuration
to guests, which is important for consistency of the environment across
kernel versions and for migration compatibility with hosts running
older kernels that exposed the real values.
Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
Documentation/virt/kvm/api.rst | 23 ++++++++
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/arm.c | 17 ++++++
arch/arm64/kvm/sys_regs.c | 26 ++++++++++
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/arm64/native_cache_config.c | 52 +++++++++++++++++++
7 files changed, 121 insertions(+)
create mode 100644 tools/testing/selftests/kvm/arm64/native_cache_config.c
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e3b3bd9edeec..ee47dc07ceac 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8930,6 +8930,29 @@ no-op.
``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can be disabled.
+7.48 KVM_CAP_ARM_NATIVE_CACHE_CONFIG
+-------------------------------------
+
+:Architecture: arm64
+:Target: vcpu
+:Parameters: none
+:Returns: 0 on success, -ENOMEM on allocation failure, -EINVAL if
+ args[0] or flags are non-zero.
+
+This per-vcpu capability reads the real CLIDR_EL1 and CCSIDR_EL1 values
+from the physical CPU on which the ioctl is executed, and sets them on
+the vcpu. This replaces the fabricated cache configuration that KVM
+provides by default.
+
+The caller should ensure the vcpu thread is pinned to the desired
+physical CPU before invoking this capability, so that the correct cache
+topology is captured. On heterogeneous systems, different physical CPUs
+may have different cache configurations.
+
+After this capability is enabled, the vcpu's CLIDR_EL1 and CCSIDR_EL1
+values can still be overridden individually via ``KVM_SET_ONE_REG`` and
+the ``KVM_REG_ARM_DEMUX`` interface.
+
8. Other capabilities.
======================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a1bb025c641f..c9713a472c47 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1296,6 +1296,7 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm);
void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
int __init kvm_sys_reg_table_init(void);
+int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu);
struct sys_reg_desc;
int __init populate_sysreg_config(const struct sys_reg_desc *sr,
unsigned int idx);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 326a99fea753..579583e8dc5c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -393,6 +393,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_DISABLE_EXITS:
r = KVM_ARM_DISABLE_VALID_EXITS;
break;
+ case KVM_CAP_ARM_NATIVE_CACHE_CONFIG:
+ case KVM_CAP_ENABLE_CAP:
+ r = 1;
+ break;
case KVM_CAP_SET_GUEST_DEBUG2:
return KVM_GUESTDBG_VALID_MASK;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
@@ -1793,6 +1797,19 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init);
break;
}
+ case KVM_ENABLE_CAP: {
+ struct kvm_enable_cap cap;
+
+ r = -EFAULT;
+ if (copy_from_user(&cap, argp, sizeof(cap)))
+ break;
+
+ r = -EINVAL;
+ if (cap.cap == KVM_CAP_ARM_NATIVE_CACHE_CONFIG &&
+ !cap.args[0] && !cap.flags)
+ r = kvm_vcpu_set_native_cache_config(vcpu);
+ break;
+ }
case KVM_SET_ONE_REG:
case KVM_GET_ONE_REG: {
struct kvm_one_reg reg;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1b4cacb6e918..c19d84e48f8b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -484,6 +484,32 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
return 0;
}
+int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu)
+{
+ u32 csselr;
+
+ if (!vcpu->arch.ccsidr) {
+ vcpu->arch.ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32),
+ GFP_KERNEL_ACCOUNT);
+ if (!vcpu->arch.ccsidr)
+ return -ENOMEM;
+ }
+
+ local_irq_disable();
+
+ __vcpu_assign_sys_reg(vcpu, CLIDR_EL1, read_sysreg(clidr_el1));
+
+ for (csselr = 0; csselr < CSSELR_MAX; csselr++) {
+ write_sysreg(csselr, csselr_el1);
+ isb();
+ vcpu->arch.ccsidr[csselr] = read_sysreg(ccsidr_el1);
+ }
+
+ local_irq_enable();
+
+ return 0;
+}
+
static bool access_rw(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 694cf699ed0a..2d8bbb4dd69b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -995,6 +995,7 @@ struct kvm_enable_cap {
#define KVM_CAP_S390_USER_OPEREXEC 246
#define KVM_CAP_S390_KEYOP 247
#define KVM_CAP_ARM_DISABLE_EXITS 248
+#define KVM_CAP_ARM_NATIVE_CACHE_CONFIG 249
struct kvm_irq_routing_irqchip {
__u32 irqchip;
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index ad9f9fa181a1..8e05cc8ec204 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -182,6 +182,7 @@ TEST_GEN_PROGS_arm64 += arm64/vgic_group_iidr
TEST_GEN_PROGS_arm64 += arm64/vgic_group_v2
TEST_GEN_PROGS_arm64 += arm64/vpmu_counter_access
TEST_GEN_PROGS_arm64 += arm64/no-vgic-v3
+TEST_GEN_PROGS_arm64 += arm64/native_cache_config
TEST_GEN_PROGS_arm64 += arm64/idreg-idst
TEST_GEN_PROGS_arm64 += arm64/kvm-uuid
TEST_GEN_PROGS_arm64 += access_tracking_perf_test
diff --git a/tools/testing/selftests/kvm/arm64/native_cache_config.c b/tools/testing/selftests/kvm/arm64/native_cache_config.c
new file mode 100644
index 000000000000..4afea32f2348
--- /dev/null
+++ b/tools/testing/selftests/kvm/arm64/native_cache_config.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * native_cache_config.c - Test KVM_CAP_ARM_NATIVE_CACHE_CONFIG
+ *
+ * Verify that enabling the capability populates the vcpu's CLIDR_EL1
+ * with the real hardware value instead of the fabricated default.
+ */
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define CLIDR_EL1_REG_ID ARM64_SYS_REG(3, 1, 0, 0, 1)
+
+int main(int argc, char *argv[])
+{
+ struct kvm_enable_cap cap = {
+ .cap = KVM_CAP_ARM_NATIVE_CACHE_CONFIG,
+ };
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+ uint64_t clidr_before, clidr_after;
+ int r;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_NATIVE_CACHE_CONFIG));
+
+ vm = vm_create_with_one_vcpu(&vcpu, NULL);
+
+ /* Read the fabricated CLIDR_EL1 */
+ clidr_before = vcpu_get_reg(vcpu, CLIDR_EL1_REG_ID);
+
+ /* Enable native cache config */
+ r = __vcpu_ioctl(vcpu, KVM_ENABLE_CAP, &cap);
+ TEST_ASSERT(!r, "KVM_ENABLE_CAP failed: %d (errno %d)", r, errno);
+
+ /* Read CLIDR_EL1 again */
+ clidr_after = vcpu_get_reg(vcpu, CLIDR_EL1_REG_ID);
+
+ pr_info("CLIDR_EL1 before: 0x%016lx\n", clidr_before);
+ pr_info("CLIDR_EL1 after: 0x%016lx\n", clidr_after);
+
+ TEST_ASSERT(clidr_after != 0,
+ "CLIDR_EL1 should not be zero after native config");
+
+ /* Invalid: non-zero args should fail */
+ cap.args[0] = 1;
+ r = __vcpu_ioctl(vcpu, KVM_ENABLE_CAP, &cap);
+ TEST_ASSERT(r == -1 && errno == EINVAL,
+ "Non-zero args should fail: got %d errno %d", r, errno);
+
+ kvm_vm_free(vm);
+ return 0;
+}
--
2.43.0
On Thu, 09 Apr 2026 16:29:06 +0100,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> [1 <text/plain; UTF-8 (quoted-printable)>]
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Commit 7af0c2534f4c5 ("KVM: arm64: Normalize cache configuration")
> fabricates CLIDR_EL1 and CCSIDR_EL1 values instead of using the real
> hardware values. While this provides consistent values across
> heterogeneous CPUs, it does cause visible changes in the CPU model
> exposed to guests.
>
> The commit claims that userspace can restore the original values, but
> there is no way for userspace to obtain the real CLIDR_EL1 register
> value — it is not fully reconstructible from sysfs, which lacks the
> LoC, LoUU, and LoUIS fields.
>
> Add a per-vcpu KVM_CAP_ARM_NATIVE_CACHE_CONFIG capability that reads
> the real CLIDR_EL1 and all CCSIDR_EL1 values from the current physical
> CPU and sets them on the vcpu.
>
> This allows hypervisors to present the real hardware cache configuration
> to guests, which is important for consistency of the environment across
> kernel versions and for migration compatibility with hosts running
> older kernels that exposed the real values.
>
> Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration")
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> Documentation/virt/kvm/api.rst | 23 ++++++++
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/arm.c | 17 ++++++
> arch/arm64/kvm/sys_regs.c | 26 ++++++++++
> include/uapi/linux/kvm.h | 1 +
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../selftests/kvm/arm64/native_cache_config.c | 52 +++++++++++++++++++
> 7 files changed, 121 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/arm64/native_cache_config.c
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index e3b3bd9edeec..ee47dc07ceac 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8930,6 +8930,29 @@ no-op.
>
> ``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can be disabled.
>
> +7.48 KVM_CAP_ARM_NATIVE_CACHE_CONFIG
> +-------------------------------------
> +
> +:Architecture: arm64
> +:Target: vcpu
> +:Parameters: none
> +:Returns: 0 on success, -ENOMEM on allocation failure, -EINVAL if
> + args[0] or flags are non-zero.
> +
> +This per-vcpu capability reads the real CLIDR_EL1 and CCSIDR_EL1 values
> +from the physical CPU on which the ioctl is executed, and sets them on
> +the vcpu. This replaces the fabricated cache configuration that KVM
> +provides by default.
> +
> +The caller should ensure the vcpu thread is pinned to the desired
> +physical CPU before invoking this capability, so that the correct cache
> +topology is captured. On heterogeneous systems, different physical CPUs
> +may have different cache configurations.
> +
> +After this capability is enabled, the vcpu's CLIDR_EL1 and CCSIDR_EL1
> +values can still be overridden individually via ``KVM_SET_ONE_REG`` and
> +the ``KVM_REG_ARM_DEMUX`` interface.
> +
> 8. Other capabilities.
> ======================
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a1bb025c641f..c9713a472c47 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1296,6 +1296,7 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm);
> void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
>
> int __init kvm_sys_reg_table_init(void);
> +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu);
> struct sys_reg_desc;
> int __init populate_sysreg_config(const struct sys_reg_desc *sr,
> unsigned int idx);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 326a99fea753..579583e8dc5c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -393,6 +393,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_ARM_DISABLE_EXITS:
> r = KVM_ARM_DISABLE_VALID_EXITS;
> break;
> + case KVM_CAP_ARM_NATIVE_CACHE_CONFIG:
> + case KVM_CAP_ENABLE_CAP:
> + r = 1;
> + break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> return KVM_GUESTDBG_VALID_MASK;
> case KVM_CAP_ARM_SET_DEVICE_ADDR:
> @@ -1793,6 +1797,19 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> r = kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init);
> break;
> }
> + case KVM_ENABLE_CAP: {
> + struct kvm_enable_cap cap;
> +
> + r = -EFAULT;
> + if (copy_from_user(&cap, argp, sizeof(cap)))
> + break;
> +
> + r = -EINVAL;
> + if (cap.cap == KVM_CAP_ARM_NATIVE_CACHE_CONFIG &&
> + !cap.args[0] && !cap.flags)
> + r = kvm_vcpu_set_native_cache_config(vcpu);
> + break;
> + }
> case KVM_SET_ONE_REG:
> case KVM_GET_ONE_REG: {
> struct kvm_one_reg reg;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1b4cacb6e918..c19d84e48f8b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -484,6 +484,32 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
> return 0;
> }
>
> +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu)
> +{
> + u32 csselr;
> +
> + if (!vcpu->arch.ccsidr) {
> + vcpu->arch.ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32),
> + GFP_KERNEL_ACCOUNT);
> + if (!vcpu->arch.ccsidr)
> + return -ENOMEM;
> + }
Well, no.
The moment you decide to expose all of the host's crap, you really
need to put everything on the table. It means fully handling
FEAT_CCIDX, which we were careful not to expose anywhere because it is
a terrible idea.
So CCSIDR_EL1 becomes a 64bit value, is complemented with CCSIDR2_EL1,
and needs to be advertised as such through the idregs. CCSIDR2_EL1
must be exposed to userspace and made writable, but only if the
feature exists. You also need to conditionally undef CCSIDR2_EL1
depending on the VM configuration.
The "amusing" thing is that, before we introduced the cache hierarchy
sanitisation, we would happily report something completely senseless
on CCIDX hardware. It didn't matter, because nobody can make any use
of that information, apart from EL3 firmware.
But if you want this to be a reflection of the underlying HW, then so
be it.
> + for (csselr = 0; csselr < CSSELR_MAX; csselr++) {
> + write_sysreg(csselr, csselr_el1);
> + isb();
> + vcpu->arch.ccsidr[csselr] = read_sysreg(ccsidr_el1);
That's not how the selection register works. CLIDR_EL1 tells you what
each cache level is (Instructions, Data, Unified, Tags), and that must
be combined with the index (which doesn't start at bit 0).
I also wonder how you reconcile not exposing MTE when the cache
hierarchy indicate support for tags. That clearly contradicts "report
what the HW has".
M.
--
Without deviation from the norm, progress is not possible.
On Thu, 2026-04-09 at 18:07 +0100, Marc Zyngier wrote:
> On Thu, 09 Apr 2026 16:29:06 +0100,
> David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > [1 <text/plain; UTF-8 (quoted-printable)>]
> > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > Commit 7af0c2534f4c5 ("KVM: arm64: Normalize cache configuration")
> > fabricates CLIDR_EL1 and CCSIDR_EL1 values instead of using the real
> > hardware values. While this provides consistent values across
> > heterogeneous CPUs, it does cause visible changes in the CPU model
> > exposed to guests.
> >
> > The commit claims that userspace can restore the original values, but
> > there is no way for userspace to obtain the real CLIDR_EL1 register
> > value — it is not fully reconstructible from sysfs, which lacks the
> > LoC, LoUU, and LoUIS fields.
> >
> > Add a per-vcpu KVM_CAP_ARM_NATIVE_CACHE_CONFIG capability that reads
> > the real CLIDR_EL1 and all CCSIDR_EL1 values from the current physical
> > CPU and sets them on the vcpu.
> >
> > This allows hypervisors to present the real hardware cache configuration
> > to guests, which is important for consistency of the environment across
> > kernel versions and for migration compatibility with hosts running
> > older kernels that exposed the real values.
> >
> > Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration")
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> > Documentation/virt/kvm/api.rst | 23 ++++++++
> > arch/arm64/include/asm/kvm_host.h | 1 +
> > arch/arm64/kvm/arm.c | 17 ++++++
> > arch/arm64/kvm/sys_regs.c | 26 ++++++++++
> > include/uapi/linux/kvm.h | 1 +
> > tools/testing/selftests/kvm/Makefile.kvm | 1 +
> > .../selftests/kvm/arm64/native_cache_config.c | 52 +++++++++++++++++++
> > 7 files changed, 121 insertions(+)
> > create mode 100644 tools/testing/selftests/kvm/arm64/native_cache_config.c
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index e3b3bd9edeec..ee47dc07ceac 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8930,6 +8930,29 @@ no-op.
> >
> > ``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can be disabled.
> >
> > +7.48 KVM_CAP_ARM_NATIVE_CACHE_CONFIG
> > +-------------------------------------
> > +
> > +:Architecture: arm64
> > +:Target: vcpu
> > +:Parameters: none
> > +:Returns: 0 on success, -ENOMEM on allocation failure, -EINVAL if
> > + args[0] or flags are non-zero.
> > +
> > +This per-vcpu capability reads the real CLIDR_EL1 and CCSIDR_EL1 values
> > +from the physical CPU on which the ioctl is executed, and sets them on
> > +the vcpu. This replaces the fabricated cache configuration that KVM
> > +provides by default.
> > +
> > +The caller should ensure the vcpu thread is pinned to the desired
> > +physical CPU before invoking this capability, so that the correct cache
> > +topology is captured. On heterogeneous systems, different physical CPUs
> > +may have different cache configurations.
> > +
> > +After this capability is enabled, the vcpu's CLIDR_EL1 and CCSIDR_EL1
> > +values can still be overridden individually via ``KVM_SET_ONE_REG`` and
> > +the ``KVM_REG_ARM_DEMUX`` interface.
> > +
> > 8. Other capabilities.
> > ======================
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index a1bb025c641f..c9713a472c47 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1296,6 +1296,7 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm);
> > void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
> >
> > int __init kvm_sys_reg_table_init(void);
> > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu);
> > struct sys_reg_desc;
> > int __init populate_sysreg_config(const struct sys_reg_desc *sr,
> > unsigned int idx);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 326a99fea753..579583e8dc5c 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -393,6 +393,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > case KVM_CAP_ARM_DISABLE_EXITS:
> > r = KVM_ARM_DISABLE_VALID_EXITS;
> > break;
> > + case KVM_CAP_ARM_NATIVE_CACHE_CONFIG:
> > + case KVM_CAP_ENABLE_CAP:
> > + r = 1;
> > + break;
> > case KVM_CAP_SET_GUEST_DEBUG2:
> > return KVM_GUESTDBG_VALID_MASK;
> > case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > @@ -1793,6 +1797,19 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> > r = kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init);
> > break;
> > }
> > + case KVM_ENABLE_CAP: {
> > + struct kvm_enable_cap cap;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&cap, argp, sizeof(cap)))
> > + break;
> > +
> > + r = -EINVAL;
> > + if (cap.cap == KVM_CAP_ARM_NATIVE_CACHE_CONFIG &&
> > + !cap.args[0] && !cap.flags)
> > + r = kvm_vcpu_set_native_cache_config(vcpu);
> > + break;
> > + }
> > case KVM_SET_ONE_REG:
> > case KVM_GET_ONE_REG: {
> > struct kvm_one_reg reg;
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 1b4cacb6e918..c19d84e48f8b 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -484,6 +484,32 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
> > return 0;
> > }
> >
> > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu)
> > +{
> > + u32 csselr;
> > +
> > + if (!vcpu->arch.ccsidr) {
> > + vcpu->arch.ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32),
> > + GFP_KERNEL_ACCOUNT);
> > + if (!vcpu->arch.ccsidr)
> > + return -ENOMEM;
> > + }
>
> Well, no.
>
> The moment you decide to expose all of the host's crap, you really
> need to put everything on the table. It means fully handling
> FEAT_CCIDX, which we were careful not to expose anywhere because it is
> a terrible idea.
The intent here is not to "expose all of the host's crap", but to
maintain compatibility with what the kernel did before commit
7af0c2534f4c. No need to expose FEAT_CCIDX.
> > + for (csselr = 0; csselr < CSSELR_MAX; csselr++) {
> > + write_sysreg(csselr, csselr_el1);
> > + isb();
> > + vcpu->arch.ccsidr[csselr] = read_sysreg(ccsidr_el1);
>
> That's not how the selection register works. CLIDR_EL1 tells you what
> each cache level is (Instructions, Data, Unified, Tags), and that must
> be combined with the index (which doesn't start at bit 0).
Ack, thanks. I'll rework that based on the old is_valid_cache()
function.
> I also wonder how you reconcile not exposing MTE when the cache
> hierarchy indicate support for tags. That clearly contradicts "report
> what the HW has".
If that was an issue then it would already have been an issue before
commit 7af0c2534f4 (and in kernels with that commit reverted), hosting
millions of guests today.
This isn't about introducing *new* behaviour; it's about allowing the
existing established behaviour to be maintained so that we can have a
*managed* transition to the new model (for new launches) rather than an
unconditional uncontrolled change as the kernel gets upgraded.
On Thu, 09 Apr 2026 18:49:09 +0100,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> [1 <text/plain; UTF-8 (quoted-printable)>]
> On Thu, 2026-04-09 at 18:07 +0100, Marc Zyngier wrote:
> > On Thu, 09 Apr 2026 16:29:06 +0100,
> > David Woodhouse <dwmw2@infradead.org> wrote:
> > >
> > > [1 <text/plain; UTF-8 (quoted-printable)>]
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> > >
> > > Commit 7af0c2534f4c5 ("KVM: arm64: Normalize cache configuration")
> > > fabricates CLIDR_EL1 and CCSIDR_EL1 values instead of using the real
> > > hardware values. While this provides consistent values across
> > > heterogeneous CPUs, it does cause visible changes in the CPU model
> > > exposed to guests.
> > >
> > > The commit claims that userspace can restore the original values, but
> > > there is no way for userspace to obtain the real CLIDR_EL1 register
> > > value — it is not fully reconstructible from sysfs, which lacks the
> > > LoC, LoUU, and LoUIS fields.
> > >
> > > Add a per-vcpu KVM_CAP_ARM_NATIVE_CACHE_CONFIG capability that reads
> > > the real CLIDR_EL1 and all CCSIDR_EL1 values from the current physical
> > > CPU and sets them on the vcpu.
> > >
> > > This allows hypervisors to present the real hardware cache configuration
> > > to guests, which is important for consistency of the environment across
> > > kernel versions and for migration compatibility with hosts running
> > > older kernels that exposed the real values.
> > >
> > > Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration")
> > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > > ---
> > > Documentation/virt/kvm/api.rst | 23 ++++++++
> > > arch/arm64/include/asm/kvm_host.h | 1 +
> > > arch/arm64/kvm/arm.c | 17 ++++++
> > > arch/arm64/kvm/sys_regs.c | 26 ++++++++++
> > > include/uapi/linux/kvm.h | 1 +
> > > tools/testing/selftests/kvm/Makefile.kvm | 1 +
> > > .../selftests/kvm/arm64/native_cache_config.c | 52 +++++++++++++++++++
> > > 7 files changed, 121 insertions(+)
> > > create mode 100644 tools/testing/selftests/kvm/arm64/native_cache_config.c
> > >
> > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > index e3b3bd9edeec..ee47dc07ceac 100644
> > > --- a/Documentation/virt/kvm/api.rst
> > > +++ b/Documentation/virt/kvm/api.rst
> > > @@ -8930,6 +8930,29 @@ no-op.
> > >
> > > ``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can be disabled.
> > >
> > > +7.48 KVM_CAP_ARM_NATIVE_CACHE_CONFIG
> > > +-------------------------------------
> > > +
> > > +:Architecture: arm64
> > > +:Target: vcpu
> > > +:Parameters: none
> > > +:Returns: 0 on success, -ENOMEM on allocation failure, -EINVAL if
> > > + args[0] or flags are non-zero.
> > > +
> > > +This per-vcpu capability reads the real CLIDR_EL1 and CCSIDR_EL1 values
> > > +from the physical CPU on which the ioctl is executed, and sets them on
> > > +the vcpu. This replaces the fabricated cache configuration that KVM
> > > +provides by default.
> > > +
> > > +The caller should ensure the vcpu thread is pinned to the desired
> > > +physical CPU before invoking this capability, so that the correct cache
> > > +topology is captured. On heterogeneous systems, different physical CPUs
> > > +may have different cache configurations.
> > > +
> > > +After this capability is enabled, the vcpu's CLIDR_EL1 and CCSIDR_EL1
> > > +values can still be overridden individually via ``KVM_SET_ONE_REG`` and
> > > +the ``KVM_REG_ARM_DEMUX`` interface.
> > > +
> > > 8. Other capabilities.
> > > ======================
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index a1bb025c641f..c9713a472c47 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -1296,6 +1296,7 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm);
> > > void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
> > >
> > > int __init kvm_sys_reg_table_init(void);
> > > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu);
> > > struct sys_reg_desc;
> > > int __init populate_sysreg_config(const struct sys_reg_desc *sr,
> > > unsigned int idx);
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index 326a99fea753..579583e8dc5c 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -393,6 +393,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > case KVM_CAP_ARM_DISABLE_EXITS:
> > > r = KVM_ARM_DISABLE_VALID_EXITS;
> > > break;
> > > + case KVM_CAP_ARM_NATIVE_CACHE_CONFIG:
> > > + case KVM_CAP_ENABLE_CAP:
> > > + r = 1;
> > > + break;
> > > case KVM_CAP_SET_GUEST_DEBUG2:
> > > return KVM_GUESTDBG_VALID_MASK;
> > > case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > > @@ -1793,6 +1797,19 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> > > r = kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init);
> > > break;
> > > }
> > > + case KVM_ENABLE_CAP: {
> > > + struct kvm_enable_cap cap;
> > > +
> > > + r = -EFAULT;
> > > + if (copy_from_user(&cap, argp, sizeof(cap)))
> > > + break;
> > > +
> > > + r = -EINVAL;
> > > + if (cap.cap == KVM_CAP_ARM_NATIVE_CACHE_CONFIG &&
> > > + !cap.args[0] && !cap.flags)
> > > + r = kvm_vcpu_set_native_cache_config(vcpu);
> > > + break;
> > > + }
> > > case KVM_SET_ONE_REG:
> > > case KVM_GET_ONE_REG: {
> > > struct kvm_one_reg reg;
> > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > index 1b4cacb6e918..c19d84e48f8b 100644
> > > --- a/arch/arm64/kvm/sys_regs.c
> > > +++ b/arch/arm64/kvm/sys_regs.c
> > > @@ -484,6 +484,32 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
> > > return 0;
> > > }
> > >
> > > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu)
> > > +{
> > > + u32 csselr;
> > > +
> > > + if (!vcpu->arch.ccsidr) {
> > > + vcpu->arch.ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32),
> > > + GFP_KERNEL_ACCOUNT);
> > > + if (!vcpu->arch.ccsidr)
> > > + return -ENOMEM;
> > > + }
> >
> > Well, no.
> >
> > The moment you decide to expose all of the host's crap, you really
> > need to put everything on the table. It means fully handling
> > FEAT_CCIDX, which we were careful not to expose anywhere because it is
> > a terrible idea.
>
> The intent here is not to "expose all of the host's crap", but to
> maintain compatibility with what the kernel did before commit
> 7af0c2534f4c. No need to expose FEAT_CCIDX.
That's not optional. Without FEAT_CCIDX, the guest cannot interpret
the correct cache geometry.
>
> > > + for (csselr = 0; csselr < CSSELR_MAX; csselr++) {
> > > + write_sysreg(csselr, csselr_el1);
> > > + isb();
> > > + vcpu->arch.ccsidr[csselr] = read_sysreg(ccsidr_el1);
> >
> > That's not how the selection register works. CLIDR_EL1 tells you what
> > each cache level is (Instructions, Data, Unified, Tags), and that must
> > be combined with the index (which doesn't start at bit 0).
>
> Ack, thanks. I'll rework that based on the old is_valid_cache()
> function.
>
> > I also wonder how you reconcile not exposing MTE when the cache
> > hierarchy indicate support for tags. That clearly contradicts "report
> > what the HW has".
>
> If that was an issue then it would already have been an issue before
> commit 7af0c2534f4 (and in kernels with that commit reverted), hosting
> millions of guests today.
That only means you are doing a pretty bad job at supporting
guests. And yes, this is an issue for anything that expects to see
something meaningful in CCSIDR[]. The fact that none of your guests
hit that problem only means you're lacking coverage.
From what I can read, anything from Neoverse V1 is affected.
>
> This isn't about introducing *new* behaviour; it's about allowing the
> existing established behaviour to be maintained so that we can have a
> *managed* transition to the new model (for new launches) rather than an
> unconditional uncontrolled change as the kernel gets upgraded.
Then fully implement "show me the cache hierarchy", read it out, and
write it back with whatever level of brokenness you intend to inflict
on your guests.
But I'm not reintroducing this particular bug.
M.
--
Without deviation from the norm, progress is not possible.
On Thu, 2026-04-09 at 19:12 +0100, Marc Zyngier wrote: > > That only means you are doing a pretty bad job at supporting > guests. And yes, this is an issue for anything that expects to see > something meaningful in CCSIDR[]. The fact that none of your guests > hit that problem only means you're lacking coverage. I think we just have a different idea of what it means to do a good job of supporting guests. For me, the most important thing is never to randomly change stuff underneath a running guest, even if it's a "bug fix". Even fixing an egregious and obvious bug, like fixing the reported serial port type from the legacy 16550 to ACPI_DBG2_16550_WITH_GAS turned out to break FreeBSD, to pick just one recent example. Guest operating systems will often work with the environment they actually *had* already, regardless of the actual specifications or common sense. We don't have the luxury of just changing things underneath them and then saying "haha, well that should never have worked so screw you". Moving to the fixed model is obviously the plan, but it has to be done carefully and starting with new launches and new instance types and a rollback plan, not just randomly inflicted as an unexpected side-effect of a kernel upgrade. > From what I can read, anything from Neoverse V1 is affected. > > > > > This isn't about introducing *new* behaviour; it's about allowing the > > existing established behaviour to be maintained so that we can have a > > *managed* transition to the new model (for new launches) rather than an > > unconditional uncontrolled change as the kernel gets upgraded. > > Then fully implement "show me the cache hierarchy", read it out, and > write it back with whatever level of brokenness you intend to inflict > on your guests. Again, *compatibility* is the point here, not brokenness. > But I'm not reintroducing this particular bug. Fair enough. It's not like the x86 zoo; there are only a certain number of existing CPUs that I care about, that I need to enumerate. I can just hard-code the values for all of those in the VMM — the kernel does let me override the errantly-changed defaults, as the commit message said.
© 2016 - 2026 Red Hat, Inc.