There is one (multiplexed) CAP which can be used to create, populate and
then activate the realm.
Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
---
Changes since v9:
* Improvements to documentation.
* Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
Changes since v8:
* Minor improvements to documentation following review.
* Bump the magic numbers to avoid conflicts.
Changes since v7:
* Add documentation of new ioctls
* Bump the magic numbers to avoid conflicts
Changes since v6:
* Rename some of the symbols to make their usage clearer and avoid
repetition.
Changes from v5:
* Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
---
Documentation/virt/kvm/api.rst | 71 +++++++++++++++++++++++++++++++
arch/arm64/include/uapi/asm/kvm.h | 49 +++++++++++++++++++++
include/uapi/linux/kvm.h | 10 +++++
3 files changed, 130 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6aa40ee05a4a..69c0a9eba6c5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -3549,6 +3549,11 @@ Possible features:
Depends on KVM_CAP_ARM_EL2_E2H0.
KVM_ARM_VCPU_HAS_EL2 must also be set.
+ - KVM_ARM_VCPU_REC: Allocate a REC (Realm Execution Context) for this
+ VCPU. This must be specified on all VCPUs created in a Realm VM.
+ Depends on KVM_CAP_ARM_RME.
+ Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_REC).
+
4.83 KVM_ARM_PREFERRED_TARGET
-----------------------------
@@ -5122,6 +5127,7 @@ Recognised values for feature:
===== ===========================================
arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
+ arm64 KVM_ARM_VCPU_REC (requires KVM_CAP_ARM_RME)
===== ===========================================
Finalizes the configuration of the specified vcpu feature.
@@ -6476,6 +6482,30 @@ the capability to be present.
`flags` must currently be zero.
+4.144 KVM_ARM_VCPU_RMM_PSCI_COMPLETE
+------------------------------------
+
+:Capability: KVM_CAP_ARM_RME
+:Architectures: arm64
+:Type: vcpu ioctl
+:Parameters: struct kvm_arm_rmm_psci_complete (in)
+:Returns: 0 if successful, < 0 on error
+
+::
+
+ struct kvm_arm_rmm_psci_complete {
+ __u64 target_mpidr;
+ __u32 psci_status;
+ __u32 padding[3];
+ };
+
+Where PSCI functions are handled by user space, the RMM needs to be informed of
+the target of the operation using `target_mpidr`, along with the status
+(`psci_status`). The RMM v1.0 specification defines two functions that require
+this call: PSCI_CPU_ON and PSCI_AFFINITY_INFO.
+
+If the kernel is handling PSCI then this is done automatically and the VMM
+doesn't need to call this ioctl.
.. _kvm_run:
@@ -8662,6 +8692,47 @@ This capability indicate to the userspace whether a PFNMAP memory region
can be safely mapped as cacheable. This relies on the presence of
force write back (FWB) feature support on the hardware.
+7.44 KVM_CAP_ARM_RME
+--------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] provides an action, args[1] points to a structure in
+ memory for the action.
+:Returns: 0 on success, negative value on error
+
+Used to configure and set up the memory for a Realm. The available actions are:
+
+================================= =============================================
+ KVM_CAP_ARM_RME_CONFIG_REALM Takes struct arm_rme_config as args[1] and
+ configures realm parameters prior to it being
+ created.
+
+ Options are ARM_RME_CONFIG_RPV to set the
+ "Realm Personalization Value" and
+ ARM_RME_CONFIG_HASH_ALGO to set the hash
+ algorithm.
+
+ KVM_CAP_ARM_RME_CREATE_REALM Request the RMM to create the realm. The
+ realm's configuration parameters must be set
+ first.
+
+ KVM_CAP_ARM_RME_INIT_RIPAS_REALM Takes struct arm_rme_init_ripas as args[1]
+ and sets the RIPAS (Realm IPA State) to
+ RIPAS_RAM of a specified area of the realm's
+ IPA.
+
+ KVM_CAP_ARM_RME_POPULATE_REALM Takes struct arm_rme_populate_realm as
+ args[1] and populates a region of protected
+ address space by copying the data from the
+ shared alias.
+
+ KVM_CAP_ARM_RME_ACTIVATE_REALM Request the RMM to activate the realm. No
+ changes can be made to the Realm's populated
+ memory, IPA state, configuration parameters
+ or vCPU additions after this step.
+================================= =============================================
+
8. Other capabilities.
======================
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ed5f3892674c..9b5d67ecbc5e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
#define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */
#define KVM_ARM_VCPU_HAS_EL2 7 /* Support nested virtualization */
#define KVM_ARM_VCPU_HAS_EL2_E2H0 8 /* Limit NV support to E2H RES0 */
+#define KVM_ARM_VCPU_REC 9 /* VCPU REC state as part of Realm */
struct kvm_vcpu_init {
__u32 target;
@@ -429,6 +430,54 @@ enum {
#define KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES 3
#define KVM_DEV_ARM_ITS_CTRL_RESET 4
+/* KVM_CAP_ARM_RME on VM fd */
+#define KVM_CAP_ARM_RME_CONFIG_REALM 0
+#define KVM_CAP_ARM_RME_CREATE_REALM 1
+#define KVM_CAP_ARM_RME_INIT_RIPAS_REALM 2
+#define KVM_CAP_ARM_RME_POPULATE_REALM 3
+#define KVM_CAP_ARM_RME_ACTIVATE_REALM 4
+
+/* List of configuration items accepted for KVM_CAP_ARM_RME_CONFIG_REALM */
+#define ARM_RME_CONFIG_RPV 0
+#define ARM_RME_CONFIG_HASH_ALGO 1
+
+#define ARM_RME_CONFIG_HASH_ALGO_SHA256 0
+#define ARM_RME_CONFIG_HASH_ALGO_SHA512 1
+
+#define ARM_RME_CONFIG_RPV_SIZE 64
+
+struct arm_rme_config {
+ __u32 cfg;
+ union {
+ /* cfg == ARM_RME_CONFIG_RPV */
+ struct {
+ __u8 rpv[ARM_RME_CONFIG_RPV_SIZE];
+ };
+
+ /* cfg == ARM_RME_CONFIG_HASH_ALGO */
+ struct {
+ __u32 hash_algo;
+ };
+
+ /* Fix the size of the union */
+ __u8 reserved[256];
+ };
+};
+
+#define KVM_ARM_RME_POPULATE_FLAGS_MEASURE (1 << 0)
+struct arm_rme_populate_realm {
+ __u64 base;
+ __u64 size;
+ __u32 flags;
+ __u32 reserved[3];
+};
+
+struct arm_rme_init_ripas {
+ __u64 base;
+ __u64 size;
+ __u64 reserved[2];
+};
+
/* Device Control API on vcpu fd */
#define KVM_ARM_VCPU_PMU_V3_CTRL 0
#define KVM_ARM_VCPU_PMU_V3_IRQ 0
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0f0d49d2544..7dafb443368a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -962,6 +962,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_RISCV_MP_STATE_RESET 242
#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
+#define KVM_CAP_ARM_RME 244
struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1614,4 +1615,13 @@ struct kvm_pre_fault_memory {
__u64 padding[5];
};
+/* Available with KVM_CAP_ARM_RME, only for VMs with KVM_VM_TYPE_ARM_REALM */
+struct kvm_arm_rmm_psci_complete {
+ __u64 target_mpidr;
+ __u32 psci_status;
+ __u32 padding[3];
+};
+
+#define KVM_ARM_VCPU_RMM_PSCI_COMPLETE _IOW(KVMIO, 0xd6, struct kvm_arm_rmm_psci_complete)
+
#endif /* __LINUX_KVM_H */
--
2.43.0
On Wed, 20 Aug 2025 15:55:26 +0100,
Steven Price <steven.price@arm.com> wrote:
>
> There is one (multiplexed) CAP which can be used to create, populate and
> then activate the realm.
>
> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> ---
> Changes since v9:
> * Improvements to documentation.
> * Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
> Changes since v8:
> * Minor improvements to documentation following review.
> * Bump the magic numbers to avoid conflicts.
> Changes since v7:
> * Add documentation of new ioctls
> * Bump the magic numbers to avoid conflicts
> Changes since v6:
> * Rename some of the symbols to make their usage clearer and avoid
> repetition.
> Changes from v5:
> * Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
> KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
> ---
> Documentation/virt/kvm/api.rst | 71 +++++++++++++++++++++++++++++++
> arch/arm64/include/uapi/asm/kvm.h | 49 +++++++++++++++++++++
> include/uapi/linux/kvm.h | 10 +++++
> 3 files changed, 130 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 6aa40ee05a4a..69c0a9eba6c5 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -3549,6 +3549,11 @@ Possible features:
> Depends on KVM_CAP_ARM_EL2_E2H0.
> KVM_ARM_VCPU_HAS_EL2 must also be set.
>
> + - KVM_ARM_VCPU_REC: Allocate a REC (Realm Execution Context) for this
> + VCPU. This must be specified on all VCPUs created in a Realm VM.
> + Depends on KVM_CAP_ARM_RME.
> + Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_REC).
> +
> 4.83 KVM_ARM_PREFERRED_TARGET
> -----------------------------
>
> @@ -5122,6 +5127,7 @@ Recognised values for feature:
>
> ===== ===========================================
> arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
> + arm64 KVM_ARM_VCPU_REC (requires KVM_CAP_ARM_RME)
> ===== ===========================================
>
> Finalizes the configuration of the specified vcpu feature.
> @@ -6476,6 +6482,30 @@ the capability to be present.
>
> `flags` must currently be zero.
>
> +4.144 KVM_ARM_VCPU_RMM_PSCI_COMPLETE
> +------------------------------------
> +
> +:Capability: KVM_CAP_ARM_RME
> +:Architectures: arm64
> +:Type: vcpu ioctl
> +:Parameters: struct kvm_arm_rmm_psci_complete (in)
> +:Returns: 0 if successful, < 0 on error
> +
> +::
> +
> + struct kvm_arm_rmm_psci_complete {
> + __u64 target_mpidr;
> + __u32 psci_status;
> + __u32 padding[3];
> + };
> +
> +Where PSCI functions are handled by user space, the RMM needs to be informed of
> +the target of the operation using `target_mpidr`, along with the status
> +(`psci_status`). The RMM v1.0 specification defines two functions that require
> +this call: PSCI_CPU_ON and PSCI_AFFINITY_INFO.
> +
> +If the kernel is handling PSCI then this is done automatically and the VMM
> +doesn't need to call this ioctl.
Why should userspace involved in this? Why can't this be a
notification that the host delivers to the RMM when the vcpu is about
to run?
>
> .. _kvm_run:
>
> @@ -8662,6 +8692,47 @@ This capability indicate to the userspace whether a PFNMAP memory region
> can be safely mapped as cacheable. This relies on the presence of
> force write back (FWB) feature support on the hardware.
>
> +7.44 KVM_CAP_ARM_RME
> +--------------------
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] provides an action, args[1] points to a structure in
> + memory for the action.
> +:Returns: 0 on success, negative value on error
> +
> +Used to configure and set up the memory for a Realm. The available actions are:
> +
> +================================= =============================================
> + KVM_CAP_ARM_RME_CONFIG_REALM Takes struct arm_rme_config as args[1] and
> + configures realm parameters prior to it being
> + created.
> +
> + Options are ARM_RME_CONFIG_RPV to set the
> + "Realm Personalization Value" and
> + ARM_RME_CONFIG_HASH_ALGO to set the hash
> + algorithm.
> +
> + KVM_CAP_ARM_RME_CREATE_REALM Request the RMM to create the realm. The
> + realm's configuration parameters must be set
> + first.
> +
> + KVM_CAP_ARM_RME_INIT_RIPAS_REALM Takes struct arm_rme_init_ripas as args[1]
> + and sets the RIPAS (Realm IPA State) to
> + RIPAS_RAM of a specified area of the realm's
> + IPA.
> +
> + KVM_CAP_ARM_RME_POPULATE_REALM Takes struct arm_rme_populate_realm as
> + args[1] and populates a region of protected
> + address space by copying the data from the
> + shared alias.
> +
> + KVM_CAP_ARM_RME_ACTIVATE_REALM Request the RMM to activate the realm. No
> + changes can be made to the Realm's populated
> + memory, IPA state, configuration parameters
> + or vCPU additions after this step.
> +================================= =============================================
> +
These are not capabilities, they are actions that the VMM may perform
on a VM. You don't configure a VM using capabilities. You use it to
buy into some behaviours, but that's all.
And then there is the semantic of this stuff. Why do I need something
like KVM_CAP_ARM_RME_CREATE_REALM when I can just pass this as part of
the VM type? Why do I need a new way to describe memory region when we
already have memslots for that exact purpose?
Overall, you are leaking the RMM interface into userspace, and that's
an absolute show-stopper. We have an API, it is not pretty, but it
exists. We don't need another one that will be just as broken. If the
RMM needs some impedance matching, that's the kernel's job.
M.
--
Without deviation from the norm, progress is not possible.
On 01/10/2025 13:28, Marc Zyngier wrote:
> On Wed, 20 Aug 2025 15:55:26 +0100,
> Steven Price <steven.price@arm.com> wrote:
>>
>> There is one (multiplexed) CAP which can be used to create, populate and
>> then activate the realm.
>>
>> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>> ---
>> Changes since v9:
>> * Improvements to documentation.
>> * Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
>> Changes since v8:
>> * Minor improvements to documentation following review.
>> * Bump the magic numbers to avoid conflicts.
>> Changes since v7:
>> * Add documentation of new ioctls
>> * Bump the magic numbers to avoid conflicts
>> Changes since v6:
>> * Rename some of the symbols to make their usage clearer and avoid
>> repetition.
>> Changes from v5:
>> * Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
>> KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
>> ---
>> Documentation/virt/kvm/api.rst | 71 +++++++++++++++++++++++++++++++
>> arch/arm64/include/uapi/asm/kvm.h | 49 +++++++++++++++++++++
>> include/uapi/linux/kvm.h | 10 +++++
>> 3 files changed, 130 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 6aa40ee05a4a..69c0a9eba6c5 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -3549,6 +3549,11 @@ Possible features:
>> Depends on KVM_CAP_ARM_EL2_E2H0.
>> KVM_ARM_VCPU_HAS_EL2 must also be set.
>>
>> + - KVM_ARM_VCPU_REC: Allocate a REC (Realm Execution Context) for this
>> + VCPU. This must be specified on all VCPUs created in a Realm VM.
>> + Depends on KVM_CAP_ARM_RME.
>> + Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_REC).
>> +
>> 4.83 KVM_ARM_PREFERRED_TARGET
>> -----------------------------
>>
>> @@ -5122,6 +5127,7 @@ Recognised values for feature:
>>
>> ===== ===========================================
>> arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
>> + arm64 KVM_ARM_VCPU_REC (requires KVM_CAP_ARM_RME)
>> ===== ===========================================
>>
>> Finalizes the configuration of the specified vcpu feature.
>> @@ -6476,6 +6482,30 @@ the capability to be present.
>>
>> `flags` must currently be zero.
>>
>> +4.144 KVM_ARM_VCPU_RMM_PSCI_COMPLETE
>> +------------------------------------
>> +
>> +:Capability: KVM_CAP_ARM_RME
>> +:Architectures: arm64
>> +:Type: vcpu ioctl
>> +:Parameters: struct kvm_arm_rmm_psci_complete (in)
>> +:Returns: 0 if successful, < 0 on error
>> +
>> +::
>> +
>> + struct kvm_arm_rmm_psci_complete {
>> + __u64 target_mpidr;
>> + __u32 psci_status;
>> + __u32 padding[3];
>> + };
>> +
>> +Where PSCI functions are handled by user space, the RMM needs to be informed of
>> +the target of the operation using `target_mpidr`, along with the status
>> +(`psci_status`). The RMM v1.0 specification defines two functions that require
>> +this call: PSCI_CPU_ON and PSCI_AFFINITY_INFO.
>> +
>> +If the kernel is handling PSCI then this is done automatically and the VMM
>> +doesn't need to call this ioctl.
>
> Why should userspace involved in this? Why can't this be a
> notification that the host delivers to the RMM when the vcpu is about
> to run?
This is only when PSCI is being handled by user space. If the kernel
(i.e KVM) is handling PSCI then indeed there's no user space involvement.
I'm not sure how we could avoid this when PSCI is being implemented in
user space. Or am I missing something?
>>
>> .. _kvm_run:
>>
>> @@ -8662,6 +8692,47 @@ This capability indicate to the userspace whether a PFNMAP memory region
>> can be safely mapped as cacheable. This relies on the presence of
>> force write back (FWB) feature support on the hardware.
>>
>> +7.44 KVM_CAP_ARM_RME
>> +--------------------
>> +
>> +:Architectures: arm64
>> +:Target: VM
>> +:Parameters: args[0] provides an action, args[1] points to a structure in
>> + memory for the action.
>> +:Returns: 0 on success, negative value on error
>> +
>> +Used to configure and set up the memory for a Realm. The available actions are:
>> +
>> +================================= =============================================
>> + KVM_CAP_ARM_RME_CONFIG_REALM Takes struct arm_rme_config as args[1] and
>> + configures realm parameters prior to it being
>> + created.
>> +
>> + Options are ARM_RME_CONFIG_RPV to set the
>> + "Realm Personalization Value" and
>> + ARM_RME_CONFIG_HASH_ALGO to set the hash
>> + algorithm.
>> +
>> + KVM_CAP_ARM_RME_CREATE_REALM Request the RMM to create the realm. The
>> + realm's configuration parameters must be set
>> + first.
>> +
>> + KVM_CAP_ARM_RME_INIT_RIPAS_REALM Takes struct arm_rme_init_ripas as args[1]
>> + and sets the RIPAS (Realm IPA State) to
>> + RIPAS_RAM of a specified area of the realm's
>> + IPA.
>> +
>> + KVM_CAP_ARM_RME_POPULATE_REALM Takes struct arm_rme_populate_realm as
>> + args[1] and populates a region of protected
>> + address space by copying the data from the
>> + shared alias.
>> +
>> + KVM_CAP_ARM_RME_ACTIVATE_REALM Request the RMM to activate the realm. No
>> + changes can be made to the Realm's populated
>> + memory, IPA state, configuration parameters
>> + or vCPU additions after this step.
>> +================================= =============================================
>> +
>
> These are not capabilities, they are actions that the VMM may perform
> on a VM. You don't configure a VM using capabilities. You use it to
> buy into some behaviours, but that's all.
>
> And then there is the semantic of this stuff. Why do I need something
> like KVM_CAP_ARM_RME_CREATE_REALM when I can just pass this as part of
> the VM type? Why do I need a new way to describe memory region when we
> already have memslots for that exact purpose?
>
> Overall, you are leaking the RMM interface into userspace, and that's
> an absolute show-stopper. We have an API, it is not pretty, but it
> exists. We don't need another one that will be just as broken. If the
> RMM needs some impedance matching, that's the kernel's job.
So I'll admit the (ab)use of capabilities to set up the realm is a bit
of a hack. I initially did it this way to maintain some compatibility
with a prototype implementation, but it's taken until v10 for anyone to
express displeasure with the approach! It would of course be possible to
implement these are separate ioctls.
However, it seems we need to pin down the semantics before I do that
refactoring.
KVM_CAP_ARM_RME_CREATE_REALM
============================
The desire is that you can configure the aspects of the realm piecemeal
and then later trigger the SMC to the RMM. This avoids the need for a
big structure (which will grow with new features) containing all the
configuration. It also allows KVM_CAP_ARM_RME_CONFIG_REALM to validate
each configuration and provide immediate failure when a configuration
option is invalid.
Would you prefer ditching KVM_CAP_ARM_RME_CONFIG_REALM and passing a
structure of all the configuration options to
KVM_CAP_ARM_RME_CREATE_REALM? It would make identifying what part of an
invalid configuration is problematic harder. And we'd probably therefore
need to add a new discovery mechanism to find e.g. which hash algorithms
are supported.
KVM_CAP_ARM_RME_INIT_RIPAS_REALM
================================
This is a property that doesn't exist for a normal guest and as such the
VMM is going to have to do something extra to describe which areas of
RAM are protected and which are shared. I guess with the recent
guest_memfd changes it might be possible to pull that from the
guest_memfd instance(s). But note that it's entirely valid (from the CCA
perspective) to have both protected and shared regions that are not
backed at guest start. So setting the RIPAS to RAM (i.e. making an area
of memory protected) is independent of the guest_memfd backing status.
KVM_CAP_ARM_RME_POPULATE_REALM
==============================
This ties in with the above. This is effectively doing a data-preserving
conversion of shared to protected. We need some form of this to set the
initial data for the guest (boot loader/kernel image etc).
In this patch set this API is a little broken as it requires double
backing (both a valid VMA in the VMM and allocation in the guest_memfd).
My expectation was to replace this with a version that simply does an
in-place data-preserving conversion in guest_memfd. The only drawback is
that this requires a double memcpy() - KVM would need to copy the data
to a temporary buffer, and then the RMM would copy it back (after
dealing with memory encryption setup).
The alternative would be to provide a separate input buffer for
POPULATE_REALM. That could e.g. be a mmap()ed file, but that moves CCA
further away from a standard interface.
KVM_CAP_ARM_RME_ACTIVATE_REALM
==============================
This is easiest to drop - we could just do this step when the VMM first
tries to start a VCPU. The main benefits are:
* It makes error reporting more obvious (you can tell the difference
between the RMM failing the activate and something else preventing a
VCPU entry).
* It makes the KVM code slightly clearer because the state transition
is triggered by an obvious action from the VMM, and there's less
potential for races.
I agree it's a shame that the VMM can't just use the standard KVM
interface for realm guest setup. But there are some fundamental
differences between a normal guest and a realm guest. So the VMM will
need to be enlightened. Feel free to comment on the above if you have
suggestions on how the above API can be improved.
Thanks,
Steve
On 01/10/2025 15:44, Steven Price wrote:
> On 01/10/2025 13:28, Marc Zyngier wrote:
>> On Wed, 20 Aug 2025 15:55:26 +0100,
>> Steven Price <steven.price@arm.com> wrote:
>>>
>>> There is one (multiplexed) CAP which can be used to create, populate and
>>> then activate the realm.
>>>
>>> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>> ---
>>> Changes since v9:
>>> * Improvements to documentation.
>>> * Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
>>> Changes since v8:
>>> * Minor improvements to documentation following review.
>>> * Bump the magic numbers to avoid conflicts.
>>> Changes since v7:
>>> * Add documentation of new ioctls
>>> * Bump the magic numbers to avoid conflicts
>>> Changes since v6:
>>> * Rename some of the symbols to make their usage clearer and avoid
>>> repetition.
>>> Changes from v5:
>>> * Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
>>> KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
>>> ---
>>> Documentation/virt/kvm/api.rst | 71 +++++++++++++++++++++++++++++++
>>> arch/arm64/include/uapi/asm/kvm.h | 49 +++++++++++++++++++++
>>> include/uapi/linux/kvm.h | 10 +++++
>>> 3 files changed, 130 insertions(+)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>> index 6aa40ee05a4a..69c0a9eba6c5 100644
>>> --- a/Documentation/virt/kvm/api.rst
>>> +++ b/Documentation/virt/kvm/api.rst
>>> @@ -3549,6 +3549,11 @@ Possible features:
>>> Depends on KVM_CAP_ARM_EL2_E2H0.
>>> KVM_ARM_VCPU_HAS_EL2 must also be set.
>>>
>>> + - KVM_ARM_VCPU_REC: Allocate a REC (Realm Execution Context) for this
>>> + VCPU. This must be specified on all VCPUs created in a Realm VM.
>>> + Depends on KVM_CAP_ARM_RME.
>>> + Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_REC).
>>> +
>>> 4.83 KVM_ARM_PREFERRED_TARGET
>>> -----------------------------
>>>
>>> @@ -5122,6 +5127,7 @@ Recognised values for feature:
>>>
>>> ===== ===========================================
>>> arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE)
>>> + arm64 KVM_ARM_VCPU_REC (requires KVM_CAP_ARM_RME)
>>> ===== ===========================================
>>>
>>> Finalizes the configuration of the specified vcpu feature.
>>> @@ -6476,6 +6482,30 @@ the capability to be present.
>>>
>>> `flags` must currently be zero.
>>>
>>> +4.144 KVM_ARM_VCPU_RMM_PSCI_COMPLETE
>>> +------------------------------------
>>> +
>>> +:Capability: KVM_CAP_ARM_RME
>>> +:Architectures: arm64
>>> +:Type: vcpu ioctl
>>> +:Parameters: struct kvm_arm_rmm_psci_complete (in)
>>> +:Returns: 0 if successful, < 0 on error
>>> +
>>> +::
>>> +
>>> + struct kvm_arm_rmm_psci_complete {
>>> + __u64 target_mpidr;
>>> + __u32 psci_status;
>>> + __u32 padding[3];
>>> + };
>>> +
>>> +Where PSCI functions are handled by user space, the RMM needs to be informed of
>>> +the target of the operation using `target_mpidr`, along with the status
>>> +(`psci_status`). The RMM v1.0 specification defines two functions that require
>>> +this call: PSCI_CPU_ON and PSCI_AFFINITY_INFO.
>>> +
>>> +If the kernel is handling PSCI then this is done automatically and the VMM
>>> +doesn't need to call this ioctl.
>>
>> Why should userspace involved in this? Why can't this be a
>> notification that the host delivers to the RMM when the vcpu is about
>> to run?
>
> This is only when PSCI is being handled by user space. If the kernel
> (i.e KVM) is handling PSCI then indeed there's no user space involvement.
>
> I'm not sure how we could avoid this when PSCI is being implemented in
> user space. Or am I missing something?
I think there is a bit of disconnect here.
The RMM doesn't track the RECs for a given vCPU. So, when it requires
the REC object for a given vCPU, the Host provides this via an
RMI_PSCI_COMPLETE call. This is used for PSCI_CPU_ON and
PSCI_AFFINITY_INFO today, where the RMM can do the book keeping
for the REC and emulate the PSCI. Now, the host does have a control
on whether to ACCEPT or REJECT a request (for CPU_ON).
The RMM requires the PSCI_COMPLETE call, before it can return the
PSCI_CPU_ON back to the caller and also before the target vCPU can
run. Thus, this cannot be delayed until the "new VCPU" is run.
Like Steven mentioned, this is only useful in the UABI if the VMM is
handling the PSCI. And this must be issued, before the target vCPU
can be scheduled.
Suzuki
© 2016 - 2026 Red Hat, Inc.