From: Ashish Kalra <ashish.kalra@amd.com>
The new RMPOPT instruction sets bits in a per-CPU RMPOPT table, which
indicates whether specific 1GB physical memory regions contain SEV-SNP
guest memory.
Per-CPU RMPOPT tables support at most 2 TB of addressable memory for
RMP optimizations.
Initialize the per-CPU RMPOPT table base to the starting physical
address. This enables RMP optimization for up to 2 TB of system RAM on
all CPUs.
Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/include/asm/msr-index.h | 3 +++
arch/x86/virt/svm/sev.c | 37 ++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index da5275d8eda6..8e7da03abd5b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -753,6 +753,9 @@
#define MSR_AMD64_SEG_RMP_ENABLED_BIT 0
#define MSR_AMD64_SEG_RMP_ENABLED BIT_ULL(MSR_AMD64_SEG_RMP_ENABLED_BIT)
#define MSR_AMD64_RMP_SEGMENT_SHIFT(x) (((x) & GENMASK_ULL(13, 8)) >> 8)
+#define MSR_AMD64_RMPOPT_BASE 0xc0010139
+#define MSR_AMD64_RMPOPT_ENABLE_BIT 0
+#define MSR_AMD64_RMPOPT_ENABLE BIT_ULL(MSR_AMD64_RMPOPT_ENABLE_BIT)
#define MSR_SVSM_CAA 0xc001f000
diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
index a4f3a364fb65..405199c2f563 100644
--- a/arch/x86/virt/svm/sev.c
+++ b/arch/x86/virt/svm/sev.c
@@ -500,6 +500,41 @@ static bool __init setup_rmptable(void)
}
}
+static void __configure_rmpopt(void *val)
+{
+ u64 rmpopt_base = ((u64)val & PUD_MASK) | MSR_AMD64_RMPOPT_ENABLE;
+
+ wrmsrq(MSR_AMD64_RMPOPT_BASE, rmpopt_base);
+}
+
+static __init void configure_and_enable_rmpopt(void)
+{
+ phys_addr_t pa_start = ALIGN_DOWN(PFN_PHYS(min_low_pfn), PUD_SIZE);
+
+ if (!cpu_feature_enabled(X86_FEATURE_RMPOPT)) {
+ pr_debug("RMPOPT not supported on this platform\n");
+ return;
+ }
+
+ if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) {
+ pr_debug("RMPOPT optimizations not enabled as SNP support is not enabled\n");
+ return;
+ }
+
+ if (!(rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED)) {
+ pr_info("RMPOPT optimizations not enabled, segmented RMP required\n");
+ return;
+ }
+
+ /*
+ * Per-CPU RMPOPT tables support at most 2 TB of addressable memory for RMP optimizations.
+ *
+ * Set per-core RMPOPT base to min_low_pfn to enable RMP optimization for
+ * up to 2TB of system RAM on all CPUs.
+ */
+ on_each_cpu_mask(cpu_online_mask, __configure_rmpopt, (void *)pa_start, true);
+}
+
/*
* Do the necessary preparations which are verified by the firmware as
* described in the SNP_INIT_EX firmware command description in the SNP
@@ -555,6 +590,8 @@ int __init snp_rmptable_init(void)
skip_enable:
cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/rmptable_init:online", __snp_enable, NULL);
+ configure_and_enable_rmpopt();
+
/*
* Setting crash_kexec_post_notifiers to 'true' to ensure that SNP panic
* notifier is invoked to do SNP IOMMU shutdown before kdump.
--
2.43.0
On 3/2/26 13:35, Ashish Kalra wrote:
> The new RMPOPT instruction sets bits in a per-CPU RMPOPT table, which
> indicates whether specific 1GB physical memory regions contain SEV-SNP
> guest memory.
Honestly, this is an implementation detail that we don't need to know
about in the kernel. It's also not even factually correct. The
instruction _might_ not set any bits, either because there is SEV-SNP
memory or because it's being run in query mode.
The new RMPOPT instruction helps manage per-CPU RMP optimization
structures inside the CPU. It takes a 1GB-aligned physical
address and either returns the status of the optimizations or
tries to enable the optimizations.
> Per-CPU RMPOPT tables support at most 2 TB of addressable memory for
> RMP optimizations.
>
> Initialize the per-CPU RMPOPT table base to the starting physical
> address. This enables RMP optimization for up to 2 TB of system RAM on
> all CPUs.
The reset looks good.
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index da5275d8eda6..8e7da03abd5b 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -753,6 +753,9 @@
> #define MSR_AMD64_SEG_RMP_ENABLED_BIT 0
> #define MSR_AMD64_SEG_RMP_ENABLED BIT_ULL(MSR_AMD64_SEG_RMP_ENABLED_BIT)
> #define MSR_AMD64_RMP_SEGMENT_SHIFT(x) (((x) & GENMASK_ULL(13, 8)) >> 8)
> +#define MSR_AMD64_RMPOPT_BASE 0xc0010139
> +#define MSR_AMD64_RMPOPT_ENABLE_BIT 0
> +#define MSR_AMD64_RMPOPT_ENABLE BIT_ULL(MSR_AMD64_RMPOPT_ENABLE_BIT)
>
> #define MSR_SVSM_CAA 0xc001f000
>
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index a4f3a364fb65..405199c2f563 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -500,6 +500,41 @@ static bool __init setup_rmptable(void)
> }
> }
>
> +static void __configure_rmpopt(void *val)
> +{
> + u64 rmpopt_base = ((u64)val & PUD_MASK) | MSR_AMD64_RMPOPT_ENABLE;
> +
> + wrmsrq(MSR_AMD64_RMPOPT_BASE, rmpopt_base);
> +}
> +
> +static __init void configure_and_enable_rmpopt(void)
> +{
> + phys_addr_t pa_start = ALIGN_DOWN(PFN_PHYS(min_low_pfn), PUD_SIZE);
> +
> + if (!cpu_feature_enabled(X86_FEATURE_RMPOPT)) {
> + pr_debug("RMPOPT not supported on this platform\n");
> + return;
> + }
> +
> + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) {
> + pr_debug("RMPOPT optimizations not enabled as SNP support is not enabled\n");
> + return;
> + }
To be honest, I think those two are just plain noise ^^.
> + if (!(rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED)) {
> + pr_info("RMPOPT optimizations not enabled, segmented RMP required\n");
> + return;
> + }
> +
> + /*
> + * Per-CPU RMPOPT tables support at most 2 TB of addressable memory for RMP optimizations.
> + *
> + * Set per-core RMPOPT base to min_low_pfn to enable RMP optimization for
> + * up to 2TB of system RAM on all CPUs.
> + */
Please at least be consistent with your comments. This is both over 80
columns *and* not even consistent in the two sentences.
> + on_each_cpu_mask(cpu_online_mask, __configure_rmpopt, (void *)pa_start, true);
> +}
What's wrong with:
u64 rmpopt_base = pa_start | MSR_AMD64_RMPOPT_ENABLE;
...
for_each_online_cpu(cpu)
wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base);
Then there's at least no ugly casting.
Hello Dave,
On 3/2/2026 4:32 PM, Dave Hansen wrote:
>> +static __init void configure_and_enable_rmpopt(void)
>> +{
>> + phys_addr_t pa_start = ALIGN_DOWN(PFN_PHYS(min_low_pfn), PUD_SIZE);
>> +
>> + if (!cpu_feature_enabled(X86_FEATURE_RMPOPT)) {
>> + pr_debug("RMPOPT not supported on this platform\n");
>> + return;
>> + }
>> +
>> + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) {
>> + pr_debug("RMPOPT optimizations not enabled as SNP support is not enabled\n");
>> + return;
>> + }
>
> To be honest, I think those two are just plain noise ^^.
They are basically pr_debug's, so won't really cause noise generally.
>
>> + if (!(rmp_cfg & MSR_AMD64_SEG_RMP_ENABLED)) {
>> + pr_info("RMPOPT optimizations not enabled, segmented RMP required\n");
>> + return;
>> + }
>> +
>> + /*
>> + * Per-CPU RMPOPT tables support at most 2 TB of addressable memory for RMP optimizations.
>> + *
>> + * Set per-core RMPOPT base to min_low_pfn to enable RMP optimization for
>> + * up to 2TB of system RAM on all CPUs.
>> + */
>
> Please at least be consistent with your comments. This is both over 80
> columns *and* not even consistent in the two sentences.
Sure.
>
>> + on_each_cpu_mask(cpu_online_mask, __configure_rmpopt, (void *)pa_start, true);
>> +}
>
> What's wrong with:
>
> u64 rmpopt_base = pa_start | MSR_AMD64_RMPOPT_ENABLE;
> ...
> for_each_online_cpu(cpu)
> wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base);
>
> Then there's at least no ugly casting.
>
RMOPT_BASE MSRs don't need to be set serially, therefore, by
using the cpu_online_mask and on_each_cpu_mask(), we can setup the MSRs
concurrently and in parallel. Using for_each_online_cpu() will be slower than
doing on_each_cpu_mask() as it doesn't send IPIs in parallel, right.
Thanks,
Ashish
On 3/2/26 14:55, Kalra, Ashish wrote: >> What's wrong with: >> >> u64 rmpopt_base = pa_start | MSR_AMD64_RMPOPT_ENABLE; >> ... >> for_each_online_cpu(cpu) >> wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base); >> >> Then there's at least no ugly casting. >> > RMOPT_BASE MSRs don't need to be set serially, therefore, by > using the cpu_online_mask and on_each_cpu_mask(), we can setup the MSRs > concurrently and in parallel. Using for_each_online_cpu() will be slower than > doing on_each_cpu_mask() as it doesn't send IPIs in parallel, right. If that's the case and you *need* performance, then please go add a wrmsrq_on_cpumask() function to do things in parallel.
On 3/2/2026 5:00 PM, Dave Hansen wrote: > On 3/2/26 14:55, Kalra, Ashish wrote: >>> What's wrong with: >>> >>> u64 rmpopt_base = pa_start | MSR_AMD64_RMPOPT_ENABLE; >>> ... >>> for_each_online_cpu(cpu) >>> wrmsrq_on_cpu(cpu, MSR_AMD64_RMPOPT_BASE, rmpopt_base); >>> >>> Then there's at least no ugly casting. >>> >> RMOPT_BASE MSRs don't need to be set serially, therefore, by >> using the cpu_online_mask and on_each_cpu_mask(), we can setup the MSRs >> concurrently and in parallel. Using for_each_online_cpu() will be slower than >> doing on_each_cpu_mask() as it doesn't send IPIs in parallel, right. > > If that's the case and you *need* performance, then please go add a > wrmsrq_on_cpumask() function to do things in parallel. Sure. Thanks, Ashish
© 2016 - 2026 Red Hat, Inc.