[PATCH v6 1/2] kvm: sev: Add SEV-SNP guest request throttling

Dionna Glaze posted 2 patches 3 months, 1 week ago
[PATCH v6 1/2] kvm: sev: Add SEV-SNP guest request throttling
Posted by Dionna Glaze 3 months, 1 week ago
The AMD-SP is a precious resource that doesn't have a scheduler other
than a mutex lock queue. To avoid customers from causing a DoS, a
kernel module parameter for rate limiting guest requests is added.

The default value does not impose any rate limiting.

Throttling vs scheduling:
Even though Linux kernel mutexes have fair scheduling, the SEV command
mutex is not enough to balance the AMD-SP load in a manner that favors
the host to run VM launches for low boot latency over traffic from the
guest in the form of guests requests that it can't predict.
Boot sequence commands and guest request commands all contend on
the same mutex, so boot latency is affected by increased guest request
contention.

A VM launch may see dozens of SNP_LAUNCH_UPDATE commands before
SNP_LAUNCH_FINISH, and boot times are a heavily protected metric in
hyperscalars.
To favor lower latency of VM launches over each VM's ability to request
attestations at a high rate, the guest requests need a secondary
scheduling mechanism.
It's not good practice to hold a lock and return to user space, so using
a secondary lock for VM launch sequences is not an appropriate solution.
For simplicity, merely set a rate limit for every VM's guest requests
and allow a system administrator to tune that rate limit to platform
needs.

Design decisions:
The throttle rate for a VM cannot be changed once it has been started.
The rate the VM gets is its level of service, so it should not be
degradable by a mem_enc_ioctl for example.

Empirical investigation:
With a test methodology of turning up N-1 "antagonist" VMs with 2 vCPUs
and 4GiB RAM that all request a SEV-SNP attestation a tight loop before
measuring the boot latency of the Nth VM, an effective quality of service
should keep the average boot latency at levels without any guest request
contention.

On a dedicated 256 core AMD Zen3 with 1TiB of RAM, continuous performance
testing shows that a boot latency of 220ms +- 50ms is typical with N in
{4, 16, 32, 64} when the request rate is set to 1/s.

After N=64, the rate limit of 1 HZ is insufficient to hold back enough
time for the final VM launch to succeed consistently in the contention.

Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Peter Gonda <pgonda@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Sean Christopherson <seanjc@google.com>

Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
---
 arch/x86/kvm/svm/sev.c | 17 +++++++++++++++++
 arch/x86/kvm/svm/svm.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1aa0f07d3a63..e45f0cfae2bd 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -12,13 +12,16 @@
 #include <linux/kvm_host.h>
 #include <linux/kernel.h>
 #include <linux/highmem.h>
+#include <linux/limits.h>
 #include <linux/psp.h>
 #include <linux/psp-sev.h>
 #include <linux/pagemap.h>
 #include <linux/swap.h>
 #include <linux/misc_cgroup.h>
 #include <linux/processor.h>
+#include <linux/ratelimit.h>
 #include <linux/trace_events.h>
+#include <linux/units.h>
 #include <uapi/linux/sev-guest.h>
 
 #include <asm/pkru.h>
@@ -59,6 +62,10 @@ static bool sev_es_debug_swap_enabled = true;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 static u64 sev_supported_vmsa_features;
 
+/* set a per-VM rate limit for SEV-SNP guest requests on VM creation. 0 is unlimited. */
+static int sev_snp_request_ratelimit_khz = 0;
+module_param(sev_snp_request_ratelimit_khz, int, 0444);
+
 #define AP_RESET_HOLD_NONE		0
 #define AP_RESET_HOLD_NAE_EVENT		1
 #define AP_RESET_HOLD_MSR_PROTO		2
@@ -367,6 +374,7 @@ static int snp_guest_req_init(struct kvm *kvm)
 {
 	struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
 	struct page *req_page;
+	u64 throttle_interval;
 
 	req_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 	if (!req_page)
@@ -381,6 +389,9 @@ static int snp_guest_req_init(struct kvm *kvm)
 	sev->guest_req_buf = page_address(req_page);
 	mutex_init(&sev->guest_req_mutex);
 
+	throttle_interval = ((u64)sev_snp_request_ratelimit_khz * HZ) / HZ_PER_KHZ;
+	ratelimit_state_init(&sev->snp_guest_msg_rs, sev_snp_request_ratelimit_khz, 1);
+
 	return 0;
 }
 
@@ -4028,6 +4039,12 @@ static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_
 
 	mutex_lock(&sev->guest_req_mutex);
 
+	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
+		svm_vmgexit_no_action(svm, SNP_GUEST_ERR(SNP_GUEST_VMM_ERR_BUSY, 0));
+		ret = 1;
+		goto out_unlock;
+	}
+
 	if (kvm_read_guest(kvm, req_gpa, sev->guest_req_buf, PAGE_SIZE)) {
 		ret = -EIO;
 		goto out_unlock;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f16b068c4228..2643c940d054 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -18,6 +18,7 @@
 #include <linux/kvm_types.h>
 #include <linux/kvm_host.h>
 #include <linux/bits.h>
+#include <linux/ratelimit.h>
 
 #include <asm/svm.h>
 #include <asm/sev-common.h>
@@ -112,6 +113,8 @@ struct kvm_sev_info {
 	void *guest_req_buf;    /* Bounce buffer for SNP Guest Request input */
 	void *guest_resp_buf;   /* Bounce buffer for SNP Guest Request output */
 	struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */
+
+	struct ratelimit_state snp_guest_msg_rs; /* Limit guest requests */
 };
 
 struct kvm_svm {
-- 
2.50.0.rc0.642.g800a2b2222-goog
Re: [PATCH v6 1/2] kvm: sev: Add SEV-SNP guest request throttling
Posted by kernel test robot 3 months, 1 week ago
Hi Dionna,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mst-vhost/linux-next]
[also build test WARNING on linus/master v6.15 next-20250606]
[cannot apply to kvm/queue kvm/next kvm/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dionna-Glaze/kvm-sev-Add-SEV-SNP-guest-request-throttling/20250605-230536
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
patch link:    https://lore.kernel.org/r/20250605150236.3775954-2-dionnaglaze%40google.com
patch subject: [PATCH v6 1/2] kvm: sev: Add SEV-SNP guest request throttling
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20250606/202506061922.q7OljdiN-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
rustc: rustc 1.78.0 (9b00956e5 2024-04-29)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250606/202506061922.q7OljdiN-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506061922.q7OljdiN-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kvm/svm/sev.c:376:6: warning: variable 'throttle_interval' set but not used [-Wunused-but-set-variable]
     376 |         u64 throttle_interval;
         |             ^
   1 warning generated.


vim +/throttle_interval +376 arch/x86/kvm/svm/sev.c

   334	
   335	/*
   336	 * This sets up bounce buffers/firmware pages to handle SNP Guest Request
   337	 * messages (e.g. attestation requests). See "SNP Guest Request" in the GHCB
   338	 * 2.0 specification for more details.
   339	 *
   340	 * Technically, when an SNP Guest Request is issued, the guest will provide its
   341	 * own request/response pages, which could in theory be passed along directly
   342	 * to firmware rather than using bounce pages. However, these pages would need
   343	 * special care:
   344	 *
   345	 *   - Both pages are from shared guest memory, so they need to be protected
   346	 *     from migration/etc. occurring while firmware reads/writes to them. At a
   347	 *     minimum, this requires elevating the ref counts and potentially needing
   348	 *     an explicit pinning of the memory. This places additional restrictions
   349	 *     on what type of memory backends userspace can use for shared guest
   350	 *     memory since there is some reliance on using refcounted pages.
   351	 *
   352	 *   - The response page needs to be switched to Firmware-owned[1] state
   353	 *     before the firmware can write to it, which can lead to potential
   354	 *     host RMP #PFs if the guest is misbehaved and hands the host a
   355	 *     guest page that KVM might write to for other reasons (e.g. virtio
   356	 *     buffers/etc.).
   357	 *
   358	 * Both of these issues can be avoided completely by using separately-allocated
   359	 * bounce pages for both the request/response pages and passing those to
   360	 * firmware instead. So that's what is being set up here.
   361	 *
   362	 * Guest requests rely on message sequence numbers to ensure requests are
   363	 * issued to firmware in the order the guest issues them, so concurrent guest
   364	 * requests generally shouldn't happen. But a misbehaved guest could issue
   365	 * concurrent guest requests in theory, so a mutex is used to serialize
   366	 * access to the bounce buffers.
   367	 *
   368	 * [1] See the "Page States" section of the SEV-SNP Firmware ABI for more
   369	 *     details on Firmware-owned pages, along with "RMP and VMPL Access Checks"
   370	 *     in the APM for details on the related RMP restrictions.
   371	 */
   372	static int snp_guest_req_init(struct kvm *kvm)
   373	{
   374		struct kvm_sev_info *sev = to_kvm_sev_info(kvm);
   375		struct page *req_page;
 > 376		u64 throttle_interval;
   377	
   378		req_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
   379		if (!req_page)
   380			return -ENOMEM;
   381	
   382		sev->guest_resp_buf = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
   383		if (!sev->guest_resp_buf) {
   384			__free_page(req_page);
   385			return -EIO;
   386		}
   387	
   388		sev->guest_req_buf = page_address(req_page);
   389		mutex_init(&sev->guest_req_mutex);
   390	
   391		throttle_interval = ((u64)sev_snp_request_ratelimit_khz * HZ) / HZ_PER_KHZ;
   392		ratelimit_state_init(&sev->snp_guest_msg_rs, sev_snp_request_ratelimit_khz, 1);
   393	
   394		return 0;
   395	}
   396	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki