From nobody Tue Dec 2 01:51:47 2025 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BBB334BA49 for ; Fri, 21 Nov 2025 11:11:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763723502; cv=none; b=XMfth9s+bULoiVfMVhw23Trf2877xa85/+yXli2mvTjHW8uo1eiXz1I/+n50gsVf5S+MZq6vcMWk1+4yuAlIztdbeBsqWGv3sdXjs4vgQuAJSbsoQwkwZKSNI6LVyKayDQYgoFMDESlWslKEPcmLC/wCZcRIvOg67R7tNRTscJU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763723502; c=relaxed/simple; bh=IuqcEOEbe0+RixX82HnLhcdMi0Mf/0K4mfVR0f073pY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EfHPMTyjQR5+iUQkJi9cw01w1oY+RvQYpyCGes/N74lFQ+7aafS34HqcIeOUa0DoJwJkQusIjaxC00HkJLptCRCjkpmwikkvcO0eK2ewe0jXXFV96cTQC6f9UluNVAcj/LDp92vwEWT+5xv6Be0yaamtEKdwyVdnGhJmsncjtho= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hju2Hlra; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hju2Hlra" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-42bb288c1bfso1204010f8f.2 for ; Fri, 21 Nov 2025 03:11:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763723495; x=1764328295; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UwlQx/OWwadt7gEx5Aj743IyAAkdjmoKIV+Gyuhnzno=; b=hju2HlraMTKxSAOEj+aejdFRCSCrnjPAb5+PNCWzEmHxEORtWGEg6qugUO+n4duIWa w33OGBh6w0s4EYPl4JuQbg9EWD9X3FpDvdzXdtVzLFMSwclpPZXKIGhY8Rltp5XBvGzC q3wy7sx9km3FpWLvGLRFwtQAYBxyUbqQigFBqU41p1YNOeBWaDNQhZstu9LP2onEsys8 8jVlKnhJadzMXp48BQeZYdeRoU7QTjqtsdDQAPC1yBPY5N5XZO8xqm7DQglH4w03LQ44 BLsfmtLyHEnbpqD7VfRipaJlbISsVcFpgvRF+QszkI/qF1R68gs46q60OiMWC09CGqQk HAUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763723495; x=1764328295; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=UwlQx/OWwadt7gEx5Aj743IyAAkdjmoKIV+Gyuhnzno=; b=pA8O4FbFAVWpPohlj4QSJTShlQRA542JSYAaglPINxmVYW8F/eYPPgcxNuG+/7vSWz XtWM0ShlawGKgsEzg6+toLqr84LlLS1JVdktYh23YnoT45XYH9o8Y8EBgidqGBzsvgWH VuYrgRPOf9KOuq6yZtp+Hr7zyDgELOTxwUO+b/XOGOS92DClYIMDNJV42AdZxkK0Rj7W 4SGhl9Q2wJKHECF3p+3nj1a4XMIpAmWAhbpkbEvCLfuRUMsAZH9X6QrpBuC3p0B/iykp PU7xZKCbEB/SKO1rANqWgty/dhZNFhV5Evq92tMnfhsNE975pJ97ja/5ens3VqfSATGE vwzA== X-Forwarded-Encrypted: i=1; AJvYcCXqs+sU9uFNrCheQmFAhWGhYUtVsYOhuUfI7pIvjlqj6yT2dIS2bte7ohFyWUTCNlHaR/HV2eZDs58ITwc=@vger.kernel.org X-Gm-Message-State: AOJu0YycEdS04pBJcex+zXQJaEyiknA0LNa7AR9WWLeQg7A32R8xWHEP YOFMecueFRQ26ugOIzP+ST1af4p0PYDITrRLGuVBRnxz/fVnaberg09O X-Gm-Gg: ASbGncs8K8cxrA9rILB2uwPCpQcpiObbmKnpJ/XAMGCsmZCE2N/Vy3vKxO/JVM6Ugvr Tt4//9XOwQr4rfSYQH4OZxvFgfC6XH8443VJVa4iKpaQ5pCLjIUMpDyKVVWXqYk9YCE+NXXsmmz tINL3stg5ZFwvLJwWH219b5x4C87/ZV5XegYpOX6ZSyGVFGIoIZRrSWlgf0NnEhJOBaB5lZoDvZ PYxmCYfCWHLnXcm4B0W+HY+dJITbkWrImaCDumffj60BRiHYzJpLN617ciZcPvbw9/wbhtOQi9x Dozdlrf0sD4taIhl/3lHClBc0dCFSFQLj7hc5QrszxLmbDvKoB/wQ1Whh+smKh6KMf+/ZPsdSjo jx657iQftEaiXZwBm/6kgrYQHY8yVH8ydIBYKCffccGBYvWcoC86rOaIxwoHtzXW4iItT48gsaO Ju+gR+cJNPES6WWjF08ZTmD2EbpnyXG6/uHLUVJLh/Fc+0TVQdgKrqg9+oG3bUSwjIjbGX7mpWm 75XMsyBjXY0yUEodHSQIdmoVYTU3IFG X-Google-Smtp-Source: AGHT+IHVJpc9/f8dm4MnHiPG8rhAYbdabzUczXwVfUZb1zuYY7xjGjgQXaM7mOC9X5Auu/VvvvBTgQ== X-Received: by 2002:a05:6000:1789:b0:42b:40b5:e64c with SMTP id ffacd0b85a97d-42cc1d0c3bdmr1976326f8f.30.1763723494827; Fri, 21 Nov 2025 03:11:34 -0800 (PST) Received: from ip-10-0-150-200.eu-west-1.compute.internal (ec2-52-49-196-232.eu-west-1.compute.amazonaws.com. [52.49.196.232]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42cb7f363e4sm10484180f8f.12.2025.11.21.03.11.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Nov 2025 03:11:34 -0800 (PST) From: Fred Griffoul To: kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, vkuznets@redhat.com, shuah@kernel.org, dwmw@amazon.co.uk, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, Fred Griffoul Subject: [PATCH v3 08/10] KVM: x86: Add nested context management Date: Fri, 21 Nov 2025 11:11:11 +0000 Message-ID: <20251121111113.456628-9-griffoul@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251121111113.456628-1-griffoul@gmail.com> References: <20251121111113.456628-1-griffoul@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fred Griffoul Add infrastructure to persist nested virtualization state when L2 vCPUs are switched on an L1 vCPU or migrated between L1 vCPUs. The nested context table uses a hash table for fast lookup by nested control block GPA (VMPTR for VMX, VMCB for SVM) and maintains a free list for context management. The kvm_nested_context_load() function searches for a context indexed by the target GPA; if not found, it allocates a new context up to the configured maximum. If at capacity, it recycles the oldest context from the free list. The oversubscription is hardcoded to support up to 8 L2 vCPUs per L1 vCPU. The kvm_nested_context_clear() function moves the context to the free list while keeping it in the hash table for potential reuse. This allows nested hypervisors to multiplex multiple L2 vCPUs on L1 vCPUs without losing cached nested state, significantly improving performance for workloads with frequent L2 context switches. This patch adds the basic infrastructure. Subsequent patches will add the nested VMX and SVM specific support to populate and utilize the cached nested state. Signed-off-by: Fred Griffoul --- arch/x86/include/asm/kvm_host.h | 31 +++++ arch/x86/include/uapi/asm/kvm.h | 2 + arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/nested.c | 199 ++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c | 5 +- 5 files changed, 237 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kvm/nested.c diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 4675e71b33a7..75f3cd82a073 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1379,6 +1379,28 @@ enum kvm_mmu_type { KVM_NR_MMU_TYPES, }; =20 +struct kvm_nested_context { + gpa_t gpa; + struct hlist_node hnode; + struct list_head lru_link; + struct kvm_vcpu *vcpu; +}; + +struct kvm_nested_context_table { + spinlock_t lock; + u32 count; + struct list_head lru_list; + DECLARE_HASHTABLE(hash, 8); +}; + +void kvm_nested_context_clear(struct kvm_vcpu *vcpu, gpa_t gpa); +struct kvm_nested_context *kvm_nested_context_load( + struct kvm_vcpu *vcpu, + gpa_t gpa); + +int kvm_nested_context_table_init(struct kvm *kvm); +void kvm_nested_context_table_destroy(struct kvm *kvm); + struct kvm_arch { unsigned long n_used_mmu_pages; unsigned long n_requested_mmu_pages; @@ -1618,6 +1640,9 @@ struct kvm_arch { * current VM. */ int cpu_dirty_log_size; + + /* Cache for nested contexts */ + struct kvm_nested_context_table *nested_context_table; }; =20 struct kvm_vm_stat { @@ -1640,6 +1665,8 @@ struct kvm_vm_stat { u64 nx_lpage_splits; u64 max_mmu_page_hash_collisions; u64 max_mmu_rmap_size; + u64 nested_context_recycle; + u64 nested_context_reuse; }; =20 struct kvm_vcpu_stat { @@ -1967,6 +1994,10 @@ struct kvm_x86_nested_ops { uint16_t *vmcs_version); uint16_t (*get_evmcs_version)(struct kvm_vcpu *vcpu); void (*hv_inject_synthetic_vmexit_post_tlb_flush)(struct kvm_vcpu *vcpu); + + struct kvm_nested_context *(*alloc_context)(struct kvm_vcpu *vcpu); + void (*free_context)(struct kvm_nested_context *ctx); + void (*reset_context)(struct kvm_nested_context *ctx); }; =20 struct kvm_x86_init_ops { diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index d420c9c066d4..637ed9286f8e 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -1042,4 +1042,6 @@ struct kvm_tdx_init_mem_region { __u64 nr_pages; }; =20 +#define KVM_NESTED_OVERSUB_RATIO 8 + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index c4b8950c7abe..2a5289cb5bd1 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -6,7 +6,7 @@ ccflags-$(CONFIG_KVM_WERROR) +=3D -Werror include $(srctree)/virt/kvm/Makefile.kvm =20 kvm-y +=3D x86.o emulate.o irq.o lapic.o cpuid.o pmu.o mtrr.o \ - debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o + debugfs.o nested.o mmu/mmu.o mmu/page_track.o mmu/spte.o =20 kvm-$(CONFIG_X86_64) +=3D mmu/tdp_iter.o mmu/tdp_mmu.o kvm-$(CONFIG_KVM_IOAPIC) +=3D i8259.o i8254.o ioapic.o diff --git a/arch/x86/kvm/nested.c b/arch/x86/kvm/nested.c new file mode 100644 index 000000000000..986820cb525f --- /dev/null +++ b/arch/x86/kvm/nested.c @@ -0,0 +1,199 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include + +static struct kvm_nested_context_table *kvm_nested_context_table_alloc(voi= d) +{ + struct kvm_nested_context_table *table; + + table =3D kzalloc(sizeof(*table), GFP_KERNEL_ACCOUNT); + if (!table) + return NULL; + + spin_lock_init(&table->lock); + INIT_LIST_HEAD(&table->lru_list); + hash_init(table->hash); + return table; +} + +static void kvm_nested_context_table_free(struct kvm_nested_context_table + *table) +{ + kfree(table); +} + +int kvm_nested_context_table_init(struct kvm *kvm) +{ + struct kvm_nested_context_table *table; + + if (!kvm_x86_ops.nested_ops->alloc_context || + !kvm_x86_ops.nested_ops->free_context || + !kvm_x86_ops.nested_ops->reset_context) + return -EINVAL; + + table =3D kvm_nested_context_table_alloc(); + if (!table) + return -ENOMEM; + + kvm->arch.nested_context_table =3D table; + return 0; +} + +void kvm_nested_context_table_destroy(struct kvm *kvm) +{ + struct kvm_nested_context_table *table; + struct kvm_nested_context *ctx; + struct hlist_node *tmp; + int bkt; + + table =3D kvm->arch.nested_context_table; + if (!table) + return; + + hash_for_each_safe(table->hash, bkt, tmp, ctx, hnode) { + hash_del(&ctx->hnode); + kvm_x86_ops.nested_ops->free_context(ctx); + } + + kvm_nested_context_table_free(table); +} + +static unsigned int kvm_nested_context_max(struct kvm *kvm) +{ + return KVM_NESTED_OVERSUB_RATIO * atomic_read(&kvm->online_vcpus); +} + +static struct kvm_nested_context *__kvm_nested_context_find(struct kvm_nes= ted_context_table + *table, gpa_t gpa) +{ + struct kvm_nested_context *ctx; + + hash_for_each_possible(table->hash, ctx, hnode, gpa) { + if (ctx->gpa =3D=3D gpa) + return ctx; + } + + return NULL; +} + +static struct kvm_nested_context *kvm_nested_context_find(struct + kvm_nested_context_table + *table, + struct kvm_vcpu *vcpu, + gpa_t gpa) +{ + struct kvm_nested_context *ctx; + + ctx =3D __kvm_nested_context_find(table, gpa); + if (!ctx) + return NULL; + + WARN_ON_ONCE(ctx->vcpu && ctx->vcpu !=3D vcpu); + + /* Remove from the LRU list if not attached to a vcpu */ + if (!ctx->vcpu) + list_del(&ctx->lru_link); + + return ctx; +} + +static struct kvm_nested_context *kvm_nested_context_recycle(struct + kvm_nested_context_table + *table) +{ + struct kvm_nested_context *ctx; + + if (list_empty(&table->lru_list)) + return NULL; + + ctx =3D + list_first_entry(&table->lru_list, struct kvm_nested_context, + lru_link); + list_del(&ctx->lru_link); + hash_del(&ctx->hnode); + return ctx; +} + +static void kvm_nested_context_insert(struct kvm_nested_context_table *tab= le, + struct kvm_nested_context *ctx, gpa_t gpa) +{ + hash_add(table->hash, &ctx->hnode, gpa); + ctx->gpa =3D gpa; +} + +struct kvm_nested_context *kvm_nested_context_load(struct kvm_vcpu *vcpu, + gpa_t gpa) +{ + struct kvm_nested_context_table *table; + struct kvm_nested_context *ctx, *new_ctx =3D NULL; + struct kvm *vm =3D vcpu->kvm; + bool reset =3D false; + + table =3D vcpu->kvm->arch.nested_context_table; + if (WARN_ON_ONCE(!table)) + return NULL; +retry: + spin_lock(&table->lock); + ctx =3D kvm_nested_context_find(table, vcpu, gpa); + if (!ctx) { + /* At capacity? Recycle the LRU context */ + if (table->count >=3D kvm_nested_context_max(vcpu->kvm)) { + ctx =3D kvm_nested_context_recycle(table); + if (unlikely(!ctx)) + goto finish; + + kvm_nested_context_insert(table, ctx, gpa); + ++vm->stat.nested_context_recycle; + reset =3D true; + + } else if (new_ctx) { + ++table->count; + ctx =3D new_ctx; + kvm_nested_context_insert(table, ctx, gpa); + new_ctx =3D NULL; + + } else { + /* Allocate a new context without holding the lock */ + spin_unlock(&table->lock); + new_ctx =3D kvm_x86_ops.nested_ops->alloc_context(vcpu); + if (unlikely(!new_ctx)) + return NULL; + + goto retry; + } + } else + ++vm->stat.nested_context_reuse; + + ctx->vcpu =3D vcpu; +finish: + spin_unlock(&table->lock); + + if (new_ctx) + kvm_x86_ops.nested_ops->free_context(new_ctx); + + if (reset) + kvm_x86_ops.nested_ops->reset_context(ctx); + + return ctx; +} + +void kvm_nested_context_clear(struct kvm_vcpu *vcpu, gpa_t gpa) +{ + struct kvm_nested_context_table *table; + struct kvm_nested_context *ctx; + + table =3D vcpu->kvm->arch.nested_context_table; + if (WARN_ON_ONCE(!table)) + return; + + spin_lock(&table->lock); + ctx =3D __kvm_nested_context_find(table, gpa); + if (ctx && ctx->vcpu) { + /* + * Move to LRU list but keep it in the hash table for possible future + * reuse. + */ + list_add_tail(&ctx->lru_link, &table->lru_list); + ctx->vcpu =3D NULL; + } + spin_unlock(&table->lock); +} diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1a9c1171df49..db13b1921aff 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -255,7 +255,9 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] =3D { STATS_DESC_ICOUNTER(VM, pages_1g), STATS_DESC_ICOUNTER(VM, nx_lpage_splits), STATS_DESC_PCOUNTER(VM, max_mmu_rmap_size), - STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions) + STATS_DESC_PCOUNTER(VM, max_mmu_page_hash_collisions), + STATS_DESC_COUNTER(VM, nested_context_recycle), + STATS_DESC_COUNTER(VM, nested_context_reuse) }; =20 const struct kvm_stats_header kvm_vm_stats_header =3D { @@ -13311,6 +13313,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); kvm_hv_destroy_vm(kvm); + kvm_nested_context_table_destroy(kvm); kvm_x86_call(vm_destroy)(kvm); } =20 --=20 2.43.0