From nobody Mon Jun 8 08:35:32 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E7483998B1 for ; Sat, 30 May 2026 16:55:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160154; cv=none; b=AccSlgLQHsjV0aoheH7typKqe1SaMtjUmzko6jqgh6OqcIfxmNGCrduxc+Gxljpsseyx93PvBd2R6sxg0427QdDsipeyNwdhJkLvRfVI90pB4I0IkUMeXQP+eORqRxNU2xl4SLVuK09VWr1hPRWwytDaWo5UXLFfcFiqSis/DCY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160154; c=relaxed/simple; bh=6au/S327OgT7wu5gI5vMswLdmBiMZTUeYiVzWqFMeGU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tyMdWwvzdzMHX80MzR9hs3WnN0ZQeQt7flz+ynn667NisVHgxRO1w14KzGdLzH8fAM71FVS6YJaLb5mxwIB8OWC3JLnhJbMavEF1tEDmxBV29b/9cIpvIrt1BfW6+VAIs2qF7wz+fDyAa2xFDdXo/SjaEdVujNVKNZYSJ0ZYq6k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=S19IcdVh; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S19IcdVh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780160152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gGbyH9UKeJXKESFRTMlgJjDKtUie2VdU9pMcPcY/c4E=; b=S19IcdVhudTsaOL1f68fmh4sxiN28R7rUeB3gjfZudoGEZaFgSPxDAM8CAp/vEwliQ/N6X gwAXUzcXQOYrQq8LvJKbJrmViqrDGAL3MAdPXI6kH/sdNikemqUYLdVkThbMJjmekSuLoj lSJk3mmLJIOdo6EbeLXAN8V1jNgDDTU= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-380-_xmhaxlJM8qD90mONXHoIg-1; Sat, 30 May 2026 12:55:48 -0400 X-MC-Unique: _xmhaxlJM8qD90mONXHoIg-1 X-Mimecast-MFC-AGG-ID: _xmhaxlJM8qD90mONXHoIg_1780160147 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id AF6151956052; Sat, 30 May 2026 16:55:47 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 344C51800465; Sat, 30 May 2026 16:55:47 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com Subject: [PATCH v2 1/5] KVM: x86: remove nested_mmu from mmu_is_nested() Date: Sat, 30 May 2026 12:55:41 -0400 Message-ID: <20260530165545.25599-2-pbonzini@redhat.com> In-Reply-To: <20260530165545.25599-1-pbonzini@redhat.com> References: <20260530165545.25599-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" nested_mmu is always stored into vcpu->arch.walk_mmu at the same time as guest_mmu is stored into vcpu->arch.mmu. But nested_mmu is not even a proper MMU, it is only used for page walking; plus the fact that walk_mmu has to be switched at all is just an implementation detail. In the end what matters here is whether the guest is using nested page tables; vmx/nested.c and svm/nested.c check it to see if they are in nEPT or nNPT context respectively. So switch to checking root_mmu vs. guest_mmu, which is a more cogent test. Signed-off-by: Paolo Bonzini Message-ID: <20260511150648.685374-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 38a905fa86de..60ff064de12f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -290,7 +290,7 @@ static inline bool x86_exception_has_error_code(unsigne= d int vector) =20 static inline bool mmu_is_nested(struct kvm_vcpu *vcpu) { - return vcpu->arch.walk_mmu =3D=3D &vcpu->arch.nested_mmu; + return vcpu->arch.mmu =3D=3D &vcpu->arch.guest_mmu; } =20 static inline bool is_pae(struct kvm_vcpu *vcpu) --=20 2.52.0 From nobody Mon Jun 8 08:35:32 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4796133F5A3 for ; Sat, 30 May 2026 16:55:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160153; cv=none; b=ZB9APhIcNAWzOigWGCTrYIlxrpiB+qQxX9po3VTJIDtrBJrSOKUKjWFMfYE6OVRIK3oD4edTCc00sVQeOPQH0ON/1uO+eUn/r34RumeJ/26FN/ceq+r7AlN50zD151ol2jaW5ylRC0vLFYwHgI+nsz7xBdFkqyu/eh5RWa5yODA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160153; c=relaxed/simple; bh=f/JibM/Z8U0GmZ8VpS7vTkSFIKKx0UmcuQ89am1W1mM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=avuRbRFuS0QtOsq3HAUiHyyQNtzHjx1zKETg3oQKRXd6Loio/OupxDD4ZngNgXJEq2CnN5jgNP9cEIVbNahCsl2K/tOmy0RBSx1eLktwmW7wpSJJ2DYJi/+rglAQJQDKGp2F4VRBYBLWoqTaJIFzTCKiqZCYh801UkkfK0rQOng= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Ci6e+f0N; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ci6e+f0N" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780160151; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RBJqn7A9uwH01+VXGW+KMltHwpd4ueg5LsNii60ADEA=; b=Ci6e+f0NDXAT13zRc9+ehvgeO+quDkzNvm7ghWIcrvBxhW4Rjks3h9etVC71qwVsPlF7wP drMGrp9sycnVu9QcHVLrFfKIuHbb+BGleZZ7rrURVoBAJr0x+HmDrla8MvcwmxRskDiW5N aKGkNirVJ11Pd1hN3+GzTVh3oUDEmTY= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-483-LPVpT5uSMFmirYqBVndvSQ-1; Sat, 30 May 2026 12:55:49 -0400 X-MC-Unique: LPVpT5uSMFmirYqBVndvSQ-1 X-Mimecast-MFC-AGG-ID: LPVpT5uSMFmirYqBVndvSQ_1780160148 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5BDED1800451; Sat, 30 May 2026 16:55:48 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D69AB1800465; Sat, 30 May 2026 16:55:47 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com Subject: [PATCH v2 2/5] KVM: nVMX: remove unnecessary code in prepare_vmcs02_rare Date: Sat, 30 May 2026 12:55:42 -0400 Message-ID: <20260530165545.25599-3-pbonzini@redhat.com> In-Reply-To: <20260530165545.25599-1-pbonzini@redhat.com> References: <20260530165545.25599-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" The early vmwrite of the PDPTRs in prepare_vmcs02_rare() is redundant, beca= use every write it does will be performed by prepare_vmcs02() if it is actually needed. In any case where the emulator or the processor need the PDPTR, either is_pae_paging() is true on vmentry, or a write of CR0, CR4 or EFER will cause a vmexit to L0. The next vmentry will refresh the PDPTRs in the vmcs02 from vmcs12. In fact, the original version[1] of what ended up being commit c7554efc8335 ("KVM: nVMX: Copy PDPTRs to/from vmcs12 only when necessary"), the writes in what is now prepare_vmcs02_rare() were removed. When the mega-collection of optimizations was posted[2], the removal of that code got dropped as a rebase good, so reinstate it. [1] https://lore.kernel.org/all/20190507160640.4812-16-sean.j.christopherso= n@intel.com [2] https://lore.kernel.org/all/1560445409-17363-31-git-send-email-pbonzini= @redhat.com Suggested-by: Sean Christopherson Signed-off-by: Paolo Bonzini --- arch/x86/kvm/vmx/nested.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index c1be8ef882b8..58f91e7921e3 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2623,17 +2623,6 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx= , struct vmcs12 *vmcs12) vmcs_writel(GUEST_SYSENTER_ESP, vmcs12->guest_sysenter_esp); vmcs_writel(GUEST_SYSENTER_EIP, vmcs12->guest_sysenter_eip); =20 - /* - * L1 may access the L2's PDPTR, so save them to construct - * vmcs12 - */ - if (enable_ept) { - vmcs_write64(GUEST_PDPTR0, vmcs12->guest_pdptr0); - vmcs_write64(GUEST_PDPTR1, vmcs12->guest_pdptr1); - vmcs_write64(GUEST_PDPTR2, vmcs12->guest_pdptr2); - vmcs_write64(GUEST_PDPTR3, vmcs12->guest_pdptr3); - } - if (kvm_mpx_supported() && vmx->vcpu.arch.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs); --=20 2.52.0 From nobody Mon Jun 8 08:35:32 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 226923A9623 for ; Sat, 30 May 2026 16:55:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160156; cv=none; b=dabLqcKFiFtMt1E9QfzZ9mTPETupyq7OjiLnKfNQL2BD5DaFL8eFD/s2yP2neBPP2VhEMRoQL6hobwmfJk3Ud1fywtIj/RjVW5XjrqL+jCAdKi5AfnK674G4/jQlMxHjW0MVFV88T/WsmmplxXea+B1dev+CNtGY1w2+xzDWzgY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160156; c=relaxed/simple; bh=hcnMy0pk5y4k5dWyzFYhPys6XtCcu6uR92E9DHF9ysg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tJZUgqGHsrgQTQMTJ5y7nA6jxTMreWSGqPfdno8qRle7P7JIRcKbfuToBejXL6mJKiYFafO0mPyehgjiKnttuj1xAtkzWi/UbBHk4ZlZtKmWWUOpEP2m02DUdmOWDtdv2c9DrxBIfYYyG/AQcgsK8uxgBfFo8lcqhhFhEybXqi8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZCHgcqb+; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZCHgcqb+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780160153; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=I9+0F9zn7InsarGAuIZCAI//xzyMzOUExsQx9OexxrE=; b=ZCHgcqb+KPovm0APG/bcLyh/Nmp1fbkVkNJI0MEG16IJ8Hzev+md6T1lFpo6e5D5Sr68d4 Sxd9QxkyrnSnIuQBRBtnuahOvgh1L9ENshz54cBme39nmg/VF3bU81XdCK+o74QxO9gwLD GOUDDGm92fHQbn46OHwzmIUok+F8fBg= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-499-07MSHU--O--RMBSFByEGeA-1; Sat, 30 May 2026 12:55:49 -0400 X-MC-Unique: 07MSHU--O--RMBSFByEGeA-1 X-Mimecast-MFC-AGG-ID: 07MSHU--O--RMBSFByEGeA_1780160149 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F2ED919560B2; Sat, 30 May 2026 16:55:48 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8207C1800465; Sat, 30 May 2026 16:55:48 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com Subject: [PATCH v2 3/5] KVM: nSVM: invalidate cached PDPTRs across nested NPT transitions Date: Sat, 30 May 2026 12:55:43 -0400 Message-ID: <20260530165545.25599-4-pbonzini@redhat.com> In-Reply-To: <20260530165545.25599-1-pbonzini@redhat.com> References: <20260530165545.25599-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" When L2 runs under nested NPT and uses PAE paging, KVM's cached PDPTRs in mmu->pdptrs[] can hold stale or wrong values after nested transitions and across migration restore, because both nested_svm_load_cr3() and svm_get_nested_state_pages() only refresh PDPTRs on the !nested_npt path. The user-visible bug is on migration restore of an L2 running with nested NPT and 32-bit PAE paging, if userspace uses KVM_SET_SREGS rather than KVM_SET_SREGS2. In that case, load_pdptrs() leaves VCPU_EXREG_PDPTR marked as available, and kvm_pdptr_read() will use a stale translation that used L1 GPAs instead of L2 nGPAs. svm_get_nested_state_pages() runs on first KVM_RUN but skips the refresh because nested_npt_enabled() is true. The CPU itself reads L2's PDPTRs correctly from memory via L1's NPT, but KVM-side walking of guest PAE page tables uses the bogus cached values. Unlike Intel's GUEST_PDPTR0..3 fields in the VMCS, SVM has no VMCB-cached PDPTR state: the in-memory PDPTEs at the current CR3 are the only source of truth, and svm_cache_reg(VCPU_EXREG_PDPTR) simply reloads them from memory via load_pdptrs(). Clearing the avail bit (and the dirty bit because !avail/dirty is invalid) to force a reload when PDPTRs as needed fixes the bug. Do the same for nested_svm_load_cr3()'s nested_npt branch, so that the invariant "PDPTRs need reloading" is handled similarly for both immediate and deferred loading. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/kvm_cache_regs.h | 8 ++++++++ arch/x86/kvm/svm/nested.c | 27 ++++++++++++++++++--------- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 2ae492ad6412..6bae5db5a54e 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -77,6 +77,14 @@ static inline bool kvm_register_is_dirty(struct kvm_vcpu= *vcpu, return test_bit(reg, vcpu->arch.regs_dirty); } =20 +static inline void kvm_register_mark_for_reload(struct kvm_vcpu *vcpu, + enum kvm_reg reg) +{ + kvm_assert_register_caching_allowed(vcpu); + __clear_bit(reg, vcpu->arch.regs_avail); + __clear_bit(reg, vcpu->arch.regs_dirty); +} + static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu, enum kvm_reg reg) { diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 3d1fd1776e19..aa5a1d8ea136 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -680,9 +680,12 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, = unsigned long cr3, if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) return -EINVAL; =20 - if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, cr3))) - return -EINVAL; + if (reload_pdptrs && is_pae_paging(vcpu)) { + if (nested_npt) + kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR); + else if (CC(!load_pdptrs(vcpu, cr3))) + return -EINVAL; + } =20 vcpu->arch.cr3 =3D cr3; =20 @@ -2055,15 +2058,21 @@ static bool svm_get_nested_state_pages(struct kvm_v= cpu *vcpu) if (WARN_ON(!is_guest_mode(vcpu))) return true; =20 - if (!vcpu->arch.pdptrs_from_userspace && - !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu)) + if (is_pae_paging(vcpu)) { /* - * Reload the guest's PDPTRs since after a migration - * the guest CR3 might be restored prior to setting the nested - * state which can lead to a load of wrong PDPTRs. + * After migration, CR3 may have been restored before + * KVM_SET_NESTED_STATE, so the PDPTR load into mmu->pdptrs[] + * may have treated CR3 as an L1 GPA. For nNPT, drop the + * cache so the next access reloads them with the proper + * nGPA translation. For !nNPT, reload eagerly unless userspace + * already supplied authoritative PDPTRs via KVM_SET_SREGS2. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) + if (nested_npt_enabled(to_svm(vcpu))) + kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR); + else if (!vcpu->arch.pdptrs_from_userspace && + CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) return false; + } =20 if (!nested_svm_merge_msrpm(vcpu)) { vcpu->run->exit_reason =3D KVM_EXIT_INTERNAL_ERROR; --=20 2.52.0 From nobody Mon Jun 8 08:35:32 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40E583B101D for ; Sat, 30 May 2026 16:55:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160156; cv=none; b=GcJjn8kbgBwAxhY9Qgm6t3O1SysUFZe0D5hoeQFC7L4QxRwVKzmxWGWuCmNEVyru1tbelfbWFIu+9T9HViMdqBPt/XfbiqJ9Mx9wF65fZVeBENt457JEJ6/rJmmt8TzY8QGqSjpa/CYaHXtmw3d6F9avV287bWyVJfdf5jxkW2I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160156; c=relaxed/simple; bh=0t/PwehJOhkZ25cyQ8uMWFa2qQhoO6sby96IaT7I5cs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VHL1Tjg/51apDRf6HvPBxxFlf7Z4bK5vQKlvkRPMUWmFxUVSSpqPA4DSEsuRNeN+rj3Z9oRKfLUOoUb4Hcrhr6aMiLp+Vcdgn58/AJ8OO26iNwFpe/5WPjwfpT4wYENQFiq6XDRqctqrsAaYXc+4YbwQj7zgC+jLVJ++eLKVPcI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=hERZ66L0; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="hERZ66L0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780160154; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8eLGDCCntmNXCbQ4uxKMEWyY9/qJPquq+xtWLIx9isI=; b=hERZ66L0Q+Gw5THpmmfrYfY53gHXpTGYPtknnXlgSWF5s6ele/Pyh6PKuoVMGNt7+J9DSt fGzW13tEj4HI6uQVy330rPy78UAfwZN1eS+MypuYKSB9gZdwM2pwmWLaZNRNkElpGgVLU8 Kl+lXnupSQPWSE7Pk2OH+ZtyCixX4Cs= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-398-ZyVnwrMsOlu7O8ekfM-lLQ-1; Sat, 30 May 2026 12:55:50 -0400 X-MC-Unique: ZyVnwrMsOlu7O8ekfM-lLQ-1 X-Mimecast-MFC-AGG-ID: ZyVnwrMsOlu7O8ekfM-lLQ_1780160149 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A15AE1800344; Sat, 30 May 2026 16:55:49 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 254281800465; Sat, 30 May 2026 16:55:49 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com Subject: [PATCH v2 4/5] KVM: x86: check that kvm_handle_invpcid is only invoked with shadow paging Date: Sat, 30 May 2026 12:55:44 -0400 Message-ID: <20260530165545.25599-5-pbonzini@redhat.com> In-Reply-To: <20260530165545.25599-1-pbonzini@redhat.com> References: <20260530165545.25599-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" This is true for both Intel and AMD. On Intel, "enable INVPCID" is set unconditionally if supported, but the vmexit is triggered by the "INVLPG exiting" control which is disabled by enable_ept. On AMD, KVM can intercept INVPCID if NPT is enabled but only in order to inject #UD in the guest. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc0924389398..1913efef6c39 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -14350,6 +14350,9 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsig= ned long type, gva_t gva) return 1; } =20 + if (WARN_ON_ONCE(tdp_enabled)) + return 0; + pcid_enabled =3D kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE); =20 switch (type) { --=20 2.52.0 From nobody Mon Jun 8 08:35:32 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E594B3BFE3B for ; Sat, 30 May 2026 16:55:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160157; cv=none; b=NdSCyyorByDfDQbUX1GVifUOEm19uh4UMGf9nvxqU1y42yg+B7Um0uodusqoQnrCzXEtRfKQ/FJvv6nCpBzM+q4JsyMwKsO8pfUeYNlYieGU+xyL7y+udGLruUEXwbLhWM+YDeX0X4VZHtgXOB/uosp6k4OU7kLfd+CfF16lKpQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780160157; c=relaxed/simple; bh=ZyfpNgGklaSEhCthBgaCX8tV/5LmuUXzQoSStF7G2PU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XRvgyHnTbvhWa4n7Gdsn+/5wGvxJTjiYCAolUeHLqFFUcujLI8ejI5hzuvUk3QBTjpLcBLjRyYou6HEd2yM9FVJwRWVDVx/Ef5rdJuGyxaBdclDgx2yF6uA//hL6IkmG5OiH36zmW157GmOyGPfSyYb1Ps1xYjvyaxseARRTGzU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=V9+V54e9; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V9+V54e9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780160155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+MqUsSM+54sbSP1UmNzpzmrG8YOArV4GH7BmQv0m3oM=; b=V9+V54e9lvnuwPXBXGJ3dtLouEDAdZeg/2ruqg1eUfnWztcpVsYlPfnB8zsHM+xf/WxqDX BNL8/JqXNb/nC7Fw6Hn8YZADe3CZ4TgrCQeS3uskzLIrHr2b7zwjK2wUWT3Hn9qBGj91zi sgJ7qg5a9Rpc9mpF/MF0B+w9W1OTGZU= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-478-VMsJmDB-OxGyrFc0BSv0rA-1; Sat, 30 May 2026 12:55:51 -0400 X-MC-Unique: VMsJmDB-OxGyrFc0BSv0rA-1 X-Mimecast-MFC-AGG-ID: VMsJmDB-OxGyrFc0BSv0rA_1780160150 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 43F851800372; Sat, 30 May 2026 16:55:50 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C8FF21800465; Sat, 30 May 2026 16:55:49 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com Subject: [PATCH v2 5/5] KVM: x86/mmu: move pdptrs out of the MMU Date: Sat, 30 May 2026 12:55:45 -0400 Message-ID: <20260530165545.25599-6-pbonzini@redhat.com> In-Reply-To: <20260530165545.25599-1-pbonzini@redhat.com> References: <20260530165545.25599-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" PDPTRs are part of the CPU state. A bit unconventionally, they are reached via vcpu->arch.walk_mmu instead of being stored in vcpu->arch directly. That is nice in principle---it would allow TDP shadow paging to have its own PDPTRs---but it is not necessary, because EPT has no PDPTRs and NPT does not cache them. Since kvm_pdptr_read does not otherwise need the MMU, drop the pdptrs from the MMU altogether. There is however something to be careful about, in that PDPTRs are now not stored separately in root_mmu and nested_mmu for L1 and L2 guests. In practice this was already not an issue: - for EPT the VMCS0x has to keep them up to date; and for the purpose of emulation they are always loaded from the VMCS on vmentry/vmexit, thanks to the clearing of dirty and available register bitmaps in vmx_switch_vmcs() - for NPT, VCPU_EXREG_PDPTR is similarly cleared for nNPT, which does not cache the PDPTRs; while for non-nNPT the PDPTRs are loaded together with the load of CR3. Note that page table PDPTRs are not affected, since they are stored in pae_root. Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 5 ++--- arch/x86/kvm/kvm_cache_regs.h | 4 ++-- arch/x86/kvm/svm/svm.c | 2 +- arch/x86/kvm/vmx/vmx.c | 20 ++++++++------------ arch/x86/kvm/x86.c | 6 +++--- 5 files changed, 16 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 2a4be06177ff..02ac77b54c58 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -540,10 +540,7 @@ struct kvm_mmu { * the bits spte never used. */ struct rsvd_bits_validate shadow_zero_check; - struct rsvd_bits_validate guest_rsvd_check; - - u64 pdptrs[4]; /* pae */ }; =20 enum pmc_type { @@ -902,6 +899,8 @@ struct kvm_vcpu_arch { */ struct kvm_mmu *walk_mmu; =20 + u64 pdptrs[4]; /* pae */ + struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 6bae5db5a54e..2a93e8c45c1a 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -192,12 +192,12 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcp= u, int index) if (!kvm_register_is_available(vcpu, VCPU_REG_PDPTR)) kvm_x86_call(cache_reg)(vcpu, VCPU_REG_PDPTR); =20 - return vcpu->arch.walk_mmu->pdptrs[index]; + return vcpu->arch.pdptrs[index]; } =20 static inline void kvm_pdptr_write(struct kvm_vcpu *vcpu, int index, u64 v= alue) { - vcpu->arch.walk_mmu->pdptrs[index] =3D value; + vcpu->arch.pdptrs[index] =3D value; } =20 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 84496bc0508d..58a87a1b2ff8 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1526,7 +1526,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum= kvm_reg reg) switch (reg) { case VCPU_REG_PDPTR: /* - * When !npt_enabled, mmu->pdptrs[] is already available since + * When !npt_enabled, vcpu->pdptrs[] is already available since * it is always updated per SDM when moving to CRs. */ if (npt_enabled) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f13d56bc32d1..3570e83ef280 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3376,30 +3376,26 @@ void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) =20 void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.walk_mmu; - if (!kvm_register_is_dirty(vcpu, VCPU_REG_PDPTR)) return; =20 if (is_pae_paging(vcpu)) { - vmcs_write64(GUEST_PDPTR0, mmu->pdptrs[0]); - vmcs_write64(GUEST_PDPTR1, mmu->pdptrs[1]); - vmcs_write64(GUEST_PDPTR2, mmu->pdptrs[2]); - vmcs_write64(GUEST_PDPTR3, mmu->pdptrs[3]); + vmcs_write64(GUEST_PDPTR0, vcpu->arch.pdptrs[0]); + vmcs_write64(GUEST_PDPTR1, vcpu->arch.pdptrs[1]); + vmcs_write64(GUEST_PDPTR2, vcpu->arch.pdptrs[2]); + vmcs_write64(GUEST_PDPTR3, vcpu->arch.pdptrs[3]); } } =20 void ept_save_pdptrs(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu =3D vcpu->arch.walk_mmu; - if (WARN_ON_ONCE(!is_pae_paging(vcpu))) return; =20 - mmu->pdptrs[0] =3D vmcs_read64(GUEST_PDPTR0); - mmu->pdptrs[1] =3D vmcs_read64(GUEST_PDPTR1); - mmu->pdptrs[2] =3D vmcs_read64(GUEST_PDPTR2); - mmu->pdptrs[3] =3D vmcs_read64(GUEST_PDPTR3); + vcpu->arch.pdptrs[0] =3D vmcs_read64(GUEST_PDPTR0); + vcpu->arch.pdptrs[1] =3D vmcs_read64(GUEST_PDPTR1); + vcpu->arch.pdptrs[2] =3D vmcs_read64(GUEST_PDPTR2); + vcpu->arch.pdptrs[3] =3D vmcs_read64(GUEST_PDPTR3); =20 kvm_register_mark_available(vcpu, VCPU_REG_PDPTR); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1913efef6c39..8dea09b20162 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1071,7 +1071,7 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long = cr3) gpa_t real_gpa; int i; int ret; - u64 pdpte[ARRAY_SIZE(mmu->pdptrs)]; + u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)]; =20 /* * If the MMU is nested, CR3 holds an L2 GPA and needs to be translated @@ -1100,10 +1100,10 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned lon= g cr3) * Marking VCPU_REG_PDPTR dirty doesn't work for !tdp_enabled. * Shadow page roots need to be reconstructed instead. */ - if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs))) + if (!tdp_enabled && memcmp(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pd= ptrs))) kvm_mmu_free_roots(vcpu->kvm, mmu, KVM_MMU_ROOT_CURRENT); =20 - memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); + memcpy(vcpu->arch.pdptrs, pdpte, sizeof(vcpu->arch.pdptrs)); kvm_register_mark_dirty(vcpu, VCPU_REG_PDPTR); kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu); vcpu->arch.pdptrs_from_userspace =3D false; --=20 2.52.0