From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE372C433FE for ; Wed, 9 Feb 2022 17:00:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237641AbiBIRAz (ORCPT ); Wed, 9 Feb 2022 12:00:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237535AbiBIRAp (ORCPT ); Wed, 9 Feb 2022 12:00:45 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C2246C05CB8E for ; Wed, 9 Feb 2022 09:00:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426047; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jEqL6PLWr+bZ/5j2l/zElDfsDQbZUnnUlRxHORnTr6s=; b=ivtfcwTEYb6XbxMi3EouTnlNO4IPsqekza5vYAnc2zVvErhfsyWm3VgxUIvSZvIPRup11b WUrPIsWaLbde1cRRGIcRoUDXlreMLaEPN4FCrDmJV41KFBAnZFRB9icgvg6WkmBCEOGXa5 RJpIzycnqnUcrL1hUpt2LIOK61HocoU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-574-YSjKLW7EP7yxonBU0a_7RQ-1; Wed, 09 Feb 2022 12:00:42 -0500 X-MC-Unique: YSjKLW7EP7yxonBU0a_7RQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 649588145E2; Wed, 9 Feb 2022 17:00:41 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B5657D499; Wed, 9 Feb 2022 17:00:29 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com, stable@vger.kernel.org Subject: [PATCH 01/12] KVM: x86: host-initiated EFER.LME write affects the MMU Date: Wed, 9 Feb 2022 12:00:09 -0500 Message-Id: <20220209170020.1775368-2-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" While the guest runs, EFER.LME cannot change unless CR0.PG is clear, and th= erefore EFER.NX is the only bit that can affect the MMU role. However, set_efer ac= cepts a host-initiated change to EFER.LME even with CR0.PG=3D1. In that case, the MMU has to be reset. Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-i= nitiated writes") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini Reviewed-by: Sean Christopherson --- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/x86.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 51faa2c76ca5..a5a50cfeffff 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -48,6 +48,7 @@ X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE) =20 #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP) +#define KVM_MMU_EFER_ROLE_BITS (EFER_LME | EFER_NX) =20 static __always_inline u64 rsvd_bits(int s, int e) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9a9006226501..5e1298aef9e2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1647,7 +1647,7 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr= _data *msr_info) } =20 /* Update reserved bits */ - if ((efer ^ old_efer) & EFER_NX) + if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS) kvm_mmu_reset_context(vcpu); =20 return 0; --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACC84C433EF for ; Wed, 9 Feb 2022 17:01:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236973AbiBIRBf (ORCPT ); Wed, 9 Feb 2022 12:01:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237657AbiBIRAv (ORCPT ); Wed, 9 Feb 2022 12:00:51 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A57A2C05CB86 for ; Wed, 9 Feb 2022 09:00:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=maNLEYrkI9/B7Gp1A8K9ykB0aRV+CgOE/hJeFpDr/sg=; b=Tqnqmf2IAQMU+E7vrJ+3jpvxHPfWKhUSgL5YdpQKzWR2O3vv1s93JF9mZsaS3YMRSyrOzi 9wDk4FcgbcTcmElR6OZD30b460uI9Y3oKzhzyzo6stteN2uzx8jarTzYGw/id4ev2SOxn/ xghDbDrUiFpAgMc2oHGJAMciUEnqwgA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-594-IT82pY6aOeenyWtwkXPNyg-1; Wed, 09 Feb 2022 12:00:50 -0500 X-MC-Unique: IT82pY6aOeenyWtwkXPNyg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 35C388145EB; Wed, 9 Feb 2022 17:00:49 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7E5017CD66; Wed, 9 Feb 2022 17:00:41 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 02/12] KVM: MMU: move MMU role accessors to header Date: Wed, 9 Feb 2022 12:00:10 -0500 Message-Id: <20220209170020.1775368-3-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" We will use is_cr0_pg to check whether a page fault can be delivered. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 21 +++++++++++++++++++++ arch/x86/kvm/mmu/mmu.c | 21 --------------------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index a5a50cfeffff..b9d06a218b2c 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -65,6 +65,27 @@ static __always_inline u64 rsvd_bits(int s, int e) return ((2ULL << (e - s)) - 1) << s; } =20 +/* + * The MMU itself (with a valid role) is the single source of truth for the + * MMU. Do not use the regs used to build the MMU/role, nor the vCPU. The + * regs don't account for dependencies, e.g. clearing CR4 bits if CR0.PG= =3D1, + * and the vCPU may be incorrect/irrelevant. + */ +#define BUILD_MMU_ROLE_ACCESSOR(base_or_ext, reg, name) \ +static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu) \ +{ \ + return !!(mmu->mmu_role. base_or_ext . reg##_##name); \ +} +BUILD_MMU_ROLE_ACCESSOR(ext, cr0, pg); +BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pse); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pae); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smep); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smap); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pke); +BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57); +BUILD_MMU_ROLE_ACCESSOR(base, efer, nx); + void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_= mask); void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only); =20 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 296f8723f9ae..e0c0f0bc2e8b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -223,27 +223,6 @@ BUILD_MMU_ROLE_REGS_ACCESSOR(cr4, la57, X86_CR4_LA57); BUILD_MMU_ROLE_REGS_ACCESSOR(efer, nx, EFER_NX); BUILD_MMU_ROLE_REGS_ACCESSOR(efer, lma, EFER_LMA); =20 -/* - * The MMU itself (with a valid role) is the single source of truth for the - * MMU. Do not use the regs used to build the MMU/role, nor the vCPU. The - * regs don't account for dependencies, e.g. clearing CR4 bits if CR0.PG= =3D1, - * and the vCPU may be incorrect/irrelevant. - */ -#define BUILD_MMU_ROLE_ACCESSOR(base_or_ext, reg, name) \ -static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu) \ -{ \ - return !!(mmu->mmu_role. base_or_ext . reg##_##name); \ -} -BUILD_MMU_ROLE_ACCESSOR(ext, cr0, pg); -BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pse); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pae); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smep); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smap); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pke); -BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57); -BUILD_MMU_ROLE_ACCESSOR(base, efer, nx); - static struct kvm_mmu_role_regs vcpu_to_role_regs(struct kvm_vcpu *vcpu) { struct kvm_mmu_role_regs regs =3D { --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC752C433FE for ; Wed, 9 Feb 2022 17:01:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237766AbiBIRBm (ORCPT ); Wed, 9 Feb 2022 12:01:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237744AbiBIRBG (ORCPT ); Wed, 9 Feb 2022 12:01:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E802EC05CBB3 for ; Wed, 9 Feb 2022 09:01:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426067; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+02+sNgYCOuI9QfDVR0KE7R+hKxoZomtfsP09w88qVA=; b=cu1Hp2avo1IE8GfebqB5A1A+VX49PMlw5ahpoYPLcND/qwWDeCNIcUkULoLGj2w3PwUqTi iwl5XiO1+2wC+eGR+Bmu0bob0k/xDj7pwsbiTZBwzrm8I6BnBmDhWaWlG8eup5A152SqsT R44raYBdE1L413tjXvwtimMEWXd+OOc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-348-kSAM0rXCM7SHfbBr7gI1jw-1; Wed, 09 Feb 2022 12:01:06 -0500 X-MC-Unique: kSAM0rXCM7SHfbBr7gI1jw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DD56718397B3; Wed, 9 Feb 2022 17:00:49 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 503EF7CD66; Wed, 9 Feb 2022 17:00:49 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 03/12] KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 Date: Wed, 9 Feb 2022 12:00:11 -0500 Message-Id: <20220209170020.1775368-4-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Enabling async page faults is nonsensical if paging is disabled, but it is allowed because CR0.PG=3D0 does not clear the async page fault MSR. Just ignore them and only use the artificial halt state, similar to what happens in guest mode if async #PF vmexits are disabled. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/x86.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5e1298aef9e2..98aca0f2af12 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12272,7 +12272,9 @@ static inline bool apf_pageready_slot_free(struct k= vm_vcpu *vcpu) =20 static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu) { - if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu)) + if (is_guest_mode(vcpu) + ? !vcpu->arch.apf.delivery_as_pf_vmexit + : !is_cr0_pg(vcpu->arch.mmu)) return false; =20 if (!kvm_pv_async_pf_enabled(vcpu) || --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CE1FC433EF for ; Wed, 9 Feb 2022 17:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237716AbiBIRA5 (ORCPT ); Wed, 9 Feb 2022 12:00:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237684AbiBIRAw (ORCPT ); Wed, 9 Feb 2022 12:00:52 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 02A1EC05CB87 for ; Wed, 9 Feb 2022 09:00:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8TYNwAs02vm4CDNx1aPCczbwgm5BVw9EzNs2ZbO9w3I=; b=UrrGe7PFHEmI5PJR8GRocA1aGE/rF9a8jof+53L177wZABJvXK3WpoY0d0FQ5TIHcIjOsm SQZ2PTcXkpuXYDx6EVkaT88ld8wAwZ91TVpPN3RKvFzbPtAIXV99wbsOaYScpcQ+d5TtUw Ahz1GzI0EQfaLq7dnOEOAH0v86/glZE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-155-QukTdJvoOv-PPolitsLQWQ-1; Wed, 09 Feb 2022 12:00:51 -0500 X-MC-Unique: QukTdJvoOv-PPolitsLQWQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 802E48145E0; Wed, 9 Feb 2022 17:00:50 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 03CF37CD66; Wed, 9 Feb 2022 17:00:49 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 04/12] KVM: MMU: WARN if PAE roots linger after kvm_mmu_unload Date: Wed, 9 Feb 2022 12:00:12 -0500 Message-Id: <20220209170020.1775368-5-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index e0c0f0bc2e8b..7b5765ced928 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5065,12 +5065,21 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) return r; } =20 +static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) +{ + int i; + kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL); + WARN_ON(VALID_PAGE(mmu->root_hpa)); + if (mmu->pae_root) { + for (i =3D 0; i < 4; ++i) + WARN_ON(IS_VALID_PAE_ROOT(mmu->pae_root[i])); + } +} + void kvm_mmu_unload(struct kvm_vcpu *vcpu) { - kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa)); - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa)); + __kvm_mmu_unload(vcpu, &vcpu->arch.root_mmu); + __kvm_mmu_unload(vcpu, &vcpu->arch.guest_mmu); } =20 static bool need_remote_flush(u64 old, u64 new) --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECE21C433FE for ; Wed, 9 Feb 2022 17:01:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237737AbiBIRBC (ORCPT ); Wed, 9 Feb 2022 12:01:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237677AbiBIRAw (ORCPT ); Wed, 9 Feb 2022 12:00:52 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B25E9C0613C9 for ; Wed, 9 Feb 2022 09:00:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fZIUln5prMWPKDJM3hoJG1ZFa53meIsAdYHTNLyA4Gc=; b=UwHowb0C6gR9Ri/qEs89whxFq63b3Yx3qTZpRmJnMRzcecCLsGI1WPTHk5Kavs2lrmQOl4 ghi8RViHB+UALnwJkkX9T1p3VhZfmg/chDZS3y7/9br1u9kR4jpBKCL0gtoL3gxmrYjwTi i2nz7f49mwqgvvwIrtJ1EvakLGr++Dc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-517-qEOHYrzmNhq2Tq5dTVLzIw-1; Wed, 09 Feb 2022 12:00:53 -0500 X-MC-Unique: qEOHYrzmNhq2Tq5dTVLzIw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 21088101F7A6; Wed, 9 Feb 2022 17:00:51 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 99B9D7CD66; Wed, 9 Feb 2022 17:00:50 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 05/12] KVM: MMU: avoid NULL-pointer dereference on page freeing bugs Date: Wed, 9 Feb 2022 12:00:13 -0500 Message-Id: <20220209170020.1775368-6-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If kvm_mmu_free_roots encounters a PAE page table where a 64-bit page table is expected, the result is a NULL pointer dereference. Instead just WARN and exit. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7b5765ced928..d0f2077bd798 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3201,6 +3201,8 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_t= *root_hpa, return; =20 sp =3D to_shadow_page(*root_hpa & PT64_BASE_ADDR_MASK); + if (WARN_ON(!sp)) + return; =20 if (is_tdp_mmu_page(sp)) kvm_tdp_mmu_put_root(kvm, sp, false); --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8A91C433F5 for ; Wed, 9 Feb 2022 17:01:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237788AbiBIRBI (ORCPT ); Wed, 9 Feb 2022 12:01:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237672AbiBIRAz (ORCPT ); Wed, 9 Feb 2022 12:00:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CCFF6C05CB89 for ; Wed, 9 Feb 2022 09:00:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1FNh7pkHY41jFWmZFdVN600EQqaJlF1A+HaHz0XISvI=; b=g69tFctEqKOBTb+dhc3dxSdte1herpcLLpe/t2UWB3wBQP5/eQDNqULiF2N4IltTSp/CTe 24rvCffgC0sK/V5QjJddFzV/DC1ynZsQW9hcnLxLFL2GvpLAbFucuta+pqjLxBafF1qGu4 mtMCNVjGAAHk7HUdBR4GEsfve+pt5e0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-282-bhh_Xgj_PXqfi1V8wlP2YA-1; Wed, 09 Feb 2022 12:00:54 -0500 X-MC-Unique: bhh_Xgj_PXqfi1V8wlP2YA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B7C2718397A7; Wed, 9 Feb 2022 17:00:51 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B8BD7CD66; Wed, 9 Feb 2022 17:00:51 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 06/12] KVM: MMU: rename kvm_mmu_reload Date: Wed, 9 Feb 2022 12:00:14 -0500 Message-Id: <20220209170020.1775368-7-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The name of kvm_mmu_reload is very confusing for two reasons: first, KVM_REQ_MMU_RELOAD actually does not call it; second, it only does anything if there is no valid root. Rename it to kvm_mmu_ensure_valid_root, which matches the actual behavior better. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/x86.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index b9d06a218b2c..c9f1c2162ade 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -104,7 +104,7 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu); void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); =20 -static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) +static inline int kvm_mmu_ensure_valid_root(struct kvm_vcpu *vcpu) { if (likely(vcpu->arch.mmu->root_hpa !=3D INVALID_PAGE)) return 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 98aca0f2af12..2685fb62807e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9976,7 +9976,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } } =20 - r =3D kvm_mmu_reload(vcpu); + r =3D kvm_mmu_ensure_valid_root(vcpu); if (unlikely(r)) { goto cancel_injection; } @@ -12164,7 +12164,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcp= u, struct kvm_async_pf *work) work->wakeup_all) return; =20 - r =3D kvm_mmu_reload(vcpu); + r =3D kvm_mmu_ensure_valid_root(vcpu); if (unlikely(r)) return; =20 --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0DD3C433F5 for ; Wed, 9 Feb 2022 17:01:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237810AbiBIRBS (ORCPT ); Wed, 9 Feb 2022 12:01:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237692AbiBIRA4 (ORCPT ); Wed, 9 Feb 2022 12:00:56 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B0B5FC05CB87 for ; Wed, 9 Feb 2022 09:00:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426057; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rST8ZCfpeFgIVD9zxeUahDKLk3PTvPhk0YqFQNkI87c=; b=KgjLEZwHTXsB+z+loLEyFC8TH57bABJzndBcAJzTrr/9E+g9AsJ6inkZ2Y4F3yDRMYWiS2 mA66YR2eqoqs3G/qLp0m+ADOKFO/i7Y8iFtgHRux9lMmJlydbLavg30vUDMHG4KN5+9r+z dkUsOfotQseBDi2WxhW2G1KX4FLj8V8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-408-E6H_uRNaMJG51fiiYEFBJQ-1; Wed, 09 Feb 2022 12:00:55 -0500 X-MC-Unique: E6H_uRNaMJG51fiiYEFBJQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5AB7818397BE; Wed, 9 Feb 2022 17:00:52 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id D30EF7CD66; Wed, 9 Feb 2022 17:00:51 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 07/12] KVM: x86: use struct kvm_mmu_root_info for mmu->root Date: Wed, 9 Feb 2022 12:00:15 -0500 Message-Id: <20220209170020.1775368-8-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The root_hpa and root_pgd fields form essentially a struct kvm_mmu_root_inf= o. Use the struct to have more consistency between mmu->root and mmu->prev_roots. The patch is entirely search and replace except for cached_root_available, which does not need a temporary struct kvm_mmu_root_info anymore. Signed-off-by: Paolo Bonzini --- arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu.h | 4 +- arch/x86/kvm/mmu/mmu.c | 69 +++++++++++++++------------------ arch/x86/kvm/mmu/mmu_audit.c | 4 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- arch/x86/kvm/mmu/tdp_mmu.h | 2 +- arch/x86/kvm/vmx/nested.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 2 +- 10 files changed, 42 insertions(+), 50 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index a0d2925b6651..6da9a460e584 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -432,8 +432,7 @@ struct kvm_mmu { int (*sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp); void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa); - hpa_t root_hpa; - gpa_t root_pgd; + struct kvm_mmu_root_info root; union kvm_mmu_role mmu_role; u8 root_level; u8 shadow_root_level; diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index c9f1c2162ade..f896c438c8ee 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -106,7 +106,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); =20 static inline int kvm_mmu_ensure_valid_root(struct kvm_vcpu *vcpu) { - if (likely(vcpu->arch.mmu->root_hpa !=3D INVALID_PAGE)) + if (likely(vcpu->arch.mmu->root.hpa !=3D INVALID_PAGE)) return 0; =20 return kvm_mmu_load(vcpu); @@ -128,7 +128,7 @@ static inline unsigned long kvm_get_active_pcid(struct = kvm_vcpu *vcpu) =20 static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) { - u64 root_hpa =3D vcpu->arch.mmu->root_hpa; + u64 root_hpa =3D vcpu->arch.mmu->root.hpa; =20 if (!VALID_PAGE(root_hpa)) return; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d0f2077bd798..3c3f597ea00d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2141,7 +2141,7 @@ static void shadow_walk_init_using_root(struct kvm_sh= adow_walk_iterator *iterato * prev_root is currently only used for 64-bit hosts. So only * the active root_hpa is valid here. */ - BUG_ON(root !=3D vcpu->arch.mmu->root_hpa); + BUG_ON(root !=3D vcpu->arch.mmu->root.hpa); =20 iterator->shadow_addr =3D vcpu->arch.mmu->pae_root[(addr >> 30) & 3]; @@ -2155,7 +2155,7 @@ static void shadow_walk_init_using_root(struct kvm_sh= adow_walk_iterator *iterato static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, struct kvm_vcpu *vcpu, u64 addr) { - shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root_hpa, + shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root.hpa, addr); } =20 @@ -3224,7 +3224,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct= kvm_mmu *mmu, BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >=3D BITS_PER_LONG); =20 /* Before acquiring the MMU lock, see if we need to do any real work. */ - if (!(free_active_root && VALID_PAGE(mmu->root_hpa))) { + if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) { for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) && VALID_PAGE(mmu->prev_roots[i].hpa)) @@ -3244,7 +3244,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct= kvm_mmu *mmu, if (free_active_root) { if (mmu->shadow_root_level >=3D PT64_ROOT_4LEVEL && (mmu->root_level >=3D PT64_ROOT_4LEVEL || mmu->direct_map)) { - mmu_free_root_page(kvm, &mmu->root_hpa, &invalid_list); + mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list); } else if (mmu->pae_root) { for (i =3D 0; i < 4; ++i) { if (!IS_VALID_PAE_ROOT(mmu->pae_root[i])) @@ -3255,8 +3255,8 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct= kvm_mmu *mmu, mmu->pae_root[i] =3D INVALID_PAE_ROOT; } } - mmu->root_hpa =3D INVALID_PAGE; - mmu->root_pgd =3D 0; + mmu->root.hpa =3D INVALID_PAGE; + mmu->root.pgd =3D 0; } =20 kvm_mmu_commit_zap_page(kvm, &invalid_list); @@ -3329,10 +3329,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *= vcpu) =20 if (is_tdp_mmu_enabled(vcpu->kvm)) { root =3D kvm_tdp_mmu_get_vcpu_root_hpa(vcpu); - mmu->root_hpa =3D root; + mmu->root.hpa =3D root; } else if (shadow_root_level >=3D PT64_ROOT_4LEVEL) { root =3D mmu_alloc_root(vcpu, 0, 0, shadow_root_level, true); - mmu->root_hpa =3D root; + mmu->root.hpa =3D root; } else if (shadow_root_level =3D=3D PT32E_ROOT_LEVEL) { if (WARN_ON_ONCE(!mmu->pae_root)) { r =3D -EIO; @@ -3347,15 +3347,15 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *= vcpu) mmu->pae_root[i] =3D root | PT_PRESENT_MASK | shadow_me_mask; } - mmu->root_hpa =3D __pa(mmu->pae_root); + mmu->root.hpa =3D __pa(mmu->pae_root); } else { WARN_ONCE(1, "Bad TDP root level =3D %d\n", shadow_root_level); r =3D -EIO; goto out_unlock; } =20 - /* root_pgd is ignored for direct MMUs. */ - mmu->root_pgd =3D 0; + /* root.pgd is ignored for direct MMUs. */ + mmu->root.pgd =3D 0; out_unlock: write_unlock(&vcpu->kvm->mmu_lock); return r; @@ -3468,7 +3468,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vc= pu) if (mmu->root_level >=3D PT64_ROOT_4LEVEL) { root =3D mmu_alloc_root(vcpu, root_gfn, 0, mmu->shadow_root_level, false); - mmu->root_hpa =3D root; + mmu->root.hpa =3D root; goto set_root_pgd; } =20 @@ -3518,14 +3518,14 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *= vcpu) } =20 if (mmu->shadow_root_level =3D=3D PT64_ROOT_5LEVEL) - mmu->root_hpa =3D __pa(mmu->pml5_root); + mmu->root.hpa =3D __pa(mmu->pml5_root); else if (mmu->shadow_root_level =3D=3D PT64_ROOT_4LEVEL) - mmu->root_hpa =3D __pa(mmu->pml4_root); + mmu->root.hpa =3D __pa(mmu->pml4_root); else - mmu->root_hpa =3D __pa(mmu->pae_root); + mmu->root.hpa =3D __pa(mmu->pae_root); =20 set_root_pgd: - mmu->root_pgd =3D root_pgd; + mmu->root.pgd =3D root_pgd; out_unlock: write_unlock(&vcpu->kvm->mmu_lock); =20 @@ -3638,13 +3638,13 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu->direct_map) return; =20 - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (!VALID_PAGE(vcpu->arch.mmu->root.hpa)) return; =20 vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); =20 if (vcpu->arch.mmu->root_level >=3D PT64_ROOT_4LEVEL) { - hpa_t root =3D vcpu->arch.mmu->root_hpa; + hpa_t root =3D vcpu->arch.mmu->root.hpa; sp =3D to_shadow_page(root); =20 if (!is_unsync_root(root)) @@ -3935,7 +3935,7 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, st= ruct kvm_page_fault *fault, static bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int mmu_seq) { - struct kvm_mmu_page *sp =3D to_shadow_page(vcpu->arch.mmu->root_hpa); + struct kvm_mmu_page *sp =3D to_shadow_page(vcpu->arch.mmu->root.hpa); =20 /* Special roots, e.g. pae_root, are not backed by shadow pages. */ if (sp && is_obsolete_sp(vcpu->kvm, sp)) @@ -4092,34 +4092,27 @@ static inline bool is_root_usable(struct kvm_mmu_ro= ot_info *root, gpa_t pgd, /* * Find out if a previously cached root matching the new pgd/role is avail= able. * The current root is also inserted into the cache. - * If a matching root was found, it is assigned to kvm_mmu->root_hpa and t= rue is + * If a matching root was found, it is assigned to kvm_mmu->root.hpa and t= rue is * returned. - * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root_hpa= and + * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa= and * false is returned. This root should now be freed by the caller. */ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd, union kvm_mmu_page_role new_role) { uint i; - struct kvm_mmu_root_info root; struct kvm_mmu *mmu =3D vcpu->arch.mmu; =20 - root.pgd =3D mmu->root_pgd; - root.hpa =3D mmu->root_hpa; - - if (is_root_usable(&root, new_pgd, new_role)) + if (is_root_usable(&mmu->root, new_pgd, new_role)) return true; =20 for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { - swap(root, mmu->prev_roots[i]); + swap(mmu->root, mmu->prev_roots[i]); =20 - if (is_root_usable(&root, new_pgd, new_role)) + if (is_root_usable(&mmu->root, new_pgd, new_role)) break; } =20 - mmu->root_hpa =3D root.hpa; - mmu->root_pgd =3D root.pgd; - return i < KVM_MMU_NUM_PREV_ROOTS; } =20 @@ -4175,7 +4168,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, = gpa_t new_pgd, */ if (!new_role.direct) __clear_sp_write_flooding_count( - to_shadow_page(vcpu->arch.mmu->root_hpa)); + to_shadow_page(vcpu->arch.mmu->root.hpa)); } =20 void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) @@ -5071,7 +5064,7 @@ static void __kvm_mmu_unload(struct kvm_vcpu *vcpu, s= truct kvm_mmu *mmu) { int i; kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(mmu->root_hpa)); + WARN_ON(VALID_PAGE(mmu->root.hpa)); if (mmu->pae_root) { for (i =3D 0; i < 4; ++i) WARN_ON(IS_VALID_PAE_ROOT(mmu->pae_root[i])); @@ -5266,7 +5259,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t c= r2_or_gpa, u64 error_code, int r, emulation_type =3D EMULTYPE_PF; bool direct =3D vcpu->arch.mmu->direct_map; =20 - if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) return RET_PF_RETRY; =20 r =3D RET_PF_INVALID; @@ -5338,7 +5331,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, st= ruct kvm_mmu *mmu, return; =20 if (root_hpa =3D=3D INVALID_PAGE) { - mmu->invlpg(vcpu, gva, mmu->root_hpa); + mmu->invlpg(vcpu, gva, mmu->root.hpa); =20 /* * INVLPG is required to invalidate any global mappings for the VA, @@ -5374,7 +5367,7 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t= gva, unsigned long pcid) uint i; =20 if (pcid =3D=3D kvm_get_active_pcid(vcpu)) { - mmu->invlpg(vcpu, gva, mmu->root_hpa); + mmu->invlpg(vcpu, gva, mmu->root.hpa); tlb_flush =3D true; } =20 @@ -5487,8 +5480,8 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, st= ruct kvm_mmu *mmu) struct page *page; int i; =20 - mmu->root_hpa =3D INVALID_PAGE; - mmu->root_pgd =3D 0; + mmu->root.hpa =3D INVALID_PAGE; + mmu->root.pgd =3D 0; for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) mmu->prev_roots[i] =3D KVM_MMU_ROOT_INFO_INVALID; =20 diff --git a/arch/x86/kvm/mmu/mmu_audit.c b/arch/x86/kvm/mmu/mmu_audit.c index f31fdb874f1f..3e5d62a25350 100644 --- a/arch/x86/kvm/mmu/mmu_audit.c +++ b/arch/x86/kvm/mmu/mmu_audit.c @@ -56,11 +56,11 @@ static void mmu_spte_walk(struct kvm_vcpu *vcpu, inspec= t_spte_fn fn) int i; struct kvm_mmu_page *sp; =20 - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (!VALID_PAGE(vcpu->arch.mmu->root.hpa)) return; =20 if (vcpu->arch.mmu->root_level >=3D PT64_ROOT_4LEVEL) { - hpa_t root =3D vcpu->arch.mmu->root_hpa; + hpa_t root =3D vcpu->arch.mmu->root.hpa; =20 sp =3D to_shadow_page(root); __mmu_spte_walk(vcpu, sp, fn, vcpu->arch.mmu->root_level); diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 5b5bdac97c7b..346f3bad3cb9 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -668,7 +668,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, if (FNAME(gpte_changed)(vcpu, gw, top_level)) goto out_gpte_changed; =20 - if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) goto out_gpte_changed; =20 for (shadow_walk_init(&it, vcpu, fault->addr); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 8def8f810cb0..debf08212f12 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -657,7 +657,7 @@ static inline void tdp_mmu_set_spte_no_dirty_log(struct= kvm *kvm, else =20 #define tdp_mmu_for_each_pte(_iter, _mmu, _start, _end) \ - for_each_tdp_pte(_iter, to_shadow_page(_mmu->root_hpa), _start, _end) + for_each_tdp_pte(_iter, to_shadow_page(_mmu->root.hpa), _start, _end) =20 /* * Yield if the MMU lock is contended or this thread needs to return contr= ol diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index 3f987785702a..57c73d8f76ce 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -95,7 +95,7 @@ static inline bool is_tdp_mmu_page(struct kvm_mmu_page *s= p) { return sp->tdp_mmu static inline bool is_tdp_mmu(struct kvm_mmu *mmu) { struct kvm_mmu_page *sp; - hpa_t hpa =3D mmu->root_hpa; + hpa_t hpa =3D mmu->root.hpa; =20 if (WARN_ON(!VALID_PAGE(hpa))) return false; diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index c73e4d938ddc..29289ecca223 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -5466,7 +5466,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); =20 roots_to_free =3D 0; - if (nested_ept_root_matches(mmu->root_hpa, mmu->root_pgd, + if (nested_ept_root_matches(mmu->root.hpa, mmu->root.pgd, operand.eptp)) roots_to_free |=3D KVM_MMU_ROOT_CURRENT; =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 70e7f00362bc..5542a2b536e0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2957,7 +2957,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcp= u *vcpu) static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) { struct kvm_mmu *mmu =3D vcpu->arch.mmu; - u64 root_hpa =3D mmu->root_hpa; + u64 root_hpa =3D mmu->root.hpa; =20 /* No flush required if the current context is invalid. */ if (!VALID_PAGE(root_hpa)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2685fb62807e..0d3646535cc5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -762,7 +762,7 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vc= pu, if ((fault->error_code & PFERR_PRESENT_MASK) && !(fault->error_code & PFERR_RSVD_MASK)) kvm_mmu_invalidate_gva(vcpu, fault_mmu, fault->address, - fault_mmu->root_hpa); + fault_mmu->root.hpa); =20 fault_mmu->inject_page_fault(vcpu, fault); return fault->nested_page_fault; --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F280AC433F5 for ; Wed, 9 Feb 2022 17:01:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231791AbiBIRBF (ORCPT ); Wed, 9 Feb 2022 12:01:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237624AbiBIRAy (ORCPT ); Wed, 9 Feb 2022 12:00:54 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 92A3CC05CB87 for ; Wed, 9 Feb 2022 09:00:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426056; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UmiLwad0yNHqYcNsluzF3JRoNRWaXkEyVI1t62KrVok=; b=OiD7WnRIDmh4DeDoLG2p0w/9q7nJuo6lApc2+3E3e1OF4wDPcGF9PFoIMufUPfrt7xqOAD PJ8ZxuLYnPyFx5IxjHOaMPB+IQCQ/u45NO/QeOSVeWlaaMPbCLj5az6wJyw5CW0peCH/K6 feuP+b1eQU8e8AnDfm2RrpQQidCgObY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-462-U5WUf8sOO3O9cGvcDwstzQ-1; Wed, 09 Feb 2022 12:00:55 -0500 X-MC-Unique: U5WUf8sOO3O9cGvcDwstzQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id F0EBB101F7A1; Wed, 9 Feb 2022 17:00:52 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 754F47CD66; Wed, 9 Feb 2022 17:00:52 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 08/12] KVM: MMU: do not consult levels when freeing roots Date: Wed, 9 Feb 2022 12:00:16 -0500 Message-Id: <20220209170020.1775368-9-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Right now, PGD caching requires a complicated dance of first computing the MMU role and passing it to __kvm_mmu_new_pgd, and then separately calli= ng kvm_init_mmu. Part of this is due to kvm_mmu_free_roots using mmu->root_level and mmu->shadow_root_level to distinguish whether the page table uses a single root or 4 PAE roots. Because kvm_init_mmu can overwrite mmu->root_level, kvm_mmu_free_roots must be called before kvm_init_mmu. However, even after kvm_init_mmu there is a way to detect whether the page = table has a single root or four, because the pae_root does not have an associated struct kvm_mmu_page. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3c3f597ea00d..95d0fa0bb876 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3219,12 +3219,15 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, stru= ct kvm_mmu *mmu, struct kvm *kvm =3D vcpu->kvm; int i; LIST_HEAD(invalid_list); - bool free_active_root =3D roots_to_free & KVM_MMU_ROOT_CURRENT; + bool free_active_root; =20 BUILD_BUG_ON(KVM_MMU_NUM_PREV_ROOTS >=3D BITS_PER_LONG); =20 /* Before acquiring the MMU lock, see if we need to do any real work. */ - if (!(free_active_root && VALID_PAGE(mmu->root.hpa))) { + free_active_root =3D (roots_to_free & KVM_MMU_ROOT_CURRENT) + && VALID_PAGE(mmu->root.hpa); + + if (!free_active_root) { for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) if ((roots_to_free & KVM_MMU_ROOT_PREVIOUS(i)) && VALID_PAGE(mmu->prev_roots[i].hpa)) @@ -3242,8 +3245,7 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct= kvm_mmu *mmu, &invalid_list); =20 if (free_active_root) { - if (mmu->shadow_root_level >=3D PT64_ROOT_4LEVEL && - (mmu->root_level >=3D PT64_ROOT_4LEVEL || mmu->direct_map)) { + if (to_shadow_page(mmu->root.hpa)) { mmu_free_root_page(kvm, &mmu->root.hpa, &invalid_list); } else if (mmu->pae_root) { for (i =3D 0; i < 4; ++i) { --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D6ECC433F5 for ; Wed, 9 Feb 2022 17:01:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237653AbiBIRB3 (ORCPT ); Wed, 9 Feb 2022 12:01:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237730AbiBIRA7 (ORCPT ); Wed, 9 Feb 2022 12:00:59 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0FC83C05CB89 for ; Wed, 9 Feb 2022 09:01:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q9h2kopg+//QvhPwKzfjAfT60rLQ7JGTbMd+9FyXI5k=; b=hmEbrXewAnqASuWuKzx6hWod6Z92RKl0XdgkGloLfrMEog15B8Ezf7BIqSOL9GPH0zM7vH XqsFKJY/0Xw3XynYAbWrZ+2tH7VuWiU1RPTJO1lREP4f1rcTf8Xst1WOsFYHaPChmlprZ5 hxpM7orZXx9hRU8XDn9u6ohoJOCogP4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-466-99PD-VGPP3C1-fXmnX7zuQ-1; Wed, 09 Feb 2022 12:00:59 -0500 X-MC-Unique: 99PD-VGPP3C1-fXmnX7zuQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 93570101F7A4; Wed, 9 Feb 2022 17:00:53 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 178EE7CD6F; Wed, 9 Feb 2022 17:00:53 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 09/12] KVM: MMU: look for a cached PGD when going from 32-bit to 64-bit Date: Wed, 9 Feb 2022 12:00:17 -0500 Message-Id: <20220209170020.1775368-10-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Right now, PGD caching avoids placing a PAE root in the cache by using the old value of mmu->root_level and mmu->shadow_root_level; it does not look for a cached PGD if the old root is a PAE one, and then frees it using kvm_mmu_free_roots. Change the logic instead to free the uncacheable root early. This way, __kvm_new_mmu_pgd is able to look up the cache when going from 32-bit to 64-bit (if there is a hit, the invalid root becomes the least recently used). An example of this is nested virtualization with shadow paging, when a 64-bit L1 runs a 32-bit L2. As a side effect (which is actually the reason why this patch was written), PGD caching does not use the old value of mmu->root_level and mmu->shadow_root_level anymore. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 71 ++++++++++++++++++++++++++++++++---------- 1 file changed, 54 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 95d0fa0bb876..f61208ccce43 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4087,20 +4087,20 @@ static inline bool is_root_usable(struct kvm_mmu_ro= ot_info *root, gpa_t pgd, union kvm_mmu_page_role role) { return (role.direct || pgd =3D=3D root->pgd) && - VALID_PAGE(root->hpa) && to_shadow_page(root->hpa) && + VALID_PAGE(root->hpa) && role.word =3D=3D to_shadow_page(root->hpa)->role.word; } =20 /* * Find out if a previously cached root matching the new pgd/role is avail= able. - * The current root is also inserted into the cache. - * If a matching root was found, it is assigned to kvm_mmu->root.hpa and t= rue is - * returned. - * Otherwise, the LRU root from the cache is assigned to kvm_mmu->root.hpa= and - * false is returned. This root should now be freed by the caller. + * If a matching root is found, it is assigned to kvm_mmu->root and + * true is returned. + * If no match is found, the current root becomes the MRU of the cache + * if valid (thus evicting the LRU root), kvm_mmu->root is left invalid, + * and false is returned. */ -static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd, - union kvm_mmu_page_role new_role) +static bool cached_root_find_and_promote(struct kvm_vcpu *vcpu, gpa_t new_= pgd, + union kvm_mmu_page_role new_role) { uint i; struct kvm_mmu *mmu =3D vcpu->arch.mmu; @@ -4109,13 +4109,48 @@ static bool cached_root_available(struct kvm_vcpu *= vcpu, gpa_t new_pgd, return true; =20 for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { + /* + * The swaps end up rotating the cache like this: + * C 0 1 2 3 (on entry to the function) + * 0 C 1 2 3 + * 1 C 0 2 3 + * 2 C 0 1 3 + * 3 C 0 1 2 (on exit from the loop) + */ swap(mmu->root, mmu->prev_roots[i]); - if (is_root_usable(&mmu->root, new_pgd, new_role)) - break; + return true; } =20 - return i < KVM_MMU_NUM_PREV_ROOTS; + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); + return false; +} + +/* + * Find out if a previously cached root matching the new pgd/role is avail= able. + * If a matching root is found, it is assigned to kvm_mmu->root and true + * is returned. The current, invalid root goes to the bottom of the cache. + * If no match is found, kvm_mmu->root is left invalid and false is return= ed. + */ +static bool cached_root_find_and_replace(struct kvm_vcpu *vcpu, gpa_t new_= pgd, + union kvm_mmu_page_role new_role) +{ + uint i; + struct kvm_mmu *mmu =3D vcpu->arch.mmu; + + for (i =3D 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) + if (is_root_usable(&mmu->prev_roots[i], new_pgd, new_role)) + goto hit; + + return false; + +hit: + swap(mmu->root, mmu->prev_roots[i]); + /* Bubble up the remaining roots. */ + for (; i < KVM_MMU_NUM_PREV_ROOTS - 1; i++) + mmu->prev_roots[i] =3D mmu->prev_roots[i + 1]; + mmu->prev_roots[i].hpa =3D INVALID_PAGE; + return true; } =20 static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd, @@ -4124,22 +4159,24 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, = gpa_t new_pgd, struct kvm_mmu *mmu =3D vcpu->arch.mmu; =20 /* - * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid + * For now, limit the caching to 64-bit hosts+VMs in order to avoid * having to deal with PDPTEs. We may add support for 32-bit hosts/VMs * later if necessary. */ - if (mmu->shadow_root_level >=3D PT64_ROOT_4LEVEL && - mmu->root_level >=3D PT64_ROOT_4LEVEL) - return cached_root_available(vcpu, new_pgd, new_role); + if (VALID_PAGE(mmu->root.hpa) && !to_shadow_page(mmu->root.hpa)) + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); =20 - return false; + if (VALID_PAGE(mmu->root.hpa)) + return cached_root_find_and_promote(vcpu, new_pgd, new_role); + else + return cached_root_find_and_replace(vcpu, new_pgd, new_role); } =20 static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, union kvm_mmu_page_role new_role) { if (!fast_pgd_switch(vcpu, new_pgd, new_role)) { - kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); + /* kvm_mmu_ensure_valid_pgd will set up a new root. */ return; } =20 --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EDE6C433F5 for ; Wed, 9 Feb 2022 17:01:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237825AbiBIRB0 (ORCPT ); Wed, 9 Feb 2022 12:01:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237704AbiBIRA6 (ORCPT ); Wed, 9 Feb 2022 12:00:58 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E9902C05CB88 for ; Wed, 9 Feb 2022 09:01:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tR1GvTEwRwp3TbS3xow6kXaZGlbaFMOGHUB9ngIKM98=; b=hCi2g5gH8Rwp0TPgvGBQWjsfmCVY282iH+t53Xfju5oW4bBF3q9VIDW1NjFu8WXlDEi3sz FCpa2ud/mDjn+vRjk+gY1CTzfND+7VDjL2/0VsFwq1Sxxn4sVhuqWLzEQ5JlpB+e50EsCd 75Q4BP6jHAJ7lvrFOozXwhrt/d0cIGE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-615-vilrr8bKP26rBZunguo2ng-1; Wed, 09 Feb 2022 12:00:57 -0500 X-MC-Unique: vilrr8bKP26rBZunguo2ng-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B7B9992504; Wed, 9 Feb 2022 17:00:54 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3966574E8C; Wed, 9 Feb 2022 17:00:54 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 10/12] KVM: MMU: load new PGD after the shadow MMU is initialized Date: Wed, 9 Feb 2022 12:00:18 -0500 Message-Id: <20220209170020.1775368-11-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and shadow_root_level anymore, pull the PGD load after the initialization of the shadow MMUs. Besides being more intuitive, this enables future simplifications and optimizations because it's not necessary anymore to compute the role outside kvm_init_mmu. In particular, kvm_mmu_reset_context was not attempting to use a cached PGD to avoid having to figure out the new role. It will soon be able to follow what nested_{vmx,svm}_load_cr3 are doing, and avoid unloading all the cached roots. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 37 +++++++++++++++++-------------------- arch/x86/kvm/svm/nested.c | 6 +++--- arch/x86/kvm/vmx/nested.c | 6 +++--- 3 files changed, 23 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f61208ccce43..df9e0a43513c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4882,9 +4882,8 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u= nsigned long cr0, =20 new_role =3D kvm_calc_shadow_npt_root_page_role(vcpu, ®s); =20 - __kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base); - shadow_mmu_init_context(vcpu, context, ®s, new_role); + __kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base); } EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu); =20 @@ -4922,27 +4921,25 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu,= bool execonly, kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty, execonly, level); =20 - __kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base); - - if (new_role.as_u64 =3D=3D context->mmu_role.as_u64) - return; - - context->mmu_role.as_u64 =3D new_role.as_u64; + if (new_role.as_u64 !=3D context->mmu_role.as_u64) { + context->mmu_role.as_u64 =3D new_role.as_u64; =20 - context->shadow_root_level =3D level; + context->shadow_root_level =3D level; =20 - context->ept_ad =3D accessed_dirty; - context->page_fault =3D ept_page_fault; - context->gva_to_gpa =3D ept_gva_to_gpa; - context->sync_page =3D ept_sync_page; - context->invlpg =3D ept_invlpg; - context->root_level =3D level; - context->direct_map =3D false; + context->ept_ad =3D accessed_dirty; + context->page_fault =3D ept_page_fault; + context->gva_to_gpa =3D ept_gva_to_gpa; + context->sync_page =3D ept_sync_page; + context->invlpg =3D ept_invlpg; + context->root_level =3D level; + context->direct_map =3D false; + update_permission_bitmask(context, true); + context->pkru_mask =3D 0; + reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level); + reset_ept_shadow_zero_bits_mask(context, execonly); + } =20 - update_permission_bitmask(context, true); - context->pkru_mask =3D 0; - reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level); - reset_ept_shadow_zero_bits_mask(context, execonly); + __kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base); } EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu); =20 diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index f284e61451c8..96bab464967f 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -492,14 +492,14 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu,= unsigned long cr3, CC(!load_pdptrs(vcpu, cr3))) return -EINVAL; =20 - if (!nested_npt) - kvm_mmu_new_pgd(vcpu, cr3); - vcpu->arch.cr3 =3D cr3; =20 /* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */ kvm_init_mmu(vcpu); =20 + if (!nested_npt) + kvm_mmu_new_pgd(vcpu, cr3); + return 0; } =20 diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 29289ecca223..abfcd71f787f 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1126,15 +1126,15 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcp= u, unsigned long cr3, return -EINVAL; } =20 - if (!nested_ept) - kvm_mmu_new_pgd(vcpu, cr3); - vcpu->arch.cr3 =3D cr3; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); =20 /* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */ kvm_init_mmu(vcpu); =20 + if (!nested_ept) + kvm_mmu_new_pgd(vcpu, cr3); + return 0; } =20 --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38F21C433F5 for ; Wed, 9 Feb 2022 17:01:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237753AbiBIRBQ (ORCPT ); Wed, 9 Feb 2022 12:01:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237681AbiBIRAz (ORCPT ); Wed, 9 Feb 2022 12:00:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1E60BC05CB88 for ; Wed, 9 Feb 2022 09:00:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IQ8QeMPdVkeRP9kEW9HbE/igjhwB/TDxHP64swgeSBk=; b=FCnAENX1nraIheQFZ93wVJuV4ghcYzv0DaXIqnybcOOrSDluw/d3eGGihDNMtiYScoia1T MvO5L4fe0r/b6BNJYAAHsNjdBdMpDUpFHdINKhQQUgyxAi8XIFC3BKN/UYK7y2J/ULyb0R nP0TSdRSjp7IFtaBBM1zkgB7ZlvIXtQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-211-fVeOpJBmPhOlv92GnKX_mg-1; Wed, 09 Feb 2022 12:00:57 -0500 X-MC-Unique: fVeOpJBmPhOlv92GnKX_mg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7F7DE92500; Wed, 9 Feb 2022 17:00:55 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id D122674E8C; Wed, 9 Feb 2022 17:00:54 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 11/12] KVM: MMU: remove kvm_mmu_calc_root_page_role Date: Wed, 9 Feb 2022 12:00:19 -0500 Message-Id: <20220209170020.1775368-12-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Since the guest PGD is now loaded after the MMU has been set up completely, the desired role for a cache hit is simply the current mmu_role. There is no need to compute it again, so __kvm_mmu_new_pgd can be folded in kvm_mmu_new_pgd. For the !tdp_enabled case, it would also have been possible to use the role that is already in vcpu->arch.mmu. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 29 ++++------------------------- 1 file changed, 4 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index df9e0a43513c..38b40ddcaad7 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -190,8 +190,6 @@ struct kmem_cache *mmu_page_header_cache; static struct percpu_counter kvm_total_used_mmu_pages; =20 static void mmu_spte_set(u64 *sptep, u64 spte); -static union kvm_mmu_page_role -kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu); =20 struct kvm_mmu_role_regs { const unsigned long cr0; @@ -4172,9 +4170,9 @@ static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gp= a_t new_pgd, return cached_root_find_and_replace(vcpu, new_pgd, new_role); } =20 -static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, - union kvm_mmu_page_role new_role) +void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) { + union kvm_mmu_page_role new_role =3D vcpu->arch.mmu->mmu_role.base; if (!fast_pgd_switch(vcpu, new_pgd, new_role)) { /* kvm_mmu_ensure_valid_pgd will set up a new root. */ return; @@ -4209,11 +4207,6 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu,= gpa_t new_pgd, __clear_sp_write_flooding_count( to_shadow_page(vcpu->arch.mmu->root.hpa)); } - -void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) -{ - __kvm_mmu_new_pgd(vcpu, new_pgd, kvm_mmu_calc_root_page_role(vcpu)); -} EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd); =20 static unsigned long get_cr3(struct kvm_vcpu *vcpu) @@ -4883,7 +4876,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, u= nsigned long cr0, new_role =3D kvm_calc_shadow_npt_root_page_role(vcpu, ®s); =20 shadow_mmu_init_context(vcpu, context, ®s, new_role); - __kvm_mmu_new_pgd(vcpu, nested_cr3, new_role.base); + kvm_mmu_new_pgd(vcpu, nested_cr3); } EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu); =20 @@ -4939,7 +4932,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, b= ool execonly, reset_ept_shadow_zero_bits_mask(context, execonly); } =20 - __kvm_mmu_new_pgd(vcpu, new_eptp, new_role.base); + kvm_mmu_new_pgd(vcpu, new_eptp); } EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu); =20 @@ -5024,20 +5017,6 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_init_mmu); =20 -static union kvm_mmu_page_role -kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu) -{ - struct kvm_mmu_role_regs regs =3D vcpu_to_role_regs(vcpu); - union kvm_mmu_role role; - - if (tdp_enabled) - role =3D kvm_calc_tdp_mmu_root_page_role(vcpu, ®s, true); - else - role =3D kvm_calc_shadow_mmu_root_page_role(vcpu, ®s, true); - - return role.base; -} - void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) { /* --=20 2.31.1 From nobody Mon Jun 29 12:39:09 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDD28C433FE for ; Wed, 9 Feb 2022 17:01:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237820AbiBIRBU (ORCPT ); Wed, 9 Feb 2022 12:01:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237698AbiBIRA5 (ORCPT ); Wed, 9 Feb 2022 12:00:57 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 425C6C05CB82 for ; Wed, 9 Feb 2022 09:01:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644426059; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N3kTVV9G0T5QAqpQwcJtBoAGDXQI3g2qBn7ZlIGXs3M=; b=D9W9Va7zK5jx/3PY5cSFgu+yVaN5KzV/ZgMfGfIVSNLZtCegJ58ijGgsuIp+DzdQz9V0NP N73k9VHz/QigSp6h0CG+u014/Vt8xqcb/34mfUewdTioqk01EkhHWHxyw4+K6zpkYHMOlL 5VTCidfKgyxm7TRjdG82ljjBEn0opsY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-304-d-POX9QgMvCtU3wFX26-Cw-1; Wed, 09 Feb 2022 12:00:57 -0500 X-MC-Unique: d-POX9QgMvCtU3wFX26-Cw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2269818397D4; Wed, 9 Feb 2022 17:00:56 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 998C5708F1; Wed, 9 Feb 2022 17:00:55 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: vkuznets@redhat.com, mlevitsk@redhat.com, dmatlack@google.com, seanjc@google.com Subject: [PATCH 12/12] KVM: x86: do not unload MMU roots on all role changes Date: Wed, 9 Feb 2022 12:00:20 -0500 Message-Id: <20220209170020.1775368-13-pbonzini@redhat.com> In-Reply-To: <20220209170020.1775368-1-pbonzini@redhat.com> References: <20220209170020.1775368-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" kvm_mmu_reset_context is called on all role changes and right now it calls kvm_mmu_unload. With the legacy MMU this is a relatively cheap operation; the previous PGDs remains in the hash table and is picked up immediately on the next page fault. With the TDP MMU, however, the roots are thrown away for good and a full rebuild of the page tables is necessary, which is many times more expensive. Fortunately, throwing away the roots is not necessary except when the manual says a TLB flush is required: - changing CR0.PG from 1 to 0 (because it flushes the TLB according to the x86 architecture specification) - changing CPUID (which changes the interpretation of page tables in ways not reflected by the role). - changing CR4.SMEP from 0 to 1 (not doing so actually breaks access.c!) Except for these cases, once the MMU has updated the CPU/MMU roles and metadata it is enough to force-reload the current value of CR3. KVM will look up the cached roots for an entry with the right role and PGD, and only if the cache misses a new root will be created. Measuring with vmexit.flat from kvm-unit-tests shows the following improvement: TDP legacy shadow before 46754 5096 5150 after 4879 4875 5006 which is for very small page tables. The impact is however much larger when running as an L1 hypervisor, because the new page tables cause extra work for L0 to shadow them. Reported-by: Brad Spengler Signed-off-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 7 ++++--- arch/x86/kvm/x86.c | 27 ++++++++++++++++++--------- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 38b40ddcaad7..dbd4e98ba426 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5020,8 +5020,8 @@ EXPORT_SYMBOL_GPL(kvm_init_mmu); void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) { /* - * Invalidate all MMU roles to force them to reinitialize as CPUID - * information is factored into reserved bit calculations. + * Invalidate all MMU roles and roots to force them to reinitialize, + * as CPUID information is factored into reserved bit calculations. * * Correctly handling multiple vCPU models with respect to paging and * physical address properties) in a single VM would require tracking @@ -5034,6 +5034,7 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.root_mmu.mmu_role.ext.valid =3D 0; vcpu->arch.guest_mmu.mmu_role.ext.valid =3D 0; vcpu->arch.nested_mmu.mmu_role.ext.valid =3D 0; + kvm_mmu_unload(vcpu); kvm_mmu_reset_context(vcpu); =20 /* @@ -5045,8 +5046,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) =20 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu) { - kvm_mmu_unload(vcpu); kvm_init_mmu(vcpu); + kvm_mmu_new_pgd(vcpu, vcpu->arch.cr3); } EXPORT_SYMBOL_GPL(kvm_mmu_reset_context); =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0d3646535cc5..97c4f5fc291f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -873,8 +873,12 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned = long old_cr0, unsigned lon kvm_async_pf_hash_reset(vcpu); } =20 - if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS) + if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS) { + /* Flush the TLB if CR0 is changed 1 -> 0. */ + if ((old_cr0 & X86_CR0_PG) && !(cr0 & X86_CR0_PG)) + kvm_mmu_unload(vcpu); kvm_mmu_reset_context(vcpu); + } =20 if (((cr0 ^ old_cr0) & X86_CR0_CD) && kvm_arch_has_noncoherent_dma(vcpu->kvm) && @@ -1067,15 +1071,18 @@ void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsign= ed long old_cr4, unsigned lon * free them all. KVM_REQ_MMU_RELOAD is fit for the both cases; it * is slow, but changing CR4.PCIDE is a rare case. * - * If CR4.PGE is changed, the guest TLB must be flushed. + * Setting SMEP also needs to flush the TLB, in addition to resetting + * the MMU. * - * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and - * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence - * the usage of "else if". + * If CR4.PGE is changed, the guest TLB must be flushed. Because + * the shadow MMU ignores global pages, this bit is not part of + * KVM_MMU_CR4_ROLE_BITS. */ - if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) + if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) { kvm_mmu_reset_context(vcpu); - else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE) + if ((cr4 & X86_CR4_SMEP) && !(old_cr4 & X86_CR4_SMEP)) + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); + } else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE) kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu); else if ((cr4 ^ old_cr4) & X86_CR4_PGE) kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); @@ -11329,8 +11336,10 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool in= it_event) * paging related bits are ignored if paging is disabled, i.e. CR0.WP, * CR4, and EFER changes are all irrelevant if CR0.PG was '0'. */ - if (old_cr0 & X86_CR0_PG) - kvm_mmu_reset_context(vcpu); + if (old_cr0 & X86_CR0_PG) { + kvm_mmu_unload(vcpu); + kvm_init_mmu(vcpu); + } =20 /* * Intel's SDM states that all TLB entries are flushed on INIT. AMD's --=20 2.31.1