From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 439DBC25B08
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:05:34 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241549AbiHEXFc (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:05:32 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59398 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S237177AbiHEXFX (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:23 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EEC21277D
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:22 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 oo11-20020a17090b1c8b00b001f285a3bdf0so5043884pjb.5
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=cVyg5lVhUGta27OyTveQuZSdsBeAtej8VaD6SjStUhY=;
        b=OBrKgXyIpckqO9CrRKQ6JD4KRVGxW16h0LR+qvoW/sZRw5MpW/Z9S0i8q2GProLabD
         U11zjkDbuoq485Mk9Gd/+WBd0Y7EDcsnfKEGq88nBSWANQXbpWNkzagKz8yzPTvahbgr
         Tx+W5fNIO9/woKPBCVsAR1d+3ya8ZuMcfUgqPC5wcdVXVlcbDMH07bemHy8Uc3zWnN88
         yep++LgjoXTqfsrDX/l1pXFO0TlR2UuqX1RCPnZCmLK6gik8SrDD6XuOceyCetZ/Ec+S
         jHqPwi4f3xEfzKsUsShxmzbZT0pkQ9aGQ3hAMH6YuWAVcs2kQoxnYtEDMGqPPww34yeU
         wQVQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=cVyg5lVhUGta27OyTveQuZSdsBeAtej8VaD6SjStUhY=;
        b=iJO5FVrmcFQ5FPKt1qfqLkck6LVlnMTzU1hScO+tJcw6CREPouqBrimqhId8/52MNK
         tpK0zLfm5Son2yMjYC1m4JhvnHAh2OMHsb/hnYEHb/Ve1HCYHPSDgsINHDcfUbcY16Pm
         oxagUzSzp0QfJUiNN2B6SYGkO4iPsZh66c8mYSdr0RHd3BFCSRaNBNZ0CPZvD6GByDwt
         pa6O4eSFHfIixEVmEg9c13p86jg9DT7S2kRXkLc9EsUzrzwMYsQBjVwWPDg4KNcIHYhI
         I1N8FrVJMzK8KAtUW8VEDh+3ZtTFMSSTe9J7W7dKjlfsJq1r/Yvh0lmCUOT8v9Qk7K5b
         HqEA==
X-Gm-Message-State: ACgBeo0cl8LSBXFnhp2ubLWYrP9kDzF5MpLEgj5SiQ0FEIZnnrbWCxpy
        b8vEAXNE4bLLsbGafnGaShLTadiAFn8=
X-Google-Smtp-Source: 
 AA6agR6n7rwy6mGJvNzb2TddHLX9FDpvrM4ae0/FLkoU/yQ2ZlFbW1POomBPv4xPpQsAGD08UagtVoVsl78=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a05:6a00:27a0:b0:52e:316c:851e with SMTP id
 bd32-20020a056a0027a000b0052e316c851emr9010748pfb.68.1659740722006; Fri, 05
 Aug 2022 16:05:22 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:06 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-2-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 1/8] KVM: x86/mmu: Bug the VM if KVM attempts to double
 count an NX huge page
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

WARN and kill the VM if KVM attempts to double count an NX huge page,
i.e. attempts to re-tag a shadow page with "NX huge page disallowed".
KVM does NX huge page accounting only when linking a new shadow page, and
it should be impossible for a new shadow page to be already accounted.
E.g. even in the TDP MMU case, where vCPUs can race to install a new
shadow page, only the "winner" will account the installed page.

Kill the VM instead of continuing on as either KVM has an egregious bug,
e.g. didn't zero-initialize the data, or there's host data corruption, in
which carrying on is dangerous, e.g. could cause silent data corruption
in the guest.

Reported-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3e1317325e1f..36b898dbde91 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -804,7 +804,7 @@ static void account_shadowed(struct kvm *kvm, struct kv=
m_mmu_page *sp)
=20
 void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	if (sp->lpage_disallowed)
+	if (KVM_BUG_ON(sp->lpage_disallowed, kvm))
 		return;
=20
 	++kvm->stat.nx_lpage_splits;
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BD3C7C00140
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:05:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241594AbiHEXFf (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:05:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59556 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S238675AbiHEXF0 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:26 -0400
Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com
 [IPv6:2607:f8b0:4864:20::449])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C5751057C
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:24 -0700 (PDT)
Received: by mail-pf1-x449.google.com with SMTP id
 j23-20020aa78017000000b0052ee4264488so442775pfi.16
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=mFUF1YXrRbDDczd6BbaTXj4rrrJUlppqKSMgMMo2ORU=;
        b=ZBRM95/VxLzLXI5Ah3DYM++BJhXadZKNC1i6TIiU6X6PCqaQrXv+hRCY6lbBKczoqw
         jfG7znD0490L7D6hl8FL2APaIeRzXYc0G/ZmDvjhc4uaNz0eBPzoPBtN5nFoPzzBCxJU
         a4a722tNm5CSq/H5QnJuVwJaiAeWIfv3t73C2e5QDhNJdxH8IR6qoExQXpV9glvJnkqJ
         U/59OmTAvwUnh2lC1psVmX5ZNbQLbEwYyRvu3pMVI85PgxPooUztB+yCXUH+8pEnEVk+
         Bt/h/wKaHXCMC89or55gDgXOxRXSWq8ptNwCoJ7M/bpNsEivJNMArG672+9USpTXqtgB
         d9Rg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=mFUF1YXrRbDDczd6BbaTXj4rrrJUlppqKSMgMMo2ORU=;
        b=HTZpGyQhGCheeJh+nS14bLU0GoHvrrgQC/AnM1flhYh5zoG0T8mymzr6vrQOABg/3t
         PkPq7aGTOAOcf8AwMp23e0uwnFO4ygPGmQFYyptKd/1cSZ443v3PUR/mpveLTRffds+t
         6UDI/EjyL4HmUYIGKL+xvclr7xHY/yxCGANZVFlf7Iv+zzUyt3e5lY/2rP4MDpXtvzUm
         LSde7u3mwgDaiMXloIKQ3/kc2W7UsfCnejH4hI+k/6xuzU338FlPgpA1+fwMcyGyUqFN
         f9lWuOdo1gpLwHV+SgpHjsg1jbFQ5UT5V5KFcsNM/2U5R9chya35q2hLQnaMKUmHc/Qa
         SbJQ==
X-Gm-Message-State: ACgBeo3xEmebvsdOA5l6YjegiMvn05gyTWzAZyA/3iAMkmnn7qIxi+Gd
        W5/2LOSJD/hmvhlXtFUFzBMsdvI1C18=
X-Google-Smtp-Source: 
 AA6agR7xf2hc36UYUl74VM07zEr3ZgeM0ZUqWOJI0fG7BHFwTzBnEVpoKnh9bd/zQtvNo9AAlzrqd8zhvoU=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90a:d3d8:b0:1f2:2cd:a1ca with SMTP id
 d24-20020a17090ad3d800b001f202cda1camr18446417pjw.135.1659740723749; Fri, 05
 Aug 2022 16:05:23 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:07 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-3-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 2/8] KVM: x86/mmu: Tag disallowed NX huge pages even if
 they're not tracked
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Tag shadow pages that cannot be replaced with an NX huge page regardless
of whether or not zapping the page would allow KVM to immediately create
a huge page, e.g. because something else prevents creating a huge page.

I.e. track pages that are disallowed from being NX huge pages regardless
of whether or not the page could have been huge at the time of fault.
KVM currently tracks pages that were disallowed from being huge due to
the NX workaround if and only if the page could otherwise be huge.  But
that fails to handled the scenario where whatever restriction prevented
KVM from installing a huge page goes away, e.g. if dirty logging is
disabled, the host mapping level changes, etc...

Failure to tag shadow pages appropriately could theoretically lead to
false negatives, e.g. if a fetch fault requests a small page and thus
isn't tracked, and a read/write fault later requests a huge page, KVM
will not reject the huge page as it should.

To avoid yet another flag, initialize the list_head and use list_empty()
to determine whether or not a page is on the list of NX huge pages that
should be recovered.

Note, the TDP MMU accounting is still flawed as fixing the TDP MMU is
more involved due to mmu_lock being held for read.  This will be
addressed in a future commit.

Fixes: 5bcaf3e1715f ("KVM: x86/mmu: Account NX huge page disallowed iff hug=
e page was requested")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 27 +++++++++++++++++++--------
 arch/x86/kvm/mmu/mmu_internal.h | 10 +++++++++-
 arch/x86/kvm/mmu/paging_tmpl.h  |  6 +++---
 arch/x86/kvm/mmu/tdp_mmu.c      |  4 +++-
 4 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 36b898dbde91..55dac44f3397 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -802,15 +802,20 @@ static void account_shadowed(struct kvm *kvm, struct =
kvm_mmu_page *sp)
 		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
 }
=20
-void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			  bool nx_huge_page_possible)
 {
-	if (KVM_BUG_ON(sp->lpage_disallowed, kvm))
+	if (KVM_BUG_ON(!list_empty(&sp->lpage_disallowed_link), kvm))
+		return;
+
+	sp->lpage_disallowed =3D true;
+
+	if (!nx_huge_page_possible)
 		return;
=20
 	++kvm->stat.nx_lpage_splits;
 	list_add_tail(&sp->lpage_disallowed_link,
 		      &kvm->arch.lpage_disallowed_mmu_pages);
-	sp->lpage_disallowed =3D true;
 }
=20
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -832,9 +837,13 @@ static void unaccount_shadowed(struct kvm *kvm, struct=
 kvm_mmu_page *sp)
=20
 void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	--kvm->stat.nx_lpage_splits;
 	sp->lpage_disallowed =3D false;
-	list_del(&sp->lpage_disallowed_link);
+
+	if (list_empty(&sp->lpage_disallowed_link))
+		return;
+
+	--kvm->stat.nx_lpage_splits;
+	list_del_init(&sp->lpage_disallowed_link);
 }
=20
 static struct kvm_memory_slot *
@@ -2115,6 +2124,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page=
(struct kvm *kvm,
=20
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
+	INIT_LIST_HEAD(&sp->lpage_disallowed_link);
+
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
 	 * depends on valid pages being added to the head of the list.  See
@@ -3112,9 +3123,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct=
 kvm_page_fault *fault)
 			continue;
=20
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->is_tdp && fault->huge_page_disallowed &&
-		    fault->req_level >=3D it.level)
-			account_huge_nx_page(vcpu->kvm, sp);
+		if (fault->is_tdp && fault->huge_page_disallowed)
+			account_huge_nx_page(vcpu->kvm, sp,
+					     fault->req_level >=3D it.level);
 	}
=20
 	if (WARN_ON_ONCE(it.level !=3D fault->goal_level))
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 582def531d4d..cca1ad75d096 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -100,6 +100,13 @@ struct kvm_mmu_page {
 		};
 	};
=20
+	/*
+	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
+	 * huge page.  A shadow page will have lpage_disallowed set but not be
+	 * on the list if a huge page is disallowed for other reasons, e.g.
+	 * because KVM is shadowing a PTE at the same gfn, the memslot isn't
+	 * properly aligned, etc...
+	 */
 	struct list_head lpage_disallowed_link;
 #ifdef CONFIG_X86_32
 	/*
@@ -315,7 +322,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *=
fault, u64 spte, int cur_
=20
 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
=20
-void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+			  bool nx_huge_page_possible);
 void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
=20
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index f5958071220c..e450f49f2225 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -713,9 +713,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault,
 			continue;
=20
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->huge_page_disallowed &&
-		    fault->req_level >=3D it.level)
-			account_huge_nx_page(vcpu->kvm, sp);
+		if (fault->huge_page_disallowed)
+			account_huge_nx_page(vcpu->kvm, sp,
+					     fault->req_level >=3D it.level);
 	}
=20
 	if (WARN_ON_ONCE(it.level !=3D fault->goal_level))
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bf2ccf9debca..903d0d3497b6 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -284,6 +284,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm=
_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
+	INIT_LIST_HEAD(&sp->lpage_disallowed_link);
+
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
 	sp->role =3D role;
@@ -1130,7 +1132,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
 	if (account_nx)
-		account_huge_nx_page(kvm, sp);
+		account_huge_nx_page(kvm, sp, true);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
=20
 	return 0;
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C37ACC00140
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:05:41 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241612AbiHEXFk (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:05:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59626 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241504AbiHEXF1 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:27 -0400
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48AE715A03
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:26 -0700 (PDT)
Received: by mail-pl1-x64a.google.com with SMTP id
 l11-20020a170902f68b00b0016ee1c2c0deso2321159plg.2
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=+l6bBcXTApvZtl0Y2+jf0FOOSqJKr9DboP5GF6rCibM=;
        b=PUO142F+86e5XUxBJdXC55zmshXQD4ljCn0pvKXpz3CS8bF8PIRoMWkN/Q0PhRJ6l7
         xPmKOM6Phk2GyF7LsmRBkXFpVfC3xH8qYVZXd0EG2KGV55fPXGaMMMbg8IqrH+QgAgCC
         j1tQsgC5xVG5f/GA47k1G7hRbMRBGrNtKLLoKWsTTRT5My+9FN/T5iMrusdXMyODijzV
         s2YZDX75oi7aENroHJtQoWSImSDgDHKjDGCrcDOwXYmtJPCWE8/xR/Q3MVeiMM71kK0K
         +2EifhO7AZ7rmkG/96XcaZaQEiclw/LfLcEFVPJ8btl7AoufefBNKwcJ1L/gcmbFdSeW
         XpoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=+l6bBcXTApvZtl0Y2+jf0FOOSqJKr9DboP5GF6rCibM=;
        b=1l4yuIGm2+/4bhhcs+yLoPDW3yG0h9huHdxCkajrdCNPpR+hIj8DwNT9c119B4PjFf
         ZliraxLkd0QN3vridXjsUrxSocQgtf9lQO7uDg3YipFZaP9l+ar6mjDnkE1nwuTJ1ZsY
         UIAeGsap4ah0IjAquLhaOIqUil3NMuO4Ufo6GQbBReghqhZfdCjCbCCa+mPKD3U3kdGI
         SRtavATXwp6JKZRX4soUJHcvUc2grX4lacf2zlnFNQSvwEWGcG1bilKCVW2+Q6lJEg9i
         4Tsr4RLrWW4XCx2xyMwqJjwrmWV9MAtk5LOK7uAjLio2/I6RgNRk+0oNg5OacRwRU6Fb
         SMzA==
X-Gm-Message-State: ACgBeo2R1iM/mIFBVqy+H4ivyBpPqZ3z52kssVYDBg/KoY+oRnJWLqNu
        YosqoqIXIgOEg/6/WsZM9bAgFrWo0no=
X-Google-Smtp-Source: 
 AA6agR5PSlVPciHau1SH11NqKay41PMXGdnN7+f355Tq21zOjpbGTMMLXImg4EId9u3+i7fD2gxMyLMDxOs=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90a:df03:b0:1f3:396c:dd94 with SMTP id
 gp3-20020a17090adf0300b001f3396cdd94mr1070561pjb.1.1659740725378; Fri, 05 Aug
 2022 16:05:25 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:08 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-4-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 3/8] KVM: x86/mmu: Rename NX huge pages fields/functions
 for consistency
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Rename most of the variables/functions involved in the NX huge page
mitigation to provide consistency, e.g. lpage vs huge page, and NX huge
vs huge NX, and also to provide clarity, e.g. to make it obvious the flag
applies only to the NX huge page mitigation, not to any condition that
prevents creating a huge page.

Leave the nx_lpage_splits stat alone as the name is ABI and thus set in
stone.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/include/asm/kvm_host.h |  8 ++--
 arch/x86/kvm/mmu/mmu.c          | 70 +++++++++++++++++----------------
 arch/x86/kvm/mmu/mmu_internal.h | 22 +++++++----
 arch/x86/kvm/mmu/paging_tmpl.h  |  2 +-
 arch/x86/kvm/mmu/tdp_mmu.c      |  8 ++--
 5 files changed, 59 insertions(+), 51 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index e8281d64a431..5634347e5d05 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1143,7 +1143,7 @@ struct kvm_arch {
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
-	struct list_head lpage_disallowed_mmu_pages;
+	struct list_head possible_nx_huge_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 	/*
@@ -1259,7 +1259,7 @@ struct kvm_arch {
 	bool sgx_provisioning_allowed;
=20
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
-	struct task_struct *nx_lpage_recovery_thread;
+	struct task_struct *nx_huge_page_recovery_thread;
=20
 #ifdef CONFIG_X86_64
 	/*
@@ -1304,8 +1304,8 @@ struct kvm_arch {
 	 *  - tdp_mmu_roots (above)
 	 *  - tdp_mmu_pages (above)
 	 *  - the link field of struct kvm_mmu_pages used by the TDP MMU
-	 *  - lpage_disallowed_mmu_pages
-	 *  - the lpage_disallowed_link field of struct kvm_mmu_pages used
+	 *  - possible_nx_huge_pages;
+	 *  - the possible_nx_huge_page_link field of struct kvm_mmu_pages used
 	 *    by the TDP MMU
 	 * It is acceptable, but not necessary, to acquire this lock when
 	 * the thread holds the MMU lock in write mode.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 55dac44f3397..53d0dafa68ff 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -802,20 +802,20 @@ static void account_shadowed(struct kvm *kvm, struct =
kvm_mmu_page *sp)
 		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
 }
=20
-void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 			  bool nx_huge_page_possible)
 {
-	if (KVM_BUG_ON(!list_empty(&sp->lpage_disallowed_link), kvm))
+	if (KVM_BUG_ON(!list_empty(&sp->possible_nx_huge_page_link), kvm))
 		return;
=20
-	sp->lpage_disallowed =3D true;
+	sp->nx_huge_page_disallowed =3D true;
=20
 	if (!nx_huge_page_possible)
 		return;
=20
 	++kvm->stat.nx_lpage_splits;
-	list_add_tail(&sp->lpage_disallowed_link,
-		      &kvm->arch.lpage_disallowed_mmu_pages);
+	list_add_tail(&sp->possible_nx_huge_page_link,
+		      &kvm->arch.possible_nx_huge_pages);
 }
=20
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
@@ -835,15 +835,15 @@ static void unaccount_shadowed(struct kvm *kvm, struc=
t kvm_mmu_page *sp)
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
=20
-void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	sp->lpage_disallowed =3D false;
+	sp->nx_huge_page_disallowed =3D false;
=20
-	if (list_empty(&sp->lpage_disallowed_link))
+	if (list_empty(&sp->possible_nx_huge_page_link))
 		return;
=20
 	--kvm->stat.nx_lpage_splits;
-	list_del_init(&sp->lpage_disallowed_link);
+	list_del_init(&sp->possible_nx_huge_page_link);
 }
=20
 static struct kvm_memory_slot *
@@ -2124,7 +2124,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page=
(struct kvm *kvm,
=20
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
-	INIT_LIST_HEAD(&sp->lpage_disallowed_link);
+	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
=20
 	/*
 	 * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages()
@@ -2483,8 +2483,8 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kv=
m,
 		zapped_root =3D !is_obsolete_sp(kvm, sp);
 	}
=20
-	if (sp->lpage_disallowed)
-		unaccount_huge_nx_page(kvm, sp);
+	if (sp->nx_huge_page_disallowed)
+		unaccount_nx_huge_page(kvm, sp);
=20
 	sp->role.invalid =3D 1;
=20
@@ -3124,7 +3124,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct=
 kvm_page_fault *fault)
=20
 		link_shadow_page(vcpu, it.sptep, sp);
 		if (fault->is_tdp && fault->huge_page_disallowed)
-			account_huge_nx_page(vcpu->kvm, sp,
+			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >=3D it.level);
 	}
=20
@@ -5981,7 +5981,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
=20
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
-	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
+	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
=20
 	r =3D kvm_mmu_init_tdp_mmu(kvm);
@@ -6699,7 +6699,7 @@ static int set_nx_huge_pages(const char *val, const s=
truct kernel_param *kp)
 			kvm_mmu_zap_all_fast(kvm);
 			mutex_unlock(&kvm->slots_lock);
=20
-			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
+			wake_up_process(kvm->arch.nx_huge_page_recovery_thread);
 		}
 		mutex_unlock(&kvm_lock);
 	}
@@ -6825,7 +6825,7 @@ static int set_nx_huge_pages_recovery_param(const cha=
r *val, const struct kernel
 		mutex_lock(&kvm_lock);
=20
 		list_for_each_entry(kvm, &vm_list, vm_list)
-			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
+			wake_up_process(kvm->arch.nx_huge_page_recovery_thread);
=20
 		mutex_unlock(&kvm_lock);
 	}
@@ -6833,7 +6833,7 @@ static int set_nx_huge_pages_recovery_param(const cha=
r *val, const struct kernel
 	return err;
 }
=20
-static void kvm_recover_nx_lpages(struct kvm *kvm)
+static void kvm_recover_nx_huge_pages(struct kvm *kvm)
 {
 	unsigned long nx_lpage_splits =3D kvm->stat.nx_lpage_splits;
 	int rcu_idx;
@@ -6856,23 +6856,25 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
 	ratio =3D READ_ONCE(nx_huge_pages_recovery_ratio);
 	to_zap =3D ratio ? DIV_ROUND_UP(nx_lpage_splits, ratio) : 0;
 	for ( ; to_zap; --to_zap) {
-		if (list_empty(&kvm->arch.lpage_disallowed_mmu_pages))
+		if (list_empty(&kvm->arch.possible_nx_huge_pages))
 			break;
=20
 		/*
 		 * We use a separate list instead of just using active_mmu_pages
-		 * because the number of lpage_disallowed pages is expected to
-		 * be relatively small compared to the total.
+		 * because the number of shadow pages that be replaced with an
+		 * NX huge page is expected to be relatively small compared to
+		 * the total number of shadow pages.  And because the TDP MMU
+		 * doesn't use active_mmu_pages.
 		 */
-		sp =3D list_first_entry(&kvm->arch.lpage_disallowed_mmu_pages,
+		sp =3D list_first_entry(&kvm->arch.possible_nx_huge_pages,
 				      struct kvm_mmu_page,
-				      lpage_disallowed_link);
-		WARN_ON_ONCE(!sp->lpage_disallowed);
+				      possible_nx_huge_page_link);
+		WARN_ON_ONCE(!sp->nx_huge_page_disallowed);
 		if (is_tdp_mmu_page(sp)) {
 			flush |=3D kvm_tdp_mmu_zap_sp(kvm, sp);
 		} else {
 			kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
-			WARN_ON_ONCE(sp->lpage_disallowed);
+			WARN_ON_ONCE(sp->nx_huge_page_disallowed);
 		}
=20
 		if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
@@ -6893,7 +6895,7 @@ static void kvm_recover_nx_lpages(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, rcu_idx);
 }
=20
-static long get_nx_lpage_recovery_timeout(u64 start_time)
+static long get_nx_huge_page_recovery_timeout(u64 start_time)
 {
 	bool enabled;
 	uint period;
@@ -6904,19 +6906,19 @@ static long get_nx_lpage_recovery_timeout(u64 start=
_time)
 		       : MAX_SCHEDULE_TIMEOUT;
 }
=20
-static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data)
+static int kvm_nx_huge_page_recovery_worker(struct kvm *kvm, uintptr_t dat=
a)
 {
 	u64 start_time;
 	long remaining_time;
=20
 	while (true) {
 		start_time =3D get_jiffies_64();
-		remaining_time =3D get_nx_lpage_recovery_timeout(start_time);
+		remaining_time =3D get_nx_huge_page_recovery_timeout(start_time);
=20
 		set_current_state(TASK_INTERRUPTIBLE);
 		while (!kthread_should_stop() && remaining_time > 0) {
 			schedule_timeout(remaining_time);
-			remaining_time =3D get_nx_lpage_recovery_timeout(start_time);
+			remaining_time =3D get_nx_huge_page_recovery_timeout(start_time);
 			set_current_state(TASK_INTERRUPTIBLE);
 		}
=20
@@ -6925,7 +6927,7 @@ static int kvm_nx_lpage_recovery_worker(struct kvm *k=
vm, uintptr_t data)
 		if (kthread_should_stop())
 			return 0;
=20
-		kvm_recover_nx_lpages(kvm);
+		kvm_recover_nx_huge_pages(kvm);
 	}
 }
=20
@@ -6933,17 +6935,17 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
 {
 	int err;
=20
-	err =3D kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 0,
+	err =3D kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker=
, 0,
 					  "kvm-nx-lpage-recovery",
-					  &kvm->arch.nx_lpage_recovery_thread);
+					  &kvm->arch.nx_huge_page_recovery_thread);
 	if (!err)
-		kthread_unpark(kvm->arch.nx_lpage_recovery_thread);
+		kthread_unpark(kvm->arch.nx_huge_page_recovery_thread);
=20
 	return err;
 }
=20
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
 {
-	if (kvm->arch.nx_lpage_recovery_thread)
-		kthread_stop(kvm->arch.nx_lpage_recovery_thread);
+	if (kvm->arch.nx_huge_page_recovery_thread)
+		kthread_stop(kvm->arch.nx_huge_page_recovery_thread);
 }
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index cca1ad75d096..67879459a25c 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -57,7 +57,13 @@ struct kvm_mmu_page {
 	bool tdp_mmu_page;
 	bool unsync;
 	u8 mmu_valid_gen;
-	bool lpage_disallowed; /* Can't be replaced by an equiv large page */
+
+	 /*
+	  * The shadow page can't be replaced by an equivalent huge page
+	  * because it is being used to map an executable page in the guest
+	  * and the NX huge page mitigation is enabled.
+	  */
+	bool nx_huge_page_disallowed;
=20
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -102,12 +108,12 @@ struct kvm_mmu_page {
=20
 	/*
 	 * Tracks shadow pages that, if zapped, would allow KVM to create an NX
-	 * huge page.  A shadow page will have lpage_disallowed set but not be
-	 * on the list if a huge page is disallowed for other reasons, e.g.
-	 * because KVM is shadowing a PTE at the same gfn, the memslot isn't
-	 * properly aligned, etc...
+	 * huge page.  A shadow page will have nx_huge_page_disallowed set but
+	 * not be on the list if a huge page is disallowed for other reasons,
+	 * e.g. because KVM is shadowing a PTE at the same gfn, the memslot
+	 * isn't properly aligned, etc...
 	 */
-	struct list_head lpage_disallowed_link;
+	struct list_head possible_nx_huge_page_link;
 #ifdef CONFIG_X86_32
 	/*
 	 * Used out of the mmu-lock to avoid reading spte values while an
@@ -322,8 +328,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *=
fault, u64 spte, int cur_
=20
 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
=20
-void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 			  bool nx_huge_page_possible);
-void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
=20
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e450f49f2225..259c0f019f09 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -714,7 +714,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault,
=20
 		link_shadow_page(vcpu, it.sptep, sp);
 		if (fault->huge_page_disallowed)
-			account_huge_nx_page(vcpu->kvm, sp,
+			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >=3D it.level);
 	}
=20
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 903d0d3497b6..0e94182c87be 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -284,7 +284,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm=
_vcpu *vcpu)
 static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep,
 			    gfn_t gfn, union kvm_mmu_page_role role)
 {
-	INIT_LIST_HEAD(&sp->lpage_disallowed_link);
+	INIT_LIST_HEAD(&sp->possible_nx_huge_page_link);
=20
 	set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
=20
@@ -392,8 +392,8 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct k=
vm_mmu_page *sp,
 		lockdep_assert_held_write(&kvm->mmu_lock);
=20
 	list_del(&sp->link);
-	if (sp->lpage_disallowed)
-		unaccount_huge_nx_page(kvm, sp);
+	if (sp->nx_huge_page_disallowed)
+		unaccount_nx_huge_page(kvm, sp);
=20
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
@@ -1132,7 +1132,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
 	if (account_nx)
-		account_huge_nx_page(kvm, sp, true);
+		account_nx_huge_page(kvm, sp, true);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
=20
 	return 0;
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 21EBBC25B08
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:05:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241647AbiHEXFo (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:05:44 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59736 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241540AbiHEXF3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:29 -0400
Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com
 [IPv6:2607:f8b0:4864:20::64a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99E5A1057C
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:27 -0700 (PDT)
Received: by mail-pl1-x64a.google.com with SMTP id
 o12-20020a170902d4cc00b0016e81c62cc8so2329537plg.13
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=WLuSlvIJZlcnazsELn/bSEkU+G0vGBR7V6zPFbRjYl0=;
        b=lf0DhdSASK1h/vy4nEBEmemTayzmtWDsnzYMSZwpBW9myRUV58cgyU8eR0d0pC7jTg
         2mepqFjjY6zRmUMexsuN1ybaJtRpI+dCTQ1O1smEHwP03qsrKRWnlfG+6rLsgIS37O3F
         3m9r2tO1Gm4Rrzt4pKZaG0E53Z5YYde4MiY6FylL498tj1bzL+FYjGGdtEA6UngKbnCU
         xvW/iwSC7BT7Hxj1A/WVBFfAgEijmvt5sfhQjHrNuGu+2mIubdVpPQyoijWZjF2im8Tj
         sLAqQAQNOaVcPGkw5MQ+AYfkErKbepIjtd3H77szTo6fZ8ZDMIKAjAH3vr4Us5q/RGwe
         xjRw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=WLuSlvIJZlcnazsELn/bSEkU+G0vGBR7V6zPFbRjYl0=;
        b=023VpTT21TcKtuGZnGe0NRD/v03EoVSvmGx8INeC1uD7uPRzaYGeV5jsFsceNEUm2v
         5xLrFRiH8Dmri+tBnpt0BN4l59/3jPwc2Gohx50dwT0+TABO9t967cW2IwgnOrV2rT3g
         lPB0EiLqincxnBxO6WSiU6sRvuRUqwGEcMKvYOrL8mmAMd4oBAGbMiw8V56Ylre7AiN7
         AmbMexjdel7G/fzrRlZlpG7zERrTyMs+ZjI/x9DrBnkyzn84wewhX6k7jejma1HrzdGc
         pN5SJPGe0bGQjP3UWCh+ami0JQ5NBqtDQhSKejaWXFo/IDNZraBnO6YA8O3l/YeJeWRH
         UgkA==
X-Gm-Message-State: ACgBeo2/Hs2K0k+SGEhpjwX0pGtXDUaWO6gePeQHHlCQFuXg7sEdVxTA
        XtQyUFbs5W9bm30rTX1gOF9sXeMKjBc=
X-Google-Smtp-Source: 
 AA6agR432WEnrCUzqIvmpkhvNwROdM1p7cqZaz4OmRA8YYmCaKpLSXfFcLCkyxy4lyZWKWcPig9qP215a3s=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90b:4cc5:b0:1f5:395:6c71 with SMTP id
 nd5-20020a17090b4cc500b001f503956c71mr18623544pjb.132.1659740727212; Fri, 05
 Aug 2022 16:05:27 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:09 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-5-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 4/8] KVM: x86/mmu: Properly account NX huge page workaround
 for nonpaging MMUs
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Account and track NX huge pages for nonpaging MMUs so that a future
enhancement to precisely check if a shadow page can't be replaced by a NX
huge page doesn't get false positives.  Without correct tracking, KVM can
get stuck in a loop if an instruction is fetching and writing data on the
same huge page, e.g. KVM installs a small executable page on the fetch
fault, replaces it with an NX huge page on the write fault, and faults
again on the fetch.

Alternatively, and perhaps ideally, KVM would simply not enforce the
workaround for nonpaging MMUs.  The guest has no page tables to abuse
and KVM is guaranteed to switch to a different MMU on CR0.PG being
toggled so there's no security or performance concerns.  However, getting
make_spte() to play nice now and in the future is unnecessarily complex.

In the current code base, make_spte() can enforce the mitigation if TDP
is enabled or the MMU is indirect, but make_spte() may not always have a
vCPU/MMU to work with, e.g. if KVM were to support in-line huge page
promotion when disabling dirty logging.

Without a vCPU/MMU, KVM could either pass in the correct information
and/or derive it from the shadow page, but the former is ugly and the
latter subtly non-trivial due to the possibility of direct shadow pages
in indirect MMUs.  Given that using shadow paging with an unpaged guest
is far from top priority _and_ has been subjected to the workaround since
its inception, keep it simple and just fix the accounting glitch.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/mmu/mmu.c  |  2 +-
 arch/x86/kvm/mmu/spte.c | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 53d0dafa68ff..345b6b22ab68 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3123,7 +3123,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct=
 kvm_page_fault *fault)
 			continue;
=20
 		link_shadow_page(vcpu, it.sptep, sp);
-		if (fault->is_tdp && fault->huge_page_disallowed)
+		if (fault->huge_page_disallowed)
 			account_nx_huge_page(vcpu->kvm, sp,
 					     fault->req_level >=3D it.level);
 	}
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 7314d27d57a4..52186b795bce 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -147,6 +147,18 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_p=
age *sp,
 	if (!prefetch)
 		spte |=3D spte_shadow_accessed_mask(spte);
=20
+	/*
+	 * For simplicity, enforce the NX huge page mitigation even if not
+	 * strictly necessary.  KVM could ignore the mitigation if paging is
+	 * disabled in the guest, as the guest doesn't have an page tables to
+	 * abuse.  But to safely ignore the mitigation, KVM would have to
+	 * ensure a new MMU is loaded (or all shadow pages zapped) when CR0.PG
+	 * is toggled on, and that's a net negative for performance when TDP is
+	 * enabled.  When TDP is disabled, KVM will always switch to a new MMU
+	 * when CR0.PG is toggled, but leveraging that to ignore the mitigation
+	 * would tie make_spte() further to vCPU/MMU state, and add complexity
+	 * just to optimize a mode that is anything but performance critical.
+	 */
 	if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
 	    is_nx_huge_page_enabled(vcpu->kvm)) {
 		pte_access &=3D ~ACC_EXEC_MASK;
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 503C5C00140
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:05:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241663AbiHEXFu (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:05:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59968 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241562AbiHEXFc (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:32 -0400
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E9D21D32A
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:29 -0700 (PDT)
Received: by mail-pg1-x54a.google.com with SMTP id
 h185-20020a636cc2000000b00419b8e7df69so1765139pgc.18
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=OW9TQ+LIu6gItsbFLVpaBRe5T84EmDWNHujqjldprDw=;
        b=SgOShPtcsyD6DJfgAOln2HyXTBcVLdGwM0h3i6vA6Gfdh/CuyKzeT5MtWB7WsXZCfj
         AvoOw031s2l0qlb370kwpA+z8zRJInInIYnoAoIE6H4SNfjOFAzO4NOs9dwag16kNXEt
         ebBCtMGitxbO2xvtwAf0NTXJ+Io/Aw63CGUAxo7SOOIcxRzSQtt0QLUVPEm3b8eRuInT
         z8BG4VEXuGkiosxmGxt3F9db/xpg4xwIpmpsYJh8X1bhxvP9i1FS0p8xl/+Lg2B7SlEY
         pZwbG2+KP5al+yYRD1DQBtCVn7emmbd+Dzs/QWBm6ndxfv/Q82UEFK66xc8zEWCsLAlS
         hHTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=OW9TQ+LIu6gItsbFLVpaBRe5T84EmDWNHujqjldprDw=;
        b=Y+4qs1Vl+2J7oHGYqnAmCoVqZ+iAwRwSj/dgh3FrAyoPjNGjQeLGi4d3JeCSsQhMd7
         1AyawvWS7EIF2evrTaR1UmLXTUr6CNsz8j2LosWIQzUOhU9SdOYdoWY7kNWagMV//ery
         qa9ugS3xLHLLv3ijRd3Ri+2iHeJig9qMHuq4FBZB/NwB3Wud9D8Ze/upBeFtAbQ5lrjc
         zVhKCNGhfw+w9CaHRHiSICRu29nkTKeh4Mif4R+25GcnAXF6II1cN2ox6/E+hoFwBYPq
         EESctsrNa7uiuPiqVXzQMQIA+2blXG4PlE3LBHGDrIhFmvJDEqGErIObepwj4S9Bu60c
         4Nfw==
X-Gm-Message-State: ACgBeo2ri8iZ/xl/CCdboxRTo2lWTLAiNU2eU7vIri+9DnQDB47vLVYV
        yOS6SYKfNLFnmB7eq1ooFYXo/irzcUo=
X-Google-Smtp-Source: 
 AA6agR5OZjOLPpfT49XnSCfB9PI/nziY6RyBZB6mzyD88PBYt0wT22mMnzb6fjVe62SAqPSzm+pNkSlzkGk=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a63:8049:0:b0:41b:e8db:d912 with SMTP id
 j70-20020a638049000000b0041be8dbd912mr7233418pgd.380.1659740728787; Fri, 05
 Aug 2022 16:05:28 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:10 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-6-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 5/8] KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU
 before setting SPTE
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Set nx_huge_page_disallowed in TDP MMU shadow pages before making the SP
visible to other readers, i.e. before setting its SPTE.  This will allow
KVM to query the flag when determining if a shadow page can be replaced
by a NX huge page without violating the rules of the mitigation.

Note, the shadow/legacy MMU holds mmu_lock for write, so it's impossible
for another CPU to see a shadow page without an up-to-date
nx_huge_page_disallowed, i.e. only the TDP MMU needs the complicated
dance.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 28 +++++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h |  5 ++--
 arch/x86/kvm/mmu/tdp_mmu.c      | 46 +++++++++++++++++++++++----------
 3 files changed, 53 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 345b6b22ab68..f81ddedbe2f7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -802,22 +802,25 @@ static void account_shadowed(struct kvm *kvm, struct =
kvm_mmu_page *sp)
 		kvm_flush_remote_tlbs_with_address(kvm, gfn, 1);
 }
=20
-void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-			  bool nx_huge_page_possible)
+void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	if (KVM_BUG_ON(!list_empty(&sp->possible_nx_huge_page_link), kvm))
 		return;
=20
-	sp->nx_huge_page_disallowed =3D true;
-
-	if (!nx_huge_page_possible)
-		return;
-
 	++kvm->stat.nx_lpage_splits;
 	list_add_tail(&sp->possible_nx_huge_page_link,
 		      &kvm->arch.possible_nx_huge_pages);
 }
=20
+static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
+				 bool nx_huge_page_possible)
+{
+	sp->nx_huge_page_disallowed =3D true;
+
+	if (nx_huge_page_possible)
+		track_possible_nx_huge_page(kvm, sp);
+}
+
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
@@ -835,10 +838,8 @@ static void unaccount_shadowed(struct kvm *kvm, struct=
 kvm_mmu_page *sp)
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
=20
-void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s=
p)
 {
-	sp->nx_huge_page_disallowed =3D false;
-
 	if (list_empty(&sp->possible_nx_huge_page_link))
 		return;
=20
@@ -846,6 +847,13 @@ void unaccount_nx_huge_page(struct kvm *kvm, struct kv=
m_mmu_page *sp)
 	list_del_init(&sp->possible_nx_huge_page_link);
 }
=20
+static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s=
p)
+{
+	sp->nx_huge_page_disallowed =3D false;
+
+	untrack_possible_nx_huge_page(kvm, sp);
+}
+
 static struct kvm_memory_slot *
 gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
 			    bool no_dirty_log)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 67879459a25c..22152241bd29 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -328,8 +328,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *=
fault, u64 spte, int cur_
=20
 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
=20
-void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp,
-			  bool nx_huge_page_possible);
-void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
+void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s=
p);
=20
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0e94182c87be..34994ca3d45b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -392,8 +392,19 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct =
kvm_mmu_page *sp,
 		lockdep_assert_held_write(&kvm->mmu_lock);
=20
 	list_del(&sp->link);
-	if (sp->nx_huge_page_disallowed)
-		unaccount_nx_huge_page(kvm, sp);
+
+	/*
+	 * Ensure nx_huge_page_disallowed is read after observing the present
+	 * shadow page.  A different vCPU may have _just_ finished installing
+	 * the shadow page if mmu_lock is held for read.  Pairs with the
+	 * smp_wmb() in kvm_tdp_mmu_map().
+	 */
+	smp_rmb();
+
+	if (sp->nx_huge_page_disallowed) {
+		sp->nx_huge_page_disallowed =3D false;
+		untrack_possible_nx_huge_page(kvm, sp);
+	}
=20
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
@@ -1107,16 +1118,13 @@ static int tdp_mmu_map_handle_target_level(struct k=
vm_vcpu *vcpu,
  * @kvm: kvm instance
  * @iter: a tdp_iter instance currently on the SPTE that should be set
  * @sp: The new TDP page table to install.
- * @account_nx: True if this page table is being installed to split a
- *              non-executable huge page.
  * @shared: This operation is running under the MMU lock in read mode.
  *
  * Returns: 0 if the new page table was installed. Non-0 if the page table
  *          could not be installed (e.g. the atomic compare-exchange faile=
d).
  */
 static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
-			   struct kvm_mmu_page *sp, bool account_nx,
-			   bool shared)
+			   struct kvm_mmu_page *sp, bool shared)
 {
 	u64 spte =3D make_nonleaf_spte(sp->spt, !kvm_ad_enabled());
 	int ret =3D 0;
@@ -1131,8 +1139,6 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
=20
 	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
-	if (account_nx)
-		account_nx_huge_page(kvm, sp, true);
 	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
=20
 	return 0;
@@ -1145,6 +1151,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
 int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	struct kvm_mmu *mmu =3D vcpu->arch.mmu;
+	struct kvm *kvm =3D vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
 	int ret;
@@ -1181,9 +1188,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 		}
=20
 		if (!is_shadow_present_pte(iter.old_spte)) {
-			bool account_nx =3D fault->huge_page_disallowed &&
-					  fault->req_level >=3D iter.level;
-
 			/*
 			 * If SPTE has been frozen by another thread, just
 			 * give up and retry, avoiding unnecessary page table
@@ -1195,10 +1199,26 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault)
 			sp =3D tdp_mmu_alloc_sp(vcpu);
 			tdp_mmu_init_child_sp(sp, &iter);
=20
-			if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) {
+			sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed;
+
+			/*
+			 * Ensure nx_huge_page_disallowed is visible before the
+			 * SP is marked present, as mmu_lock is held for read.
+			 * Pairs with the smp_rmb() in tdp_mmu_unlink_sp().
+			 */
+			smp_wmb();
+
+			if (tdp_mmu_link_sp(kvm, &iter, sp, true)) {
 				tdp_mmu_free_sp(sp);
 				break;
 			}
+
+			if (fault->huge_page_disallowed &&
+			    fault->req_level >=3D iter.level) {
+				spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+				track_possible_nx_huge_page(kvm, sp);
+				spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+			}
 		}
 	}
=20
@@ -1486,7 +1506,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, s=
truct tdp_iter *iter,
 	 * correctness standpoint since the translation will be the same either
 	 * way.
 	 */
-	ret =3D tdp_mmu_link_sp(kvm, iter, sp, false, shared);
+	ret =3D tdp_mmu_link_sp(kvm, iter, sp, shared);
 	if (ret)
 		goto out;
=20
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9D907C00140
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:05:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241671AbiHEXFx (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:05:53 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60076 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241576AbiHEXFe (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:34 -0400
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B85F5165A0
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:31 -0700 (PDT)
Received: by mail-pj1-x1049.google.com with SMTP id
 o21-20020a17090a9f9500b001f0574225faso5041021pjp.6
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=7iv2nkYs2Vt5ti4O0QvNxO0OSZhrVlUXfvDgfst3my8=;
        b=Ju2r70VXaQaBCfA4/9Ehm4gB0L5NiBP1hooL4HnyKWIcEYGbQOZWdNUOPU1XmaSkcq
         /PuKMks4HOOCC3C9X/jiH9r3zNN7Qjeful+CPThn0pd4iiaAkCPXOMzi2tXWo+WmWCgK
         E2Kk25DioybC9HTR+89W1FGK/A1wQ2tA9hCs3vH3sQztLf0cJm6AWKmkSreEBBN9YdBn
         TBzhe98kIwtY05XH3SzepbCXIXeU6qRbH7b3M+QLMp4rZEhh1g5Ye4LmLb5vOBsGo3OU
         kBAVT8NcLgSipzYSEN90in0Uh3DftRjtewjeunalmlUnV0s3KbWIKEpPB4bjvIask8aT
         PuXg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=7iv2nkYs2Vt5ti4O0QvNxO0OSZhrVlUXfvDgfst3my8=;
        b=v7UV34huuOF1ri+1kFqSGsx+ORq3Nn4CvlcaS1v4/VX0I123HtT4sG7wTeW8/wkrzz
         LsEEx3tL8OnJ3CHmXKI5opw42DwyeYcOSHDcJ/tzb8XBDjmwhkKPRsa8QZCa12KBhtjR
         +/dTKxonhKr8tixhz4dO/hGCypVPimatk26SD86AoeJgnUbpCHymROtM/QUB41dGFAay
         ISSLDobPC++35eNYBVtq8KnAJQ9lyuOAxRHDiWMUp7RkcmlnycFuAjecYHSEsIDJtFB0
         hVMN/blB9H6h3ZIqsIhTJaAluA/BnHRS3lwiZ7ubjzcsLiiVTl0UdXq9pKB+UkWFkHIP
         IRpA==
X-Gm-Message-State: ACgBeo1oJSROd3PrldxONjEz5VfoOmAVwO2N/RhpEK73fJSakSPJhhAp
        ZccCOrOVz06C0wtWusQR4DpV4lgDdo0=
X-Google-Smtp-Source: 
 AA6agR6Wuy0rugKFS69l72MhpsgCDGkbaHDDzKGZo1mdiNyxouwP60a7Vc0b0twmCliZeFK+daVqExOnEYE=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a17:90b:3807:b0:1f4:ecf7:5987 with SMTP id
 mq7-20020a17090b380700b001f4ecf75987mr9545835pjb.13.1659740730404; Fri, 05
 Aug 2022 16:05:30 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:11 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-7-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 6/8] KVM: x86/mmu: Track the number of TDP MMU pages, but
 not the actual pages
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Track the number of TDP MMU "shadow" pages instead of tracking the pages
themselves. With the NX huge page list manipulation moved out of the common
linking flow, elminating the list-based tracking means the happy path of
adding a shadow page doesn't need to acquire a spinlock and can instead
inc/dec an atomic.

Keep the tracking as the WARN during TDP MMU teardown on leaked shadow
pages is very, very useful for detecting KVM bugs.

Tracking the number of pages will also make it trivial to expose the
counter to userspace as a stat in the future, which may or may not be
desirable.

Note, the TDP MMU needs to use a separate counter (and stat if that ever
comes to be) from the existing n_used_mmu_pages. The TDP MMU doesn't bother
supporting the shrinker nor does it honor KVM_SET_NR_MMU_PAGES (because the
TDP MMU consumes so few pages relative to shadow paging), and including TDP
MMU pages in that counter would break both the shrinker and shadow MMUs,
e.g. if a VM is using nested TDP.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 11 +++--------
 arch/x86/kvm/mmu/tdp_mmu.c      | 28 +++++++++++++---------------
 2 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 5634347e5d05..7ac0c5612319 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1271,6 +1271,9 @@ struct kvm_arch {
 	 */
 	bool tdp_mmu_enabled;
=20
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t tdp_mmu_pages;
+
 	/*
 	 * List of struct kvm_mmu_pages being used as roots.
 	 * All struct kvm_mmu_pages in the list should have
@@ -1291,18 +1294,10 @@ struct kvm_arch {
 	 */
 	struct list_head tdp_mmu_roots;
=20
-	/*
-	 * List of struct kvmp_mmu_pages not being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set and a tdp_mmu_root_count of 0.
-	 */
-	struct list_head tdp_mmu_pages;
-
 	/*
 	 * Protects accesses to the following fields when the MMU lock
 	 * is held in read mode:
 	 *  - tdp_mmu_roots (above)
-	 *  - tdp_mmu_pages (above)
 	 *  - the link field of struct kvm_mmu_pages used by the TDP MMU
 	 *  - possible_nx_huge_pages;
 	 *  - the possible_nx_huge_page_link field of struct kvm_mmu_pages used
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 34994ca3d45b..526d38704e5c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -29,7 +29,6 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 	kvm->arch.tdp_mmu_enabled =3D true;
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
 	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
 	kvm->arch.tdp_mmu_zap_wq =3D wq;
 	return 1;
 }
@@ -54,7 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 	/* Also waits for any queued work items.  */
 	destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
=20
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
+	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
 	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
=20
 	/*
@@ -386,12 +385,7 @@ static void handle_changed_spte_dirty_log(struct kvm *=
kvm, int as_id, gfn_t gfn,
 static void tdp_mmu_unlink_sp(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool shared)
 {
-	if (shared)
-		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	else
-		lockdep_assert_held_write(&kvm->mmu_lock);
-
-	list_del(&sp->link);
+	atomic64_dec(&kvm->arch.tdp_mmu_pages);
=20
 	/*
 	 * Ensure nx_huge_page_disallowed is read after observing the present
@@ -401,10 +395,16 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct=
 kvm_mmu_page *sp,
 	 */
 	smp_rmb();
=20
-	if (sp->nx_huge_page_disallowed) {
-		sp->nx_huge_page_disallowed =3D false;
-		untrack_possible_nx_huge_page(kvm, sp);
-	}
+	if (!sp->nx_huge_page_disallowed)
+		return;
+
+	if (shared)
+		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
+
+	sp->nx_huge_page_disallowed =3D false;
+	untrack_possible_nx_huge_page(kvm, sp);
=20
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
@@ -1137,9 +1137,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td=
p_iter *iter,
 		tdp_mmu_set_spte(kvm, iter, spte);
 	}
=20
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	atomic64_inc(&kvm->arch.tdp_mmu_pages);
=20
 	return 0;
 }
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D3B23C19F2D
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:06:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241697AbiHEXGD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:06:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59928 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241558AbiHEXFg (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:36 -0400
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FF31167FB
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:32 -0700 (PDT)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-324f98aed9eso32147907b3.16
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=TOisPftkQNAh306F8cmwV+0sj7RxsHKWkdSzODJuSsI=;
        b=o4fDSZPkxKiPaP1ZjpXoHjoxuDVjJi6t82FWq9ZUMChWvlbdqt1r4KMrBNsKn+J04e
         RIpJ3imoWYECPQOKGUEiGnvR+4DE/97Gky3qs18My5TZ+0Yl/0i2pnIaXJKr2+ZROX4C
         XVct4Y3FPGYGxfixOkB3WV1dHPSP8KNmn595p0E4nmIKokqXRlSi/R+ZlFVNLD9DQCb6
         XmP3lXZafcf1BEdriQR0k3yNLRHiTo/DSy9s3sLFeVmYdSHukGURlSD2wv/m3skoP8tl
         nIysYc+fPln12FolrNqIMa71j0lPEhKn+6JzUG3zWJdsziBsBCLLVJiIbwR/ZV5mcptT
         znsQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=TOisPftkQNAh306F8cmwV+0sj7RxsHKWkdSzODJuSsI=;
        b=bBaG6S36tBamAa9yruIQmTWStWUkZYdt31T00UmlBa4PTdf1XmBs//69i6Eaqpj6Rj
         4TdK+riUw6Qw87UIftUiqc1FkG2sdeZya3ybqa5IOgiWtla2npHl19cG//qSizBzISI/
         YDqilNh3lJFdFIfKfHQK5hkAX3HtETica16bMch26VTmrRtiPDeDDzaa6ReWFQ0nmvvJ
         5YtH5TwO3egDkHGbg7it4PJYjtXERL6adScg570zLqzJp7YpgdBWir1D+mVybpOSDsIc
         ag4Mtdc/7uOQ9u1sgr7v1VkF+FXGbRdQ/XLFEbdiU1qADTA9mYcE1ekgb3yX0lF15j91
         JhXQ==
X-Gm-Message-State: ACgBeo0M4lx0CKcYQcZxEcWAG8poSOw/Ypu4JNBLJvhsdonreLCrLGgj
        bt8AM8nsXMindGECYLzg4CMLC5j/k5I=
X-Google-Smtp-Source: 
 AA6agR5OjFzv+RjZIhoUTpjo1vZT+nybdnZFF4zIb5tG0qrEsKeZ5pyISufHK9LXK3PkqZx6GpIdH026kTA=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a25:f309:0:b0:671:8725:7f22 with SMTP id
 c9-20020a25f309000000b0067187257f22mr7299925ybs.512.1659740732082; Fri, 05
 Aug 2022 16:05:32 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:12 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-8-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 7/8] KVM: x86/mmu: Add helper to convert SPTE value to its
 shadow page
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a helper to convert a SPTE to its shadow page to deduplicate a
variety of flows and hopefully avoid future bugs, e.g. if KVM attempts to
get the shadow page for a SPTE without dropping high bits.

Opportunistically add a comment in mmu_free_root_page() documenting why
it treats the root HPA as a SPTE.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c          | 17 ++++++++++-------
 arch/x86/kvm/mmu/mmu_internal.h | 12 ------------
 arch/x86/kvm/mmu/spte.h         | 17 +++++++++++++++++
 arch/x86/kvm/mmu/tdp_mmu.h      |  2 ++
 4 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f81ddedbe2f7..1442129c85e0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1796,7 +1796,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
 			continue;
 		}
=20
-		child =3D to_shadow_page(ent & SPTE_BASE_ADDR_MASK);
+		child =3D spte_to_child_sp(ent);
=20
 		if (child->unsync_children) {
 			if (mmu_pages_add(pvec, child, i))
@@ -2355,7 +2355,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcp=
u, u64 *sptep,
 		 * so we should update the spte at this point to get
 		 * a new sp with the correct access.
 		 */
-		child =3D to_shadow_page(*sptep & SPTE_BASE_ADDR_MASK);
+		child =3D spte_to_child_sp(*sptep);
 		if (child->role.access =3D=3D direct_access)
 			return;
=20
@@ -2376,7 +2376,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k=
vm_mmu_page *sp,
 		if (is_last_spte(pte, sp->role.level)) {
 			drop_spte(kvm, spte);
 		} else {
-			child =3D to_shadow_page(pte & SPTE_BASE_ADDR_MASK);
+			child =3D spte_to_child_sp(pte);
 			drop_parent_pte(child, spte);
=20
 			/*
@@ -2815,7 +2815,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct=
 kvm_memory_slot *slot,
 			struct kvm_mmu_page *child;
 			u64 pte =3D *sptep;
=20
-			child =3D to_shadow_page(pte & SPTE_BASE_ADDR_MASK);
+			child =3D spte_to_child_sp(pte);
 			drop_parent_pte(child, sptep);
 			flush =3D true;
 		} else if (pfn !=3D spte_to_pfn(*sptep)) {
@@ -3427,7 +3427,11 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_=
t *root_hpa,
 	if (!VALID_PAGE(*root_hpa))
 		return;
=20
-	sp =3D to_shadow_page(*root_hpa & SPTE_BASE_ADDR_MASK);
+	/*
+	 * The "root" may be a special root, e.g. a PAE entry, treat it as a
+	 * SPTE to ensure any non-PA bits are dropped.
+	 */
+	sp =3D spte_to_child_sp(*root_hpa);
 	if (WARN_ON(!sp))
 		return;
=20
@@ -3912,8 +3916,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 		hpa_t root =3D vcpu->arch.mmu->pae_root[i];
=20
 		if (IS_VALID_PAE_ROOT(root)) {
-			root &=3D SPTE_BASE_ADDR_MASK;
-			sp =3D to_shadow_page(root);
+			sp =3D spte_to_child_sp(root);
 			mmu_sync_children(vcpu, sp, true);
 		}
 	}
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index 22152241bd29..dbaf6755c5a7 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -133,18 +133,6 @@ struct kvm_mmu_page {
=20
 extern struct kmem_cache *mmu_page_header_cache;
=20
-static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
-{
-	struct page *page =3D pfn_to_page(shadow_page >> PAGE_SHIFT);
-
-	return (struct kvm_mmu_page *)page_private(page);
-}
-
-static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
-{
-	return to_shadow_page(__pa(sptep));
-}
-
 static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role)
 {
 	return role.smm ? 1 : 0;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index cabe3fbb4f39..37aa4a9c3d75 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -207,6 +207,23 @@ static inline int spte_index(u64 *sptep)
  */
 extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask;
=20
+static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page)
+{
+	struct page *page =3D pfn_to_page((shadow_page) >> PAGE_SHIFT);
+
+	return (struct kvm_mmu_page *)page_private(page);
+}
+
+static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte)
+{
+	return to_shadow_page(spte & SPTE_BASE_ADDR_MASK);
+}
+
+static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep)
+{
+	return to_shadow_page(__pa(sptep));
+}
+
 static inline bool is_mmio_spte(u64 spte)
 {
 	return (spte & shadow_mmio_mask) =3D=3D shadow_mmio_value &&
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index c163f7cc23ca..d3714200b932 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -5,6 +5,8 @@
=20
 #include <linux/kvm_host.h>
=20
+#include "spte.h"
+
 hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu);
=20
 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *=
root)
--=20
2.37.1.559.g78731f0fdb-goog
From nobody Sun Feb  8 10:22:43 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 732F3C25B08
	for <linux-kernel@archiver.kernel.org>; Fri,  5 Aug 2022 23:06:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S241673AbiHEXGV (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Fri, 5 Aug 2022 19:06:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60076 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S241618AbiHEXFk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 5 Aug 2022 19:05:40 -0400
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BEFAC17072
        for <linux-kernel@vger.kernel.org>;
 Fri,  5 Aug 2022 16:05:34 -0700 (PDT)
Received: by mail-yb1-xb49.google.com with SMTP id
 bt7-20020a056902136700b006777a976adfso3189041ybb.20
        for <linux-kernel@vger.kernel.org>;
 Fri, 05 Aug 2022 16:05:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:from:to:cc;
        bh=a0sejF7LmfikJFmz6Lcl1SsiJbupoK1D9mnUFihc8VU=;
        b=UN/syhFvQS+2ABKiiRlWZevZVkyMjjsByGEV3Y+bQXxcB7y8FC9Cnha9pEVQ3n/745
         /w8b0GESJducYshuMoCL1A7OIB2kBBFDLaSBFg9h/hv20ySYOCKWDLaR9uE4WeZ6jFrz
         dFQARovx6wq+grTzXeYH+njsy/flyfO2ANOL6T9SMKwbJcOojBc/xwLN1GYhfGeVqa8K
         ADZV9TDVUXSfMbcyGCxOphzyU3qBtO71i/D+A5xccpK0vLo7/5tV5jtD0ap9M431NbaF
         LZQe0Fe/wU6HURE6VJ6LhW2Qog4ffqgDMyThZgPU142k2eXGz+LQ34I1TzLTiQjuTmR4
         D3yA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc;
        bh=a0sejF7LmfikJFmz6Lcl1SsiJbupoK1D9mnUFihc8VU=;
        b=zt3mCI/uCG7gPx00YO4az6gPwjYYb9EE9cpqD0z9az51auhV8TdSGG4Gz+fE9w6QgM
         5yQ5RMT1qdinlL6Ri8Bd/cWJQQhp4saHjpmYJAiZ1SUZgehwPaILG8hfrH0+mhRHpDhq
         NCEQ9GU9ZDku7rGmUft5igCKte5YVMg1/uSyeNyDHgyb3kXi8dDVF/OFihqZ3/StMhXd
         T9E28/A+cdZ9Y+N03dr9ioYehAAZnGEd1drJXLLbLz01K/Jf94ZW4X9dE1xoEwsc23xe
         II8WYuVHf0wJIyD4Pmj3xWMvtTwXRy5MNyHfLdPdE8w3zs+4wXfx6jCcSToX1/O6WRNB
         biIA==
X-Gm-Message-State: ACgBeo0xbhHOpstgfxrYhX2pGfbtj7b6yk8woIaxMpWK5YiMbJcIQzu4
        2Z+pKIhOuIclmqt/4+Ib+UqHIxSgvPE=
X-Google-Smtp-Source: 
 AA6agR4jfD1+2IaL8aoXi2FQlkXtc/IkU6FWDrTk0ASjCLegEJ1d43RrPHpEHZDnVTBaOUPannz8B4FmaYw=
X-Received: from zagreus.c.googlers.com
 ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37])
 (user=seanjc job=sendgmr) by 2002:a81:b91:0:b0:324:f022:5d5e with SMTP id
 139-20020a810b91000000b00324f0225d5emr8124524ywl.353.1659740733576; Fri, 05
 Aug 2022 16:05:33 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri,  5 Aug 2022 23:05:13 +0000
In-Reply-To: <20220805230513.148869-1-seanjc@google.com>
Message-Id: <20220805230513.148869-9-seanjc@google.com>
Mime-Version: 1.0
References: <20220805230513.148869-1-seanjc@google.com>
X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog
Subject: [PATCH v3 8/8] KVM: x86/mmu: explicitly check nx_hugepage in
 disallowed_hugepage_adjust()
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        David Matlack <dmatlack@google.com>,
        Yan Zhao <yan.y.zhao@intel.com>,
        Mingwei Zhang <mizhang@google.com>,
        Ben Gardon <bgardon@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Mingwei Zhang <mizhang@google.com>

Explicitly check if a NX huge page is disallowed when determining if a
page fault needs to be forced to use a smaller sized page.  KVM currently
assumes that the NX huge page mitigation is the only scenario where KVM
will force a shadow page instead of a huge page, and so unnecessarily
keeps an existing shadow page instead of replacing it with a huge page.

Any scenario that causes KVM to zap leaf SPTEs may result in having a SP
that can be made huge without violating the NX huge page mitigation.
E.g. prior to commit 5ba7c4c6d1c7 ("KVM: x86/MMU: Zap non-leaf SPTEs when
disabling dirty logging"), KVM would keep shadow pages after disabling
dirty logging due to a live migration being canceled, resulting in
degraded performance due to running with 4kb pages instead of huge pages.

Although the dirty logging case is "fixed", that fix is coincidental,
i.e. is an implementation detail, and there are other scenarios where KVM
will zap leaf SPTEs.  E.g. zapping leaf SPTEs in response to a host page
migration (mmu_notifier invalidation) to create a huge page would yield a
similar result; KVM would see the shadow-present non-leaf SPTE and assume
a huge page is disallowed.

Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Reviewed-by: Ben Gardon <bgardon@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
[sean: add barrier comments, use spte_to_child_sp(), massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c     | 17 +++++++++++++++--
 arch/x86/kvm/mmu/tdp_mmu.c |  3 ++-
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1442129c85e0..3ddfc82868fd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3090,6 +3090,19 @@ void disallowed_hugepage_adjust(struct kvm_page_faul=
t *fault, u64 spte, int cur_
 	    cur_level =3D=3D fault->goal_level &&
 	    is_shadow_present_pte(spte) &&
 	    !is_large_pte(spte)) {
+		u64 page_mask;
+
+		/*
+		 * Ensure nx_huge_page_disallowed is read after checking for a
+		 * present shadow page.  A different vCPU may be concurrently
+		 * installing the shadow page if mmu_lock is held for read.
+		 * Pairs with the smp_wmb() in kvm_tdp_mmu_map().
+		 */
+		smp_rmb();
+
+		if (!spte_to_child_sp(spte)->nx_huge_page_disallowed)
+			return;
+
 		/*
 		 * A small SPTE exists for this pfn, but FNAME(fetch)
 		 * and __direct_map would like to create a large PTE
@@ -3097,8 +3110,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault=
 *fault, u64 spte, int cur_
 		 * patching back for them into pfn the next 9 bits of
 		 * the address.
 		 */
-		u64 page_mask =3D KVM_PAGES_PER_HPAGE(cur_level) -
-				KVM_PAGES_PER_HPAGE(cur_level - 1);
+		page_mask =3D KVM_PAGES_PER_HPAGE(cur_level) -
+			    KVM_PAGES_PER_HPAGE(cur_level - 1);
 		fault->pfn |=3D fault->gfn & page_mask;
 		fault->goal_level--;
 	}
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 526d38704e5c..c5314ca95e08 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1202,7 +1202,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 			/*
 			 * Ensure nx_huge_page_disallowed is visible before the
 			 * SP is marked present, as mmu_lock is held for read.
-			 * Pairs with the smp_rmb() in tdp_mmu_unlink_sp().
+			 * Pairs with the smp_rmb() in tdp_mmu_unlink_sp() and
+			 * in disallowed_hugepage_adjust().
 			 */
 			smp_wmb();
=20
--=20
2.37.1.559.g78731f0fdb-goog