From nobody Sun Feb 8 18:49:03 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3C4F361DC0 for ; Thu, 29 Jan 2026 01:16:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649389; cv=none; b=JG9Zeu+7bKO98CZRMyCJkeHMMd8olmUz1cg7DbvkeyGSyPY8qmMCpuTvK94c3skYY2OqmtCcCMDPMw9yv8pUmq8BZfdQwuuCkYnAk3o3AVr7GA13eNo+kMF3A0ssY2clBVVRj8HiDVo32E9v/+/7TsWYKgZ63V8bI7DcVls++Lg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649389; c=relaxed/simple; bh=MXrNqxIppqNsimK8K/2l7kp6YfDuyBgadcQDGD92Gkc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZgrIouDDCa00+VTXFGEIJ9E0FbAMTeDd1aupVRL1fFMFoqXA3IcBRzULaWSwjxLE8yx9tRPFNjnUx/mnSoQjdS1NcXzFw3/hqxUo3Mj7r9mu5TXiRD7mTucKK84JdtVVczQPkk+/V89ykn87VuqOw8vCTGLcfsnEUGW4aLNuUFE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r5sWMCqF; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r5sWMCqF" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34c7d0c5ed2so331731a91.0 for ; Wed, 28 Jan 2026 17:16:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649386; x=1770254186; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=XitJqErgB2rpX72/d+fDj8zwONic1fb7Um6zwnCWul0=; b=r5sWMCqFwdDaw7KulBji2YBerR0rUuIxkCrLr87M2VIx+TpQoe6kLPfk7qhwijc/+Y JIC+fMLuys3HAGQrlLmHzEqOHS9nCNIFPup14L6XrfV2FhhN8lqHQXdgwHZFwjsFPIEe y83zUBohwKR2HGBgGo1MbmuumMXGnp2vobiZSfTsgYFZEe8JZEkgEF52nW31ryuqb1jg okeAhNEsJbSHTQRA6vlbntfx/nEpyLT5TaN/Y9M5ECx6nxb0zETH5ShKybOShszDWISF +XSKP5hNEj8dGOuSENKfF1TuFC5km90VZZ+j9rxp9d76Y0OTZD2O/b0r5ZsFv9s/9Emk ytvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649386; x=1770254186; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XitJqErgB2rpX72/d+fDj8zwONic1fb7Um6zwnCWul0=; b=sX0Ohjyyh9DnN4mlPat1WM2i+EtcD0UR41zJiKLVLBOCgpH96xlPJMPb/gEVbEP3z4 9o57ZgG3qeSKhro8KhADynHDDEi97TKs7x1Ncr9MXe2c9sb0cbo4bds6Ld7qRl2F1pP8 QclSKvoJ3zA9kzswaDaDN/d1LtKbd4XAon83rLNFotXI57pju5FHshKGY8wBD/ui6E4c dn67EoJvDzWQy6b0SJYHpUGE3V0wliJ0tZSHHSF/ZH1vocbp/6glAw5Nrkdpdr++Qna1 i3Ec90Oe4YTDJiN077hoaq5l+zl2sKnC+Elrl/WTGyiLcuBbSh4KKHRuHxS+lyGCmLAr thhA== X-Gm-Message-State: AOJu0YxFFTvKikrvjku5t3873rbmjYSmGZOywnjVa+sSDw3FrzG2Tc90 gXep+YdNH1+mN3GLWWP/f6yW+zzQQg8LGxcj9/bU232n87Y/iSZ9stnuc5aC2cw95nH2DwFYVe/ 9GHrK+g== X-Received: from pjbbf7.prod.google.com ([2002:a17:90b:b07:b0:34a:b3a0:78b9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:538f:b0:34c:6108:bf32 with SMTP id 98e67ed59e1d1-353fedb94b1mr6266892a91.34.1769649386237; Wed, 28 Jan 2026 17:16:26 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:03 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-32-seanjc@google.com> Subject: [RFC PATCH v5 31/45] KVM: x86/mmu: Prevent hugepage promotion for mirror roots in fault path From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Disallow hugepage promotion in the TDP MMU for mirror roots as KVM doesn't currently support promoting S-EPT entries due to the complexity incurred by the TDX-Module's rules for hugepage promotion. - The current TDX-Module requires all 4KB leafs to be either all PENDING or all ACCEPTED before a successful promotion to 2MB. This requirement prevents successful page merging after partially converting a 2MB range from private to shared and then back to private, which is the primary scenario necessitating page promotion. - The TDX-Module effectively requires a break-before-make sequence (to satisfy its TLB flushing rules), i.e. creates a window of time where a different vCPU can encounter faults on a SPTE that KVM is trying to promote to a hugepage. To avoid unexpected BUSY errors, KVM would need to FREEZE the non-leaf SPTE before replacing it with a huge SPTE. Disable hugepage promotion for all map() operations, as supporting page promotion when building the initial image is still non-trivial, and the vast majority of images are ~4MB or less, i.e. the benefit of creating hugepages during TD build time is minimal. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao [sean: check root, add comment, rewrite changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 3 ++- arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4ecbf216d96f..45650f70eeab 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3419,7 +3419,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault= *fault, u64 spte, int cur_ cur_level =3D=3D fault->goal_level && is_shadow_present_pte(spte) && !is_large_pte(spte) && - spte_to_child_sp(spte)->nx_huge_page_disallowed) { + ((spte_to_child_sp(spte)->nx_huge_page_disallowed) || + is_mirror_sp(spte_to_child_sp(spte)))) { /* * A small SPTE exists for this pfn, but FNAME(fetch), * direct_map(), or kvm_tdp_mmu_map() would like to create a diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 01e3e4f4baa5..f8ebdd0c6114 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1222,7 +1222,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv= m_page_fault *fault) for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) { int r; =20 - if (fault->nx_huge_page_workaround_enabled) + /* + * Don't replace a page table (non-leaf) SPTE with a huge SPTE + * (a.k.a. hugepage promotion) if the NX hugepage workaround is + * enabled, as doing so will cause significant thrashing if one + * or more leaf SPTEs needs to be executable. + * + * Disallow hugepage promotion for mirror roots as KVM doesn't + * (yet) support promoting S-EPT entries while holding mmu_lock + * for read (due to complexity induced by the TDX-Module APIs). + */ + if (fault->nx_huge_page_workaround_enabled || is_mirror_sp(root)) disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); =20 /* --=20 2.53.0.rc1.217.geba53bf80e-goog