From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FF84ECAAA1 for ; Tue, 30 Aug 2022 23:55:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230455AbiH3Xzr (ORCPT ); Tue, 30 Aug 2022 19:55:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230093AbiH3Xzm (ORCPT ); Tue, 30 Aug 2022 19:55:42 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9984D2ED5A for ; Tue, 30 Aug 2022 16:55:41 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id s8-20020a170902ea0800b00172e456031eso8776762plg.3 for ; Tue, 30 Aug 2022 16:55:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=KwSYCZerlx8jq2QaoSW9rJT1jqCtPnDeeiOJE8jfST4=; b=BPH+iO6gDGCsp2Wcx1LYWuE/INfoeSg5h3VFyT2+38gryYDiRQNbYYVwlMT78ungxP uhH+zRrer8VCaDfUCOCZIBNmaRpGl8JFqhjto1VWg3595CY+538Oa5oLaTTGSZUOcTrx QmvyynMrgDUdcIUJAcBzCpeO9UY3gTlHOAfaFOQD9qfZr2BxPj/X7k0u36MGUUwFLb/D 3WJmTeRSqrkAK7oOeT0dGuQWP8LbfSWn8mXsflZgNJzq3/hGQ3Bx7/Qh90pQ0vUwCZt7 pl0QmujrSQkoPGrx/eThrTz9cTK3ObEIsit1R1QXl2WXj2YSj+wze22tC2CAXttiyjYT XjoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=KwSYCZerlx8jq2QaoSW9rJT1jqCtPnDeeiOJE8jfST4=; b=4sms1aO8Vbv/KiH/fH8mXkWnKzHR//stWTZxlb/UMwvDUPODKB0tfP8R93sovt0qII 9y2J1Z1oIe/kSe3rs8GbmQ5GN2XCKrK1b+U01bukBm3ybgjvIBO41lmeOeOsXMynevSj WM1oWgIyd3GOFYcQ3CYM5zk/nyne1FdtVSkDAnHY4FH7WiOpbKDhpUmNDJScK6Tf8ium 6gnzb50OX1xxcPcUAiPKQMagfxkNIniyKuCMhyStE7SnL4+TuZc4OkxuvXEneyxJkDAX /LHEG/BiRE8QB8sMsUkK0ocVPYeHpOMTY/Wo+qxDlO5PtSo73EB0EFzesyeFm2Uw0VKO 5QGA== X-Gm-Message-State: ACgBeo2vv6TjIBaHjZ7Q0U0xXZ09CRtX+lLXcq1TKmU/OR3Zls1JFZ3J on8Xq1VDxMwtzUIAqoEukuwevf2NHhg= X-Google-Smtp-Source: AA6agR6Gceek7om6pgblzSZOyMYq2EfiLWTCKia41sXQbONpjnXz30oLCnehC/Cob4rJMGMpKrHB2I3lntA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a62:5ec6:0:b0:538:3aa:7fe5 with SMTP id s189-20020a625ec6000000b0053803aa7fe5mr16255558pfb.73.1661903741190; Tue, 30 Aug 2022 16:55:41 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:29 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-2-seanjc@google.com> Subject: [PATCH v4 1/9] KVM: x86/mmu: Bug the VM if KVM attempts to double count an NX huge page From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" WARN and kill the VM if KVM attempts to double count an NX huge page, i.e. attempts to re-tag a shadow page with "NX huge page disallowed". KVM does NX huge page accounting only when linking a new shadow page, and it should be impossible for a new shadow page to be already accounted. E.g. even in the TDP MMU case, where vCPUs can race to install a new shadow page, only the "winner" will account the installed page. Kill the VM instead of continuing on as either KVM has an egregious bug, e.g. didn't zero-initialize the data, or there's host data corruption, in which carrying on is dangerous, e.g. could cause silent data corruption in the guest. Reported-by: David Matlack Signed-off-by: Sean Christopherson Reviewed-by: Mingwei Zhang --- arch/x86/kvm/mmu/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 32b60a6b83bd..74afee3f2476 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -804,7 +804,7 @@ static void account_shadowed(struct kvm *kvm, struct kv= m_mmu_page *sp) =20 void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - if (sp->lpage_disallowed) + if (KVM_BUG_ON(sp->lpage_disallowed, kvm)) return; =20 ++kvm->stat.nx_lpage_splits; --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FA19C0502A for ; Tue, 30 Aug 2022 23:56:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231705AbiH3Xz6 (ORCPT ); Tue, 30 Aug 2022 19:55:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230255AbiH3Xzq (ORCPT ); Tue, 30 Aug 2022 19:55:46 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B6044AD5E for ; Tue, 30 Aug 2022 16:55:43 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 2-20020a17090a0b8200b001fdb8fd5f29so3086053pjr.8 for ; Tue, 30 Aug 2022 16:55:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=ZrmzY1nFE2+1TW1+wwiyIQ98bGSDZdyPkbZYanZN6ck=; b=mjIC5FT80dn+5a+gFVNuQk85fp5UWouRUcw+yNEZG8nZW8jM5VzonJshEPuRh3jYcG YcHYe0R2njhkVA+Fv/6DD5y1IOZSTrVWM7b4hbDb7I/Z0V2I5MEzpMRJqc2w0EhP1Akq NmdA0jOBFpDeESsasqjKl7mkk8xCG3xiyCYg5W5TvK4NyQaC8RFWX8uSS0Mx08G5vZWj dueYf8cpOo6qjPrKmy17ucK0wcQx3U+/aewNeNRLwKAN/+7rqkNfvtZDM8qnwVQX/9hH yKqlQNJnsRItcliBEBBIKJh5b1MTa42JtnCJlKIB3F7QwTdzNHpsMDaWHHjw7oEYrSKf FvOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=ZrmzY1nFE2+1TW1+wwiyIQ98bGSDZdyPkbZYanZN6ck=; b=7YaKOCnzrycyOeW2XS7hT7sCFWM7Y2dGjI/RtxlWDqt31zqvWlTiffB5KxoOIHtHwV +T2RHHe2mGZwGz+w8H+ThwBDOM6dHlMaJjZwAQlIBZ6Ceu8Y6djRBucGsmgqYBFE1kJx pWEDBWI0pog0OTw79fV+ACpQ5flB7EEqPaNFbqA6Hmzn8bYJS92WuMSFIU4xcUKiG8oT PgAO5hEgo8d9hSbATbQmGYjTK4Rw7LeQOS8E+zldMYVVnJWCMFvycN2Pc8C8qmkPvpIW 6VOZjQouEdL49DjyTir8x+Hf3x6QGzYdD/fkhkPlUsALc7ZuGtA4GlDj+jAGNUQuVoSm jUhw== X-Gm-Message-State: ACgBeo2DT7ltyKWEbdITaPfWdJtBa1LZH81gsqrPu9db6XszSFPalmR5 rB4WWPO/9TU+RALoujx5uofWFZubQKs= X-Google-Smtp-Source: AA6agR4cxamvFPgNyhog7zx9BhleRmPCfOkJAtQKks21qozTSu7QccOb9bHTbqtUmOV9R+FiVVeHjlxUHlk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:3911:b0:1fb:1f53:fa5 with SMTP id ob17-20020a17090b391100b001fb1f530fa5mr469698pjb.233.1661903742870; Tue, 30 Aug 2022 16:55:42 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:30 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-3-seanjc@google.com> Subject: [PATCH v4 2/9] KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Tag shadow pages that cannot be replaced with an NX huge page regardless of whether or not zapping the page would allow KVM to immediately create a huge page, e.g. because something else prevents creating a huge page. I.e. track pages that are disallowed from being NX huge pages regardless of whether or not the page could have been huge at the time of fault. KVM currently tracks pages that were disallowed from being huge due to the NX workaround if and only if the page could otherwise be huge. But that fails to handled the scenario where whatever restriction prevented KVM from installing a huge page goes away, e.g. if dirty logging is disabled, the host mapping level changes, etc... Failure to tag shadow pages appropriately could theoretically lead to false negatives, e.g. if a fetch fault requests a small page and thus isn't tracked, and a read/write fault later requests a huge page, KVM will not reject the huge page as it should. To avoid yet another flag, initialize the list_head and use list_empty() to determine whether or not a page is on the list of NX huge pages that should be recovered. Note, the TDP MMU accounting is still flawed as fixing the TDP MMU is more involved due to mmu_lock being held for read. This will be addressed in a future commit. Fixes: 5bcaf3e1715f ("KVM: x86/mmu: Account NX huge page disallowed iff hug= e page was requested") Signed-off-by: Sean Christopherson Reviewed-by: Mingwei Zhang --- arch/x86/kvm/mmu/mmu.c | 27 +++++++++++++++++++-------- arch/x86/kvm/mmu/mmu_internal.h | 10 +++++++++- arch/x86/kvm/mmu/paging_tmpl.h | 6 +++--- arch/x86/kvm/mmu/tdp_mmu.c | 4 +++- 4 files changed, 34 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 74afee3f2476..564a80a86984 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -802,15 +802,20 @@ static void account_shadowed(struct kvm *kvm, struct = kvm_mmu_page *sp) kvm_flush_remote_tlbs_with_address(kvm, gfn, 1); } =20 -void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp) +void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool nx_huge_page_possible) { - if (KVM_BUG_ON(sp->lpage_disallowed, kvm)) + if (KVM_BUG_ON(!list_empty(&sp->lpage_disallowed_link), kvm)) + return; + + sp->lpage_disallowed =3D true; + + if (!nx_huge_page_possible) return; =20 ++kvm->stat.nx_lpage_splits; list_add_tail(&sp->lpage_disallowed_link, &kvm->arch.lpage_disallowed_mmu_pages); - sp->lpage_disallowed =3D true; } =20 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) @@ -832,9 +837,13 @@ static void unaccount_shadowed(struct kvm *kvm, struct= kvm_mmu_page *sp) =20 void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - --kvm->stat.nx_lpage_splits; sp->lpage_disallowed =3D false; - list_del(&sp->lpage_disallowed_link); + + if (list_empty(&sp->lpage_disallowed_link)) + return; + + --kvm->stat.nx_lpage_splits; + list_del_init(&sp->lpage_disallowed_link); } =20 static struct kvm_memory_slot * @@ -2127,6 +2136,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page= (struct kvm *kvm, =20 set_page_private(virt_to_page(sp->spt), (unsigned long)sp); =20 + INIT_LIST_HEAD(&sp->lpage_disallowed_link); + /* * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages() * depends on valid pages being added to the head of the list. See @@ -3124,9 +3135,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct= kvm_page_fault *fault) continue; =20 link_shadow_page(vcpu, it.sptep, sp); - if (fault->is_tdp && fault->huge_page_disallowed && - fault->req_level >=3D it.level) - account_huge_nx_page(vcpu->kvm, sp); + if (fault->is_tdp && fault->huge_page_disallowed) + account_huge_nx_page(vcpu->kvm, sp, + fault->req_level >=3D it.level); } =20 if (WARN_ON_ONCE(it.level !=3D fault->goal_level)) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 582def531d4d..cca1ad75d096 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -100,6 +100,13 @@ struct kvm_mmu_page { }; }; =20 + /* + * Tracks shadow pages that, if zapped, would allow KVM to create an NX + * huge page. A shadow page will have lpage_disallowed set but not be + * on the list if a huge page is disallowed for other reasons, e.g. + * because KVM is shadowing a PTE at the same gfn, the memslot isn't + * properly aligned, etc... + */ struct list_head lpage_disallowed_link; #ifdef CONFIG_X86_32 /* @@ -315,7 +322,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *= fault, u64 spte, int cur_ =20 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); =20 -void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool nx_huge_page_possible); void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); =20 #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 39e0205e7300..260dc8bc3d4f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -713,9 +713,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, continue; =20 link_shadow_page(vcpu, it.sptep, sp); - if (fault->huge_page_disallowed && - fault->req_level >=3D it.level) - account_huge_nx_page(vcpu->kvm, sp); + if (fault->huge_page_disallowed) + account_huge_nx_page(vcpu->kvm, sp, + fault->req_level >=3D it.level); } =20 if (WARN_ON_ONCE(it.level !=3D fault->goal_level)) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 672f0432d777..80a4a1a09131 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -284,6 +284,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm= _vcpu *vcpu) static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep, gfn_t gfn, union kvm_mmu_page_role role) { + INIT_LIST_HEAD(&sp->lpage_disallowed_link); + set_page_private(virt_to_page(sp->spt), (unsigned long)sp); =20 sp->role =3D role; @@ -1141,7 +1143,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td= p_iter *iter, spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_add(&sp->link, &kvm->arch.tdp_mmu_pages); if (account_nx) - account_huge_nx_page(kvm, sp); + account_huge_nx_page(kvm, sp, true); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); tdp_account_mmu_page(kvm, sp); =20 --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D5AEECAAD5 for ; Tue, 30 Aug 2022 23:56:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231667AbiH3X4C (ORCPT ); Tue, 30 Aug 2022 19:56:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230093AbiH3Xzx (ORCPT ); Tue, 30 Aug 2022 19:55:53 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A0A954CA2 for ; Tue, 30 Aug 2022 16:55:45 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id s4-20020a17090aa10400b001fe1cfc50f7so990165pjp.9 for ; Tue, 30 Aug 2022 16:55:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=sabBmyh47hekn6DJvSUTeKkGh6xuMPSgisAV6l1VEg8=; b=nQ+Cx+e6wdcoczASQW7CdM5qhS63WrRI9wVMySdgyVeILaBk20IKSB0bB/ejxJybi2 B6vNrlZVQAT6oid8/TOYOemCkmQVaqvRKLWBGszf4kCqXIVGF2LU+8uqpMZbv4ue0ItX 9PHleiqfI1bZN0mDeWxmB3bpIqsvmr9IP9w5Gp1lFCH+6HOkrmgVQc0Ua7xOWt27zj6q CccEmhxQs8x5HX/9BG790jyHN1DlCJ7YWzBpHNkGjehKssDRcmUC4tDQ8FtqMxJpQdRN q5SVwdGyNwHnQ/hCmOkMFItDAhgAqLltKY/HeCvQK/2RTwUQr3XwIe81odL+uQGmiwri rm2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=sabBmyh47hekn6DJvSUTeKkGh6xuMPSgisAV6l1VEg8=; b=Rsmr6WxTqbO8/9+POD4LcSKyJUBYlRz6yrgV3vhCRg+IfssyOtjUpPqsGHE4tMgrlK DDUv7vmc+NuLKHqFtGSk2Ae6sW0awzsJ0guBp+TB67b8YF20zlHLyhXAUmSgy9VfU1M7 HSzoubutIlUeCdBSis53BLugr+sbfcNdjXJ2YILz6Y3VBNUwAQQMCQU3lUPhyr+i63gM fF/ouEg06W9Tu6L6s0g3Kmjr7PTaJd6dIwCyPecr6RKvpzE85S+bnNYYfIj7Sqf2wRF3 d167mg9gPqnxO2wSNrU0T1wB7eD/5QCc57C1ou0txJp/ZbVP94Aw9ZqRsDT8O3UeM9fe +A/g== X-Gm-Message-State: ACgBeo2STZ7WHh6Y7L/hKfNS0JGkes3LbuD+AYaotKqNnVXuOunYsOBe FtaO5Ml29Dogc4BJaoVCrLFKXyGS+G0= X-Google-Smtp-Source: AA6agR6U/q7eYgSg+k75jeof9vjGjdQPtijbrgPHvd2jti9tkVLYlmuxA3e9b/8ZEr3sS7CdAVnRdeF50ZE= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1ac7:b0:52f:4e43:5ace with SMTP id f7-20020a056a001ac700b0052f4e435acemr23673988pfv.59.1661903744854; Tue, 30 Aug 2022 16:55:44 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:31 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-4-seanjc@google.com> Subject: [PATCH v4 3/9] KVM: x86/mmu: Rename NX huge pages fields/functions for consistency From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rename most of the variables/functions involved in the NX huge page mitigation to provide consistency, e.g. lpage vs huge page, and NX huge vs huge NX, and also to provide clarity, e.g. to make it obvious the flag applies only to the NX huge page mitigation, not to any condition that prevents creating a huge page. Add a comment explaining what the newly named "possible_nx_huge_pages" tracks. Leave the nx_lpage_splits stat alone as the name is ABI and thus set in stone. Signed-off-by: Sean Christopherson Reviewed-by: Mingwei Zhang --- arch/x86/include/asm/kvm_host.h | 19 +++++++-- arch/x86/kvm/mmu/mmu.c | 70 +++++++++++++++++---------------- arch/x86/kvm/mmu/mmu_internal.h | 22 +++++++---- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 8 ++-- 5 files changed, 70 insertions(+), 51 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 2c96c43c313a..48e51600f1be 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1143,7 +1143,18 @@ struct kvm_arch { struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; struct list_head active_mmu_pages; struct list_head zapped_obsolete_pages; - struct list_head lpage_disallowed_mmu_pages; + /* + * A list of kvm_mmu_page structs that, if zapped, could possibly be + * replaced by an NX huge page. A shadow page is on this list if its + * existence disallows an NX huge page (nx_huge_page_disallowed is set) + * and there are no other conditions that prevent a huge page, e.g. + * the backing host page is huge, dirtly logging is not enabled for its + * memslot, etc... Note, zapping shadow pages on this list doesn't + * guarantee an NX huge page will be created in its stead, e.g. if the + * guest attempts to execute from the region then KVM obviously can't + * create an NX huge page (without hanging the guest). + */ + struct list_head possible_nx_huge_pages; struct kvm_page_track_notifier_node mmu_sp_tracker; struct kvm_page_track_notifier_head track_notifier_head; /* @@ -1259,7 +1270,7 @@ struct kvm_arch { bool sgx_provisioning_allowed; =20 struct kvm_pmu_event_filter __rcu *pmu_event_filter; - struct task_struct *nx_lpage_recovery_thread; + struct task_struct *nx_huge_page_recovery_thread; =20 #ifdef CONFIG_X86_64 /* @@ -1304,8 +1315,8 @@ struct kvm_arch { * - tdp_mmu_roots (above) * - tdp_mmu_pages (above) * - the link field of struct kvm_mmu_pages used by the TDP MMU - * - lpage_disallowed_mmu_pages - * - the lpage_disallowed_link field of struct kvm_mmu_pages used + * - possible_nx_huge_pages; + * - the possible_nx_huge_page_link field of struct kvm_mmu_pages used * by the TDP MMU * It is acceptable, but not necessary, to acquire this lock when * the thread holds the MMU lock in write mode. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 564a80a86984..a39dc886c5b8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -802,20 +802,20 @@ static void account_shadowed(struct kvm *kvm, struct = kvm_mmu_page *sp) kvm_flush_remote_tlbs_with_address(kvm, gfn, 1); } =20 -void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp, +void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, bool nx_huge_page_possible) { - if (KVM_BUG_ON(!list_empty(&sp->lpage_disallowed_link), kvm)) + if (KVM_BUG_ON(!list_empty(&sp->possible_nx_huge_page_link), kvm)) return; =20 - sp->lpage_disallowed =3D true; + sp->nx_huge_page_disallowed =3D true; =20 if (!nx_huge_page_possible) return; =20 ++kvm->stat.nx_lpage_splits; - list_add_tail(&sp->lpage_disallowed_link, - &kvm->arch.lpage_disallowed_mmu_pages); + list_add_tail(&sp->possible_nx_huge_page_link, + &kvm->arch.possible_nx_huge_pages); } =20 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) @@ -835,15 +835,15 @@ static void unaccount_shadowed(struct kvm *kvm, struc= t kvm_mmu_page *sp) kvm_mmu_gfn_allow_lpage(slot, gfn); } =20 -void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp) +void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - sp->lpage_disallowed =3D false; + sp->nx_huge_page_disallowed =3D false; =20 - if (list_empty(&sp->lpage_disallowed_link)) + if (list_empty(&sp->possible_nx_huge_page_link)) return; =20 --kvm->stat.nx_lpage_splits; - list_del_init(&sp->lpage_disallowed_link); + list_del_init(&sp->possible_nx_huge_page_link); } =20 static struct kvm_memory_slot * @@ -2136,7 +2136,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page= (struct kvm *kvm, =20 set_page_private(virt_to_page(sp->spt), (unsigned long)sp); =20 - INIT_LIST_HEAD(&sp->lpage_disallowed_link); + INIT_LIST_HEAD(&sp->possible_nx_huge_page_link); =20 /* * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages() @@ -2495,8 +2495,8 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kv= m, zapped_root =3D !is_obsolete_sp(kvm, sp); } =20 - if (sp->lpage_disallowed) - unaccount_huge_nx_page(kvm, sp); + if (sp->nx_huge_page_disallowed) + unaccount_nx_huge_page(kvm, sp); =20 sp->role.invalid =3D 1; =20 @@ -3136,7 +3136,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct= kvm_page_fault *fault) =20 link_shadow_page(vcpu, it.sptep, sp); if (fault->is_tdp && fault->huge_page_disallowed) - account_huge_nx_page(vcpu->kvm, sp, + account_nx_huge_page(vcpu->kvm, sp, fault->req_level >=3D it.level); } =20 @@ -5980,7 +5980,7 @@ int kvm_mmu_init_vm(struct kvm *kvm) =20 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages); - INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages); + INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); =20 r =3D kvm_mmu_init_tdp_mmu(kvm); @@ -6665,7 +6665,7 @@ static int set_nx_huge_pages(const char *val, const s= truct kernel_param *kp) kvm_mmu_zap_all_fast(kvm); mutex_unlock(&kvm->slots_lock); =20 - wake_up_process(kvm->arch.nx_lpage_recovery_thread); + wake_up_process(kvm->arch.nx_huge_page_recovery_thread); } mutex_unlock(&kvm_lock); } @@ -6797,7 +6797,7 @@ static int set_nx_huge_pages_recovery_param(const cha= r *val, const struct kernel mutex_lock(&kvm_lock); =20 list_for_each_entry(kvm, &vm_list, vm_list) - wake_up_process(kvm->arch.nx_lpage_recovery_thread); + wake_up_process(kvm->arch.nx_huge_page_recovery_thread); =20 mutex_unlock(&kvm_lock); } @@ -6805,7 +6805,7 @@ static int set_nx_huge_pages_recovery_param(const cha= r *val, const struct kernel return err; } =20 -static void kvm_recover_nx_lpages(struct kvm *kvm) +static void kvm_recover_nx_huge_pages(struct kvm *kvm) { unsigned long nx_lpage_splits =3D kvm->stat.nx_lpage_splits; int rcu_idx; @@ -6828,23 +6828,25 @@ static void kvm_recover_nx_lpages(struct kvm *kvm) ratio =3D READ_ONCE(nx_huge_pages_recovery_ratio); to_zap =3D ratio ? DIV_ROUND_UP(nx_lpage_splits, ratio) : 0; for ( ; to_zap; --to_zap) { - if (list_empty(&kvm->arch.lpage_disallowed_mmu_pages)) + if (list_empty(&kvm->arch.possible_nx_huge_pages)) break; =20 /* * We use a separate list instead of just using active_mmu_pages - * because the number of lpage_disallowed pages is expected to - * be relatively small compared to the total. + * because the number of shadow pages that be replaced with an + * NX huge page is expected to be relatively small compared to + * the total number of shadow pages. And because the TDP MMU + * doesn't use active_mmu_pages. */ - sp =3D list_first_entry(&kvm->arch.lpage_disallowed_mmu_pages, + sp =3D list_first_entry(&kvm->arch.possible_nx_huge_pages, struct kvm_mmu_page, - lpage_disallowed_link); - WARN_ON_ONCE(!sp->lpage_disallowed); + possible_nx_huge_page_link); + WARN_ON_ONCE(!sp->nx_huge_page_disallowed); if (is_tdp_mmu_page(sp)) { flush |=3D kvm_tdp_mmu_zap_sp(kvm, sp); } else { kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list); - WARN_ON_ONCE(sp->lpage_disallowed); + WARN_ON_ONCE(sp->nx_huge_page_disallowed); } =20 if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) { @@ -6865,7 +6867,7 @@ static void kvm_recover_nx_lpages(struct kvm *kvm) srcu_read_unlock(&kvm->srcu, rcu_idx); } =20 -static long get_nx_lpage_recovery_timeout(u64 start_time) +static long get_nx_huge_page_recovery_timeout(u64 start_time) { bool enabled; uint period; @@ -6876,19 +6878,19 @@ static long get_nx_lpage_recovery_timeout(u64 start= _time) : MAX_SCHEDULE_TIMEOUT; } =20 -static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data) +static int kvm_nx_huge_page_recovery_worker(struct kvm *kvm, uintptr_t dat= a) { u64 start_time; long remaining_time; =20 while (true) { start_time =3D get_jiffies_64(); - remaining_time =3D get_nx_lpage_recovery_timeout(start_time); + remaining_time =3D get_nx_huge_page_recovery_timeout(start_time); =20 set_current_state(TASK_INTERRUPTIBLE); while (!kthread_should_stop() && remaining_time > 0) { schedule_timeout(remaining_time); - remaining_time =3D get_nx_lpage_recovery_timeout(start_time); + remaining_time =3D get_nx_huge_page_recovery_timeout(start_time); set_current_state(TASK_INTERRUPTIBLE); } =20 @@ -6897,7 +6899,7 @@ static int kvm_nx_lpage_recovery_worker(struct kvm *k= vm, uintptr_t data) if (kthread_should_stop()) return 0; =20 - kvm_recover_nx_lpages(kvm); + kvm_recover_nx_huge_pages(kvm); } } =20 @@ -6905,17 +6907,17 @@ int kvm_mmu_post_init_vm(struct kvm *kvm) { int err; =20 - err =3D kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 0, + err =3D kvm_vm_create_worker_thread(kvm, kvm_nx_huge_page_recovery_worker= , 0, "kvm-nx-lpage-recovery", - &kvm->arch.nx_lpage_recovery_thread); + &kvm->arch.nx_huge_page_recovery_thread); if (!err) - kthread_unpark(kvm->arch.nx_lpage_recovery_thread); + kthread_unpark(kvm->arch.nx_huge_page_recovery_thread); =20 return err; } =20 void kvm_mmu_pre_destroy_vm(struct kvm *kvm) { - if (kvm->arch.nx_lpage_recovery_thread) - kthread_stop(kvm->arch.nx_lpage_recovery_thread); + if (kvm->arch.nx_huge_page_recovery_thread) + kthread_stop(kvm->arch.nx_huge_page_recovery_thread); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index cca1ad75d096..67879459a25c 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -57,7 +57,13 @@ struct kvm_mmu_page { bool tdp_mmu_page; bool unsync; u8 mmu_valid_gen; - bool lpage_disallowed; /* Can't be replaced by an equiv large page */ + + /* + * The shadow page can't be replaced by an equivalent huge page + * because it is being used to map an executable page in the guest + * and the NX huge page mitigation is enabled. + */ + bool nx_huge_page_disallowed; =20 /* * The following two entries are used to key the shadow page in the @@ -102,12 +108,12 @@ struct kvm_mmu_page { =20 /* * Tracks shadow pages that, if zapped, would allow KVM to create an NX - * huge page. A shadow page will have lpage_disallowed set but not be - * on the list if a huge page is disallowed for other reasons, e.g. - * because KVM is shadowing a PTE at the same gfn, the memslot isn't - * properly aligned, etc... + * huge page. A shadow page will have nx_huge_page_disallowed set but + * not be on the list if a huge page is disallowed for other reasons, + * e.g. because KVM is shadowing a PTE at the same gfn, the memslot + * isn't properly aligned, etc... */ - struct list_head lpage_disallowed_link; + struct list_head possible_nx_huge_page_link; #ifdef CONFIG_X86_32 /* * Used out of the mmu-lock to avoid reading spte values while an @@ -322,8 +328,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *= fault, u64 spte, int cur_ =20 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); =20 -void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp, +void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, bool nx_huge_page_possible); -void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); =20 #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 260dc8bc3d4f..f7dc752f98f1 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -714,7 +714,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, =20 link_shadow_page(vcpu, it.sptep, sp); if (fault->huge_page_disallowed) - account_huge_nx_page(vcpu->kvm, sp, + account_nx_huge_page(vcpu->kvm, sp, fault->req_level >=3D it.level); } =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 80a4a1a09131..73eb28ed1f03 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -284,7 +284,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm= _vcpu *vcpu) static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep, gfn_t gfn, union kvm_mmu_page_role role) { - INIT_LIST_HEAD(&sp->lpage_disallowed_link); + INIT_LIST_HEAD(&sp->possible_nx_huge_page_link); =20 set_page_private(virt_to_page(sp->spt), (unsigned long)sp); =20 @@ -403,8 +403,8 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct k= vm_mmu_page *sp, lockdep_assert_held_write(&kvm->mmu_lock); =20 list_del(&sp->link); - if (sp->lpage_disallowed) - unaccount_huge_nx_page(kvm, sp); + if (sp->nx_huge_page_disallowed) + unaccount_nx_huge_page(kvm, sp); =20 if (shared) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); @@ -1143,7 +1143,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td= p_iter *iter, spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_add(&sp->link, &kvm->arch.tdp_mmu_pages); if (account_nx) - account_huge_nx_page(kvm, sp, true); + account_nx_huge_page(kvm, sp, true); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); tdp_account_mmu_page(kvm, sp); =20 --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 163F8ECAAA1 for ; Tue, 30 Aug 2022 23:56:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229913AbiH3X4J (ORCPT ); Tue, 30 Aug 2022 19:56:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231438AbiH3Xz4 (ORCPT ); Tue, 30 Aug 2022 19:55:56 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 533CE5A167 for ; Tue, 30 Aug 2022 16:55:48 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id y9-20020a17090322c900b00174c881abaeso4548952plg.6 for ; Tue, 30 Aug 2022 16:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=dW3v8nB6rc55x7pB4U5xsxMaQfNaI01R1eoBhCvYIrI=; b=rB5o95pn0R9HwtJdKuLSb27vhuSQuA2+gy5rIyhHwvrMhA+Ny5z+qBQ66sXAbq3w5o Rku2zKHhxrQhRsgkvKBzwPbHXYLbI6MSQCk72a/Qm6SYfLdHhdztGeKLowZ4b8cvqgSw Bd4fmllHd5n3ayZM5Mq9I9CX9p8B3wD/gac7gbEqU7UrjKg6fnZpxw4NIw7jjC/je+Zq irzaINF6nP6JZleLJdYTUvg0NL6LgSTaA9cZ7vV9RoosGLZN0+QeYKOHS0TKCA5H+MVG tQD2qDarGnbMtEl2MSORgWmFRMxa6g1vKY6H5LEi2kjlcmncnc8FOrt5w1Q382SqghJQ SoVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=dW3v8nB6rc55x7pB4U5xsxMaQfNaI01R1eoBhCvYIrI=; b=EiFGi0ON4c+gDNmrSmHyMvJSi/oVAHDGwlhf9ZG5pknG11tx7SVExTfT7HmyN/efPf ooPzUuugYYrAKoBed3N5XYgEwh2oQu8xUvJhmUZq/X72tzvbyrjqQ+AITUvxT/A0eJZX RblpLOprRpZ2QLrCJf6kGVDRClh2+4hhSaDwNgGW+Lljr/K3eAksAtvklrt3CpWXVOcT +JnevkddydbZmrBhydDsG6fu5T0k90x5fNLW7419trr8EW72sgN2KaQmQ3ouSAfsfSKP shMTwt9++LQOkGLdzre6NBhr4Y7pFvAslboZ8Q1metWeA/UOnRDx1tS6KVYXOz372gcG 3rBw== X-Gm-Message-State: ACgBeo2BRXVay48uZqxwN9RiauP9pLmZe9mJXvVAtjMtdYKGRO7TkDWn 3b8Dg03ZQQgox4s7TjKUmPGZfeDeKbE= X-Google-Smtp-Source: AA6agR4ROM8mxlz/MLLrkmUlD65Ei6K1cq7X2/ast/Mc3yX2lMU40jjXsqWJkg61bd5HrphHm2EWvxJrK9k= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1384:b0:538:73c5:91ff with SMTP id t4-20020a056a00138400b0053873c591ffmr7841260pfg.54.1661903746666; Tue, 30 Aug 2022 16:55:46 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:32 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-5-seanjc@google.com> Subject: [PATCH v4 4/9] KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Account and track NX huge pages for nonpaging MMUs so that a future enhancement to precisely check if a shadow page can't be replaced by a NX huge page doesn't get false positives. Without correct tracking, KVM can get stuck in a loop if an instruction is fetching and writing data on the same huge page, e.g. KVM installs a small executable page on the fetch fault, replaces it with an NX huge page on the write fault, and faults again on the fetch. Alternatively, and perhaps ideally, KVM would simply not enforce the workaround for nonpaging MMUs. The guest has no page tables to abuse and KVM is guaranteed to switch to a different MMU on CR0.PG being toggled so there's no security or performance concerns. However, getting make_spte() to play nice now and in the future is unnecessarily complex. In the current code base, make_spte() can enforce the mitigation if TDP is enabled or the MMU is indirect, but make_spte() may not always have a vCPU/MMU to work with, e.g. if KVM were to support in-line huge page promotion when disabling dirty logging. Without a vCPU/MMU, KVM could either pass in the correct information and/or derive it from the shadow page, but the former is ugly and the latter subtly non-trivial due to the possibility of direct shadow pages in indirect MMUs. Given that using shadow paging with an unpaged guest is far from top priority _and_ has been subjected to the workaround since its inception, keep it simple and just fix the accounting glitch. Signed-off-by: Sean Christopherson Reviewed-by: David Matlack Reviewed-by: Mingwei Zhang --- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/spte.c | 12 ++++++++++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a39dc886c5b8..04eb87f5a39d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3135,7 +3135,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct= kvm_page_fault *fault) continue; =20 link_shadow_page(vcpu, it.sptep, sp); - if (fault->is_tdp && fault->huge_page_disallowed) + if (fault->huge_page_disallowed) account_nx_huge_page(vcpu->kvm, sp, fault->req_level >=3D it.level); } diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 2e08b2a45361..c0fd7e049b4e 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -161,6 +161,18 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_p= age *sp, if (!prefetch) spte |=3D spte_shadow_accessed_mask(spte); =20 + /* + * For simplicity, enforce the NX huge page mitigation even if not + * strictly necessary. KVM could ignore the mitigation if paging is + * disabled in the guest, as the guest doesn't have an page tables to + * abuse. But to safely ignore the mitigation, KVM would have to + * ensure a new MMU is loaded (or all shadow pages zapped) when CR0.PG + * is toggled on, and that's a net negative for performance when TDP is + * enabled. When TDP is disabled, KVM will always switch to a new MMU + * when CR0.PG is toggled, but leveraging that to ignore the mitigation + * would tie make_spte() further to vCPU/MMU state, and add complexity + * just to optimize a mode that is anything but performance critical. + */ if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(vcpu->kvm)) { pte_access &=3D ~ACC_EXEC_MASK; --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E4E1ECAAD4 for ; Tue, 30 Aug 2022 23:56:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230171AbiH3X4P (ORCPT ); Tue, 30 Aug 2022 19:56:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230029AbiH3Xz5 (ORCPT ); Tue, 30 Aug 2022 19:55:57 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B20F95EDF3 for ; Tue, 30 Aug 2022 16:55:48 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id m15-20020a170902db0f00b001753b1c5adeso882663plx.18 for ; Tue, 30 Aug 2022 16:55:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date; bh=v857NRLjw5TFD2WSt6Gh77XixyhkWGhFH1qnXRfNvO0=; b=Ht0kj4DaKZUo79Gx4BG8GR/PjYLs3Jk6L9GjvPJG5Sdq2eH/e5vwZ/R6es9s67pS+U 7C+y8M2Z7VR9/PF0b/FZC2fmUdBNOFN/dUZFF4ZAlRUQ91gIqMV27868w+3gwM/GUpqQ BsXRXxjYxqjWz3wLcpfz4yXkdSxfse4bBtT7sRuRvjzdHJdgt+vEPc3DSCc4Yxzd/t7c TcMje/OrZEBbSk1RMoTMrDIE6t8wR8YtJ97ADIfABiuWU8pV45jH8gWeLyjV4kvuGIJR XFD5KjiXAjkS0yKaD274f8XwWWtgPpzXNzg4IB4GDxc80vNZxo37u9HKA5XiAyXjdlCv teWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date; bh=v857NRLjw5TFD2WSt6Gh77XixyhkWGhFH1qnXRfNvO0=; b=nRSHlCsdOFkzi9ipMKDrnLtZhC3k6UEwzkTFEFCVfB+Gki0EsGEz0jfrlTamQKiELD 9581RCnQRf+KXo/AFK7nSyfbloCmHqJ5POqrMvD7fFJYnIQnrrIFQalMyblzvsv226db Uq8c4porUXzWr7qleX/uW3lx28Tr5dGGROYpu6JnmVBdKFtFVHDAhot5IDgLCJ6LECdY l/wms3kA9iobZeMsy+R+/wkz6YXqWka2u92Dw1obQW9kD1yM5WDnId292K2xCl+mi5Gl wpz9DC5tEn3zrB9lJ/2B9tj9YZtznl4Q04lZ5alq77fa8Veb1NGQSlU6rGlFSh24roGu SUJA== X-Gm-Message-State: ACgBeo1gLG6NPGl8JieR79uf3tavMv3q5VKJ9F3eeFPqzDmG9CDMAMp1 JUBS1Q3jabe91wgprw3/w74NMtV7vrY= X-Google-Smtp-Source: AA6agR5KBvD4D+PkeJubo5qFF3e8xWagdr5p4yv2rT/E/1JwS0ifyDhBReFnG8D03QxuKQ4W24Yh6SJPgEU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:ced0:b0:172:e189:f709 with SMTP id d16-20020a170902ced000b00172e189f709mr23008852plg.63.1661903748416; Tue, 30 Aug 2022 16:55:48 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:33 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-6-seanjc@google.com> Subject: [PATCH v4 5/9] KVM: x86/mmu: Document implicit barriers/ordering in TDP MMU shared mode From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add comments to the tdp_mmu_set_spte_atomic() and kvm_tdp_mmu_read_spte() provide ordering guarantees to ensure that any changes made to a child shadow page are guaranteed to be visible before a SPTE is marked present, e.g. that there's no risk of concurrent readers observing a stale PFN for a shadow-present SPTE. No functional change intended. Suggested-by: Paolo Bonzini Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_iter.h | 6 ++++++ arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h index f0af385c56e0..9d982ccf4567 100644 --- a/arch/x86/kvm/mmu/tdp_iter.h +++ b/arch/x86/kvm/mmu/tdp_iter.h @@ -13,6 +13,12 @@ * to be zapped while holding mmu_lock for read, and to allow TLB flushes = to be * batched without having to collect the list of zapped SPs. Flows that c= an * remove SPs must service pending TLB flushes prior to dropping RCU prote= ction. + * + * The READ_ONCE() ensures that, if the SPTE points at a child shadow page= , all + * fields in struct kvm_mmu_page will be read after the caller observes the + * present SPTE (KVM must check that the SPTE is present before following = the + * SPTE's pfn to its associated shadow page). Pairs with the implicit mem= ory + * barrier in tdp_mmu_set_spte_atomic(). */ static inline u64 kvm_tdp_mmu_read_spte(tdp_ptep_t sptep) { diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 73eb28ed1f03..d1079fabe14c 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -658,6 +658,11 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *= kvm, lockdep_assert_held_read(&kvm->mmu_lock); =20 /* + * The atomic CMPXCHG64 provides an implicit memory barrier and ensures + * that, if the SPTE points at a shadow page, all struct kvm_mmu_page + * fields are visible to readers before the SPTE is marked present. + * Pairs with ordering guarantees provided by kvm_tdp_mmu_read_spte(). + * * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and * does not hold the mmu_lock. */ --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD727ECAAD4 for ; Tue, 30 Aug 2022 23:56:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231511AbiH3X4W (ORCPT ); Tue, 30 Aug 2022 19:56:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230132AbiH3Xz5 (ORCPT ); Tue, 30 Aug 2022 19:55:57 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F2796CF7B for ; Tue, 30 Aug 2022 16:55:50 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id m15-20020a170902db0f00b001753b1c5adeso882690plx.18 for ; Tue, 30 Aug 2022 16:55:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=OEXYXOPsjQnvNo18Y8S1WOEsrNik2WeeLM8VKO0HexY=; b=kFN6altA4K9S6B0wKomy0TqJTB21e/0j6Xgj5NlTOaR+uX8wEXshLknKJI7JO3JtVi US0gQduwpxMw82FObW2HfDrmM+eOVPERM6cHyXd2+bKEIEhQxJBbnpiFYTL6vD9h6KvQ +mDdUMa31lF4FdAU4uRWKnTp2ug7HXmGRUc48qqlxgJRXQVy0MJFNmc+LNI768Rfw/J/ ZS++x29sYph08bjTc7E8P37E0frBdH0JTzUPAEWKD6r5Ru+hpE5Oh9cDKuvot9lC4ZpH /b7nHBJ9By84kNXsTYcFmiMqG6vHvPtoaYr+DG6axOpVX70hcR3ujjLl2epGWLyB3tas Dmtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=OEXYXOPsjQnvNo18Y8S1WOEsrNik2WeeLM8VKO0HexY=; b=v49oR3yXqi3ceHfENUj4DIUvqLMzj01LUFhZVVgpkcHjibsSK6df0Mi7Ow0UyAzoIF laGng2ZV7fcePMEluEjbWYur6eke3e/2ZkrqKHkZbe6rfFndYuQQTuYNlzIfRFfwdEq8 oFqO0nu0tYIAi6tMHt2lk9u+wtU5AlEFbc9v10S/gSXRrsw+2wz+ZjA1Okt6ncw2aoEF ttVwQBO+H3zcW2eXX7Wi+fj4UADSDAVn9qfb7gOK5QbNl9KaX/8jwsOjB/Lh9ROUDrjJ vdRUcAPr6Ld82wfp4armKLjsp8ann6MMcV2AkWH7WFMIP7dOFAtv8Pz2lMLsuhjfD/Qn YQww== X-Gm-Message-State: ACgBeo2mt8r+m/aKrIf924bBNd1wMaCHZEoXd+D2UgiE3uMJnM9bpWdA Asyaec/aVv8VDn/HAtI1dFzvIyV9EHM= X-Google-Smtp-Source: AA6agR6VJzTsPSk8HW+30rJ7QUwgbVeJU6Eo+R4UIW/wTF1ueU/+7jHW7lYRsaJpC/7SGIwsPRCBQp3HFrs= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:ce88:b0:174:9e8e:ee39 with SMTP id f8-20020a170902ce8800b001749e8eee39mr14420990plg.71.1661903749736; Tue, 30 Aug 2022 16:55:49 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:34 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-7-seanjc@google.com> Subject: [PATCH v4 6/9] KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting SPTE From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Set nx_huge_page_disallowed in TDP MMU shadow pages before making the SP visible to other readers, i.e. before setting its SPTE. This will allow KVM to query the flag when determining if a shadow page can be replaced by a NX huge page without violating the rules of the mitigation. Note, the shadow/legacy MMU holds mmu_lock for write, so it's impossible for another CPU to see a shadow page without an up-to-date nx_huge_page_disallowed, i.e. only the TDP MMU needs the complicated dance. Signed-off-by: Sean Christopherson Reviewed-by: David Matlack --- arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++---------- arch/x86/kvm/mmu/mmu_internal.h | 5 ++--- arch/x86/kvm/mmu/tdp_mmu.c | 31 ++++++++++++++++++------------- 3 files changed, 38 insertions(+), 26 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 04eb87f5a39d..de06c1f87635 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -802,22 +802,25 @@ static void account_shadowed(struct kvm *kvm, struct = kvm_mmu_page *sp) kvm_flush_remote_tlbs_with_address(kvm, gfn, 1); } =20 -void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, - bool nx_huge_page_possible) +void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) { if (KVM_BUG_ON(!list_empty(&sp->possible_nx_huge_page_link), kvm)) return; =20 - sp->nx_huge_page_disallowed =3D true; - - if (!nx_huge_page_possible) - return; - ++kvm->stat.nx_lpage_splits; list_add_tail(&sp->possible_nx_huge_page_link, &kvm->arch.possible_nx_huge_pages); } =20 +static void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool nx_huge_page_possible) +{ + sp->nx_huge_page_disallowed =3D true; + + if (nx_huge_page_possible) + track_possible_nx_huge_page(kvm, sp); +} + static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) { struct kvm_memslots *slots; @@ -835,10 +838,8 @@ static void unaccount_shadowed(struct kvm *kvm, struct= kvm_mmu_page *sp) kvm_mmu_gfn_allow_lpage(slot, gfn); } =20 -void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp) +void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s= p) { - sp->nx_huge_page_disallowed =3D false; - if (list_empty(&sp->possible_nx_huge_page_link)) return; =20 @@ -846,6 +847,13 @@ void unaccount_nx_huge_page(struct kvm *kvm, struct kv= m_mmu_page *sp) list_del_init(&sp->possible_nx_huge_page_link); } =20 +static void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s= p) +{ + sp->nx_huge_page_disallowed =3D false; + + untrack_possible_nx_huge_page(kvm, sp); +} + static struct kvm_memory_slot * gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 67879459a25c..22152241bd29 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -328,8 +328,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *= fault, u64 spte, int cur_ =20 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); =20 -void account_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp, - bool nx_huge_page_possible); -void unaccount_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s= p); =20 #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d1079fabe14c..fd38465aee9e 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -403,8 +403,11 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct = kvm_mmu_page *sp, lockdep_assert_held_write(&kvm->mmu_lock); =20 list_del(&sp->link); - if (sp->nx_huge_page_disallowed) - unaccount_nx_huge_page(kvm, sp); + + if (sp->nx_huge_page_disallowed) { + sp->nx_huge_page_disallowed =3D false; + untrack_possible_nx_huge_page(kvm, sp); + } =20 if (shared) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); @@ -1123,16 +1126,13 @@ static int tdp_mmu_map_handle_target_level(struct k= vm_vcpu *vcpu, * @kvm: kvm instance * @iter: a tdp_iter instance currently on the SPTE that should be set * @sp: The new TDP page table to install. - * @account_nx: True if this page table is being installed to split a - * non-executable huge page. * @shared: This operation is running under the MMU lock in read mode. * * Returns: 0 if the new page table was installed. Non-0 if the page table * could not be installed (e.g. the atomic compare-exchange faile= d). */ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter, - struct kvm_mmu_page *sp, bool account_nx, - bool shared) + struct kvm_mmu_page *sp, bool shared) { u64 spte =3D make_nonleaf_spte(sp->spt, !kvm_ad_enabled()); int ret =3D 0; @@ -1147,8 +1147,6 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td= p_iter *iter, =20 spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_add(&sp->link, &kvm->arch.tdp_mmu_pages); - if (account_nx) - account_nx_huge_page(kvm, sp, true); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); tdp_account_mmu_page(kvm, sp); =20 @@ -1162,6 +1160,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td= p_iter *iter, int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_mmu *mmu =3D vcpu->arch.mmu; + struct kvm *kvm =3D vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; int ret; @@ -1198,9 +1197,6 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) } =20 if (!is_shadow_present_pte(iter.old_spte)) { - bool account_nx =3D fault->huge_page_disallowed && - fault->req_level >=3D iter.level; - /* * If SPTE has been frozen by another thread, just * give up and retry, avoiding unnecessary page table @@ -1212,10 +1208,19 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault) sp =3D tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_child_sp(sp, &iter); =20 - if (tdp_mmu_link_sp(vcpu->kvm, &iter, sp, account_nx, true)) { + sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed; + + if (tdp_mmu_link_sp(kvm, &iter, sp, true)) { tdp_mmu_free_sp(sp); break; } + + if (fault->huge_page_disallowed && + fault->req_level >=3D iter.level) { + spin_lock(&kvm->arch.tdp_mmu_pages_lock); + track_possible_nx_huge_page(kvm, sp); + spin_unlock(&kvm->arch.tdp_mmu_pages_lock); + } } } =20 @@ -1503,7 +1508,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, s= truct tdp_iter *iter, * correctness standpoint since the translation will be the same either * way. */ - ret =3D tdp_mmu_link_sp(kvm, iter, sp, false, shared); + ret =3D tdp_mmu_link_sp(kvm, iter, sp, shared); if (ret) goto out; =20 --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85E5CECAAD4 for ; Tue, 30 Aug 2022 23:56:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232083AbiH3X41 (ORCPT ); Tue, 30 Aug 2022 19:56:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231747AbiH3Xz6 (ORCPT ); Tue, 30 Aug 2022 19:55:58 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07C636FA35 for ; Tue, 30 Aug 2022 16:55:51 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id s8-20020a170902ea0800b00172e456031eso8776965plg.3 for ; Tue, 30 Aug 2022 16:55:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=1t4LRP5kAW8XRqpj1wJe7nP5AI7g+bKmcPU7nOyMrf8=; b=P2q0Mo8lxezffgZkWNz/cJGfzYrkhJsiVF4lxTC20833HawT9KSi57N35KCei492Mf R8PuhpkKkW0NfzWLri9iSdQvEu4/9tve/4YNoq3RXSIe+6wwCtvXIpAszq2PLErBCf3S Bs+VWZK8vB+VU25wwtcOOwGWS/BrofxdldwT96sZIp34HeRbvNl1bf508TfCnEoH6I1t rVwm4usJvlriahOIy8EXueYjMWN/8bVUwD3UrO7MKH0pQkoDh5F+DIMM1QKE95WnmXep bBRwSBT78xOuC4WMUP+uPKUBtGeYMID60ar0EZjunNkWegWhMIxASSeFrv+Y8BFygQcH V7WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=1t4LRP5kAW8XRqpj1wJe7nP5AI7g+bKmcPU7nOyMrf8=; b=7Wil/a+eYHbKqlazBprnQi6ZoRs3kJTDE9g+S6xwhsqdftI7YakyVXZSp8r4OPt/Iy dBYJN88MNmk8XjcEXSHEUUxQJIW7Z3lx/xvy1dqLN/e1VkSYQRdbV07/HUbiFygmge5h PfSZXCuWL819N40LyY9g8nkd5A1c5vAaq6CVTV6M/Y94eLhHZDWyLfyFwJHjLz/c4sAk yERQ4l3BCvR5V0gE/Wso/WDD8hNO/JZTtmUjQnGPXQqbHflZexkgcjmnTrf2XA5duInb Yu07CyOFyox1ZU5O191y1lzVq/GMaurNdDZ5uAx7qHVoJN7ZuD3hN7MmaleCUnwacI5j l+Ew== X-Gm-Message-State: ACgBeo0GSHwytct8GVMA5+rLiVcMPYukulkzqFPHKHUjnAh3qqotD97y vvMkQuuD4U6FL4gjPrkxHk6DQxqyIHM= X-Google-Smtp-Source: AA6agR4e7pm/YjX4nNdaGaNNAHx9YKfykLv7oIdGbw7GzAdJrsvShZEBKqqGO3VU0JJRnB29t31ujpWlURU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:aa01:b0:172:b0dc:ba40 with SMTP id be1-20020a170902aa0100b00172b0dcba40mr23639626plb.101.1661903751484; Tue, 30 Aug 2022 16:55:51 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:35 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-8-seanjc@google.com> Subject: [PATCH v4 7/9] KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual pages From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Track the number of TDP MMU "shadow" pages instead of tracking the pages themselves. With the NX huge page list manipulation moved out of the common linking flow, elminating the list-based tracking means the happy path of adding a shadow page doesn't need to acquire a spinlock and can instead inc/dec an atomic. Keep the tracking as the WARN during TDP MMU teardown on leaked shadow pages is very, very useful for detecting KVM bugs. Tracking the number of pages will also make it trivial to expose the counter to userspace as a stat in the future, which may or may not be desirable. Note, the TDP MMU needs to use a separate counter (and stat if that ever comes to be) from the existing n_used_mmu_pages. The TDP MMU doesn't bother supporting the shrinker nor does it honor KVM_SET_NR_MMU_PAGES (because the TDP MMU consumes so few pages relative to shadow paging), and including TDP MMU pages in that counter would break both the shrinker and shadow MMUs, e.g. if a VM is using nested TDP. Cc: Yan Zhao Reviewed-by: Mingwei Zhang Reviewed-by: David Matlack Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 11 +++-------- arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++----------- 2 files changed, 12 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 48e51600f1be..6c2113e6d19c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1282,6 +1282,9 @@ struct kvm_arch { */ bool tdp_mmu_enabled; =20 + /* The number of TDP MMU pages across all roots. */ + atomic64_t tdp_mmu_pages; + /* * List of struct kvm_mmu_pages being used as roots. * All struct kvm_mmu_pages in the list should have @@ -1302,18 +1305,10 @@ struct kvm_arch { */ struct list_head tdp_mmu_roots; =20 - /* - * List of struct kvmp_mmu_pages not being used as roots. - * All struct kvm_mmu_pages in the list should have - * tdp_mmu_page set and a tdp_mmu_root_count of 0. - */ - struct list_head tdp_mmu_pages; - /* * Protects accesses to the following fields when the MMU lock * is held in read mode: * - tdp_mmu_roots (above) - * - tdp_mmu_pages (above) * - the link field of struct kvm_mmu_pages used by the TDP MMU * - possible_nx_huge_pages; * - the possible_nx_huge_page_link field of struct kvm_mmu_pages used diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index fd38465aee9e..92ad533f4f25 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -29,7 +29,6 @@ int kvm_mmu_init_tdp_mmu(struct kvm *kvm) kvm->arch.tdp_mmu_enabled =3D true; INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots); spin_lock_init(&kvm->arch.tdp_mmu_pages_lock); - INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages); kvm->arch.tdp_mmu_zap_wq =3D wq; return 1; } @@ -54,7 +53,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) /* Also waits for any queued work items. */ destroy_workqueue(kvm->arch.tdp_mmu_zap_wq); =20 - WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages)); + WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages)); WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots)); =20 /* @@ -377,11 +376,13 @@ static void handle_changed_spte_dirty_log(struct kvm = *kvm, int as_id, gfn_t gfn, static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { kvm_account_pgtable_pages((void *)sp->spt, +1); + atomic64_inc(&kvm->arch.tdp_mmu_pages); } =20 static void tdp_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *s= p) { kvm_account_pgtable_pages((void *)sp->spt, -1); + atomic64_dec(&kvm->arch.tdp_mmu_pages); } =20 /** @@ -397,17 +398,17 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct= kvm_mmu_page *sp, bool shared) { tdp_unaccount_mmu_page(kvm, sp); + + if (!sp->nx_huge_page_disallowed) + return; + if (shared) spin_lock(&kvm->arch.tdp_mmu_pages_lock); else lockdep_assert_held_write(&kvm->mmu_lock); =20 - list_del(&sp->link); - - if (sp->nx_huge_page_disallowed) { - sp->nx_huge_page_disallowed =3D false; - untrack_possible_nx_huge_page(kvm, sp); - } + sp->nx_huge_page_disallowed =3D false; + untrack_possible_nx_huge_page(kvm, sp); =20 if (shared) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); @@ -1145,9 +1146,6 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct td= p_iter *iter, tdp_mmu_set_spte(kvm, iter, spte); } =20 - spin_lock(&kvm->arch.tdp_mmu_pages_lock); - list_add(&sp->link, &kvm->arch.tdp_mmu_pages); - spin_unlock(&kvm->arch.tdp_mmu_pages_lock); tdp_account_mmu_page(kvm, sp); =20 return 0; --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 714D5ECAAA1 for ; Tue, 30 Aug 2022 23:57:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231167AbiH3X5C (ORCPT ); Tue, 30 Aug 2022 19:57:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231717AbiH3X4j (ORCPT ); Tue, 30 Aug 2022 19:56:39 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7AF271BF8 for ; Tue, 30 Aug 2022 16:55:54 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id q7-20020a63e947000000b004297f1e1f86so6197069pgj.12 for ; Tue, 30 Aug 2022 16:55:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=Djp3CXhw97imusM87qiMQnhEDBYlX/mb/nkPMjzyLGc=; b=RcXGoYFdqVscbUJ83ofAPJuLPdh5Y43S2GvyFTmMbuLk8YKjr2uRLfFJ2wKT+wkPFN /3YVCWJMNvTZG0O3MDKPGMXfDmWGDfL+USqDAMoh/Ppzu8Pu5SzZrX3kNOvJ1MapzB50 eqnAyUGDpVcf+046aPpiiOEB6kLPzaOVRXBeCSmFY5cYmod+Klcuye1PDX89IjPB7ToF XLxSesmFTk8WN6mlxXM1NTk0Atuu+3FCR72Owz3WpQ8SpOlewzb85nMOILh3BdEVgf9Z j5z1C5o6h0IBW0Q2MVenfN2MSgSrOq0JFszq/3JpLyBjrFpsi/muF3iNopOwN9gXEPQe hnOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=Djp3CXhw97imusM87qiMQnhEDBYlX/mb/nkPMjzyLGc=; b=67JTfEtzzOGAOr42JskWYs1PNKd2zT/TCh7Du6T+5Tfe0ocx8745ujMkmDL8/D/Akc IeGIvrLBUcvSCIorJfNiDPT1KnszlKvPyOoJPZJBjyj6c05sS0oSzvSIUR924VkoV6PE 7eavhaomjmJ7JuI/pAd83Wu7u0lY82CTWcpm3GAN6F1i24M3A+UFmir5o7Gzq9f+/y9t o9w+aDy0rgmFT/p2fWRcefY7qOX/Wms/N0p8OtBRRZ3bin1yRR+fmJBa/29TgvlS7YZv QTi1h6BrA8204bXG9qpcigSdj3UU3+eDuNIp182Xoqdx+0vPrTDvHHTxT+AbhlIZ0LWp fcKA== X-Gm-Message-State: ACgBeo1ds6qaXLstw3xUG88DvQCHg4X+htP7lV3eULzigQ5z4DNYKh7o QllN22cQSr16qpzUuWYmS2ZuMtiAuhY= X-Google-Smtp-Source: AA6agR5qvKfNO3A+YaO7bFe5xrcBFzFnWapEEKvmJAxGpy4+c5IU9DOBxOYlUNevzADEpUnEeymJXH35EoQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:902:e883:b0:175:22e8:f30a with SMTP id w3-20020a170902e88300b0017522e8f30amr4430989plg.127.1661903753536; Tue, 30 Aug 2022 16:55:53 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:36 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-9-seanjc@google.com> Subject: [PATCH v4 8/9] KVM: x86/mmu: Add helper to convert SPTE value to its shadow page From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a helper to convert a SPTE to its shadow page to deduplicate a variety of flows and hopefully avoid future bugs, e.g. if KVM attempts to get the shadow page for a SPTE without dropping high bits. Opportunistically add a comment in mmu_free_root_page() documenting why it treats the root HPA as a SPTE. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 17 ++++++++++------- arch/x86/kvm/mmu/mmu_internal.h | 12 ------------ arch/x86/kvm/mmu/spte.h | 17 +++++++++++++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 ++ 4 files changed, 29 insertions(+), 19 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index de06c1f87635..4737da767a40 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1808,7 +1808,7 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, continue; } =20 - child =3D to_shadow_page(ent & SPTE_BASE_ADDR_MASK); + child =3D spte_to_child_sp(ent); =20 if (child->unsync_children) { if (mmu_pages_add(pvec, child, i)) @@ -2367,7 +2367,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcp= u, u64 *sptep, * so we should update the spte at this point to get * a new sp with the correct access. */ - child =3D to_shadow_page(*sptep & SPTE_BASE_ADDR_MASK); + child =3D spte_to_child_sp(*sptep); if (child->role.access =3D=3D direct_access) return; =20 @@ -2388,7 +2388,7 @@ static int mmu_page_zap_pte(struct kvm *kvm, struct k= vm_mmu_page *sp, if (is_last_spte(pte, sp->role.level)) { drop_spte(kvm, spte); } else { - child =3D to_shadow_page(pte & SPTE_BASE_ADDR_MASK); + child =3D spte_to_child_sp(pte); drop_parent_pte(child, spte); =20 /* @@ -2827,7 +2827,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct= kvm_memory_slot *slot, struct kvm_mmu_page *child; u64 pte =3D *sptep; =20 - child =3D to_shadow_page(pte & SPTE_BASE_ADDR_MASK); + child =3D spte_to_child_sp(pte); drop_parent_pte(child, sptep); flush =3D true; } else if (pfn !=3D spte_to_pfn(*sptep)) { @@ -3439,7 +3439,11 @@ static void mmu_free_root_page(struct kvm *kvm, hpa_= t *root_hpa, if (!VALID_PAGE(*root_hpa)) return; =20 - sp =3D to_shadow_page(*root_hpa & SPTE_BASE_ADDR_MASK); + /* + * The "root" may be a special root, e.g. a PAE entry, treat it as a + * SPTE to ensure any non-PA bits are dropped. + */ + sp =3D spte_to_child_sp(*root_hpa); if (WARN_ON(!sp)) return; =20 @@ -3924,8 +3928,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) hpa_t root =3D vcpu->arch.mmu->pae_root[i]; =20 if (IS_VALID_PAE_ROOT(root)) { - root &=3D SPTE_BASE_ADDR_MASK; - sp =3D to_shadow_page(root); + sp =3D spte_to_child_sp(root); mmu_sync_children(vcpu, sp, true); } } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 22152241bd29..dbaf6755c5a7 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -133,18 +133,6 @@ struct kvm_mmu_page { =20 extern struct kmem_cache *mmu_page_header_cache; =20 -static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) -{ - struct page *page =3D pfn_to_page(shadow_page >> PAGE_SHIFT); - - return (struct kvm_mmu_page *)page_private(page); -} - -static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep) -{ - return to_shadow_page(__pa(sptep)); -} - static inline int kvm_mmu_role_as_id(union kvm_mmu_page_role role) { return role.smm ? 1 : 0; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 7670c13ce251..7e5343339b90 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -219,6 +219,23 @@ static inline int spte_index(u64 *sptep) */ extern u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; =20 +static inline struct kvm_mmu_page *to_shadow_page(hpa_t shadow_page) +{ + struct page *page =3D pfn_to_page((shadow_page) >> PAGE_SHIFT); + + return (struct kvm_mmu_page *)page_private(page); +} + +static inline struct kvm_mmu_page *spte_to_child_sp(u64 spte) +{ + return to_shadow_page(spte & SPTE_BASE_ADDR_MASK); +} + +static inline struct kvm_mmu_page *sptep_to_sp(u64 *sptep) +{ + return to_shadow_page(__pa(sptep)); +} + static inline bool is_mmio_spte(u64 spte) { return (spte & shadow_mmio_mask) =3D=3D shadow_mmio_value && diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index c163f7cc23ca..d3714200b932 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -5,6 +5,8 @@ =20 #include =20 +#include "spte.h" + hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu); =20 __must_check static inline bool kvm_tdp_mmu_get_root(struct kvm_mmu_page *= root) --=20 2.37.2.672.g94769d06f0-goog From nobody Tue Apr 7 05:43:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47124ECAAD5 for ; Tue, 30 Aug 2022 23:56:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229896AbiH3X4t (ORCPT ); Tue, 30 Aug 2022 19:56:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231828AbiH3X4U (ORCPT ); Tue, 30 Aug 2022 19:56:20 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2B3354CA2 for ; Tue, 30 Aug 2022 16:55:56 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id m15-20020a170902db0f00b001753b1c5adeso882806plx.18 for ; Tue, 30 Aug 2022 16:55:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc; bh=y3kz8obqLBonD5tHztG2UKXsESWQEw13imz9n7vKM28=; b=QO4pr/OUWywfeGPew6Vl8vnqc3SYijsxKpAsnxfE5RTke1B7w5vPsYPaSlXahQSRM0 IFSjVWD2W8wZzhkPsUga2Qcy4LplXxzYUCdwOQa2fTzBcFfUmwDswKPc2KGBUp5hKseA 8jRRBhirZ1LvG1KiamPdIh/XZXfZQ4Lv6Syx6aZnoEZmL+WBGvq2MZUoSE6oKYbP7EQ0 hAwP/Hjnhd2/AgR1OYTOgk1cZsAulgKYVxH3dYIuLW0pTe1LMispqZh7Jq2cP4Vqm9tz Jk347ukWwIPKiPZJv5FY12eMr59Ss8UKeSIGU+ss+mtyFLw51QCHcsavhOdZTEllbWE/ DO8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc; bh=y3kz8obqLBonD5tHztG2UKXsESWQEw13imz9n7vKM28=; b=QfZ8EbhwPaWO+RByqreRwuyByqRnoVFasVvTxX70xq0+GVkcs9kQSIMFJP8tdArqWS 1nhyRDgWRX818xBQ6LtdJvILGyXce75A6vKiOjBNBmwlhfWkmPeQORUVmB1jvLn6wcRV SGi0KHmZjB/NBPN3Mr13SGNvwjJVgFjxQCEXJ7berFGd50O/xTJOZmd/UWIjhkBtcMaF N8U2z6wULKZgKMpuKh93weqIGgAu50D4uUE5ZS4cmmgyNkhK8OT4vuQOGZk8kMwe59Dc m8HDyw6EZtIBV7XIGVMfQQROzhog8Pwe06pkd8pSOV3OcHu3mVrNxFg1vEJ85N4BmVIp aXrw== X-Gm-Message-State: ACgBeo3mrrLLEeOPuhHi2hcDDKixJezFqa8I319uvUX953MBBHWkEwIN YX3gcRlAEVsd5Ry8sKptF2waQhvSkfo= X-Google-Smtp-Source: AA6agR4HYZOMjM4W62nKOMIMchdSJkN3Hvv4lk2xn8FabErsvrvhPNo3yhKmT05mKEsy59hy1iNCMgt1VDI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90a:e558:b0:1fb:c4b7:1a24 with SMTP id ei24-20020a17090ae55800b001fbc4b71a24mr18202pjb.1.1661903754875; Tue, 30 Aug 2022 16:55:54 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 30 Aug 2022 23:55:37 +0000 In-Reply-To: <20220830235537.4004585-1-seanjc@google.com> Mime-Version: 1.0 References: <20220830235537.4004585-1-seanjc@google.com> X-Mailer: git-send-email 2.37.2.672.g94769d06f0-goog Message-ID: <20220830235537.4004585-10-seanjc@google.com> Subject: [PATCH v4 9/9] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Matlack , Mingwei Zhang , Yan Zhao , Ben Gardon Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Mingwei Zhang Explicitly check if a NX huge page is disallowed when determining if a page fault needs to be forced to use a smaller sized page. KVM currently assumes that the NX huge page mitigation is the only scenario where KVM will force a shadow page instead of a huge page, and so unnecessarily keeps an existing shadow page instead of replacing it with a huge page. Any scenario that causes KVM to zap leaf SPTEs may result in having a SP that can be made huge without violating the NX huge page mitigation. E.g. prior to commit 5ba7c4c6d1c7 ("KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging"), KVM would keep shadow pages after disabling dirty logging due to a live migration being canceled, resulting in degraded performance due to running with 4kb pages instead of huge pages. Although the dirty logging case is "fixed", that fix is coincidental, i.e. is an implementation detail, and there are other scenarios where KVM will zap leaf SPTEs. E.g. zapping leaf SPTEs in response to a host page migration (mmu_notifier invalidation) to create a huge page would yield a similar result; KVM would see the shadow-present non-leaf SPTE and assume a huge page is disallowed. Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") Reviewed-by: Ben Gardon Reviewed-by: David Matlack Signed-off-by: Mingwei Zhang [sean: use spte_to_child_sp(), massage changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4737da767a40..d1fc087f86bf 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3102,6 +3102,11 @@ void disallowed_hugepage_adjust(struct kvm_page_faul= t *fault, u64 spte, int cur_ cur_level =3D=3D fault->goal_level && is_shadow_present_pte(spte) && !is_large_pte(spte)) { + u64 page_mask; + + if (!spte_to_child_sp(spte)->nx_huge_page_disallowed) + return; + /* * A small SPTE exists for this pfn, but FNAME(fetch) * and __direct_map would like to create a large PTE @@ -3109,8 +3114,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault= *fault, u64 spte, int cur_ * patching back for them into pfn the next 9 bits of * the address. */ - u64 page_mask =3D KVM_PAGES_PER_HPAGE(cur_level) - - KVM_PAGES_PER_HPAGE(cur_level - 1); + page_mask =3D KVM_PAGES_PER_HPAGE(cur_level) - + KVM_PAGES_PER_HPAGE(cur_level - 1); fault->pfn |=3D fault->gfn & page_mask; fault->goal_level--; } --=20 2.37.2.672.g94769d06f0-goog