From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1174C001B2 for ; Thu, 8 Dec 2022 23:36:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230111AbiLHXgA (ORCPT ); Thu, 8 Dec 2022 18:36:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230045AbiLHXfy (ORCPT ); Thu, 8 Dec 2022 18:35:54 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C3006BC9B; Thu, 8 Dec 2022 15:35:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542553; x=1702078553; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=22ehivlpNlURKI1NMGM4Gdwtt2lvDad5mDmE/luUqYU=; b=nHLqefISEAH4xJb6JeKbJ/QmjQokv4GonPJmhb/rTa5ApAMz8UJemRkw Fdff1fRCUdA6Aj+p0/AQkEE2hmYmrcllYyMvx8Sh1c+JNMLyqBf5c1/gN dMP5f8R3zoHub25ZyLR7qC0aTkOlgS7N9dTSeBpsRoyPtwV/F23ycdJb2 /Scfyl0eWMjNbXb5or8MTO3TWJ1M19CTbClUEpKH1m8NrTii9XBVnRPWE pxcukqGWvMQgmznap8eIItoRM+kCbvJ8zL9w18LpMV2oXpWjLodKk4v4w 9bqVF8B/NpwcCQ/emvi0ds2Q64BFndwUeGvc53nB35lTBt7Xz31Lqcy5Z g==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586475" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586475" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:51 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950875" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950875" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:51 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 01/15] KVM: TDP_MMU: Go to next level if smaller private mapping exists Date: Thu, 8 Dec 2022 15:35:36 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Cannot map a private page as large page if any smaller mapping exists. It has to wait for all the not-mapped smaller page to be mapped and promote it to larger mapping. Signed-off-by: Xiaoyao Li --- arch/x86/kvm/mmu/tdp_mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 65b9af33a0eb..cb36089a40da 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1373,7 +1373,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn + 1) { int r; =20 - if (fault->nx_huge_page_workaround_enabled) + if (fault->nx_huge_page_workaround_enabled || + kvm_gfn_shared_mask(vcpu->kvm)) disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); =20 if (iter.level =3D=3D fault->goal_level) --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B82BC4167B for ; Thu, 8 Dec 2022 23:36:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230144AbiLHXgE (ORCPT ); Thu, 8 Dec 2022 18:36:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230041AbiLHXfy (ORCPT ); Thu, 8 Dec 2022 18:35:54 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E50E6BC9E; Thu, 8 Dec 2022 15:35:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542553; x=1702078553; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hqWWs96g8j8J+o2BvB4eR8w2w9BH0DQPRuS3KFwmOI0=; b=OJFsnkFcBEWPqqI9NfmohL47IGuhyowHDLgLDBd2cb9RvLBciEEypIOO wx0ahMX2Wl6CGukf5R229FizILWDiwoIzVCo7kIrSwJ8DIeGS8OklDfNy BUUNp2kEHrUNxCpf1eB3Vc//1FRuupwPUXd2BPLNiqHtj9dly/XAhpOBa xJpQsYnBKMZu2Ya42fkTVEuKsr/A3VD+C1nIYXLSwvra8V+n2hEojcBTC vWylfUCXkc/AO59/PBuEl1qDRo4UkPRLBmRhRmferLtM64GXlGd6rWRX4 /QRS+2JFtlfvslpNthEEBuMdoK/F/uhyHmqIcYofhK1lcRbBydWFz3NZq A==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586481" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586481" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:52 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950888" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950888" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:52 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 02/15] KVM: TDX: Pass page level to cache flush before TDX SEAMCALL Date: Thu, 8 Dec 2022 15:35:37 -0800 Message-Id: <6134a82d4a78d5a7aee4d684d7d4dab34f7f1bcf.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li tdh_mem_page_aug() will support 2MB large page in the near future. Cache flush also needs to be 2MB instead of 4KB in such cases. Introduce a helper function to flush cache with page size info in preparation for large pages. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/tdx_ops.h | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 86330d0e4b22..4db983b1dc94 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -6,6 +6,7 @@ =20 #include =20 +#include #include #include #include @@ -18,6 +19,11 @@ =20 void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *= out); =20 +static inline void tdx_clflush_page(hpa_t addr, enum pg_level level) +{ + clflush_cache_range(__va(addr), KVM_HPAGE_SIZE(level)); +} + /* * TDX module acquires its internal lock for resources. It doesn't spin t= o get * locks because of its restrictions of allowed execution time. Instead, = it @@ -50,21 +56,21 @@ static inline u64 seamcall_sept(u64 op, u64 rcx, u64 rd= x, u64 r8, u64 r9, =20 static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr) { - clflush_cache_range(__va(addr), PAGE_SIZE); + tdx_clflush_page(addr, PG_LEVEL_4K); return __seamcall(TDH_MNG_ADDCX, addr, tdr, 0, 0, NULL); } =20 static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t = source, struct tdx_module_output *out) { - clflush_cache_range(__va(hpa), PAGE_SIZE); + tdx_clflush_page(hpa, PG_LEVEL_4K); return seamcall_sept(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out); } =20 static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t = page, struct tdx_module_output *out) { - clflush_cache_range(__va(page), PAGE_SIZE); + tdx_clflush_page(page, PG_LEVEL_4K); return seamcall_sept(TDH_MEM_SEPT_ADD, gpa | level, tdr, page, 0, out); } =20 @@ -76,21 +82,21 @@ static inline u64 tdh_mem_sept_remove(hpa_t tdr, gpa_t = gpa, int level, =20 static inline u64 tdh_vp_addcx(hpa_t tdvpr, hpa_t addr) { - clflush_cache_range(__va(addr), PAGE_SIZE); + tdx_clflush_page(addr, PG_LEVEL_4K); return __seamcall(TDH_VP_ADDCX, addr, tdvpr, 0, 0, NULL); } =20 static inline u64 tdh_mem_page_relocate(hpa_t tdr, gpa_t gpa, hpa_t hpa, struct tdx_module_output *out) { - clflush_cache_range(__va(hpa), PAGE_SIZE); + tdx_clflush_page(hpa, PG_LEVEL_4K); return __seamcall(TDH_MEM_PAGE_RELOCATE, gpa, tdr, hpa, 0, out); } =20 static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa, hpa_t hpa, struct tdx_module_output *out) { - clflush_cache_range(__va(hpa), PAGE_SIZE); + tdx_clflush_page(hpa, PG_LEVEL_4K); return seamcall_sept(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out); } =20 @@ -107,13 +113,13 @@ static inline u64 tdh_mng_key_config(hpa_t tdr) =20 static inline u64 tdh_mng_create(hpa_t tdr, int hkid) { - clflush_cache_range(__va(tdr), PAGE_SIZE); + tdx_clflush_page(tdr, PG_LEVEL_4K); return __seamcall(TDH_MNG_CREATE, tdr, hkid, 0, 0, NULL); } =20 static inline u64 tdh_vp_create(hpa_t tdr, hpa_t tdvpr) { - clflush_cache_range(__va(tdvpr), PAGE_SIZE); + tdx_clflush_page(tdvpr, PG_LEVEL_4K); return __seamcall(TDH_VP_CREATE, tdvpr, tdr, 0, 0, NULL); } =20 --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3A33C4167B for ; Thu, 8 Dec 2022 23:36:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230199AbiLHXgH (ORCPT ); Thu, 8 Dec 2022 18:36:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229914AbiLHXfz (ORCPT ); Thu, 8 Dec 2022 18:35:55 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E8C06BCA6; Thu, 8 Dec 2022 15:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542554; x=1702078554; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=M34NeVz1PBu1b57DkBN+ftBVWONg/hFZwbeXaP8bdDQ=; b=HufHdii5sMJx2HeboEInTqVYkqGoP+Y9EVg0J+yrseRXQM6bEh+kBwKj ZoW/RiuL+8J3a/1D46tCUwIEBy18UW1hECxMT4graIk/DmPt8EkAwjuOV awfMJeFpamA464S64Y5FGsioveXsNMGpmqH7PyKtRt827DGEPQPjmXMcX N5GTKQiAEIXCmpypFR4uBAPfvw3AgAE0I0NeDCQzS6VwOOc2lMw+xpQUx Nqt6DtvU3617lfkRusUa7LWqNkVvxl2/3zu4LrigPTkENSCgy8T8behXB fUfhtcIUOwkqI1bqwsTJK4Zzh3Nv6mf9ZXytC/YzlGpYCvptvk0JAqDE4 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586486" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586486" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:53 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950896" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950896" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:53 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 03/15] KVM: TDX: Pass KVM page level to tdh_mem_page_add() and tdh_mem_page_aug() Date: Thu, 8 Dec 2022 15:35:38 -0800 Message-Id: <3ec36b03fd43de9b0756dfad7bc789483381643b.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Level info is needed in tdh_clflush_page() to generate the correct page size. Besides, explicitly pass level info to SEAMCALL instead of assuming it's zero. It works naturally when 2MB support lands. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/tdx.c | 7 ++++--- arch/x86/kvm/vmx/tdx_ops.h | 19 ++++++++++++------- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0645d28b93a7..4f8479c20553 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1298,6 +1298,7 @@ static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn) static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { + int tdx_level =3D pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); hpa_t hpa =3D pfn_to_hpa(pfn); gpa_t gpa =3D gfn_to_gpa(gfn); @@ -1319,7 +1320,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EINVAL; =20 - err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, hpa, &out); + err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, &out); if (err =3D=3D TDX_ERROR_SEPT_BUSY) { tdx_unpin(kvm, pfn); return -EAGAIN; @@ -1358,8 +1359,8 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, kvm_tdx->source_pa =3D INVALID_PAGE; =20 do { - err =3D tdh_mem_page_add(kvm_tdx->tdr.pa, gpa, hpa, source_pa, - &out); + err =3D tdh_mem_page_add(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, + source_pa, &out); /* * This path is executed during populating initial guest memory * image. i.e. before running any vcpu. Race is rare. diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 4db983b1dc94..4b03acce5003 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -19,6 +19,11 @@ =20 void pr_tdx_error(u64 op, u64 error_code, const struct tdx_module_output *= out); =20 +static inline enum pg_level tdx_sept_level_to_pg_level(int tdx_level) +{ + return tdx_level + 1; +} + static inline void tdx_clflush_page(hpa_t addr, enum pg_level level) { clflush_cache_range(__va(addr), KVM_HPAGE_SIZE(level)); @@ -60,11 +65,11 @@ static inline u64 tdh_mng_addcx(hpa_t tdr, hpa_t addr) return __seamcall(TDH_MNG_ADDCX, addr, tdr, 0, 0, NULL); } =20 -static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa, hpa_t hpa, hpa_t = source, - struct tdx_module_output *out) +static inline u64 tdh_mem_page_add(hpa_t tdr, gpa_t gpa, int level, hpa_t = hpa, + hpa_t source, struct tdx_module_output *out) { - tdx_clflush_page(hpa, PG_LEVEL_4K); - return seamcall_sept(TDH_MEM_PAGE_ADD, gpa, tdr, hpa, source, out); + tdx_clflush_page(hpa, tdx_sept_level_to_pg_level(level)); + return seamcall_sept(TDH_MEM_PAGE_ADD, gpa | level, tdr, hpa, source, out= ); } =20 static inline u64 tdh_mem_sept_add(hpa_t tdr, gpa_t gpa, int level, hpa_t = page, @@ -93,11 +98,11 @@ static inline u64 tdh_mem_page_relocate(hpa_t tdr, gpa_= t gpa, hpa_t hpa, return __seamcall(TDH_MEM_PAGE_RELOCATE, gpa, tdr, hpa, 0, out); } =20 -static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa, hpa_t hpa, +static inline u64 tdh_mem_page_aug(hpa_t tdr, gpa_t gpa, int level, hpa_t = hpa, struct tdx_module_output *out) { - tdx_clflush_page(hpa, PG_LEVEL_4K); - return seamcall_sept(TDH_MEM_PAGE_AUG, gpa, tdr, hpa, 0, out); + tdx_clflush_page(hpa, tdx_sept_level_to_pg_level(level)); + return seamcall_sept(TDH_MEM_PAGE_AUG, gpa | level, tdr, hpa, 0, out); } =20 static inline u64 tdh_mem_range_block(hpa_t tdr, gpa_t gpa, int level, --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDB7AC4167B for ; Thu, 8 Dec 2022 23:36:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230252AbiLHXgO (ORCPT ); Thu, 8 Dec 2022 18:36:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230076AbiLHXfz (ORCPT ); Thu, 8 Dec 2022 18:35:55 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E9096BCA7; Thu, 8 Dec 2022 15:35:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542554; x=1702078554; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=+sQP4o0OJhYPC5qL5BglaZeRAlQ+9kwaddW+ppgbLVk=; b=RLwThGfJGe21vbvqdstR1tedwwtkM7smjuq/XxWH5B0sXvDZPgy454kl eTyvXQxTKbMGwlcGQjIMF3o96vUZodte6mEzkFdDWZoYPglms9S7WbW7f ubLyPpGrkB4chNXeVkKlgC/LZjGtfLTyN9oKA1tXR9lyyZmBo+3T5B5vL jj8Zc0nF+iQ6IHEw5C/tUKuKWoZHUCtGujWFQ77CRR4NuNLPPiAYMaEjr tBsmiWo+0Y3nfcviqIwDOiNIqq/WbwOHXtMPO0HX+WIH1hABjtC4cQMU6 D1lwJQ+HzOMqq+lh1rkN5TLbN/gAgmAj4m51Cp8L4MYacUi0FHTJHe9QG Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586493" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586493" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:53 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950904" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950904" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:53 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 04/15] KVM: TDX: Pass size to tdx_measure_page() Date: Thu, 8 Dec 2022 15:35:39 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Extend tdx_measure_page() to pass size info so that it can measure large page as well. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/tdx.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 4f8479c20553..5b392d9baa25 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1273,13 +1273,15 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t = root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK); } =20 -static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa) +static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa, int size) { struct tdx_module_output out; u64 err; int i; =20 - for (i =3D 0; i < PAGE_SIZE; i +=3D TDX_EXTENDMR_CHUNKSIZE) { + WARN_ON_ONCE(size % TDX_EXTENDMR_CHUNKSIZE); + + for (i =3D 0; i < size; i +=3D TDX_EXTENDMR_CHUNKSIZE) { err =3D tdh_mr_extend(kvm_tdx->tdr.pa, gpa + i, &out); if (KVM_BUG_ON(err, &kvm_tdx->kvm)) { pr_tdx_error(TDH_MR_EXTEND, err, &out); @@ -1371,7 +1373,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, tdx_unpin(kvm, pfn); return -EIO; } else if (measure) - tdx_measure_page(kvm_tdx, gpa); + tdx_measure_page(kvm_tdx, gpa, KVM_HPAGE_SIZE(level)); =20 return 0; } --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59BF0C001B2 for ; Thu, 8 Dec 2022 23:36:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230227AbiLHXgL (ORCPT ); Thu, 8 Dec 2022 18:36:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230078AbiLHXfz (ORCPT ); Thu, 8 Dec 2022 18:35:55 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12FC36ACF1; Thu, 8 Dec 2022 15:35:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542555; x=1702078555; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZkzU93yGsqlH6pguX+JaKA3Qg8R4oA+6wDbuZ9tuIN8=; b=bCanLORXpMn0H27fXDrMW3lqXHm5WFmKg021NZGK3sbH5IkJGBH6lz4C 9Sxmq7A0EILJfK60+fyASQ8zR/HRCzlsVvCbNvdggDlJLgu7MIyzVq9Iz hWdNcBzlPwJ+QgD3Qj9nTL/pjFYXjU60b5pMkP7RCvgQ8DhYz91Yhasul zHHKUMpWCovNwUSIO2rx3UzP+szv+GCbwEGDI+01iJ+iNQeXo82S73b04 TGoMYycwDa93H8S4wuS+L6mTlyGSLY5p7NuWOJNoz8Ur6xzT4xfs+Acm7 CPdeAOwcqjgLR+TtyJ5ewTCm32S6+7xdGjFu552N9jgwjE4GiqX/geUyQ w==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586503" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586503" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:54 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950918" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950918" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:54 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 05/15] KVM: TDX: Pass size to reclaim_page() Date: Thu, 8 Dec 2022 15:35:40 -0800 Message-Id: <1844264da0ccee5570a6eace87d386426a9ac4c7.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li A 2MB large page can be tdh_mem_page_aug()'ed to TD directly. In this case, it needs to reclaim and clear the page as 2MB size. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/tdx.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5b392d9baa25..df213b488f89 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -184,14 +184,17 @@ void tdx_hardware_disable(void) tdx_disassociate_vp(&tdx->vcpu); } =20 -static void tdx_clear_page(unsigned long page_pa) +static void tdx_clear_page(unsigned long page_pa, int size) { const void *zero_page =3D (const void *) __va(page_to_phys(ZERO_PAGE(0))); void *page =3D __va(page_pa); unsigned long i; =20 + WARN_ON_ONCE(size % PAGE_SIZE); + if (!static_cpu_has(X86_FEATURE_MOVDIR64B)) { - clear_page(page); + for (i =3D 0; i < size; i +=3D PAGE_SIZE) + clear_page(page + i); return; } =20 @@ -204,7 +207,7 @@ static void tdx_clear_page(unsigned long page_pa) * The cache line could be poisoned (even without MKTME-i), clear the * poison bit. */ - for (i =3D 0; i < PAGE_SIZE; i +=3D 64) + for (i =3D 0; i < size; i +=3D 64) movdir64b(page + i, zero_page); /* * MOVDIR64B store uses WC buffer. Prevent following memory reads @@ -213,7 +216,8 @@ static void tdx_clear_page(unsigned long page_pa) __mb(); } =20 -static int tdx_reclaim_page(hpa_t pa, bool do_wb, u16 hkid) +static int tdx_reclaim_page(hpa_t pa, enum pg_level level, + bool do_wb, u16 hkid) { struct tdx_module_output out; u64 err; @@ -231,8 +235,10 @@ static int tdx_reclaim_page(hpa_t pa, bool do_wb, u16 = hkid) pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out); return -EIO; } + /* out.r8 =3D=3D tdx sept page level */ + WARN_ON_ONCE(out.r8 !=3D pg_level_to_tdx_sept_level(level)); =20 - if (do_wb) { + if (do_wb && level =3D=3D PG_LEVEL_4K) { /* * Only TDR page gets into this path. No contention is expected * because the last page of TD. @@ -244,7 +250,7 @@ static int tdx_reclaim_page(hpa_t pa, bool do_wb, u16 h= kid) } } =20 - tdx_clear_page(pa); + tdx_clear_page(pa, KVM_HPAGE_SIZE(level)); return 0; } =20 @@ -275,7 +281,7 @@ static void tdx_reclaim_td_page(struct tdx_td_page *pag= e) * was already flushed by TDH.PHYMEM.CACHE.WB before here, So * cache doesn't need to be flushed again. */ - if (tdx_reclaim_page(page->pa, false, 0)) + if (tdx_reclaim_page(page->pa, PG_LEVEL_4K, false, 0)) return; =20 page->added =3D false; @@ -428,7 +434,8 @@ void tdx_vm_free(struct kvm *kvm) * TDX global HKID is needed. */ if (kvm_tdx->tdr.added && - tdx_reclaim_page(kvm_tdx->tdr.pa, true, tdx_global_keyid)) + tdx_reclaim_page(kvm_tdx->tdr.pa, PG_LEVEL_4K, true, + tdx_global_keyid)) return; =20 free_page((unsigned long)__va(kvm_tdx->tdr.pa)); @@ -1398,7 +1405,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm= , gfn_t gfn, * The HKID assigned to this TD was already freed and cache * was already flushed. We don't have to flush again. */ - err =3D tdx_reclaim_page(hpa, false, 0); + err =3D tdx_reclaim_page(hpa, level, false, 0); if (KVM_BUG_ON(err, kvm)) return -EIO; tdx_unpin(kvm, pfn); @@ -1537,7 +1544,7 @@ static int tdx_sept_free_private_spt(struct kvm *kvm,= gfn_t gfn, * already flushed. We don't have to flush again. */ if (!is_hkid_assigned(kvm_tdx)) - return tdx_reclaim_page(__pa(private_spt), false, 0); + return tdx_reclaim_page(__pa(private_spt), PG_LEVEL_4K, false, 0); =20 /* * free_private_spt() is (obviously) called when a shadow page is being --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C5E6C4332F for ; Thu, 8 Dec 2022 23:36:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230261AbiLHXgS (ORCPT ); Thu, 8 Dec 2022 18:36:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230064AbiLHXf4 (ORCPT ); Thu, 8 Dec 2022 18:35:56 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB8356BC99; Thu, 8 Dec 2022 15:35:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542555; x=1702078555; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dJd8Gz7yMghkD4hZC2xBCpA4xV4eBMzjrPoGrsfEtns=; b=GZYLDg1y3/xD/1xYmxhDrhY2YWcmiqwJqS2gubrUGu0HqWStiw5Tmn+p l0d2ZigyRVyzY9G2e8xKP7kJKM8dbrd4EsW4TQ4Ox3oFmI0YKgjQeKeqy jkaJRCih4Zil8/Q5IKwsDlMg0s8MHClz7V8t5vTDvVxtTxHaHs5plP5g3 nSjS39e/xoAJ1suJkzT7ZyGrOXeliIUwFszyPB5HC9D/kPrXi4qb6OnzM 7Jzktj0QQPEp7GkUe12MF3INQBMizpJBCCwDxBOZoPZ+1j3lkXmtvwwZC J9Pl1Ze+vocdIxQiFBjBCT5ccX8E9IV/bzQqYSsHE413VTBrKHJy3OPXo w==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586507" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586507" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:55 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950926" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950926" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:54 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 06/15] KVM: TDX: Update tdx_sept_{set,drop}_private_spte() to support large page Date: Thu, 8 Dec 2022 15:35:41 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Allow large page level AUG and REMOVE for TDX pages. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/tdx.c | 63 +++++++++++++++++++++--------------------- 1 file changed, 32 insertions(+), 31 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index df213b488f89..d5f93115f3ba 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1297,11 +1297,12 @@ static void tdx_measure_page(struct kvm_tdx *kvm_td= x, hpa_t gpa, int size) } } =20 -static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn) +static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn, int level) { - struct page *page =3D pfn_to_page(pfn); + int i; =20 - put_page(page); + for (i =3D 0; i < KVM_PAGES_PER_HPAGE(level); i++) + put_page(pfn_to_page(pfn + i)); } =20 static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, @@ -1315,28 +1316,26 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, hpa_t source_pa; bool measure; u64 err; + int i; =20 if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) || !kvm_pfn_to_refcounted_page(pfn))) return 0; =20 /* To prevent page migration, do nothing on mmu notifier. */ - get_page(pfn_to_page(pfn)); + for (i =3D 0; i < KVM_PAGES_PER_HPAGE(level); i++) + get_page(pfn_to_page(pfn + i)); =20 /* Build-time faults are induced and handled via TDH_MEM_PAGE_ADD. */ if (likely(is_td_finalized(kvm_tdx))) { - /* TODO: handle large pages. */ - if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return -EINVAL; - err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, &out); if (err =3D=3D TDX_ERROR_SEPT_BUSY) { - tdx_unpin(kvm, pfn); + tdx_unpin(kvm, pfn, level); return -EAGAIN; } if (KVM_BUG_ON(err, kvm)) { pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out); - tdx_unpin(kvm, pfn); + tdx_unpin(kvm, pfn, level); return -EIO; } return 0; @@ -1359,7 +1358,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, * always uses vcpu 0's page table and protected by vcpu->mutex). */ if (KVM_BUG_ON(kvm_tdx->source_pa =3D=3D INVALID_PAGE, kvm)) { - tdx_unpin(kvm, pfn); + tdx_unpin(kvm, pfn, level); return -EINVAL; } =20 @@ -1377,7 +1376,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, } while (err =3D=3D TDX_ERROR_SEPT_BUSY); if (KVM_BUG_ON(err, kvm)) { pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out); - tdx_unpin(kvm, pfn); + tdx_unpin(kvm, pfn, level); return -EIO; } else if (measure) tdx_measure_page(kvm_tdx, gpa, KVM_HPAGE_SIZE(level)); @@ -1394,11 +1393,9 @@ static int tdx_sept_drop_private_spte(struct kvm *kv= m, gfn_t gfn, gpa_t gpa =3D gfn_to_gpa(gfn); hpa_t hpa =3D pfn_to_hpa(pfn); hpa_t hpa_with_hkid; + int r =3D 0; u64 err; - - /* TODO: handle large pages. */ - if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return -EINVAL; + int i; =20 if (!is_hkid_assigned(kvm_tdx)) { /* @@ -1408,7 +1405,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm= , gfn_t gfn, err =3D tdx_reclaim_page(hpa, level, false, 0); if (KVM_BUG_ON(err, kvm)) return -EIO; - tdx_unpin(kvm, pfn); + tdx_unpin(kvm, pfn, level); return 0; } =20 @@ -1425,21 +1422,25 @@ static int tdx_sept_drop_private_spte(struct kvm *k= vm, gfn_t gfn, return -EIO; } =20 - hpa_with_hkid =3D set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid); - do { - /* - * TDX_OPERAND_BUSY can happen on locking PAMT entry. Because - * this page was removed above, other thread shouldn't be - * repeatedly operating on this page. Just retry loop. - */ - err =3D tdh_phymem_page_wbinvd(hpa_with_hkid); - } while (err =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX)); - if (KVM_BUG_ON(err, kvm)) { - pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL); - return -EIO; + for (i =3D 0; i < KVM_PAGES_PER_HPAGE(level); i++) { + hpa_with_hkid =3D set_hkid_to_hpa(hpa, (u16)kvm_tdx->hkid); + do { + /* + * TDX_OPERAND_BUSY can happen on locking PAMT entry. + * Because this page was removed above, other thread + * shouldn't be repeatedly operating on this page. + * Simple retry should work. + */ + err =3D tdh_phymem_page_wbinvd(hpa_with_hkid); + } while (err =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX)); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL); + r =3D -EIO; + } else + tdx_unpin(kvm, pfn + i, PG_LEVEL_4K); + hpa +=3D PAGE_SIZE; } - tdx_unpin(kvm, pfn); - return 0; + return r; } =20 static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71ACEC4332F for ; Thu, 8 Dec 2022 23:36:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230273AbiLHXgX (ORCPT ); Thu, 8 Dec 2022 18:36:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230095AbiLHXf5 (ORCPT ); Thu, 8 Dec 2022 18:35:57 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC9FB6BC9E; Thu, 8 Dec 2022 15:35:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542556; x=1702078556; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8wumzRDYjXgIUi2sy8Lc0u3WG0A4Eh8mHFht+xhL328=; b=C8lPjx1H5ypeXKRYnHzpgSfx4ReR+eVvbEoWIkbDDLrlT7w7UAWVPM9j qXguh44sbeExFp+BEK9fq0mcm5p+HYzlP/7APsbDNm8Aemb9qYHe2Y/mO 3eLooNJqHNbM4jaL5QIiLANYvvUz3IMTsC8UQ53i3SsdURnfIuvpicPfX 6D0ZYC9NNiPX8HMWMrXkT6Gye2Y3Tr3kyIcU47VUmDtvZs2sBtWNkhEV2 0Uq5D2jAz025GrnigLU17gKoRiWVA77voRdoQS/gjQaqZy3mcL95ejjXa t/cb1vOldwvBBezmm2CrrDrAJmMbzS1EK2G19DVt9AVTIRQFCRL/8ma5w Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586514" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586514" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:56 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950944" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950944" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:55 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 07/15] KVM: MMU: Introduce level info in PFERR code Date: Thu, 8 Dec 2022 15:35:42 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li For TDX, EPT violation can happen when TDG.MEM.PAGE.ACCEPT. And TDG.MEM.PAGE.ACCEPT contains the desired accept page level of TD guest. 1. KVM can map it with 4KB page while TD guest wants to accept 2MB page. TD geust will get TDX_PAGE_SIZE_MISMATCH and it should try to accept 4KB size. 2. KVM can map it with 2MB page while TD guest wants to accept 4KB page. KVM needs to honor it because a) there is no way to tell guest KVM maps it as 2MB size. And b) guest accepts it in 4KB size since guest knows some other 4KB page in the same 2MB range will be used as shared page. For case 2, it need to pass desired page level to MMU's page_fault_handler. Use bit 29:31 of kvm PF error code for this purpose. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 5 +++++ arch/x86/kvm/vmx/common.h | 6 +++++- arch/x86/kvm/vmx/tdx.c | 15 ++++++++++++++- arch/x86/kvm/vmx/tdx.h | 19 +++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 2 +- 6 files changed, 47 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 0a2126bd85d2..f9d64775193b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -257,6 +257,8 @@ enum x86_intercept_stage; #define PFERR_FETCH_BIT 4 #define PFERR_PK_BIT 5 #define PFERR_SGX_BIT 15 +#define PFERR_LEVEL_START_BIT 29 +#define PFERR_LEVEL_END_BIT 31 #define PFERR_GUEST_FINAL_BIT 32 #define PFERR_GUEST_PAGE_BIT 33 #define PFERR_IMPLICIT_ACCESS_BIT 48 @@ -268,6 +270,7 @@ enum x86_intercept_stage; #define PFERR_FETCH_MASK BIT(PFERR_FETCH_BIT) #define PFERR_PK_MASK BIT(PFERR_PK_BIT) #define PFERR_SGX_MASK BIT(PFERR_SGX_BIT) +#define PFERR_LEVEL_MASK GENMASK_ULL(PFERR_LEVEL_END_BIT, PFERR_LEVEL_STAR= T_BIT) #define PFERR_GUEST_FINAL_MASK BIT_ULL(PFERR_GUEST_FINAL_BIT) #define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT) #define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 630e36bbd61f..02adc3c23627 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4554,6 +4554,11 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *v= cpu, =20 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { + u8 err_level =3D (fault->error_code & PFERR_LEVEL_MASK) >> PFERR_LEVEL_ST= ART_BIT; + + if (err_level) + fault->max_level =3D min(fault->max_level, err_level); + /* * If the guest's MTRRs may be used to compute the "real" memtype, * restrict the mapping level to ensure KVM uses a consistent memtype diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 65abda49debe..995cf22f47cf 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -78,7 +78,8 @@ static inline void vmx_handle_external_interrupt_irqoff(s= truct kvm_vcpu *vcpu, } =20 static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t = gpa, - unsigned long exit_qualification) + unsigned long exit_qualification, + int err_page_level) { u64 error_code; =20 @@ -98,6 +99,9 @@ static inline int __vmx_handle_ept_violation(struct kvm_v= cpu *vcpu, gpa_t gpa, error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D = 0 ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; =20 + if (err_page_level > 0) + error_code |=3D (err_page_level << PFERR_LEVEL_START_BIT) & PFERR_LEVEL_= MASK; + return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); } =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d5f93115f3ba..51b9e250437e 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1606,7 +1606,20 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, i= nt delivery_mode, =20 static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) { + union tdx_ext_exit_qualification ext_exit_qual; unsigned long exit_qual; + int err_page_level =3D 0; + + ext_exit_qual.full =3D tdexit_ext_exit_qual(vcpu); + + if (ext_exit_qual.type >=3D NUM_EXT_EXIT_QUAL) { + pr_err("EPT violation at gpa 0x%lx, with invalid ext exit qualification = type 0x%x\n", + tdexit_gpa(vcpu), ext_exit_qual.type); + kvm_vm_bugged(vcpu->kvm); + return 0; + } else if (ext_exit_qual.type =3D=3D EXT_EXIT_QUAL_ACCEPT) { + err_page_level =3D ext_exit_qual.req_sept_level + 1; + } =20 if (kvm_is_private_gpa(vcpu->kvm, tdexit_gpa(vcpu))) { /* @@ -1633,7 +1646,7 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *= vcpu) } =20 trace_kvm_page_fault(vcpu, tdexit_gpa(vcpu), exit_qual); - return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual); + return __vmx_handle_ept_violation(vcpu, tdexit_gpa(vcpu), exit_qual, err_= page_level); } =20 static int tdx_handle_ept_misconfig(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 6eca8bf4d9e1..e09757acc6f9 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -62,6 +62,25 @@ union tdx_exit_reason { u64 full; }; =20 +union tdx_ext_exit_qualification { + struct { + u64 type : 4; + u64 reserved0 : 28; + u64 req_sept_level : 3; + u64 err_sept_level : 3; + u64 err_sept_state : 8; + u64 err_sept_is_leaf : 1; + u64 reserved1 : 17; + }; + u64 full; +}; + +enum tdx_ext_exit_qualification_type { + EXT_EXIT_QUAL_NONE, + EXT_EXIT_QUAL_ACCEPT, + NUM_EXT_EXIT_QUAL, +}; + struct vcpu_tdx { struct kvm_vcpu vcpu; =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c74477b3b358..30a715c22647 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5784,7 +5784,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gp= a))) return kvm_emulate_instruction(vcpu, 0); =20 - return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification); + return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification, 0); } =20 static int handle_ept_misconfig(struct kvm_vcpu *vcpu) --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD778C4332F for ; Thu, 8 Dec 2022 23:36:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230145AbiLHXg1 (ORCPT ); Thu, 8 Dec 2022 18:36:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230101AbiLHXf6 (ORCPT ); Thu, 8 Dec 2022 18:35:58 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBF1B6BCB3; Thu, 8 Dec 2022 15:35:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542557; x=1702078557; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=55CT66bcMdxZrRpBG4Gr5xUaPrsTcdj1LMl8jhgsMD8=; b=Aa0evoS9bfohRZQopQNhzqPoBHCqm2M6cLOWBNZVdOg8aCOoDRZ2dSMA Z2y/oOgL8/6rEbl7pN2AiHD1CDg102xovrC5yGWNZbieecWUSt1+Dq5QC lRIuNOQV7ky6cu5xUcglf+jNdFdrWhbI/RgcUvgNTm2VrzFMTVgccULOu MFBtNO3MuQO0+lok2FFXFn9jLWGwS+UsiJlU3Sn+hpEmzXJX7SMm1LWUN adWxGza4M3e/RIFj8CySEJ+XalrP4Y+dRO+NCwMoWSWGWmbmzpEEnZIDS Q1tqtfa/9AhS+mMvwMFLpq3AkFlsr2NeRKm6jpBxfysIuIP16emRVq/Sb w==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586521" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586521" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:57 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950961" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950961" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:56 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 08/15] KVM: TDX: Pin pages via get_page() right before ADD/AUG'ed to TDs Date: Thu, 8 Dec 2022 15:35:43 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li When kvm_faultin_pfn(), it doesn't have the info regarding which page level will the gfn be mapped at. Hence it doesn't know to pin a 4K page or a 2M page. Move the guest private pages pinning logic right before TDH_MEM_PAGE_ADD/AUG() since at that time it knows the page level info. Signed-off-by: Xiaoyao Li --- arch/x86/kvm/vmx/tdx.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 51b9e250437e..fad7413f7eab 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1297,7 +1297,8 @@ static void tdx_measure_page(struct kvm_tdx *kvm_tdx,= hpa_t gpa, int size) } } =20 -static void tdx_unpin(struct kvm *kvm, kvm_pfn_t pfn, int level) +static void tdx_unpin(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + enum pg_level level) { int i; =20 @@ -1330,12 +1331,12 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, if (likely(is_td_finalized(kvm_tdx))) { err =3D tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, &out); if (err =3D=3D TDX_ERROR_SEPT_BUSY) { - tdx_unpin(kvm, pfn, level); + tdx_unpin(kvm, gfn, pfn, level); return -EAGAIN; } if (KVM_BUG_ON(err, kvm)) { pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out); - tdx_unpin(kvm, pfn, level); + tdx_unpin(kvm, gfn, pfn, level); return -EIO; } return 0; @@ -1358,7 +1359,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, * always uses vcpu 0's page table and protected by vcpu->mutex). */ if (KVM_BUG_ON(kvm_tdx->source_pa =3D=3D INVALID_PAGE, kvm)) { - tdx_unpin(kvm, pfn, level); + tdx_unpin(kvm, gfn, pfn, level); return -EINVAL; } =20 @@ -1376,7 +1377,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, } while (err =3D=3D TDX_ERROR_SEPT_BUSY); if (KVM_BUG_ON(err, kvm)) { pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out); - tdx_unpin(kvm, pfn, level); + tdx_unpin(kvm, gfn, pfn, level); return -EIO; } else if (measure) tdx_measure_page(kvm_tdx, gpa, KVM_HPAGE_SIZE(level)); @@ -1405,7 +1406,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm= , gfn_t gfn, err =3D tdx_reclaim_page(hpa, level, false, 0); if (KVM_BUG_ON(err, kvm)) return -EIO; - tdx_unpin(kvm, pfn, level); + tdx_unpin(kvm, gfn, pfn, level); return 0; } =20 @@ -1437,7 +1438,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm= , gfn_t gfn, pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL); r =3D -EIO; } else - tdx_unpin(kvm, pfn + i, PG_LEVEL_4K); + tdx_unpin(kvm, gfn + i, pfn + i, PG_LEVEL_4K); hpa +=3D PAGE_SIZE; } return r; --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C694C4332F for ; Thu, 8 Dec 2022 23:36:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230289AbiLHXgc (ORCPT ); Thu, 8 Dec 2022 18:36:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230109AbiLHXf7 (ORCPT ); Thu, 8 Dec 2022 18:35:59 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA8FD6BC84; Thu, 8 Dec 2022 15:35:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542558; x=1702078558; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DfwJ+I06Y8UDWays5yaVFXu8YHnX7jZC5476+tFtbmw=; b=g7Cb8Em523Z5iWuwqoIpLBFu2HAjamfgsKtH8Fp6dv1tI65RPYmhMfa+ nZM71cHP8lQbovCtWCzqmOMoyo0q4T3Jy/Zdh663urbS9Txg738VZoUgM kTc4ILkif9Hg34JPYo+Uj4bCeUItfIYO9QJJ9yONoy8wKwnQXRbi67acd nowtpcT8v/H70LbiY/uyhzPX0LK0AKDLp5VsQ3WJi1+sHAMmK+pdWPL3q 8zGug8u3Eu9vmNNFbVlZFLuzZc8VTlLvD2IQ79ZK6xzkLUFZifYmvkvzX 2XQkN1km65GbO8q2vv3tby9gMLl7/mZLBCpIOasmiOscEOiaf5/A+D3g7 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586532" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586532" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:58 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950980" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950980" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:57 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 09/15] KVM: TDX: Pass desired page level in err code for page fault handler Date: Thu, 8 Dec 2022 15:35:44 -0800 Message-Id: <29cdf1d12999e132f889e830385c4fce4f7c8e82.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li For TDX, EPT violation can happen when TDG.MEM.PAGE.ACCEPT. And TDG.MEM.PAGE.ACCEPT contains the desired accept page level of TD guest. 1. KVM can map it with 4KB page while TD guest wants to accept 2MB page. TD geust will get TDX_PAGE_SIZE_MISMATCH and it should try to accept 4KB size. 2. KVM can map it with 2MB page while TD guest wants to accept 4KB page. KVM needs to honor it because a) there is no way to tell guest KVM maps it as 2MB size. And b) guest accepts it in 4KB size since guest knows some other 4KB page in the same 2MB range will be used as shared page. For case 2, it need to pass desired page level to MMU's page_fault_handler. Use bit 29:31 of kvm PF error code for this purpose. Signed-off-by: Xiaoyao Li --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx/common.h | 2 +- arch/x86/kvm/vmx/tdx.c | 7 ++++++- arch/x86/kvm/vmx/tdx.h | 19 ------------------- arch/x86/kvm/vmx/tdx_arch.h | 19 +++++++++++++++++++ arch/x86/kvm/vmx/vmx.c | 2 +- 6 files changed, 29 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index f9d64775193b..bb790466ebae 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -279,6 +279,8 @@ enum x86_intercept_stage; PFERR_WRITE_MASK | \ PFERR_PRESENT_MASK) =20 +#define PFERR_LEVEL(err_code) (((err_code) & PFERR_LEVEL_MASK) >> PFERR_LE= VEL_START_BIT) + /* apic attention bits */ #define KVM_APIC_CHECK_VAPIC 0 /* diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h index 995cf22f47cf..69464ae0f7e8 100644 --- a/arch/x86/kvm/vmx/common.h +++ b/arch/x86/kvm/vmx/common.h @@ -99,7 +99,7 @@ static inline int __vmx_handle_ept_violation(struct kvm_v= cpu *vcpu, gpa_t gpa, error_code |=3D (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) !=3D = 0 ? PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK; =20 - if (err_page_level > 0) + if (err_page_level > PG_LEVEL_NONE) error_code |=3D (err_page_level << PFERR_LEVEL_START_BIT) & PFERR_LEVEL_= MASK; =20 return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index fad7413f7eab..39760ee95f04 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2312,6 +2312,7 @@ static int tdx_init_mem_region(struct kvm *kvm, struc= t kvm_tdx_cmd *cmd) struct kvm_tdx_init_mem_region region; struct kvm_vcpu *vcpu; struct page *page; + u64 error_code; kvm_pfn_t pfn; int idx, ret =3D 0; =20 @@ -2364,7 +2365,11 @@ static int tdx_init_mem_region(struct kvm *kvm, stru= ct kvm_tdx_cmd *cmd) kvm_tdx->source_pa =3D pfn_to_hpa(page_to_pfn(page)) | (cmd->flags & KVM_TDX_MEASURE_MEMORY_REGION); =20 - pfn =3D kvm_mmu_map_tdp_page(vcpu, region.gpa, TDX_SEPT_PFERR, + /* TODO: large page support. */ + error_code =3D TDX_SEPT_PFERR; + error_code |=3D (PG_LEVEL_4K << PFERR_LEVEL_START_BIT) & + PFERR_LEVEL_MASK; + pfn =3D kvm_mmu_map_tdp_page(vcpu, region.gpa, error_code, PG_LEVEL_4K); if (is_error_noslot_pfn(pfn) || kvm->vm_bugged) ret =3D -EFAULT; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index e09757acc6f9..6eca8bf4d9e1 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -62,25 +62,6 @@ union tdx_exit_reason { u64 full; }; =20 -union tdx_ext_exit_qualification { - struct { - u64 type : 4; - u64 reserved0 : 28; - u64 req_sept_level : 3; - u64 err_sept_level : 3; - u64 err_sept_state : 8; - u64 err_sept_is_leaf : 1; - u64 reserved1 : 17; - }; - u64 full; -}; - -enum tdx_ext_exit_qualification_type { - EXT_EXIT_QUAL_NONE, - EXT_EXIT_QUAL_ACCEPT, - NUM_EXT_EXIT_QUAL, -}; - struct vcpu_tdx { struct kvm_vcpu vcpu; =20 diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 18604734fb14..471a9f61fc81 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -163,4 +163,23 @@ struct td_params { #define TDX_MIN_TSC_FREQUENCY_KHZ (100 * 1000) #define TDX_MAX_TSC_FREQUENCY_KHZ (10 * 1000 * 1000) =20 +union tdx_ext_exit_qualification { + struct { + u64 type : 4; + u64 reserved0 : 28; + u64 req_sept_level : 3; + u64 err_sept_level : 3; + u64 err_sept_state : 8; + u64 err_sept_is_leaf : 1; + u64 reserved1 : 17; + }; + u64 full; +}; + +enum tdx_ext_exit_qualification_type { + EXT_EXIT_QUAL_NONE =3D 0, + EXT_EXIT_QUAL_ACCEPT, + NUM_EXT_EXIT_QUAL, +}; + #endif /* __KVM_X86_TDX_ARCH_H */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 30a715c22647..420c4cf9adeb 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5784,7 +5784,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu) if (unlikely(allow_smaller_maxphyaddr && kvm_vcpu_is_illegal_gpa(vcpu, gp= a))) return kvm_emulate_instruction(vcpu, 0); =20 - return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification, 0); + return __vmx_handle_ept_violation(vcpu, gpa, exit_qualification, PG_LEVEL= _NONE); } =20 static int handle_ept_misconfig(struct kvm_vcpu *vcpu) --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACAF0C04FDE for ; Thu, 8 Dec 2022 23:36:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230320AbiLHXgg (ORCPT ); Thu, 8 Dec 2022 18:36:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230158AbiLHXgE (ORCPT ); Thu, 8 Dec 2022 18:36:04 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C39796BC9B; Thu, 8 Dec 2022 15:35:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542559; x=1702078559; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=evLGS0Ysb2C50awEgkYSSa/pcL5o4ysLqfSgQX2aZMQ=; b=kx9t+P60XYxObBRurtVa+M2S82eYBF1lDYGmvDX1kw34VJc5SxQFFDwq CabbFWYM65rYk9PyHndhJN6/3WbmSfqs4bBBjL0ePbI4TTVNZmdFBwOJ1 9qT9ug828+SNceb/qLiG8pQmi28AOgChrCVuljvPYj/cwFBfZxFQ6YUP+ WRHDut1STje1tU4u9Q3vck1UfqQQAAEkGonto/1TawIdYcwEWodhcBLjS ec1ns3CQlvXZv2Xa4gu5Tp0jFJob8mMVENPiwPwDWdl3C1Efin1gITH22 IF+bJCVt0JO20tumDhPe7MmpIukapajDNbW61HoKFWwbvVuyGGnxOUM/q A==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586544" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586544" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:59 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950990" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950990" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:58 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 10/15] KVM: x86/tdp_mmu: Split the large page when zap leaf Date: Thu, 8 Dec 2022 15:35:45 -0800 Message-Id: <7edb5526907c1d36e78647577cbf562e9155a76a.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li When TDX enabled, a large page cannot be zapped if it contains mixed pages. In this case, it has to split the large page. Signed-off-by: Xiaoyao Li --- arch/x86/kvm/mmu/mmu.c | 9 +++++ arch/x86/kvm/mmu/mmu_internal.h | 2 ++ arch/x86/kvm/mmu/tdp_mmu.c | 62 +++++++++++++++++++++++++++++++-- 3 files changed, 71 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 02adc3c23627..7f56b1dd76fa 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -7318,6 +7318,15 @@ static bool linfo_is_mixed(struct kvm_lpage_info *li= nfo) return linfo->disallow_lpage & KVM_LPAGE_PRIVATE_SHARED_MIXED; } =20 +bool kvm_mem_attr_is_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int le= vel) +{ + struct kvm_lpage_info *linfo =3D lpage_info_slot(gfn & KVM_HPAGE_MA= SK(level), + slot, level); + + WARN_ON_ONCE(level =3D=3D PG_LEVEL_4K); + return linfo_is_mixed(linfo); +} + static void linfo_set_mixed(gfn_t gfn, struct kvm_memory_slot *slot, int level, bool mixed) { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 641afc4e90cb..2b7c16dfdf5e 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -435,6 +435,8 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cach= e *mc); void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s= p); =20 +bool kvm_mem_attr_is_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int le= vel); + #ifndef CONFIG_HAVE_KVM_RESTRICTED_MEM static inline int kvm_restricted_mem_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *order) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index cb36089a40da..e9af8c95a3ae 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1102,6 +1102,14 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_= mmu_page *sp) return true; } =20 + +static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, + struct tdp_iter *iter, + bool shared); + +static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, + struct kvm_mmu_page *sp, bool shared); + /* * If can_yield is true, will release the MMU lock and reschedule if the * scheduler needs the CPU or there is contention on the MMU lock. If this @@ -1113,6 +1121,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct= kvm_mmu_page *root, gfn_t start, gfn_t end, bool can_yield, bool flush, bool zap_private) { + struct kvm_mmu_page *split_sp =3D NULL; struct tdp_iter iter; =20 end =3D min(end, tdp_mmu_max_gfn_exclusive()); @@ -1144,12 +1153,63 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, stru= ct kvm_mmu_page *root, !is_last_spte(iter.old_spte, iter.level)) continue; =20 + if (kvm_gfn_shared_mask(kvm) && is_large_pte(iter.old_spte)) { + gfn_t gfn =3D iter.gfn & ~kvm_gfn_shared_mask(kvm); + gfn_t mask =3D KVM_PAGES_PER_HPAGE(iter.level) - 1; + struct kvm_memory_slot *slot; + struct kvm_mmu_page *sp; + + slot =3D gfn_to_memslot(kvm, gfn); + if (kvm_mem_attr_is_mixed(slot, gfn, iter.level) || + (gfn & mask) < start || + end < (gfn & mask) + KVM_PAGES_PER_HPAGE(iter.level)) { + WARN_ON_ONCE(!can_yield); + if (split_sp) { + sp =3D split_sp; + split_sp =3D NULL; + } else { + WARN_ON(iter.yielded); + if (flush) { + kvm_flush_remote_tlbs(kvm); + flush =3D false; + } + sp =3D tdp_mmu_alloc_sp_for_split(kvm, &iter, false); + if (iter.yielded) { + split_sp =3D sp; + continue; + } + } + KVM_BUG_ON(!sp, kvm); + + if (tdp_mmu_split_huge_page(kvm, &iter, sp, false)) { + kvm_flush_remote_tlbs(kvm); + flush =3D false; + /* force retry on this gfn. */ + iter.yielded =3D true; + } else + flush =3D true; + continue; + } + } + tdp_mmu_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE); flush =3D true; } =20 rcu_read_unlock(); =20 + if (split_sp) { + WARN_ON(!can_yield); + if (flush) { + kvm_flush_remote_tlbs(kvm); + flush =3D false; + } + + write_unlock(&kvm->mmu_lock); + tdp_mmu_free_sp(split_sp); + write_lock(&kvm->mmu_lock); + } + /* * Because this flow zaps _only_ leaf SPTEs, the caller doesn't need * to provide RCU protection as no 'struct kvm_mmu_page' will be freed. @@ -1691,8 +1751,6 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli= t(struct kvm *kvm, =20 KVM_BUG_ON(kvm_mmu_page_role_is_private(role) !=3D is_private_sptep(iter->sptep), kvm); - /* TODO: Large page isn't supported for private SPTE yet. */ - KVM_BUG_ON(kvm_mmu_page_role_is_private(role), kvm); =20 /* * Since we are allocating while under the MMU lock we have to be --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD5DBC4332F for ; Thu, 8 Dec 2022 23:36:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230171AbiLHXgj (ORCPT ); Thu, 8 Dec 2022 18:36:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230189AbiLHXgF (ORCPT ); Thu, 8 Dec 2022 18:36:05 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 340D16FF21; Thu, 8 Dec 2022 15:36:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542561; x=1702078561; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fdZOUAgVPY7OENFbNI36Sie/heTRpm7Xf4nyeZ5gKeQ=; b=Eh0dWeqJ9+88sN0Af+kU8Jnls/V/JVTwhCtUfR3tVXjFk/H9z2TlZqki SanHZfoM4QGeDoSefWY7Vfip55m5Jz1L3vbkSn6aYuIczhbNaVGwBVtsa Uxv4fwP9hFhTUGYfofMlhSVmOB6Us2cJHtKGvzqMk5VfBOYOkg89nHrQF VnHa9/uOLtoExp/U8TPjIaScvOHBnOl7IjPti+sR4xxFffu15eq4xuuAv xYZGyEeIhaarzcHaQQLxxOPA2huniR8lIQKIoOQoYFUKnWB78Kb0gWTkD 0MB3j0/2Y6swcb+/0rn7+aMACPLTFW4UCskru3AnODElgagaCinOYUUdl A==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586551" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586551" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:59 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677950998" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677950998" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:59 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 11/15] KVM: x86/tdp_mmu, TDX: Split a large page when 4KB page within it converted to shared Date: Thu, 8 Dec 2022 15:35:46 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li When mapping the shared page for TDX, it needs to zap private alias. In the case that private page is mapped as large page (2MB), it can be removed directly only when the whole 2MB is converted to shared. Otherwise, it has to split 2MB page into 512 4KB page, and only remove the pages that converted to shared. When a present large leaf spte switches to present non-leaf spte, TDX needs to split the corresponding SEPT page to reflect it. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu/tdp_mmu.c | 24 +++++++++++++++--------- arch/x86/kvm/vmx/tdx.c | 25 +++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx_arch.h | 1 + arch/x86/kvm/vmx/tdx_ops.h | 7 +++++++ 6 files changed, 49 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 0cf928d12067..1e86542141f7 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -97,6 +97,7 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL(link_private_spt) KVM_X86_OP_OPTIONAL(free_private_spt) +KVM_X86_OP_OPTIONAL(split_private_spt) KVM_X86_OP_OPTIONAL(set_private_spte) KVM_X86_OP_OPTIONAL(remove_private_spte) KVM_X86_OP_OPTIONAL(zap_private_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index bb790466ebae..282b083f9b6a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1687,6 +1687,8 @@ struct kvm_x86_ops { void *private_spt); int (*free_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *private_spt); + int (*split_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + void *private_spt); int (*set_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn); int (*remove_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level leve= l, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e9af8c95a3ae..6fd982e3701e 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -566,18 +566,24 @@ static int __must_check handle_changed_private_spte(s= truct kvm *kvm, gfn_t gfn, =20 lockdep_assert_held(&kvm->mmu_lock); if (is_present) { - /* TDP MMU doesn't change present -> present */ - KVM_BUG_ON(was_present, kvm); + void *private_spt; =20 - /* - * Use different call to either set up middle level - * private page table, or leaf. - */ - if (is_leaf) + if (level > PG_LEVEL_4K && was_leaf && !is_leaf) { + /* + * splitting large page into 4KB. + * tdp_mmu_split_huage_page() =3D> tdp_mmu_link_sp() + */ + private_spt =3D get_private_spt(gfn, new_spte, level); + KVM_BUG_ON(!private_spt, kvm); + ret =3D static_call(kvm_x86_zap_private_spte)(kvm, gfn, level); + kvm_flush_remote_tlbs(kvm); + if (!ret) + ret =3D static_call(kvm_x86_split_private_spt)(kvm, gfn, + level, private_spt); + } else if (is_leaf) ret =3D static_call(kvm_x86_set_private_spte)(kvm, gfn, level, new_pfn); else { - void *private_spt =3D get_private_spt(gfn, new_spte, level); - + private_spt =3D get_private_spt(gfn, new_spte, level); KVM_BUG_ON(!private_spt, kvm); ret =3D static_call(kvm_x86_link_private_spt)(kvm, gfn, level, private_= spt); } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 39760ee95f04..ce7026136334 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1465,6 +1465,28 @@ static int tdx_sept_link_private_spt(struct kvm *kvm= , gfn_t gfn, return 0; } =20 +static int tdx_sept_split_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + int tdx_level =3D pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + gpa_t gpa =3D gfn_to_gpa(gfn); + hpa_t hpa =3D __pa(private_spt); + struct tdx_module_output out; + u64 err; + + /* See comment in tdx_sept_set_private_spte() */ + err =3D tdh_mem_page_demote(kvm_tdx->tdr.pa, gpa, tdx_level, hpa, &out); + if (err =3D=3D TDX_ERROR_SEPT_BUSY) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MEM_PAGE_DEMOTE, err, &out); + return -EIO; + } + + return 0; +} + static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level) { @@ -1474,8 +1496,6 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm,= gfn_t gfn, struct tdx_module_output out; u64 err; =20 - /* For now large page isn't supported yet. */ - WARN_ON_ONCE(level !=3D PG_LEVEL_4K); err =3D tdh_mem_range_block(kvm_tdx->tdr.pa, gpa, tdx_level, &out); if (err =3D=3D TDX_ERROR_SEPT_BUSY) return -EAGAIN; @@ -2660,6 +2680,7 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x86= _ops) =20 x86_ops->link_private_spt =3D tdx_sept_link_private_spt; x86_ops->free_private_spt =3D tdx_sept_free_private_spt; + x86_ops->split_private_spt =3D tdx_sept_split_private_spt; x86_ops->set_private_spte =3D tdx_sept_set_private_spte; x86_ops->remove_private_spte =3D tdx_sept_remove_private_spte; x86_ops->zap_private_spte =3D tdx_sept_zap_private_spte; diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 471a9f61fc81..508d9a1139ce 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -21,6 +21,7 @@ #define TDH_MNG_CREATE 9 #define TDH_VP_CREATE 10 #define TDH_MNG_RD 11 +#define TDH_MEM_PAGE_DEMOTE 15 #define TDH_MR_EXTEND 16 #define TDH_MR_FINALIZE 17 #define TDH_VP_FLUSH 18 diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 4b03acce5003..60cbc7f94b18 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -133,6 +133,13 @@ static inline u64 tdh_mng_rd(hpa_t tdr, u64 field, str= uct tdx_module_output *out return __seamcall(TDH_MNG_RD, tdr, field, 0, 0, out); } =20 +static inline u64 tdh_mem_page_demote(hpa_t tdr, gpa_t gpa, int level, hpa= _t page, + struct tdx_module_output *out) +{ + tdx_clflush_page(page, PG_LEVEL_4K); + return seamcall_sept(TDH_MEM_PAGE_DEMOTE, gpa | level, tdr, page, 0, out); +} + static inline u64 tdh_mr_extend(hpa_t tdr, gpa_t gpa, struct tdx_module_output *out) { --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50DC2C4332F for ; Thu, 8 Dec 2022 23:36:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229995AbiLHXgp (ORCPT ); Thu, 8 Dec 2022 18:36:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230190AbiLHXgF (ORCPT ); Thu, 8 Dec 2022 18:36:05 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 341386FF32; Thu, 8 Dec 2022 15:36:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542561; x=1702078561; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Vc7FVWd44aFBz55cz2TC2iZlc7y40fTjQ4VzQwXZurc=; b=h70Duv976HgJGsJhqMJ6XYOVt4OT6T+ZpTtQY4mMxs+wSi2cJd27PvcC 0H8tibXaTJYOb0TTTxb29fdzNmGvrz737+4dH7eKD/KgEWwic3rDdK50J A9fKZFfVZeFIFRRALKy6hx5+GQEtOzm4yGO/EHJV/HkdOXa0KCJScc/oD KQJJe2noQFZUSoFEwYBcdV6mIl4hyX4OUb5rdWafqHiG1vW/Q30rD2os2 XBVuTXTruOTt4iVazgWn7jGTmBKVvpQki3jO9IQD/yaqm0n2jSYroVKUe Aamw13sEwQN5b7XtyznIqcBBvapSZik50+RXRa3/2TfA7zb/Aw2nWZgSs g==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586555" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586555" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:00 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677951011" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677951011" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:35:59 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 12/15] KVM: x86/tdp_mmu: Try to merge pages into a large page Date: Thu, 8 Dec 2022 15:35:47 -0800 Message-Id: <559a026e54bc66eef65d76e8472bbff18e07a04d.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata When a large page is passed to the KVM page fault handler and some of sub pages are already populated, try to merge sub pages into a large page. This situation can happen when the guest converts small pages into shared and convert it back into private. When a large page is passed to KVM mmu page fault handler and the spte corresponding to the page is non-leaf (one or more of sub pages are already populated at lower page level), the current kvm mmu zaps non-leaf spte at a large page level, and populate a leaf spte at that level. Thus small pages are converted into a large page. However, it doesn't work for TDX because zapping and re-populating results in zeroing page content. Instead, populate all small pages and merge them into a large page. Merging pages into a large page can fail when some sub pages are accepted and some are not. In such case, with the assumption that guest tries to accept at large page size for performance when possible, don't try to be smart to identify which page is still pending, map all pages at lower page level, and let vcpu re-execute. Signed-off-by: Isaku Yamahata --- arch/x86/include/asm/kvm-x86-ops.h | 2 + arch/x86/include/asm/kvm_host.h | 3 + arch/x86/kvm/mmu/tdp_mmu.c | 127 ++++++++++++++++++++++++++++- 3 files changed, 131 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 1e86542141f7..83f99a9fb3c2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -98,9 +98,11 @@ KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL(link_private_spt) KVM_X86_OP_OPTIONAL(free_private_spt) KVM_X86_OP_OPTIONAL(split_private_spt) +KVM_X86_OP_OPTIONAL(merge_private_spt) KVM_X86_OP_OPTIONAL(set_private_spte) KVM_X86_OP_OPTIONAL(remove_private_spte) KVM_X86_OP_OPTIONAL(zap_private_spte) +KVM_X86_OP_OPTIONAL(unzap_private_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 282b083f9b6a..5d728fd294e0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1689,11 +1689,14 @@ struct kvm_x86_ops { void *private_spt); int (*split_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *private_spt); + int (*merge_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, + void *private_spt); int (*set_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn); int (*remove_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level leve= l, kvm_pfn_t pfn); int (*zap_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level); + int (*unzap_private_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level= ); =20 bool (*has_wbinvd_exit)(void); =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6fd982e3701e..766795005a39 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1309,6 +1309,131 @@ void kvm_tdp_mmu_invalidate_all_roots(struct kvm *k= vm) } } =20 +static int tdp_mmu_merge_private_spt(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault, + struct tdp_iter *iter, u64 new_spte) +{ + u64 *sptep =3D rcu_dereference(iter->sptep); + struct kvm_mmu_page *child_sp; + struct kvm *kvm =3D vcpu->kvm; + bool unzap_retry =3D false; + int level =3D iter->level; + gfn_t gfn =3D iter->gfn; + u64 old_spte =3D *sptep; + tdp_ptep_t child_pt; + u64 child_spte; + int ret; + int i; + + /* + * TDX KVM supports only 2MB large page. It's not supported to merge + * 2MB pages into 1GB page at the moment. + */ + WARN_ON_ONCE(fault->goal_level !=3D PG_LEVEL_2M); + WARN_ON_ONCE(iter->level !=3D PG_LEVEL_2M); + + /* Freeze the spte to prevent other threads from working spte. */ + if (!try_cmpxchg64(sptep, &iter->old_spte, REMOVED_SPTE)) + return -EBUSY; + + /* Prevent the Secure-EPT entry from being used. */ + ret =3D static_call(kvm_x86_zap_private_spte)(kvm, gfn, level); + if (ret) + goto out; + kvm_flush_remote_tlbs_with_address(kvm, gfn, KVM_PAGES_PER_HPAGE(level)); + + /* + * All child pages are required to be populated for merging them into a + * large page. Populate all child spte. + */ + child_pt =3D spte_to_child_pt(iter->old_spte, iter->level); + child_sp =3D sptep_to_sp(child_pt); + unzap_retry =3D false; + /* Step down to spte. */ + tdp_iter_next(iter); + for (i =3D 0; i < SPTE_ENT_PER_PAGE; i++, tdp_iter_next(iter)) { + WARN_ON_ONCE(iter->level !=3D PG_LEVEL_4K); + if (is_shadow_present_pte(iter->old_spte)) { + /* TODO: relocate page for huge page. */ + WARN_ON_ONCE(spte_to_pfn(iter->old_spte) !=3D spte_to_pfn(new_spte) + i= ); + continue; + } + + WARN_ON_ONCE(spte_to_pfn(iter->old_spte) !=3D spte_to_pfn(new_spte) + i); + child_spte =3D make_huge_page_split_spte(kvm, new_spte, child_sp->role, = i); + /* + * Because other thread may have started to operate on this spte + * before freezing parent spte, Use atomic version to prevent + * race. + */ + ret =3D tdp_mmu_set_spte_atomic(vcpu->kvm, iter, child_spte); + if (ret =3D=3D -EBUSY || ret =3D=3D -EAGAIN) + /* + * There was a race condition. Populate remaining 4K + * spte to resolve fault->gfn to guarantee the forward + * progress. + */ + unzap_retry =3D true; + else if (ret) + goto unzap; + } + if (unzap_retry) { + ret =3D RET_PF_RETRY; + goto unzap; + } + + /* Merge pages into a large page. */ + ret =3D static_call(kvm_x86_merge_private_spt)(kvm, gfn, level, + kvm_mmu_private_spt(child_sp)); + /* + * Failed to merge pages because some pages are accepted and some are + * pending. Since the child page was mapped above, let vcpu run. + */ + if (ret =3D=3D -EAGAIN) + ret =3D RET_PF_RETRY; + if (ret) + goto unzap; + + /* Unfreeze spte. */ + __kvm_tdp_mmu_write_spte(sptep, new_spte); + + /* + * Free unused child sp. Secure-EPT page was already freed at TDX level + * by kvm_x86_merge_private_spt(). + */ + tdp_mmu_free_sp(child_sp); + return RET_PF_RETRY; + +unzap: + if (static_call(kvm_x86_unzap_private_spte)(kvm, gfn, level)) + old_spte =3D SHADOW_NONPRESENT_VALUE | + (spte_to_pfn(old_spte) << PAGE_SHIFT) | + PT_PAGE_SIZE_MASK; +out: + __kvm_tdp_mmu_write_spte(sptep, old_spte); + return ret; +} + +static int __tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault, + struct tdp_iter *iter, u64 new_spte) +{ + /* + * The private page has smaller-size pages. For example, the child + * pages was converted from shared to page, and now it can be mapped as + * a large page. Try to merge small pages into a large page. + */ + if (fault->slot && + kvm_gfn_shared_mask(vcpu->kvm) && + iter->level > PG_LEVEL_4K && + kvm_is_private_gpa(vcpu->kvm, gfn_to_gpa(fault->gfn)) && + is_shadow_present_pte(iter->old_spte) && + !is_large_pte(iter->old_spte)) + return tdp_mmu_merge_private_spt(vcpu, fault, iter, new_spte); + + return tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte); +} + /* * Installs a last-level SPTE to handle a TDP page fault. * (NPT/EPT violation/misconfiguration) @@ -1341,7 +1466,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm= _vcpu *vcpu, =20 if (new_spte =3D=3D iter->old_spte) ret =3D RET_PF_SPURIOUS; - else if (tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte)) + else if (__tdp_mmu_map_handle_target_level(vcpu, fault, iter, new_spte)) return RET_PF_RETRY; else if (is_shadow_present_pte(iter->old_spte) && !is_last_spte(iter->old_spte, iter->level)) --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41AABC4332F for ; Thu, 8 Dec 2022 23:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230353AbiLHXgu (ORCPT ); Thu, 8 Dec 2022 18:36:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230080AbiLHXgI (ORCPT ); Thu, 8 Dec 2022 18:36:08 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9677E6FF0F; Thu, 8 Dec 2022 15:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542562; x=1702078562; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zWuGHJKvb2rhePR8KFWGDOd9HnNPDNaoiycd/Rf+gtg=; b=LyZ1e3/bN0YMeDdzoaIemiu61PXZQAQ+UECywFsJgp+ckjz1Oy1Fwhaa UUoZ6ixKHQKTnp4ngMk0Jpkrgo/+O54HKi8LI9ZFSdgazY5eZ/lm3SLy2 aCGLMWTzmOj7LdQBSQrweusLIb4oOvI+1icapYyuavoROnx5yJX4f+8Ly fn2n8v1Iszk8iYZtjTwOp5WXlzQ/y5lcvN++aAHz4m6Yja0WaMSamcIwF vq9ditkIlAQ50Nziwgne0Zjn4UGHMRzv9zKWP4ri9jrxhYeCdFUqL57QH JoRq1Pr38SmjkDhSGwhWiB+QyOGNZanx8wwaQo1H+ZJ8k5Rm1q3p+CDo4 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586561" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586561" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:00 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677951029" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677951029" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:00 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 13/15] KVM: x86/tdp_mmu: TDX: Implement merge pages into a large page Date: Thu, 8 Dec 2022 15:35:48 -0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Implement merge_private_stp callback. Signed-off-by: Isaku Yamahata --- arch/x86/kvm/vmx/tdx.c | 70 ++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx_arch.h | 1 + arch/x86/kvm/vmx/tdx_errno.h | 2 ++ arch/x86/kvm/vmx/tdx_ops.h | 6 ++++ 4 files changed, 79 insertions(+) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index ce7026136334..f20e931cf983 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1487,6 +1487,47 @@ static int tdx_sept_split_private_spt(struct kvm *kv= m, gfn_t gfn, return 0; } =20 +static int tdx_sept_merge_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + int tdx_level =3D pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + struct tdx_module_output out; + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err; + + /* See comment in tdx_sept_set_private_spte() */ + err =3D tdh_mem_page_promote(kvm_tdx->tdr.pa, gpa, tdx_level, &out); + if (err =3D=3D TDX_ERROR_SEPT_BUSY) + return -EAGAIN; + if (err =3D=3D TDX_EPT_INVALID_PROMOTE_CONDITIONS) + /* + * Some pages are accepted, some pending. Need to wait for TD + * to accept all pages. Tell it the caller. + */ + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MEM_PAGE_PROMOTE, err, &out); + return -EIO; + } + WARN_ON_ONCE(out.rcx !=3D __pa(private_spt)); + + /* + * TDH.MEM.PAGE.PROMOTE frees the Secure-EPT page for the lower level. + * Flush cache for reuse. + */ + do { + err =3D tdh_phymem_page_wbinvd(set_hkid_to_hpa(__pa(private_spt), + to_kvm_tdx(kvm)->hkid)); + } while (err =3D=3D (TDX_OPERAND_BUSY | TDX_OPERAND_ID_RCX)); + if (WARN_ON_ONCE(err)) { + pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL); + return -EIO; + } + + return 0; +} + static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level) { @@ -1556,6 +1597,33 @@ static void tdx_track(struct kvm_tdx *kvm_tdx) =20 } =20 +static int tdx_sept_unzap_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level) +{ + int tdx_level =3D pg_level_to_tdx_sept_level(level); + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + gpa_t gpa =3D gfn_to_gpa(gfn); + struct tdx_module_output out; + u64 err; + + do { + err =3D tdh_mem_range_unblock(kvm_tdx->tdr.pa, gpa, tdx_level, &out); + + /* + * tdh_mem_range_block() is accompanied with tdx_track() via kvm + * remote tlb flush. Wait for the caller of + * tdh_mem_range_block() to complete TDX track. + */ + } while (err =3D=3D (TDX_TLB_TRACKING_NOT_DONE | TDX_OPERAND_ID_SEPT)); + if (err =3D=3D TDX_ERROR_SEPT_BUSY) + return -EAGAIN; + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MEM_RANGE_UNBLOCK, err, &out); + return -EIO; + } + return 0; +} + static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *private_spt) { @@ -2681,9 +2749,11 @@ int __init tdx_hardware_setup(struct kvm_x86_ops *x8= 6_ops) x86_ops->link_private_spt =3D tdx_sept_link_private_spt; x86_ops->free_private_spt =3D tdx_sept_free_private_spt; x86_ops->split_private_spt =3D tdx_sept_split_private_spt; + x86_ops->merge_private_spt =3D tdx_sept_merge_private_spt; x86_ops->set_private_spte =3D tdx_sept_set_private_spte; x86_ops->remove_private_spte =3D tdx_sept_remove_private_spte; x86_ops->zap_private_spte =3D tdx_sept_zap_private_spte; + x86_ops->unzap_private_spte =3D tdx_sept_unzap_private_spte; =20 return 0; } diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index 508d9a1139ce..3a3c9c608bf0 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -29,6 +29,7 @@ #define TDH_MNG_KEY_FREEID 20 #define TDH_MNG_INIT 21 #define TDH_VP_INIT 22 +#define TDH_MEM_PAGE_PROMOTE 23 #define TDH_VP_RD 26 #define TDH_MNG_KEY_RECLAIMID 27 #define TDH_PHYMEM_PAGE_RECLAIM 28 diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/kvm/vmx/tdx_errno.h index 389b1b53da25..74a5777c05f1 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/kvm/vmx/tdx_errno.h @@ -19,6 +19,8 @@ #define TDX_KEY_CONFIGURED 0x0000081500000000ULL #define TDX_NO_HKID_READY_TO_WBCACHE 0x0000082100000000ULL #define TDX_EPT_WALK_FAILED 0xC0000B0000000000ULL +#define TDX_TLB_TRACKING_NOT_DONE 0xC0000B0800000000ULL +#define TDX_EPT_INVALID_PROMOTE_CONDITIONS 0xC0000B0900000000ULL =20 /* * TDG.VP.VMCALL Status Codes (returned in R10) diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h index 60cbc7f94b18..5d2d0b1eed28 100644 --- a/arch/x86/kvm/vmx/tdx_ops.h +++ b/arch/x86/kvm/vmx/tdx_ops.h @@ -140,6 +140,12 @@ static inline u64 tdh_mem_page_demote(hpa_t tdr, gpa_t= gpa, int level, hpa_t pag return seamcall_sept(TDH_MEM_PAGE_DEMOTE, gpa | level, tdr, page, 0, out); } =20 +static inline u64 tdh_mem_page_promote(hpa_t tdr, gpa_t gpa, int level, + struct tdx_module_output *out) +{ + return seamcall_sept(TDH_MEM_PAGE_PROMOTE, gpa | level, tdr, 0, 0, out); +} + static inline u64 tdh_mr_extend(hpa_t tdr, gpa_t gpa, struct tdx_module_output *out) { --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D427CC4167B for ; Thu, 8 Dec 2022 23:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230368AbiLHXg7 (ORCPT ); Thu, 8 Dec 2022 18:36:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230208AbiLHXgJ (ORCPT ); Thu, 8 Dec 2022 18:36:09 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B82BE70B81; Thu, 8 Dec 2022 15:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542562; x=1702078562; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OruNx6JSRjFi416SwFgD4HRIwqgwPkuEngRvcyMCZnU=; b=hsMgtmVxyBGqayhCC/kDtWG5WGiiLYX1BXgftIOLtfswYYcHn8LhXX4C FcKuFDnVWWH8zMysPD6HjUpcOFsU37to9skkhsxjMs7uKB0bn6xAiTRJu nE7S7zXiRPxJGmRDRj+28ROgoK3xCCNNOhsZntsJUICbRBC5BC4IeAKi2 pKsWMPAEMWV91uDrU3HpqOjXHJeZ7jXG8z9ymsR8QyW16DfVmMoxnpAnE 1guA5OmHTHvWBn5GpJH4PNcSio6tbGhSUK5cfw7BmljwMef0UrBoaaW1L 3YTZpNBkdXF2CYXzxfkqAKjIoeI9Q/6RnSfvB6Ju3ekCy9EEmiBxf+hKF w==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586567" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586567" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:01 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677951038" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677951038" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:01 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 14/15] KVM: x86/mmu: Make kvm fault handelr aware of large page of private memslot Date: Thu, 8 Dec 2022 15:35:49 -0800 Message-Id: <96c1c1e2281e9172244c838ccccc4c32bc150b34.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata struct kvm_page_fault.req_level is the page level which takes care of the faulted-in page size. For now its calculation is only for the conventional kvm memslot by host_pfn_mapping_level() that traverses page table. However, host_pfn_mapping_level() cannot be used for private kvm memslot because pages of private kvm memlost aren't mapped into user virtual address space. Instead page order is given when getting pfn. Remember it in struct kvm_page_fault and use it. Signed-off-by: Isaku Yamahata --- arch/x86/kvm/mmu/mmu.c | 36 +++++++++++++++++++++------------ arch/x86/kvm/mmu/mmu_internal.h | 8 ++++++++ 2 files changed, 31 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7f56b1dd76fa..c011ea0b6444 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3132,12 +3132,12 @@ static int host_pfn_mapping_level(struct kvm *kvm, = gfn_t gfn, return level; } =20 -int kvm_mmu_max_mapping_level(struct kvm *kvm, - const struct kvm_memory_slot *slot, gfn_t gfn, - int max_level, bool is_private) +static int __kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, + gfn_t gfn, int max_level, int host_level, + bool faultin_private) { struct kvm_lpage_info *linfo; - int host_level; =20 max_level =3D min(max_level, max_huge_page_level); for ( ; max_level > PG_LEVEL_4K; max_level--) { @@ -3146,16 +3146,24 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, break; } =20 - if (is_private) - return max_level; - if (max_level =3D=3D PG_LEVEL_4K) return PG_LEVEL_4K; =20 - host_level =3D host_pfn_mapping_level(kvm, gfn, slot); + if (!faultin_private) { + WARN_ON_ONCE(host_level !=3D PG_LEVEL_NONE); + host_level =3D host_pfn_mapping_level(kvm, gfn, slot); + } + WARN_ON_ONCE(host_level =3D=3D PG_LEVEL_NONE); return min(host_level, max_level); } =20 +int kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, gfn_t gfn, + int max_level, bool faultin_private) +{ + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, PG_LEVEL_NO= NE, faultin_private); +} + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault = *fault) { struct kvm_memory_slot *slot =3D fault->slot; @@ -3176,9 +3184,10 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, = struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level =3D kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level, - fault->is_private); + fault->req_level =3D __kvm_mmu_max_mapping_level(vcpu->kvm, slot, + fault->gfn, fault->max_level, + fault->host_level, + kvm_is_faultin_private(fault)); if (fault->req_level =3D=3D PG_LEVEL_4K || fault->huge_page_disallowed) return; =20 @@ -4311,7 +4320,8 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *v= cpu, if (kvm_restricted_mem_get_pfn(slot, fault->gfn, &fault->pfn, &order)) return RET_PF_RETRY; =20 - fault->max_level =3D min(order_to_level(order), fault->max_level); + fault->host_level =3D order_to_level(order); + fault->max_level =3D min((u8)fault->host_level, fault->max_level); fault->map_writable =3D !(slot->flags & KVM_MEM_READONLY); return RET_PF_CONTINUE; } @@ -4355,7 +4365,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault if (fault->is_private !=3D kvm_mem_is_private(vcpu->kvm, fault->gfn)) return kvm_do_memory_fault_exit(vcpu, fault); =20 - if (fault->is_private && kvm_slot_can_be_private(slot)) + if (kvm_is_faultin_private(fault)) return kvm_faultin_pfn_private(vcpu, fault); =20 async =3D false; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 2b7c16dfdf5e..f7ff39b90ba4 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -333,6 +333,7 @@ struct kvm_page_fault { kvm_pfn_t pfn; hva_t hva; bool map_writable; + enum pg_level host_level; /* valid only for private memslot && private gf= n */ }; =20 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault= ); @@ -446,4 +447,11 @@ static inline int kvm_restricted_mem_get_pfn(struct kv= m_memory_slot *slot, } #endif /* CONFIG_HAVE_KVM_RESTRICTED_MEM */ =20 +static inline bool kvm_is_faultin_private(const struct kvm_page_fault *fau= lt) +{ + if (IS_ENABLED(CONFIG_HAVE_KVM_RESTRICTED_MEM)) + return fault->is_private && kvm_slot_can_be_private(fault->slot); + return false; +} + #endif /* __KVM_X86_MMU_INTERNAL_H */ --=20 2.25.1 From nobody Thu Sep 18 05:38:29 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAD18C04FDE for ; Thu, 8 Dec 2022 23:37:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230390AbiLHXhG (ORCPT ); Thu, 8 Dec 2022 18:37:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230211AbiLHXgJ (ORCPT ); Thu, 8 Dec 2022 18:36:09 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC6D26FF06; Thu, 8 Dec 2022 15:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670542562; x=1702078562; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=I6uMLGCJw4mzbJuuwI6vQO3aYEzwoK3zqbKOtUoipBg=; b=dKrl8SEB70NOzPikWx0wEjhhOR92vBKxlO0Q0KsEDUZ7BjeCg7g4zDUq LHCBM812YPRzlOS14LDRWzaataY3QKs1LrpxrKbZBl64ehTbR1hPbg0yJ +B507tFeRmUKcBiuGlQsbBKZTK0HIGpDPQ6cKmaaJ21bC7E7vSrMD/Ofa TDVrtG6KtYjWrSfBINGJuTT/2AhkkNDJlQ3jzgylp+pm40mOqbgteQfQx bF7WSd17gOZJ+Vme6XNttu390wDVLjdYcdDaiCn/7jDRLrxEJ/vVeiquO qLKZT7DDs741mvgEjGV4r6Rc0dJw8dEHe3lnuqBvIFFQacUKLEGYN/3Z+ Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="403586571" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="403586571" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:02 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10555"; a="677951044" X-IronPort-AV: E=Sophos;i="5.96,228,1665471600"; d="scan'208";a="677951044" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2022 15:36:01 -0800 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , Sean Christopherson , David Matlack Subject: [RFC PATCH v2 15/15] KVM: TDX: Allow 2MB large page for TD GUEST Date: Thu, 8 Dec 2022 15:35:50 -0800 Message-Id: <4533c730c0d853ac637596d71374f4573a4e91cd.1670541736.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Now that everything is there to support 2MB page for TD guest. Because TDX module TDH.MEM.PAGE.AUG supports 4KB page and 2MB page, set struct kvm_arch.tdp_max_page_level to 2MB page level. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata --- arch/x86/kvm/mmu/tdp_mmu.c | 9 ++------- arch/x86/kvm/vmx/tdx.c | 4 ++-- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 766795005a39..c133dda53e12 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1592,14 +1592,9 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv= m_page_fault *fault) =20 sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed; =20 - if (is_shadow_present_pte(iter.old_spte)) { - /* - * TODO: large page support. - * Doesn't support large page for TDX now - */ - KVM_BUG_ON(is_private_sptep(iter.sptep), vcpu->kvm); + if (is_shadow_present_pte(iter.old_spte)) r =3D tdp_mmu_split_huge_page(kvm, &iter, sp, true); - } else + else r =3D tdp_mmu_link_sp(kvm, &iter, sp, true); =20 /* diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f20e931cf983..1977a2a5b4c4 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -481,8 +481,8 @@ int tdx_vm_init(struct kvm *kvm) */ kvm_mmu_set_mmio_spte_value(kvm, 0); =20 - /* TODO: Enable 2mb and 1gb large page support. */ - kvm->arch.tdp_max_page_level =3D PG_LEVEL_4K; + /* TDH.MEM.PAGE.AUG supports up to 2MB page. */ + kvm->arch.tdp_max_page_level =3D PG_LEVEL_2M; =20 kvm_tdx->hkid =3D -1; =20 --=20 2.25.1