From nobody Sat Oct 4 06:33:13 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09B4028BABA; Tue, 19 Aug 2025 15:58:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755619113; cv=none; b=kyb9W/SBchvPoprXDsq05otRK1YtfBf9Vb9HXgKSbQacM5ROKmoWUfuo1M5U7QHmX6cK/AWTl5hqUxbid065sJoc2KN3bb2rYHwkswEKr2YAg3MfSc7pQGC/BHJMsAKs9ykGDVTSRqY8pLPojEepd+5tVFrXiiPwYqxQILECvDg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755619113; c=relaxed/simple; bh=MFbNSzmDK/NZkWHOnl0hUjl1toyA0z8QFZj3D81h064=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gnHFEBlkaqKaI0MEj1qyppOM15ijv9BuBZhhdPz3m7bxp0wd2PtzNhcTHWKPgPKjc5+s5Bb77qBg6qd64aP39qEB+XWmYeZ9+GgO8B2MsqrOSDwu7/FH1VUrxYvgXrP0+tVGFgWFzzTyDsSVEMFS10awYSbp8RYyG4ddv74qbVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XGixXKV+; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XGixXKV+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755619112; x=1787155112; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MFbNSzmDK/NZkWHOnl0hUjl1toyA0z8QFZj3D81h064=; b=XGixXKV+FJ/3zmTMpt9jJ5+YpAwCwJ9PDDS5Z/Fy4FccjaEyKuBrvcpc HZg0l92LYbMXOUARCV5WTpmZ2wo/eogfZscuuPNasV2SogIbPQHaU12Pd crj5NOCNqZOY1DDxQgsobooaIZ2MTQMSdctEMaV3dRsu+3E1Ut/YP15vm b2JzAUrH3sYaqGcMBCnFJ5avigVc2X+A9o/t9KAygpx9u8/rfBgR1dnlp mrCakXXC5+Lbc4V38C0rX2ZxfWFFb75FoNRF4x6FXjakR0CElUHTI+tWE UyDt5sRQI93a6PIC8vOGx/6KdG5YhPJVOCFxt9ie9m1Og/xLbGgI3VoeN w==; X-CSE-ConnectionGUID: k9DjAzKwTmqQW1cZAzsvUg== X-CSE-MsgGUID: COVMDEF7QIifyXJ0ev8oSA== X-IronPort-AV: E=McAfee;i="6800,10657,11527"; a="57780311" X-IronPort-AV: E=Sophos;i="6.17,302,1747724400"; d="scan'208";a="57780311" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2025 08:58:31 -0700 X-CSE-ConnectionGUID: Cs4EfMeZSOaiCXHgb0/nRg== X-CSE-MsgGUID: vX0enTckTZmryYPRuj6V2w== X-ExtLoop1: 1 Received: from cpetruta-mobl1.ger.corp.intel.com (HELO localhost.localdomain) ([10.245.244.66]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2025 08:58:26 -0700 From: Adrian Hunter To: Dave Hansen , pbonzini@redhat.com, seanjc@google.com, vannapurve@google.com Cc: Tony Luck , Borislav Petkov , Thomas Gleixner , Ingo Molnar , x86@kernel.org, H Peter Anvin , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org, kai.huang@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, binbin.wu@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com Subject: [PATCH V7 1/3] x86/tdx: Eliminate duplicate code in tdx_clear_page() Date: Tue, 19 Aug 2025 18:58:09 +0300 Message-ID: <20250819155811.136099-2-adrian.hunter@intel.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250819155811.136099-1-adrian.hunter@intel.com> References: <20250819155811.136099-1-adrian.hunter@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Organization: Intel Finland Oy, Registered Address: c/o Alberga Business Park, 6 krs, Bertel Jungin Aukio 5, 02600 Espoo, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" tdx_clear_page() and reset_tdx_pages() duplicate the TDX page clearing logic. Rename reset_tdx_pages() to tdx_quirk_reset_paddr() and create tdx_quirk_reset_page() to call tdx_quirk_reset_paddr() and be used in place of tdx_clear_page(). The new name reflects that, in fact, the clearing is necessary only for hardware with a certain quirk. That is dealt with in a subsequent patch but doing the rename here avoids additional churn. Note reset_tdx_pages() is slightly different from tdx_clear_page() because, more appropriately, it uses mb() in place of __mb(). Except when extra debugging is enabled (kcsan at present), mb() just calls __mb(). Reviewed-by: Kirill A. Shutemov Reviewed-by: Binbin Wu Reviewed-by: Xiaoyao Li Acked-by: Kai Huang Acked-by: Sean Christopherson Reviewed-by: Rick Edgecombe Signed-off-by: Adrian Hunter Acked-by: Vishal Annapurve --- Changes in V7: Add Rick's Rev'd-by Changes in V6: Add Sean's Ack Changes in V5: None Changes in V4: Add and use tdx_quirk_reset_page() for KVM (Sean) Changes in V3: Explain "quirk" rename in commit message (Rick) Explain mb() change in commit message (Rick) Add Rev'd-by, Ack'd-by tags Changes in V2: Rename reset_tdx_pages() to tdx_quirk_reset_paddr() Call tdx_quirk_reset_paddr() directly arch/x86/include/asm/tdx.h | 2 ++ arch/x86/kvm/vmx/tdx.c | 25 +++---------------------- arch/x86/virt/vmx/tdx/tdx.c | 10 ++++++++-- 3 files changed, 13 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 7ddef3a69866..57b46f05ff97 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -131,6 +131,8 @@ int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); =20 +void tdx_quirk_reset_page(struct page *page); + struct tdx_td { /* TD root structure: */ struct page *tdr_page; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 573d6f7d1694..ebb36229c7c8 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -283,25 +283,6 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu= *vcpu) vcpu->cpu =3D -1; } =20 -static void tdx_clear_page(struct page *page) -{ - const void *zero_page =3D (const void *) page_to_virt(ZERO_PAGE(0)); - void *dest =3D page_to_virt(page); - unsigned long i; - - /* - * The page could have been poisoned. MOVDIR64B also clears - * the poison bit so the kernel can safely use the page again. - */ - for (i =3D 0; i < PAGE_SIZE; i +=3D 64) - movdir64b(dest + i, zero_page); - /* - * MOVDIR64B store uses WC buffer. Prevent following memory reads - * from seeing potentially poisoned cache. - */ - __mb(); -} - static void tdx_no_vcpus_enter_start(struct kvm *kvm) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); @@ -347,7 +328,7 @@ static int tdx_reclaim_page(struct page *page) =20 r =3D __tdx_reclaim_page(page); if (!r) - tdx_clear_page(page); + tdx_quirk_reset_page(page); return r; } =20 @@ -596,7 +577,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kv= m) pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); return; } - tdx_clear_page(kvm_tdx->td.tdr_page); + tdx_quirk_reset_page(kvm_tdx->td.tdr_page); =20 __free_page(kvm_tdx->td.tdr_page); kvm_tdx->td.tdr_page =3D NULL; @@ -1717,7 +1698,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm= , gfn_t gfn, pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err); return -EIO; } - tdx_clear_page(page); + tdx_quirk_reset_page(page); tdx_unpin(kvm, page); return 0; } diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c7a9a087ccaf..fc8d8e444f15 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -637,7 +637,7 @@ static int tdmrs_set_up_pamt_all(struct tdmr_info_list = *tdmr_list, * clear these pages. Note this function doesn't flush cache of * these TDX private pages. The caller should make sure of that. */ -static void reset_tdx_pages(unsigned long base, unsigned long size) +static void tdx_quirk_reset_paddr(unsigned long base, unsigned long size) { const void *zero_page =3D (const void *)page_address(ZERO_PAGE(0)); unsigned long phys, end; @@ -654,9 +654,15 @@ static void reset_tdx_pages(unsigned long base, unsign= ed long size) mb(); } =20 +void tdx_quirk_reset_page(struct page *page) +{ + tdx_quirk_reset_paddr(page_to_phys(page), PAGE_SIZE); +} +EXPORT_SYMBOL_GPL(tdx_quirk_reset_page); + static void tdmr_reset_pamt(struct tdmr_info *tdmr) { - tdmr_do_pamt_func(tdmr, reset_tdx_pages); + tdmr_do_pamt_func(tdmr, tdx_quirk_reset_paddr); } =20 static void tdmrs_reset_pamt_all(struct tdmr_info_list *tdmr_list) --=20 2.48.1 From nobody Sat Oct 4 06:33:13 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F417EAD7; Tue, 19 Aug 2025 15:58:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755619119; cv=none; b=nK94GTknYbZHt5VKQStmw3qOW0Vo4FcQxN+nA0Z8WA4cNu1FxsS/2ii9HHUm4JT2DF0wouyFbHsudrD4UoyeqhLNNGnX+ODXcp0mAXzUU0XRKjD/+ncDg2Z9IWzfYH9m2b5M+8WOehnA73Njl0sjgY6SyFhMgQQrqqbut79uiDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755619119; c=relaxed/simple; bh=MPrcO8lyQe7sgyWAVYrEtyh+5tZtOWdODgZDUjOWtcA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pQjwtqMVfgIHoGJUU6iW5Z4E5OTUmT2xRABbCNkP6ydi1bD+rl2qdckR0u5f18wN/J7m4kwXSze/eFJPs9Wd4nRXqt37tL3vQJfBLXScYBERI/u008reSse36dUAyk/KFTIzQyrILIjfwaE97ooTXGF1wMw7zBVZY50K7mZuJC4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kk2qV+P4; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kk2qV+P4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755619118; x=1787155118; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MPrcO8lyQe7sgyWAVYrEtyh+5tZtOWdODgZDUjOWtcA=; b=kk2qV+P4NPQX6xERP4Nu/aGyfXjXfbrq69D4U9pstxUKlm4Zm8uDByiy U9vsRjUHgKtlSAZ0nUsgPsUeFFCrnNErCjIYutZdQ1b6lK23LatoAWHAI fog5dBBsSyjeVSV1gAjygT7nNBFsrAC+bYexWuFF6cV20XEZWtteNr4pa pxIE21UPgnF8ih1yDYiGS/ALwi/4hUWX7/xyxLR+Pf7ihB14xGpB8/DM1 vRJn2k6VygO6sRZyYT3PoKkYKZfW9bjsfW0Oi8EEIgrVqLopn4CCcu4cB /DbTyVIC/YPSIfak6N+pwS7QC6TIdguVeZYM2MEt8f28CqfnZI8Vme9lc g==; X-CSE-ConnectionGUID: o7m8bWmEQK2KftSImHMBJA== X-CSE-MsgGUID: KKAMKrj5T/azgi++C1njvA== X-IronPort-AV: E=McAfee;i="6800,10657,11527"; a="57780331" X-IronPort-AV: E=Sophos;i="6.17,302,1747724400"; d="scan'208";a="57780331" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2025 08:58:38 -0700 X-CSE-ConnectionGUID: Muoq7ySSTPm6WK18N8PQbA== X-CSE-MsgGUID: n9fEDVw6Qty2yHUHLrS6NQ== X-ExtLoop1: 1 Received: from cpetruta-mobl1.ger.corp.intel.com (HELO localhost.localdomain) ([10.245.244.66]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2025 08:58:31 -0700 From: Adrian Hunter To: Dave Hansen , pbonzini@redhat.com, seanjc@google.com, vannapurve@google.com Cc: Tony Luck , Borislav Petkov , Thomas Gleixner , Ingo Molnar , x86@kernel.org, H Peter Anvin , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org, kai.huang@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, binbin.wu@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com Subject: [PATCH V7 2/3] x86/tdx: Tidy reset_pamt functions Date: Tue, 19 Aug 2025 18:58:10 +0300 Message-ID: <20250819155811.136099-3-adrian.hunter@intel.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250819155811.136099-1-adrian.hunter@intel.com> References: <20250819155811.136099-1-adrian.hunter@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Organization: Intel Finland Oy, Registered Address: c/o Alberga Business Park, 6 krs, Bertel Jungin Aukio 5, 02600 Espoo, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" tdx_quirk_reset_paddr() was renamed to reflect that, in fact, the clearing is necessary only for hardware with a certain quirk. That is dealt with in a subsequent patch. Rename reset_pamt functions to contain "quirk" to reflect the new functionality, and remove the now misleading comment. Reviewed-by: Rick Edgecombe Acked-by: Kai Huang Reviewed-by: Binbin Wu Signed-off-by: Adrian Hunter Acked-by: Vishal Annapurve --- Changes in V7: Add Rick's Rev'd-by Add Kai's Ack Add Binbin's Rev'd-by Changes in V6: None Changes in V5: New patch arch/x86/virt/vmx/tdx/tdx.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index fc8d8e444f15..9e4638f68ba0 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -660,17 +660,17 @@ void tdx_quirk_reset_page(struct page *page) } EXPORT_SYMBOL_GPL(tdx_quirk_reset_page); =20 -static void tdmr_reset_pamt(struct tdmr_info *tdmr) +static void tdmr_quirk_reset_pamt(struct tdmr_info *tdmr) { tdmr_do_pamt_func(tdmr, tdx_quirk_reset_paddr); } =20 -static void tdmrs_reset_pamt_all(struct tdmr_info_list *tdmr_list) +static void tdmrs_quirk_reset_pamt_all(struct tdmr_info_list *tdmr_list) { int i; =20 for (i =3D 0; i < tdmr_list->nr_consumed_tdmrs; i++) - tdmr_reset_pamt(tdmr_entry(tdmr_list, i)); + tdmr_quirk_reset_pamt(tdmr_entry(tdmr_list, i)); } =20 static unsigned long tdmrs_count_pamt_kb(struct tdmr_info_list *tdmr_list) @@ -1142,15 +1142,7 @@ static int init_tdx_module(void) * to the kernel. */ wbinvd_on_all_cpus(); - /* - * According to the TDX hardware spec, if the platform - * doesn't have the "partial write machine check" - * erratum, any kernel read/write will never cause #MC - * in kernel space, thus it's OK to not convert PAMTs - * back to normal. But do the conversion anyway here - * as suggested by the TDX spec. - */ - tdmrs_reset_pamt_all(&tdx_tdmr_list); + tdmrs_quirk_reset_pamt_all(&tdx_tdmr_list); err_free_pamts: tdmrs_free_pamt_all(&tdx_tdmr_list); err_free_tdmrs: --=20 2.48.1 From nobody Sat Oct 4 06:33:13 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 94AE92C235B; Tue, 19 Aug 2025 15:58:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.21 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755619124; cv=none; b=HK2oSlCOLG0RMZ3MGPfBpJOSaFgopC8t7WXYGz1pvDRSKd32g+zn29W5sF3FnJm1qefh0+4QTSK2c0eK3EUYhLeBqaTos01VhYWIfgLgic0xULZ+KxrnCUmVE7SDZ1EQiGvzXEnH+TcWZCZ+veEes7GsZ46DEhgISgFQvugrCow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755619124; c=relaxed/simple; bh=ahLgbdf/oQinISVqnwhOLfArNoU+8/dhM+j921cbspM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mYWAfE9geFf3+zI7IoK3L4guN5/Fw4j4Ve/VWU4HkR5sf8TwS2tyDLnMFsdxgcLeodXGnQMhcQdcNCQYhviQxHSEcixEG4/+8QHdhHcwdmThlk8qz3B22Z0Dj8Z7A2m4/Aiz1gEKuMhz4yhvVHE7W3pgtrWQkAWSNuM8W0OjOUM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jM3w8UnV; arc=none smtp.client-ip=198.175.65.21 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jM3w8UnV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755619122; x=1787155122; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ahLgbdf/oQinISVqnwhOLfArNoU+8/dhM+j921cbspM=; b=jM3w8UnV0yh6N7Ly6D33DMtph2eJX8/gOnhfxbMvChqhhjzZJtm/VCJ3 JTFpwnSa6w1UQx9XE5BHPdkleEISMH9HN57oBm6V6YDwPTPwlu9Mr/QjB 7hI4IMkjWfBgQ0L1Ig7z7I1WiU8c5lKEjGH3Ndoczk6yfUWVHDRkcJKqi R4nFmjVVyKJsuUkAwNKcjZdSnLpKJvNlOpfbaSySN6vcuOAFDNuanzqYk C6F/n4Nvuzpmb2Wlj59IfyswreEuFjR714pfuuK+85/aV+yLYY4kET0ui H5MhxbNLTgRsxvvjHTgoXYrJKrztlbA2oysskJrACLhrYtDDkiFGgWSMh A==; X-CSE-ConnectionGUID: ugx/mJ2SRCaNJjUtF0YheQ== X-CSE-MsgGUID: IxYtu00OQgqxq0mwCcKiRQ== X-IronPort-AV: E=McAfee;i="6800,10657,11527"; a="57780347" X-IronPort-AV: E=Sophos;i="6.17,302,1747724400"; d="scan'208";a="57780347" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2025 08:58:42 -0700 X-CSE-ConnectionGUID: Z57wROwLQRCXumtiViME6Q== X-CSE-MsgGUID: jpJP0gxAQIK65Iyu6AT71g== X-ExtLoop1: 1 Received: from cpetruta-mobl1.ger.corp.intel.com (HELO localhost.localdomain) ([10.245.244.66]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2025 08:58:37 -0700 From: Adrian Hunter To: Dave Hansen , pbonzini@redhat.com, seanjc@google.com, vannapurve@google.com Cc: Tony Luck , Borislav Petkov , Thomas Gleixner , Ingo Molnar , x86@kernel.org, H Peter Anvin , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org, kai.huang@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, binbin.wu@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com Subject: [PATCH V7 3/3] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present Date: Tue, 19 Aug 2025 18:58:11 +0300 Message-ID: <20250819155811.136099-4-adrian.hunter@intel.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250819155811.136099-1-adrian.hunter@intel.com> References: <20250819155811.136099-1-adrian.hunter@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Organization: Intel Finland Oy, Registered Address: c/o Alberga Business Park, 6 krs, Bertel Jungin Aukio 5, 02600 Espoo, Business Identity Code: 0357606 - 4, Domiciled in Helsinki Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Avoid clearing reclaimed TDX private pages unless the platform is affected by the X86_BUG_TDX_PW_MCE erratum. This significantly reduces VM shutdown time on unaffected systems. Background KVM currently clears reclaimed TDX private pages using MOVDIR64B, which: - Clears the TD Owner bit (which identifies TDX private memory) and integrity metadata without triggering integrity violations. - Clears poison from cache lines without consuming it, avoiding MCEs on access (refer TDX Module Base spec. 1348549-006US section 6.5. Handling Machine Check Events during Guest TD Operation). The TDX module also uses MOVDIR64B to initialize private pages before use. If cache flushing is needed, it sets TDX_FEATURES.CLFLUSH_BEFORE_ALLOC. However, KVM currently flushes unconditionally, refer commit 94c477a751c7b ("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages") In contrast, when private pages are reclaimed, the TDX Module handles flushing via the TDH.PHYMEM.CACHE.WB SEAMCALL. Problem Clearing all private pages during VM shutdown is costly. For guests with a large amount of memory it can take minutes. Solution TDX Module Base Architecture spec. documents that private pages reclaimed from a TD should be initialized using MOVDIR64B, in order to avoid integrity violation or TD bit mismatch detection when later being read using a shared HKID, refer April 2025 spec. "Page Initialization" in section "8.6.2. Platforms not Using ACT: Required Cache Flush and Initialization by the Host VMM" That is an overstatement and will be clarified in coming versions of the spec. In fact, as outlined in "Table 16.2: Non-ACT Platforms Checks on Memory" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li Mode" in the same spec, there is no issue accessing such reclaimed pages using a shared key that does not have integrity enabled. Linux always uses KeyID 0 which never has integrity enabled. KeyID 0 is also the TME KeyID which disallows integrity, refer "TME Policy/Encryption Algorithm" bit description in "Intel Architecture Memory Encryption Technologies" spec version 1.6 April 2025. So there is no need to clear pages to avoid integrity violations. There remains a risk of poison consumption. However, in the context of TDX, it is expected that there would be a machine check associated with the original poisoning. On some platforms that results in a panic. However platforms may support "SEAM_NR" Machine Check capability, in which case Linux machine check handler marks the page as poisoned, which prevents it from being allocated anymore, refer commit 7911f145de5fe ("x86/mce: Implement recovery for errors in TDX/SEAM non-root mode") Improvement By skipping the clearing step on unaffected platforms, shutdown time can improve by up to 40%. On platforms with the X86_BUG_TDX_PW_MCE erratum (SPR and EMR), continue clearing because these platforms may trigger poison on partial writes to previously-private pages, even with KeyID 0, refer commit 1e536e1068970 ("x86/cpu: Detect TDX partial write machine check erratum") Reviewed-by: Kirill A. Shutemov Acked-by: Kai Huang Reviewed-by: Rick Edgecombe Reviewed-by: Xiaoyao Li Reviewed-by: Binbin Wu Signed-off-by: Adrian Hunter Acked-by: Vishal Annapurve --- Changes in V7: Add Binbin's Rev'd-by Changes in V6: Add Xiaoyao's Rev'd-by Changes in V5: None Changes in V4: Add TDX Module Base spec. version (Rick) Add Rick's Rev'd-by Changes in V3: Remove "flush cache" comments (Rick) Update function comment to better relate to "quirk" naming (Rick) Add "via MOVDIR64B" to comment (Xiaoyao) Add Rev'd-by, Ack'd-by tags Changes in V2: Improve the comment arch/x86/virt/vmx/tdx/tdx.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 9e4638f68ba0..823850399bb7 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -633,15 +633,19 @@ static int tdmrs_set_up_pamt_all(struct tdmr_info_lis= t *tdmr_list, } =20 /* - * Convert TDX private pages back to normal by using MOVDIR64B to - * clear these pages. Note this function doesn't flush cache of - * these TDX private pages. The caller should make sure of that. + * Convert TDX private pages back to normal by using MOVDIR64B to clear th= ese + * pages. Typically, any write to the page will convert it from TDX privat= e back + * to normal kernel memory. Systems with the X86_BUG_TDX_PW_MCE erratum ne= ed to + * do the conversion explicitly via MOVDIR64B. */ static void tdx_quirk_reset_paddr(unsigned long base, unsigned long size) { const void *zero_page =3D (const void *)page_address(ZERO_PAGE(0)); unsigned long phys, end; =20 + if (!boot_cpu_has_bug(X86_BUG_TDX_PW_MCE)) + return; + end =3D base + size; for (phys =3D base; phys < end; phys +=3D 64) movdir64b(__va(phys), zero_page); --=20 2.48.1