From nobody Sat Feb 7 13:40:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66CFAC7EE24 for ; Tue, 6 Jun 2023 09:18:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236274AbjFFJS4 (ORCPT ); Tue, 6 Jun 2023 05:18:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232547AbjFFJSw (ORCPT ); Tue, 6 Jun 2023 05:18:52 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 137AE109; Tue, 6 Jun 2023 02:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686043132; x=1717579132; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=cjJCRl0RndewoxrmvCjtrWtwVty9zUQSK/RTSA+/jqM=; b=cDq8iU0mbN7q+Zt9Sboa9FFEWEVNozN5VJcqwT9qiY9W8j2OQ+BgfCnt 3Xo5g4P1WcUXdz9/cCRWvF6VBnfa07qFP5RwezTTAk5y0wpyyt9wttkP1 URnfkdjE9KYeXJl8mtnDcVMi3MjEpRB+FXZHPWjCArgIc09yKP42949zX 5qk2OX5Lb/HU+niSdvTAXy+U6oQ+PZD6pmFAtdb3dhEj9QI+nOioAGEku dE4S/DnnGxEzhTXCo1BxI2PI9TVgHCXuiByur9oqqOS+rvZYtLXpZZ4UM vYHhqJuzZA4ExoJ3B+bIh6A1ZBBn3yZO5maN8vGMHpELFYKiLmVHKhKOK w==; X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="341252812" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="341252812" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="883263616" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="883263616" Received: from binbinwu-mobl.ccr.corp.intel.com ([10.249.170.159]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:48 -0700 From: Binbin Wu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com, David.Laight@ACULAB.COM, robert.hu@linux.intel.com, binbin.wu@linux.intel.com Subject: [PATCH v9 1/6] KVM: x86: Consolidate flags for __linearize() Date: Tue, 6 Jun 2023 17:18:37 +0800 Message-Id: <20230606091842.13123-2-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606091842.13123-1-binbin.wu@linux.intel.com> References: <20230606091842.13123-1-binbin.wu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Consolidate two bool parameters (write/fetch) of __linearize() into a 'u32 flags' parameter to make the function be more concise and future extendable, i.e. to support Intel Linear Address Masking (LAM), which allows high non-address bits of linear address to be used as metadata. Define two flags to replace the two bools. A new flag will be added to to support LAM to skip masking off metadata bits of linear address under some conditions. No functional change intended. Signed-off-by: Binbin Wu Reviewed-by: Chao Gao Acked-by: Kai Huang --- arch/x86/kvm/emulate.c | 19 ++++++++++--------- arch/x86/kvm/kvm_emulate.h | 4 ++++ 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 936a397a08cd..e89afc39e56f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -687,8 +687,8 @@ static unsigned insn_alignment(struct x86_emulate_ctxt = *ctxt, unsigned size) static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt, struct segmented_address addr, unsigned *max_size, unsigned size, - bool write, bool fetch, - enum x86emul_mode mode, ulong *linear) + enum x86emul_mode mode, ulong *linear, + u32 flags) { struct desc_struct desc; bool usable; @@ -718,10 +718,10 @@ static __always_inline int __linearize(struct x86_emu= late_ctxt *ctxt, goto bad; /* code segment in protected mode or read-only data segment */ if ((((ctxt->mode !=3D X86EMUL_MODE_REAL) && (desc.type & 8)) - || !(desc.type & 2)) && write) + || !(desc.type & 2)) && (flags & X86EMUL_F_WRITE)) goto bad; /* unreadable code segment */ - if (!fetch && (desc.type & 8) && !(desc.type & 2)) + if (!(flags & X86EMUL_F_FETCH) && (desc.type & 8) && !(desc.type & 2)) goto bad; lim =3D desc_limit_scaled(&desc); if (!(desc.type & 8) && (desc.type & 4)) { @@ -757,8 +757,8 @@ static int linearize(struct x86_emulate_ctxt *ctxt, ulong *linear) { unsigned max_size; - return __linearize(ctxt, addr, &max_size, size, write, false, - ctxt->mode, linear); + return __linearize(ctxt, addr, &max_size, size, ctxt->mode, linear, + write ? X86EMUL_F_WRITE : 0); } =20 static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst) @@ -771,7 +771,8 @@ static inline int assign_eip(struct x86_emulate_ctxt *c= txt, ulong dst) =20 if (ctxt->op_bytes !=3D sizeof(unsigned long)) addr.ea =3D dst & ((1UL << (ctxt->op_bytes << 3)) - 1); - rc =3D __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &li= near); + rc =3D __linearize(ctxt, addr, &max_size, 1, ctxt->mode, &linear, + X86EMUL_F_FETCH); if (rc =3D=3D X86EMUL_CONTINUE) ctxt->_eip =3D addr.ea; return rc; @@ -907,8 +908,8 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ctx= t *ctxt, int op_size) * boundary check itself. Instead, we use max_size to check * against op_size. */ - rc =3D __linearize(ctxt, addr, &max_size, 0, false, true, ctxt->mode, - &linear); + rc =3D __linearize(ctxt, addr, &max_size, 0, ctxt->mode, &linear, + X86EMUL_F_FETCH); if (unlikely(rc !=3D X86EMUL_CONTINUE)) return rc; =20 diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index ab65f3a47dfd..5b9ec610b2cb 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -88,6 +88,10 @@ struct x86_instruction_info { #define X86EMUL_IO_NEEDED 5 /* IO is needed to complete emulation */ #define X86EMUL_INTERCEPTED 6 /* Intercepted by nested VMCB/VMCS */ =20 +/* x86-specific emulation flags */ +#define X86EMUL_F_FETCH BIT(0) +#define X86EMUL_F_WRITE BIT(1) + struct x86_emulate_ops { void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); /* --=20 2.25.1 From nobody Sat Feb 7 13:40:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5B19C77B7A for ; Tue, 6 Jun 2023 09:19:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236669AbjFFJTB (ORCPT ); Tue, 6 Jun 2023 05:19:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235160AbjFFJSz (ORCPT ); Tue, 6 Jun 2023 05:18:55 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B48918E; Tue, 6 Jun 2023 02:18:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686043134; x=1717579134; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vx+dLHRNpRL2FBPV8L/ICnFVgb5+hT8yghelRXbujUk=; b=OwZOj8LSi+N6wulToScbpJhvTc4bkJMl6fKUfx5AxIf+n1JLDHHfHhAF tMxt9pYBnXVDZXP3FFl9ggecgqI9emWYc+C2LEe2kmPNfRigXc/Wn/OqD nifolw85KggZg4GFOWXdHF0+vBkrxemrIcZKX9j8tjKvV1574A+YxC2rq zm2PwR+w8CA6zOvHYlbShtqWs7zGtt/BoC5GITNkyQLrQSVfY98mCmdtG uin/vpwMl5Po2+cXpgNV3Uhwhpp/JfbUXxHXVQiWomfItR9+xceHokI31 gcOC9rxz/1e47Sa50Ef+XpQCwpkzejWzeusLgmVADBZl/v9u82O9Wdjf8 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="341252822" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="341252822" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="883263633" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="883263633" Received: from binbinwu-mobl.ccr.corp.intel.com ([10.249.170.159]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:50 -0700 From: Binbin Wu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com, David.Laight@ACULAB.COM, robert.hu@linux.intel.com, binbin.wu@linux.intel.com Subject: [PATCH v9 2/6] KVM: x86: Virtualize CR4.LAM_SUP Date: Tue, 6 Jun 2023 17:18:38 +0800 Message-Id: <20230606091842.13123-3-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606091842.13123-1-binbin.wu@linux.intel.com> References: <20230606091842.13123-1-binbin.wu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Robert Hoo Add support to allow guests to set the new CR4 control bit for guests to en= able the new Intel CPU feature Linear Address Masking (LAM) on supervisor pointe= rs. LAM modifies the checking that is applied to 64-bit linear addresses, allow= ing software to use of the untranslated address bits for metadata and masks the metadata bits before using them as linear addresses to access memory. LAM u= ses CR4.LAM_SUP (bit 28) to configure LAM for supervisor pointers. LAM also cha= nges VMENTER to allow the bit to be set in VMCS's HOST_CR4 and GUEST_CR4 for virtualization. Note CR4.LAM_SUP is allowed to be set even not in 64-bit mo= de, but it will not take effect since LAM only applies to 64-bit linear address= es. Move CR4.LAM_SUP out of CR4_RESERVED_BITS and its reservation depends on vc= pu supporting LAM feature or not. Leave the bit intercepted to avoid vmread ev= ery time when KVM fetches its value, with the expectation that guest won't togg= le the bit frequently. Set CR4.LAM_SUP bit in the emulated IA32_VMX_CR4_FIXED1 MSR for guests to a= llow guests to enable LAM for supervisor pointers in nested VMX operation. Hardware is not required to do TLB flush when CR4.LAM_SUP toggled, KVM does= n't need to emulate TLB flush based on it. There's no other features/vmx_exec_controls connection, no other code neede= d in {kvm,vmx}_set_cr4(). Signed-off-by: Robert Hoo Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Reviewed-by: Chao Gao Tested-by: Xuelian Guo Reviewed-by: Kai Huang --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/kvm/vmx/vmx.c | 3 +++ arch/x86/kvm/x86.h | 2 ++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index fb9d1f2d6136..c6f03d151c31 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -125,7 +125,8 @@ | X86_CR4_PGE | X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_PCIDE \ | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \ | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \ - | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP)) + | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \ + | X86_CR4_LAM_SUP)) =20 #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR) =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2d9d155691a7..0dd2970ba5c8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7600,6 +7600,9 @@ static void nested_vmx_cr_fixed1_bits_update(struct k= vm_vcpu *vcpu) cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP)); cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57)); =20 + entry =3D kvm_find_cpuid_entry_index(vcpu, 0x7, 1); + cr4_fixed1_update(X86_CR4_LAM_SUP, eax, feature_bit(LAM)); + #undef cr4_fixed1_update } =20 diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 82e3dafc5453..24e2b56356b8 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -528,6 +528,8 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, = u32 type); __reserved_bits |=3D X86_CR4_VMXE; \ if (!__cpu_has(__c, X86_FEATURE_PCID)) \ __reserved_bits |=3D X86_CR4_PCIDE; \ + if (!__cpu_has(__c, X86_FEATURE_LAM)) \ + __reserved_bits |=3D X86_CR4_LAM_SUP; \ __reserved_bits; \ }) =20 --=20 2.25.1 From nobody Sat Feb 7 13:40:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3B21C7EE2C for ; Tue, 6 Jun 2023 09:19:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237394AbjFFJTM (ORCPT ); Tue, 6 Jun 2023 05:19:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236178AbjFFJS6 (ORCPT ); Tue, 6 Jun 2023 05:18:58 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94B72E40; Tue, 6 Jun 2023 02:18:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686043136; x=1717579136; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3YYK7VVoasguBNmZIdcwxEm98q8YqJJW5x7RyNLBOqA=; b=C9NnFaHMjMfvLPUrPusaAfugXDWaQy2/6l2xD7AwRU5xgMrbdp3iMS11 230buJMIyNg1p03zD6ir3UPX+P8zoUP7IsA3OHdoxEqZvEMnUtmFDjtmb V5boj0tmQoI5c9DC3O+Fa93vt5//P9x4+t8NgM4ZRKVQZrjQvR3GnCt/G z6mEdo4G2VqCI0/X8rcLedjQeqHmPX3yIuCSuEoE2O/FstBd0njI2IMkw FFFzZoakmz7AqvqILERv2Q/TEhoC8jOHUaljtfL+ww+MWVaHN/2zs7oGy i8G0goL072JhCl6mFZxJ/1Eg0RqjfWkqiiiH0X5maq3aMvvO+yo6bwVdH Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="341252831" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="341252831" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="883263650" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="883263650" Received: from binbinwu-mobl.ccr.corp.intel.com ([10.249.170.159]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:53 -0700 From: Binbin Wu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com, David.Laight@ACULAB.COM, robert.hu@linux.intel.com, binbin.wu@linux.intel.com Subject: [PATCH v9 3/6] KVM: x86: Virtualize CR3.LAM_{U48,U57} Date: Tue, 6 Jun 2023 17:18:39 +0800 Message-Id: <20230606091842.13123-4-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606091842.13123-1-binbin.wu@linux.intel.com> References: <20230606091842.13123-1-binbin.wu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Robert Hoo Add support to allow guests to set two new CR3 non-address control bits for guests to enable the new Intel CPU feature Linear Address Masking (LAM) on = user pointers. LAM modifies the checking that is applied to 64-bit linear addresses, allow= ing software to use of the untranslated address bits for metadata and masks the metadata bits before using them as linear addresses to access memory. LAM u= ses two new CR3 non-address bits LAM_U48 (bit 62) and AM_U57 (bit 61) to config= ure LAM for user pointers. LAM also changes VMENTER to allow both bits to be se= t in VMCS's HOST_CR3 and GUEST_CR3 for virtualization. When EPT is on, CR3 is not trapped by KVM and it's up to the guest to set a= ny of the two LAM control bits. However, when EPT is off, the actual CR3 used by = the guest is generated from the shadow MMU root which is different from the CR3= that is *set* by the guest, and KVM needs to manually apply any active control b= its to VMCS's GUEST_CR3 based on the cached CR3 *seen* by the guest. KVM manually checks guest's CR3 to make sure it points to a valid guest phy= sical address (i.e. to support smaller MAXPHYSADDR in the guest). Extend this che= ck to allow the two LAM control bits to be set. And to make such check generic, introduce a new field 'cr3_ctrl_bits' to vcpu to record all feature control= bits that are allowed to be set by the guest. After check, non-address bits of g= uest CR3 will be stripped off to extract guest physical address. In case of nested, for a guest which supports LAM, both VMCS12's HOST_CR3 a= nd GUEST_CR3 are allowed to have the new LAM control bits set, i.e. when L0 en= ters L1 to emulate a VMEXIT from L2 to L1 or when L0 enters L2 directly. KVM also manually checks VMCS12's HOST_CR3 and GUEST_CR3 being valid physical addres= s. Extend such check to allow the new LAM control bits too. Note, LAM doesn't have a global control bit to turn on/off LAM completely, = but purely depends on hardware's CPUID to determine it can be enabled or not. T= hat means, when EPT is on, even when KVM doesn't expose LAM to guest, the guest= can still set LAM control bits in CR3 w/o causing problem. This is an unfortuna= te virtualization hole. KVM could choose to intercept CR3 in this case and inj= ect fault but this would hurt performance when running a normal VM w/o LAM supp= ort. This is undesirable. Just choose to let the guest do such illegal thing as = the worst case is guest being killed when KVM eventually find out such illegal behaviour and that is the guest to blame. Opportunistically use GENMASK_ULL() to define __PT_BASE_ADDR_MASK. Opportunistically use kvm_vcpu_is_legal_cr3() to check CR3 in SVM nested co= de, to provide a clear distinction b/t CR3 and GPA checks. Suggested-by: Sean Christopherson Signed-off-by: Robert Hoo Co-developed-by: Binbin Wu Signed-off-by: Binbin Wu Tested-by: Xuelian Guo Reviewed-by: Kai Huang Reviewed-by: Chao Gao --- arch/x86/include/asm/kvm_host.h | 5 +++++ arch/x86/kvm/cpuid.h | 5 +++++ arch/x86/kvm/mmu.h | 5 +++++ arch/x86/kvm/mmu/mmu.c | 8 +++++++- arch/x86/kvm/mmu/mmu_internal.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 3 ++- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/svm/nested.c | 4 ++-- arch/x86/kvm/vmx/nested.c | 4 ++-- arch/x86/kvm/vmx/vmx.c | 8 +++++++- arch/x86/kvm/x86.c | 4 ++-- 11 files changed, 39 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index c6f03d151c31..46471dd9cc1b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -727,6 +727,11 @@ struct kvm_vcpu_arch { unsigned long cr0_guest_owned_bits; unsigned long cr2; unsigned long cr3; + /* + * CR3 non-address feature control bits. + * Guest CR3 may contain any of those bits at runtime. + */ + u64 cr3_ctrl_bits; unsigned long cr4; unsigned long cr4_guest_owned_bits; unsigned long cr4_guest_rsvd_bits; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index b1658c0de847..ef8e1b912d7d 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -42,6 +42,11 @@ static inline int cpuid_maxphyaddr(struct kvm_vcpu *vcpu) return vcpu->arch.maxphyaddr; } =20 +static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned l= ong cr3) +{ + return !((cr3 & vcpu->arch.reserved_gpa_bits) & ~vcpu->arch.cr3_ctrl_bits= ); +} + static inline bool kvm_vcpu_is_legal_gpa(struct kvm_vcpu *vcpu, gpa_t gpa) { return !(gpa & vcpu->arch.reserved_gpa_bits); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 92d5a1924fc1..81d8a433dae1 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -144,6 +144,11 @@ static inline unsigned long kvm_get_active_pcid(struct= kvm_vcpu *vcpu) return kvm_get_pcid(vcpu, kvm_read_cr3(vcpu)); } =20 +static inline u64 kvm_get_active_cr3_ctrl_bits(struct kvm_vcpu *vcpu) +{ + return kvm_read_cr3(vcpu) & vcpu->arch.cr3_ctrl_bits; +} + static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) { u64 root_hpa =3D vcpu->arch.mmu->root.hpa; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c8961f45e3b1..deea9a9f0c75 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3812,7 +3812,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *v= cpu) hpa_t root; =20 root_pgd =3D kvm_mmu_get_guest_pgd(vcpu, mmu); - root_gfn =3D root_pgd >> PAGE_SHIFT; + /* + * Guest PGD can be CR3 or EPTP (for nested EPT case). CR3 may contain + * additional control bits (e.g. LAM control bits). To be generic, + * unconditionally strip non-address bits when computing the GFN since + * the guest PGD has already been checked for validity. + */ + root_gfn =3D (root_pgd & __PT_BASE_ADDR_MASK) >> PAGE_SHIFT; =20 if (mmu_check_root(vcpu, root_gfn)) return 1; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index d39af5639ce9..7d2105432d66 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -21,6 +21,7 @@ extern bool dbg; #endif =20 /* Page table builder macros common to shadow (host) PTEs and guest PTEs. = */ +#define __PT_BASE_ADDR_MASK GENMASK_ULL(51, 12) #define __PT_LEVEL_SHIFT(level, bits_per_level) \ (PAGE_SHIFT + ((level) - 1) * (bits_per_level)) #define __PT_INDEX(address, level, bits_per_level) \ diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 0662e0278e70..394733ac9088 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -62,7 +62,7 @@ #endif =20 /* Common logic, but per-type values. These also need to be undefined. */ -#define PT_BASE_ADDR_MASK ((pt_element_t)(((1ULL << 52) - 1) & ~(u64)(PAGE= _SIZE-1))) +#define PT_BASE_ADDR_MASK ((pt_element_t)__PT_BASE_ADDR_MASK) #define PT_LVL_ADDR_MASK(lvl) __PT_LVL_ADDR_MASK(PT_BASE_ADDR_MASK, lvl, P= T_LEVEL_BITS) #define PT_LVL_OFFSET_MASK(lvl) __PT_LVL_OFFSET_MASK(PT_BASE_ADDR_MASK, lv= l, PT_LEVEL_BITS) #define PT_INDEX(addr, lvl) __PT_INDEX(addr, lvl, PT_LEVEL_BITS) @@ -324,6 +324,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker= *walker, trace_kvm_mmu_pagetable_walk(addr, access); retry_walk: walker->level =3D mmu->cpu_role.base.level; + /* gpte_to_gfn() will strip non-address bits. */ pte =3D kvm_mmu_get_guest_pgd(vcpu, mmu); have_ad =3D PT_HAVE_ACCESSED_DIRTY(mmu); =20 diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 1279db2eab44..777f7d443e3b 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -36,7 +36,7 @@ static_assert(SPTE_TDP_AD_ENABLED =3D=3D 0); #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK #define SPTE_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1)) #else -#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1)) +#define SPTE_BASE_ADDR_MASK __PT_BASE_ADDR_MASK #endif =20 #define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_m= ask \ diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 96936ddf1b3c..1df801a48451 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -311,7 +311,7 @@ static bool __nested_vmcb_check_save(struct kvm_vcpu *v= cpu, if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) { if (CC(!(save->cr4 & X86_CR4_PAE)) || CC(!(save->cr0 & X86_CR0_PE)) || - CC(kvm_vcpu_is_illegal_gpa(vcpu, save->cr3))) + CC(!kvm_vcpu_is_legal_cr3(vcpu, save->cr3))) return false; } =20 @@ -520,7 +520,7 @@ static void nested_svm_transition_tlb_flush(struct kvm_= vcpu *vcpu) static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_npt, bool reload_pdptrs) { - if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) + if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) return -EINVAL; =20 if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) && diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index e35cf0bd0df9..11b12a75ca91 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -1085,7 +1085,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu,= unsigned long cr3, bool nested_ept, bool reload_pdptrs, enum vm_entry_failure_code *entry_failure_code) { - if (CC(kvm_vcpu_is_illegal_gpa(vcpu, cr3))) { + if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) { *entry_failure_code =3D ENTRY_FAIL_DEFAULT; return -EINVAL; } @@ -2913,7 +2913,7 @@ static int nested_vmx_check_host_state(struct kvm_vcp= u *vcpu, =20 if (CC(!nested_host_cr0_valid(vcpu, vmcs12->host_cr0)) || CC(!nested_host_cr4_valid(vcpu, vmcs12->host_cr4)) || - CC(kvm_vcpu_is_illegal_gpa(vcpu, vmcs12->host_cr3))) + CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3))) return -EINVAL; =20 if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) || diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0dd2970ba5c8..52dcf3c00bb8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3358,7 +3358,8 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, h= pa_t root_hpa, update_guest_cr3 =3D false; vmx_ept_load_pdptrs(vcpu); } else { - guest_cr3 =3D root_hpa | kvm_get_active_pcid(vcpu); + guest_cr3 =3D root_hpa | kvm_get_active_pcid(vcpu) | + kvm_get_active_cr3_ctrl_bits(vcpu); } =20 if (update_guest_cr3) @@ -7740,6 +7741,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu= *vcpu) vmx->msr_ia32_feature_control_valid_bits &=3D ~FEAT_CTL_SGX_LC_ENABLED; =20 + if (guest_cpuid_has(vcpu, X86_FEATURE_LAM)) + vcpu->arch.cr3_ctrl_bits |=3D X86_CR3_LAM_U48 | X86_CR3_LAM_U57; + else + vcpu->arch.cr3_ctrl_bits &=3D ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57); + /* Refresh #PF interception to account for MAXPHYADDR changes. */ vmx_update_exception_bitmap(vcpu); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5ad55ef71433..709fc920f378 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1275,7 +1275,7 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long = cr3) * stuff CR3, e.g. for RSM emulation, and there is no guarantee that * the current vCPU mode is accurate. */ - if (kvm_vcpu_is_illegal_gpa(vcpu, cr3)) + if (!kvm_vcpu_is_legal_cr3(vcpu, cr3)) return 1; =20 if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, cr3)) @@ -11456,7 +11456,7 @@ static bool kvm_is_valid_sregs(struct kvm_vcpu *vcp= u, struct kvm_sregs *sregs) */ if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA)) return false; - if (kvm_vcpu_is_illegal_gpa(vcpu, sregs->cr3)) + if (!kvm_vcpu_is_legal_cr3(vcpu, sregs->cr3)) return false; } else { /* --=20 2.25.1 From nobody Sat Feb 7 13:40:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACB34C77B7A for ; Tue, 6 Jun 2023 09:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237410AbjFFJTS (ORCPT ); Tue, 6 Jun 2023 05:19:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236949AbjFFJTK (ORCPT ); Tue, 6 Jun 2023 05:19:10 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F1B4E71; Tue, 6 Jun 2023 02:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686043140; x=1717579140; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XPZBPXOM1GhylITaBAqxYiF/joDXEhy6wJaBzdauyvM=; b=GjMJkQr9MouNwslwa6audfPb3muE5+SrXLrtvHPeBujWZVRaTC4aOtMw E7J4RwJXOhthpgxSL2fmUzzq0BXtrGmSVtDD04oUpj/c9rYqGlCIhX12r mrqseWO1aqLpZiKiroeuRgcTWv5VIWqe6g4cOtKWDfF/YuwXe4o4ItmV8 XBeKR3UtZphXXyLIGRjtlKm2FzjIS6vOmDqS+cLjB5CeZu/YBGCfDd8VN PSwEwmTibs7QMjMzezzRuutC5gJ3VteADIz53L8E75pDfcH/uctg5hD2+ qhSWeADCSNs6+sgWh8OqKLzCaSf802CBjibNR0Reubw/VTLOPCGnRtp4R w==; X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="341252842" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="341252842" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:19:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="883263671" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="883263671" Received: from binbinwu-mobl.ccr.corp.intel.com ([10.249.170.159]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:56 -0700 From: Binbin Wu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com, David.Laight@ACULAB.COM, robert.hu@linux.intel.com, binbin.wu@linux.intel.com Subject: [PATCH v9 4/6] KVM: x86: Introduce untag_addr() in kvm_x86_ops Date: Tue, 6 Jun 2023 17:18:40 +0800 Message-Id: <20230606091842.13123-5-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606091842.13123-1-binbin.wu@linux.intel.com> References: <20230606091842.13123-1-binbin.wu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Introduce a new optional interface untag_addr() to kvm_x86_ops to untag the metadata from linear address. Implement LAM version in VMX. When enabled feature like Intel Linear Address Masking or AMD Upper Address Ignore, linear address may be tagged with metadata. Linear address should be checked for modified canonicality and untagged in instruction emulations or VMExit handlers if LAM or UAI is applicable. Introduce untag_addr() to kvm_x86_ops to hide the vendor specific code. Pass the 'flags' to avoid distinguishing processor vendor in common emulator path for the cases whose untag policies are different in the future. For VMX, LAM version is implemented. Signed-off-by: Binbin Wu Tested-by: Xuelian Guo Reviewed-by: Chao Gao --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/kvm_emulate.h | 1 + arch/x86/kvm/vmx/vmx.c | 73 ++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/vmx.h | 2 + 5 files changed, 79 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 13bc212cd4bc..c0cebe671d41 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -52,6 +52,7 @@ KVM_X86_OP(cache_reg) KVM_X86_OP(get_rflags) KVM_X86_OP(set_rflags) KVM_X86_OP(get_if_flag) +KVM_X86_OP_OPTIONAL(untag_addr) KVM_X86_OP(flush_tlb_all) KVM_X86_OP(flush_tlb_current) KVM_X86_OP_OPTIONAL(flush_remote_tlbs) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 46471dd9cc1b..62a72560fa65 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1588,6 +1588,8 @@ struct kvm_x86_ops { void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); bool (*get_if_flag)(struct kvm_vcpu *vcpu); =20 + void (*untag_addr)(struct kvm_vcpu *vcpu, gva_t *gva, u32 flags); + void (*flush_tlb_all)(struct kvm_vcpu *vcpu); void (*flush_tlb_current)(struct kvm_vcpu *vcpu); int (*flush_remote_tlbs)(struct kvm *kvm); diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index 5b9ec610b2cb..c2091e24a6b9 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -91,6 +91,7 @@ struct x86_instruction_info { /* x86-specific emulation flags */ #define X86EMUL_F_FETCH BIT(0) #define X86EMUL_F_WRITE BIT(1) +#define X86EMUL_F_SKIPLAM BIT(2) =20 struct x86_emulate_ops { void (*vm_bugged)(struct x86_emulate_ctxt *ctxt); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 52dcf3c00bb8..82a225d1000e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8133,6 +8133,77 @@ static void vmx_vm_destroy(struct kvm *kvm) free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm= )); } =20 +#define LAM_S57_EN_MASK (X86_CR4_LAM_SUP | X86_CR4_LA57) +static int lam_sign_extend_bit(struct kvm_vcpu *vcpu, gva_t addr) +{ + u64 cr3, cr4; + + /* + * The LAM identification of a pointer as user or supervisor is + * based solely on the value of pointer bit 63. + */ + if (!(addr >> 63)) { + cr3 =3D kvm_read_cr3(vcpu); + if (cr3 & X86_CR3_LAM_U57) + return 56; + if (cr3 & X86_CR3_LAM_U48) + return 47; + } else { + cr4 =3D kvm_read_cr4_bits(vcpu, LAM_S57_EN_MASK); + if (cr4 =3D=3D LAM_S57_EN_MASK) + return 56; + if (cr4 & X86_CR4_LAM_SUP) + return 47; + } + return -1; +} + +/* + * Only called in 64-bit mode. + * + * LAM has a modified canonical check when applicable: + * LAM_S48 : [ 1 ][ metadata ][ 1 ] + * 63 47 + * LAM_U48 : [ 0 ][ metadata ][ 0 ] + * 63 47 + * LAM_S57 : [ 1 ][ metadata ][ 1 ] + * 63 56 + * LAM_U57 + 5-lvl paging : [ 0 ][ metadata ][ 0 ] + * 63 56 + * LAM_U57 + 4-lvl paging : [ 0 ][ metadata ][ 0...0 ] + * 63 56..47 + * + * Untag the metadata bits by sign-extending the value of bit 47 (LAM48) or + * bit 56 (LAM57). The resulting address after untag isn't guaranteed to be + * canonical. Callers should perform the original canonical check and raise + * #GP/#SS if the address is non-canonical. + * + * Note that KVM masks the metadata in addresses, performs the (original) + * canonicality checking and then walks page table. This is slightly + * different from hardware behavior but achieves the same effect. + * Specifically, if LAM is enabled, the processor performs a modified + * canonicality checking where the metadata are ignored instead of + * masked. After the modified canonicality checking, the processor masks + * the metadata before passing addresses for paging translation. + */ +void vmx_untag_addr(struct kvm_vcpu *vcpu, gva_t *gva, u32 flags) +{ + int sign_ext_bit; + + /* + * Check LAM_U48 in cr3_ctrl_bits to avoid guest_cpuid_has(). + * If not set, vCPU doesn't supports LAM. + */ + if (!(vcpu->arch.cr3_ctrl_bits & X86_CR3_LAM_U48) || + (flags & X86EMUL_F_SKIPLAM) || WARN_ON_ONCE(!is_64_bit_mode(vcpu))) + return; + + sign_ext_bit =3D lam_sign_extend_bit(vcpu, *gva); + if (sign_ext_bit > 0) + *gva =3D (sign_extend64(*gva, sign_ext_bit) & ~BIT_ULL(63)) | + (*gva & BIT_ULL(63)); +} + static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .name =3D KBUILD_MODNAME, =20 @@ -8181,6 +8252,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .set_rflags =3D vmx_set_rflags, .get_if_flag =3D vmx_get_if_flag, =20 + .untag_addr =3D vmx_untag_addr, + .flush_tlb_all =3D vmx_flush_tlb_all, .flush_tlb_current =3D vmx_flush_tlb_current, .flush_tlb_gva =3D vmx_flush_tlb_gva, diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 9e66531861cf..c4bbd3024fa8 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -433,6 +433,8 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu= , u32 msr, int type); u64 vmx_get_l2_tsc_offset(struct kvm_vcpu *vcpu); u64 vmx_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu); =20 +void vmx_untag_addr(struct kvm_vcpu *vcpu, gva_t *gva, u32 flags); + static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 ms= r, int type, bool value) { --=20 2.25.1 From nobody Sat Feb 7 13:40:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B29C2C7EE2C for ; Tue, 6 Jun 2023 09:19:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236940AbjFFJTX (ORCPT ); Tue, 6 Jun 2023 05:19:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237099AbjFFJTK (ORCPT ); Tue, 6 Jun 2023 05:19:10 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F18A5109; Tue, 6 Jun 2023 02:19:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686043142; x=1717579142; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aY9vZ7OdAvetDBnIgQIM1XblN5ppn5oA9IdqqTFNP8k=; b=EN1XQAytffK1SHHGCPx3WAZ7ltespkp/Dl4zfsiTL7OIWLw3dgG9eYbM VOoPfQG75InzLaJjYdiGQ+Y36aefjBO5SyfHLG+ZxjwHwPZpEctGBgjrw ycnJOB/Xu1561jIsUrmChs3CocaL94lf3J5u0Bxj9Pp2uunyHp4MnLwHn BBsMJVe+eZUTXXgHepowiA8rxLmHLBZWY0vCdcTdrGtCUwKaLYQmWeGbl n87ci23bpne2dZEwrEqW9GvniE2GYfIQwAiCCG4/EXjG+n5uK6sfKPTzk 4PRAzLBi0crSquO5KM7CPBfvy9ddXCvo//+I0y/sI1wPxWodZ8O91998I Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="341252854" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="341252854" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:19:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="883263683" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="883263683" Received: from binbinwu-mobl.ccr.corp.intel.com ([10.249.170.159]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:18:59 -0700 From: Binbin Wu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com, David.Laight@ACULAB.COM, robert.hu@linux.intel.com, binbin.wu@linux.intel.com Subject: [PATCH v9 5/6] KVM: x86: Untag address when LAM applicable Date: Tue, 6 Jun 2023 17:18:41 +0800 Message-Id: <20230606091842.13123-6-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606091842.13123-1-binbin.wu@linux.intel.com> References: <20230606091842.13123-1-binbin.wu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Untag address for 64-bit memory/MMIO operand in instruction emulations and VMExit handlers when LAM is applicable. For instruction emulation, untag address in __linearize() before canonical check. LAM doesn't apply to addresses used for instruction fetches or to those that specify the targets of jump and call instructions, use X86EMUL_F_SKIPLAM to skip LAM untag. For VMExit handlers related to 64-bit linear address: - Cases need to untag address Operand(s) of VMX instructions and INVPCID. Operand(s) of SGX ENCLS. - Cases LAM doesn't apply to Operand of INVLPG. Linear address in INVPCID descriptor (no change needed). Linear address in INVVPID descriptor (it has been confirmed, although it = is not called out in LAM spec, no change needed). BASEADDR specified in SESC of ECREATE (no change needed). Note: LAM doesn't apply to the writes to control registers or MSRs. LAM masking applies before paging, so the faulting linear address in CR2 doesn't contain the metadata. The guest linear address saved in VMCS doesn't contain metadata. Co-developed-by: Robert Hoo Signed-off-by: Robert Hoo Signed-off-by: Binbin Wu Reviewed-by: Chao Gao Tested-by: Xuelian Guo --- arch/x86/kvm/emulate.c | 16 +++++++++++++--- arch/x86/kvm/kvm_emulate.h | 2 ++ arch/x86/kvm/vmx/nested.c | 2 ++ arch/x86/kvm/vmx/sgx.c | 1 + arch/x86/kvm/x86.c | 7 +++++++ 5 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e89afc39e56f..c135adb26f1e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -701,6 +701,7 @@ static __always_inline int __linearize(struct x86_emula= te_ctxt *ctxt, *max_size =3D 0; switch (mode) { case X86EMUL_MODE_PROT64: + ctxt->ops->untag_addr(ctxt, &la, flags); *linear =3D la; va_bits =3D ctxt_virt_addr_bits(ctxt); if (!__is_canonical_address(la, va_bits)) @@ -771,8 +772,12 @@ static inline int assign_eip(struct x86_emulate_ctxt *= ctxt, ulong dst) =20 if (ctxt->op_bytes !=3D sizeof(unsigned long)) addr.ea =3D dst & ((1UL << (ctxt->op_bytes << 3)) - 1); + /* + * LAM doesn't apply to addresses that specify the targets of jump and + * call instructions. + */ rc =3D __linearize(ctxt, addr, &max_size, 1, ctxt->mode, &linear, - X86EMUL_F_FETCH); + X86EMUL_F_FETCH | X86EMUL_F_SKIPLAM); if (rc =3D=3D X86EMUL_CONTINUE) ctxt->_eip =3D addr.ea; return rc; @@ -907,9 +912,11 @@ static int __do_insn_fetch_bytes(struct x86_emulate_ct= xt *ctxt, int op_size) * __linearize is called with size 0 so that it does not do any * boundary check itself. Instead, we use max_size to check * against op_size. + * + * LAM doesn't apply to addresses used for instruction fetches. */ rc =3D __linearize(ctxt, addr, &max_size, 0, ctxt->mode, &linear, - X86EMUL_F_FETCH); + X86EMUL_F_FETCH | X86EMUL_F_SKIPLAM); if (unlikely(rc !=3D X86EMUL_CONTINUE)) return rc; =20 @@ -3442,8 +3449,11 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt) { int rc; ulong linear; + unsigned max_size; =20 - rc =3D linearize(ctxt, ctxt->src.addr.mem, 1, false, &linear); + /* LAM doesn't apply to invlpg */ + rc =3D __linearize(ctxt, ctxt->src.addr.mem, &max_size, 1, ctxt->mode, + &linear, X86EMUL_F_SKIPLAM); if (rc =3D=3D X86EMUL_CONTINUE) ctxt->ops->invlpg(ctxt, linear); /* Disable writeback. */ diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h index c2091e24a6b9..3875ab175cd2 100644 --- a/arch/x86/kvm/kvm_emulate.h +++ b/arch/x86/kvm/kvm_emulate.h @@ -230,6 +230,8 @@ struct x86_emulate_ops { int (*leave_smm)(struct x86_emulate_ctxt *ctxt); void (*triple_fault)(struct x86_emulate_ctxt *ctxt); int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr); + + void (*untag_addr)(struct x86_emulate_ctxt *ctxt, gva_t *addr, u32 flags); }; =20 /* Type, address-of, and value of an instruction's operand. */ diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 11b12a75ca91..6c8dab9999f2 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4981,6 +4981,7 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsign= ed long exit_qualification, else *ret =3D off; =20 + vmx_untag_addr(vcpu, ret, 0); /* Long mode: #GP(0)/#SS(0) if the memory address is in a * non-canonical form. This is the only check on the memory * destination for long mode! @@ -5798,6 +5799,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) vpid02 =3D nested_get_vpid02(vcpu); switch (type) { case VMX_VPID_EXTENT_INDIVIDUAL_ADDR: + /* LAM doesn't apply to the address in descriptor of invvpid */ if (!operand.vpid || is_noncanonical_address(operand.gla, vcpu)) return nested_vmx_fail(vcpu, diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c index 2261b684a7d4..b4faa94bace7 100644 --- a/arch/x86/kvm/vmx/sgx.c +++ b/arch/x86/kvm/vmx/sgx.c @@ -37,6 +37,7 @@ static int sgx_get_encls_gva(struct kvm_vcpu *vcpu, unsig= ned long offset, if (!IS_ALIGNED(*gva, alignment)) { fault =3D true; } else if (likely(is_64_bit_mode(vcpu))) { + vmx_untag_addr(vcpu, gva, 0); fault =3D is_noncanonical_address(*gva, vcpu); } else { *gva &=3D 0xffffffff; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 709fc920f378..ed2dca55573b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8296,6 +8296,11 @@ static void emulator_vm_bugged(struct x86_emulate_ct= xt *ctxt) kvm_vm_bugged(kvm); } =20 +static void emulator_untag_addr(struct x86_emulate_ctxt *ctxt, gva_t *addr= , u32 flags) +{ + static_call(kvm_x86_untag_addr)(emul_to_vcpu(ctxt), addr, flags); +} + static const struct x86_emulate_ops emulate_ops =3D { .vm_bugged =3D emulator_vm_bugged, .read_gpr =3D emulator_read_gpr, @@ -8341,6 +8346,7 @@ static const struct x86_emulate_ops emulate_ops =3D { .leave_smm =3D emulator_leave_smm, .triple_fault =3D emulator_triple_fault, .set_xcr =3D emulator_set_xcr, + .untag_addr =3D emulator_untag_addr, }; =20 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) @@ -13367,6 +13373,7 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsig= ned long type, gva_t gva) =20 switch (type) { case INVPCID_TYPE_INDIV_ADDR: + /* LAM doesn't apply to the address in descriptor of invpcid */ if ((!pcid_enabled && (operand.pcid !=3D 0)) || is_noncanonical_address(operand.gla, vcpu)) { kvm_inject_gp(vcpu, 0); --=20 2.25.1 From nobody Sat Feb 7 13:40:13 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55B48C7EE24 for ; Tue, 6 Jun 2023 09:19:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237424AbjFFJT2 (ORCPT ); Tue, 6 Jun 2023 05:19:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237367AbjFFJTL (ORCPT ); Tue, 6 Jun 2023 05:19:11 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E940610D8; Tue, 6 Jun 2023 02:19:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686043145; x=1717579145; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Tljvh4yk2BfW48Qd4w7WVwvTLp8P4EvXXQwqfZqkoXQ=; b=WciaHB8WkaJgELMVju9YZvFq/zc7M+dFhgQbrfIlL9hHW1KLs3sttRsr uu1VfKUUr4sV2eBx85gsM9ihtGySp0FLB3ajvyPW77mzhbcdmhFLPgza7 Ob00ouKgT+0WIZiZ+zwWzQD/tINBEVKYNKFTQAxjsWtDMpWcYQKMM0TNK gY+bFZKxJHbhflT0wBfmwSNo6eQdkYIyBZ1xv9ip4UA+BayhsLfUbBjZe 10D5oouS91t6OrKok3JTbCQw0vl+dZDnHZYYFWSgbrd8E6MFqk6gPIyjC RdJCIeWxp66M9Fqr4P01TLQIHx+S5uiIZcMj73oE5AO01+N/wQPbfIRoR g==; X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="341252873" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="341252873" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:19:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="883263694" X-IronPort-AV: E=Sophos;i="6.00,219,1681196400"; d="scan'208";a="883263694" Received: from binbinwu-mobl.ccr.corp.intel.com ([10.249.170.159]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jun 2023 02:19:01 -0700 From: Binbin Wu To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, chao.gao@intel.com, kai.huang@intel.com, David.Laight@ACULAB.COM, robert.hu@linux.intel.com, binbin.wu@linux.intel.com Subject: [PATCH v9 6/6] KVM: x86: Expose LAM feature to userspace VMM Date: Tue, 6 Jun 2023 17:18:42 +0800 Message-Id: <20230606091842.13123-7-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230606091842.13123-1-binbin.wu@linux.intel.com> References: <20230606091842.13123-1-binbin.wu@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Robert Hoo LAM feature is enumerated by CPUID.7.1:EAX.LAM[bit 26]. Expose the feature to userspace as the final step after the following supports: - CR4.LAM_SUP virtualization - CR3.LAM_U48 and CR3.LAM_U57 virtualization - Check and untag 64-bit linear address when LAM applies in instruction emulations and VMExit handlers. LAM support in SGX enclave mode needs additional enabling and is not included in this patch series. Signed-off-by: Robert Hoo Signed-off-by: Binbin Wu Reviewed-by: Jingqi Liu Reviewed-by: Chao Gao Tested-by: Xuelian Guo Reviewed-by: Kai Huang --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 241f554f1764..166243fb5705 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -643,7 +643,7 @@ void kvm_set_cpu_caps(void) kvm_cpu_cap_mask(CPUID_7_1_EAX, F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) | F(FZRM) | F(FSRS) | F(FSRC) | - F(AMX_FP16) | F(AVX_IFMA) + F(AMX_FP16) | F(AVX_IFMA) | F(LAM) ); =20 kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX, --=20 2.25.1