From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21778C433EF for ; Fri, 25 Feb 2022 08:53:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238734AbiBYIx6 (ORCPT ); Fri, 25 Feb 2022 03:53:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234642AbiBYIxt (ORCPT ); Fri, 25 Feb 2022 03:53:49 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 164C821EB90; Fri, 25 Feb 2022 00:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779198; x=1677315198; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=FhrN15/8PwnponpH69R70tbVnJZAuSjDUdMncdqEgmk=; b=bMdi2+aj9dEEBrxKwvzrInGi+IoIp8olCW7n8X7ZGDvB+TB9QizZbWnD lIOnEc2DTAcJEPk/NIkoeDc9VIAgRA4EtYAiIeYTEMSwyqC5ChHnDMaKv PydgBvVk5CVuALFyOpwFOw1TvmjWmTMymSFSBXWGktG32cbCPYjWXxcGf XPq1UYxOoehqyKjAZX8wv1I+w73sIX4z9nY0BKMOR4MC8Qiej9Lj8+8fm f/eHzrcqbQQjNWbUMlEafcniB3TmD+EevnxyQfbcsEnDJC3XwL0MpPpqr uUsdnmxzdgM3SxOXYenYM3aCDiBv0Z2kfKnhAmdcz+041O9sjWWEd7P0/ A==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="250037326" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="250037326" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:17 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186415" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:06 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v6 1/9] x86/cpu: Add new VMX feature, Tertiary VM-Execution control Date: Fri, 25 Feb 2022 16:22:15 +0800 Message-Id: <20220225082223.18288-2-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo A new 64-bit control field "tertiary processor-based VM-execution controls", is defined [1]. It's controlled by bit 17 of the primary processor-based VM-execution controls. Different from its brother VM-execution fields, this tertiary VM- execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs, TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH. Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3 (0x492), is also semantically different from its brothers, whose 64 bits consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2= ]. Therefore, its init_vmx_capabilities() is a little different from others. [1] ISE 6.2 "VMCS Changes" https://www.intel.com/content/www/us/en/develop/download/intel-architecture= -instruction-set-extensions-programming-reference.html [2] SDM Vol3. Appendix A.3 Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang Reviewed-by: Sean Christopherson Reviewed-by: Maxim Levitsky --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/vmxfeatures.h | 3 ++- arch/x86/kernel/cpu/feat_ctl.c | 9 ++++++++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 3faf0f97edb1..1d180f883c32 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -938,6 +938,7 @@ #define MSR_IA32_VMX_TRUE_EXIT_CTLS 0x0000048f #define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x00000490 #define MSR_IA32_VMX_VMFUNC 0x00000491 +#define MSR_IA32_VMX_PROCBASED_CTLS3 0x00000492 =20 /* VMX_BASIC bits and bitmasks */ #define VMX_BASIC_VMCS_SIZE_SHIFT 32 diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxf= eatures.h index d9a74681a77d..ff20776dc83b 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -5,7 +5,7 @@ /* * Defines VMX CPU feature bits */ -#define NVMXINTS 3 /* N 32-bit words worth of info */ +#define NVMXINTS 5 /* N 32-bit words worth of info */ =20 /* * Note: If the comment begins with a quoted string, that string is used @@ -43,6 +43,7 @@ #define VMX_FEATURE_RDTSC_EXITING ( 1*32+ 12) /* "" VM-Exit on RDTSC */ #define VMX_FEATURE_CR3_LOAD_EXITING ( 1*32+ 15) /* "" VM-Exit on writes t= o CR3 */ #define VMX_FEATURE_CR3_STORE_EXITING ( 1*32+ 16) /* "" VM-Exit on reads f= rom CR3 */ +#define VMX_FEATURE_TERTIARY_CONTROLS ( 1*32+ 17) /* "" Enable Tertiary VM= -Execution Controls */ #define VMX_FEATURE_CR8_LOAD_EXITING ( 1*32+ 19) /* "" VM-Exit on writes t= o CR8 */ #define VMX_FEATURE_CR8_STORE_EXITING ( 1*32+ 20) /* "" VM-Exit on reads f= rom CR8 */ #define VMX_FEATURE_VIRTUAL_TPR ( 1*32+ 21) /* "vtpr" TPR virtualization,= a.k.a. TPR shadow */ diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index da696eb4821a..993697e71854 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -15,6 +15,8 @@ enum vmx_feature_leafs { MISC_FEATURES =3D 0, PRIMARY_CTLS, SECONDARY_CTLS, + TERTIARY_CTLS_LOW, + TERTIARY_CTLS_HIGH, NR_VMX_FEATURE_WORDS, }; =20 @@ -22,7 +24,7 @@ enum vmx_feature_leafs { =20 static void init_vmx_capabilities(struct cpuinfo_x86 *c) { - u32 supported, funcs, ept, vpid, ign; + u32 supported, funcs, ept, vpid, ign, low, high; =20 BUILD_BUG_ON(NVMXINTS !=3D NR_VMX_FEATURE_WORDS); =20 @@ -42,6 +44,11 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c) rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported); c->vmx_capability[SECONDARY_CTLS] =3D supported; =20 + /* All 64 bits of tertiary controls MSR are allowed-1 settings. */ + rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &low, &high); + c->vmx_capability[TERTIARY_CTLS_LOW] =3D low; + c->vmx_capability[TERTIARY_CTLS_HIGH] =3D high; + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported); rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs); =20 --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38371C433F5 for ; Fri, 25 Feb 2022 08:53:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238740AbiBYIyB (ORCPT ); Fri, 25 Feb 2022 03:54:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235283AbiBYIxv (ORCPT ); Fri, 25 Feb 2022 03:53:51 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ECF1621BC6D; Fri, 25 Feb 2022 00:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779198; x=1677315198; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9vxkWcshmsYGm/OC7DWmGR3u/fniLJbXHpffUlprtmI=; b=FXKIuOn20nZrpUiFv6em0wBGaWY4oLJkYDNNWgVh/SePuJGmRPrrasEr PfCe1ka2McLZUnwZDEH1Q/JTOaaDc3TCCor9H3FtRu8cx1PFxk4u7JKHS meMMUo0FuYDeonW43gKqWeFtw1vFP8efUk/EV1+8uI4kGGjV89mvYTuAl FTG/ESaOruC5KqzSC/hGGod8z0Iy/mWm13gfhBzetaUMbkb6Fs77oSXS7 8XMgZD/xaZQDA4tlRXjppITfzv+qT2OcYO2NSTCTG1ReWIdatpXBrgpu9 lEpP3OTbV2AgtED5Ni4WTFX/HAtevvjR/zBuluzbW4uSjylECnzRkRn7m g==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="250037334" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="250037334" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:18 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186426" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:12 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v6 2/9] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Date: Fri, 25 Feb 2022 16:22:16 +0800 Message-Id: <20220225082223.18288-3-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo The Tertiary VM-Exec Control, different from previous control fields, is 64 bit. So extend BUILD_CONTROLS_SHADOW() by adding a 'bit' parameter, to support both 32 bit and 64 bit fields' auxiliary functions building. Suggested-by: Sean Christopherson Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang Reviewed-by: Maxim Levitsky --- arch/x86/kvm/vmx/vmx.h | 59 ++++++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 7f2c82e7f38f..e07c76974fb0 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -456,35 +456,38 @@ static inline u8 vmx_get_rvi(void) return vmcs_read16(GUEST_INTR_STATUS) & 0xff; } =20 -#define BUILD_CONTROLS_SHADOW(lname, uname) \ -static inline void lname##_controls_set(struct vcpu_vmx *vmx, u32 val) = \ -{ \ - if (vmx->loaded_vmcs->controls_shadow.lname !=3D val) { \ - vmcs_write32(uname, val); \ - vmx->loaded_vmcs->controls_shadow.lname =3D val; \ - } \ -} \ -static inline u32 __##lname##_controls_get(struct loaded_vmcs *vmcs) \ -{ \ - return vmcs->controls_shadow.lname; \ -} \ -static inline u32 lname##_controls_get(struct vcpu_vmx *vmx) \ -{ \ - return __##lname##_controls_get(vmx->loaded_vmcs); \ -} \ -static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u32 val) = \ -{ \ - lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \ -} \ -static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u32 val= ) \ -{ \ - lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ +#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \ +static inline \ +void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + if (vmx->loaded_vmcs->controls_shadow.lname !=3D val) { \ + vmcs_write##bits(uname, val); \ + vmx->loaded_vmcs->controls_shadow.lname =3D val; \ + } \ +} \ +static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs)\ +{ \ + return vmcs->controls_shadow.lname; \ +} \ +static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx) \ +{ \ + return __##lname##_controls_get(vmx->loaded_vmcs); \ +} \ +static inline \ +void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \ +} \ +static inline \ +void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ } -BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS) -BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS) -BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL) -BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL) -BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL) +BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32) +BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) +BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32) =20 /* * VMX_REGS_LAZY_LOAD_SET - The set of registers that will be updated in t= he --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26712C433FE for ; Fri, 25 Feb 2022 08:53:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238755AbiBYIyE (ORCPT ); Fri, 25 Feb 2022 03:54:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238729AbiBYIx4 (ORCPT ); Fri, 25 Feb 2022 03:53:56 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 502C3222193; Fri, 25 Feb 2022 00:53:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779205; x=1677315205; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=yTY896h9n4CpiCbCpKBJ0GaKczA+xxx5GcZrobDyGbg=; b=FJP0i/VAsWOeIXYJWs7ie3igpuYyWOVHfYIbaXRun55/i172xQ13wwUc vUDs1H3hYa+E9/3kMlUxB8UYm18PVNoHOVHVCKC2t4OqappwaBas/Plqa 2grBEtOC9Z5wKq3dHnwGqzOfshVHLVrhBcqAPq9XvamzpIArevNSSh6dB 0G/ZI43YUV6MJDTwRxh7thlDj2ssSUhQsOePpPdCDu1YSN+yXrt7g0dvG KoN/BYqMYWQUiZgjNJNWJTrWxHhfEtOJBIsU59UKJGdclQfBz2dsjZZyd AnGEonYtBFsmrm7QHWM6SmPru7BcmOcxLn0k+tbuJ97jELxHvZ5CPDx09 Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="252186060" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="252186060" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:25 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186456" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:18 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v6 3/9] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Date: Fri, 25 Feb 2022 16:22:17 +0800 Message-Id: <20220225082223.18288-4-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo Check VMX features on tertiary execution control in VMCS config setup. Sub-features in tertiary execution control to be enabled are adjusted according to hardware capabilities although no sub-feature is enabled in this patch. EVMCSv1 doesn't support tertiary VM-execution control, so disable it when EVMCSv1 is in use. And define the auxiliary functions for Tertiary control field here, using the new BUILD_CONTROLS_SHADOW(). Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang Reviewed-by: Maxim Levitsky --- arch/x86/include/asm/vmx.h | 3 +++ arch/x86/kvm/vmx/capabilities.h | 7 ++++++ arch/x86/kvm/vmx/evmcs.c | 2 ++ arch/x86/kvm/vmx/evmcs.h | 1 + arch/x86/kvm/vmx/vmcs.h | 1 + arch/x86/kvm/vmx/vmx.c | 38 ++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 1 + 7 files changed, 52 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0ffaa3156a4e..8c929596a299 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -31,6 +31,7 @@ #define CPU_BASED_RDTSC_EXITING VMCS_CONTROL_BIT(RDTSC_EXI= TING) #define CPU_BASED_CR3_LOAD_EXITING VMCS_CONTROL_BIT(CR3_LOAD_EXITING) #define CPU_BASED_CR3_STORE_EXITING VMCS_CONTROL_BIT(CR3_STORE_EXITING) +#define CPU_BASED_ACTIVATE_TERTIARY_CONTROLS VMCS_CONTROL_BIT(TERTIARY_CON= TROLS) #define CPU_BASED_CR8_LOAD_EXITING VMCS_CONTROL_BIT(CR8_LOAD_= EXITING) #define CPU_BASED_CR8_STORE_EXITING VMCS_CONTROL_BIT(CR8_STORE= _EXITING) #define CPU_BASED_TPR_SHADOW VMCS_CONTROL_BIT(VIRTUAL_T= PR) @@ -221,6 +222,8 @@ enum vmcs_field { ENCLS_EXITING_BITMAP_HIGH =3D 0x0000202F, TSC_MULTIPLIER =3D 0x00002032, TSC_MULTIPLIER_HIGH =3D 0x00002033, + TERTIARY_VM_EXEC_CONTROL =3D 0x00002034, + TERTIARY_VM_EXEC_CONTROL_HIGH =3D 0x00002035, GUEST_PHYSICAL_ADDRESS =3D 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH =3D 0x00002401, VMCS_LINK_POINTER =3D 0x00002800, diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 3f430e218375..31f3d88b3e4d 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -59,6 +59,7 @@ struct vmcs_config { u32 pin_based_exec_ctrl; u32 cpu_based_exec_ctrl; u32 cpu_based_2nd_exec_ctrl; + u64 cpu_based_3rd_exec_ctrl; u32 vmexit_ctrl; u32 vmentry_ctrl; struct nested_vmx_msrs nested; @@ -131,6 +132,12 @@ static inline bool cpu_has_secondary_exec_ctrls(void) CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; } =20 +static inline bool cpu_has_tertiary_exec_ctrls(void) +{ + return vmcs_config.cpu_based_exec_ctrl & + CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; +} + static inline bool cpu_has_vmx_virtualize_apic_accesses(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c index 87e3dc10edf4..6a61b1ae7942 100644 --- a/arch/x86/kvm/vmx/evmcs.c +++ b/arch/x86/kvm/vmx/evmcs.c @@ -297,8 +297,10 @@ const unsigned int nr_evmcs_1_fields =3D ARRAY_SIZE(vm= cs_field_to_evmcs_1); #if IS_ENABLED(CONFIG_HYPERV) __init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) { + vmcs_conf->cpu_based_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_EXEC_CTRL; vmcs_conf->pin_based_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_PINCTRL; vmcs_conf->cpu_based_2nd_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_2NDEXEC; + vmcs_conf->cpu_based_3rd_exec_ctrl =3D 0; =20 vmcs_conf->vmexit_ctrl &=3D ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL; vmcs_conf->vmentry_ctrl &=3D ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL; diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h index 8d70f9aea94b..f886a8ff0342 100644 --- a/arch/x86/kvm/vmx/evmcs.h +++ b/arch/x86/kvm/vmx/evmcs.h @@ -50,6 +50,7 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs); */ #define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \ PIN_BASED_VMX_PREEMPTION_TIMER) +#define EVMCS1_UNSUPPORTED_EXEC_CTRL (CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) #define EVMCS1_UNSUPPORTED_2NDEXEC \ (SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | \ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | \ diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index e325c290a816..e18dc68eeeeb 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -50,6 +50,7 @@ struct vmcs_controls_shadow { u32 pin; u32 exec; u32 secondary_exec; + u64 tertiary_exec; }; =20 /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c569dc2b9192..8a5713d49635 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2422,6 +2422,21 @@ static __init int adjust_vmx_controls(u32 ctl_min, u= 32 ctl_opt, return 0; } =20 +static __init int adjust_vmx_controls_64(u64 ctl_min, u64 ctl_opt, + u32 msr, u64 *result) +{ + u64 allowed1; + + rdmsrl(msr, allowed1); + + /* Ensure minimum (required) set of control bits are supported. */ + if (ctl_min & ~allowed1) + return -EIO; + + *result =3D (ctl_min | ctl_opt) & allowed1; + return 0; +} + static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, struct vmx_capability *vmx_cap) { @@ -2430,6 +2445,7 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, u32 _pin_based_exec_control =3D 0; u32 _cpu_based_exec_control =3D 0; u32 _cpu_based_2nd_exec_control =3D 0; + u64 _cpu_based_3rd_exec_control =3D 0; u32 _vmexit_control =3D 0; u32 _vmentry_control =3D 0; =20 @@ -2451,7 +2467,8 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, =20 opt =3D CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | - CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | + CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS, &_cpu_based_exec_control) < 0) return -EIO; @@ -2525,6 +2542,16 @@ static __init int setup_vmcs_config(struct vmcs_conf= ig *vmcs_conf, "1-setting enable VPID VM-execution control\n"); } =20 + if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) { + u64 opt3 =3D 0; + u64 min3 =3D 0; + + if (adjust_vmx_controls_64(min3, opt3, + MSR_IA32_VMX_PROCBASED_CTLS3, + &_cpu_based_3rd_exec_control)) + return -EIO; + } + min =3D VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT; #ifdef CONFIG_X86_64 min |=3D VM_EXIT_HOST_ADDR_SPACE_SIZE; @@ -2611,6 +2638,7 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, vmcs_conf->pin_based_exec_ctrl =3D _pin_based_exec_control; vmcs_conf->cpu_based_exec_ctrl =3D _cpu_based_exec_control; vmcs_conf->cpu_based_2nd_exec_ctrl =3D _cpu_based_2nd_exec_control; + vmcs_conf->cpu_based_3rd_exec_ctrl =3D _cpu_based_3rd_exec_control; vmcs_conf->vmexit_ctrl =3D _vmexit_control; vmcs_conf->vmentry_ctrl =3D _vmentry_control; =20 @@ -4230,6 +4258,11 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) return exec_control; } =20 +static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) +{ + return vmcs_config.cpu_based_3rd_exec_ctrl; +} + /* * Adjust a single secondary execution control bit to intercept/allow an * instruction in the guest. This is usually done based on whether or not= a @@ -4395,6 +4428,9 @@ static void init_vmcs(struct vcpu_vmx *vmx) if (cpu_has_secondary_exec_ctrls()) secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx)); =20 + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx)); + if (kvm_vcpu_apicv_active(&vmx->vcpu)) { vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index e07c76974fb0..d4a647d3ed4a 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -488,6 +488,7 @@ BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32) BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32) BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64) =20 /* * VMX_REGS_LAZY_LOAD_SET - The set of registers that will be updated in t= he --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0202C433F5 for ; Fri, 25 Feb 2022 08:53:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237545AbiBYIyL (ORCPT ); Fri, 25 Feb 2022 03:54:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54546 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238758AbiBYIyF (ORCPT ); Fri, 25 Feb 2022 03:54:05 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10DE7222187; Fri, 25 Feb 2022 00:53:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779214; x=1677315214; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=6nN+aBuikGWyJ9R1HQeB9Y3n4h+iAnnSA4KQ/7gg/yc=; b=EfowDODoAVx6dNlhcUWBJUWMtw8DRP7hcH4hMUzg0Bl4mSxDaHGAv7ya K9VZRl4MFSnzX+6c5COdcwyeDMSN6OfwtCek+KdGefrSEWsbYv1DkdX6e JfsIvX5QG4PBYOHO+wG8OaJeiqN7a76mQsMvQlpp45Gy11qcINEJro/Ut 837Oe84wKji1uNF4oUCdQYpwsSwbp6AH3VT3+Jpqe0vXBqPMdeCrzvG7D ZweKk1zlP3XkmMMIr4pU6p62OGPM6DaVK+ss7Rws08DHl0kguPuixtTx/ aQLwySL5q6r1wjE1kClX+FLwj6WQa9+FVLX9RTcq9+UqFvEXwOzZ3VGcY g==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="277090064" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="277090064" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:33 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186493" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:24 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v6 4/9] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Date: Fri, 25 Feb 2022 16:22:18 +0800 Message-Id: <20220225082223.18288-5-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo Add tertiary_exec_control field report in dump_vmcs() Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang Reviewed-by: Maxim Levitsky --- arch/x86/kvm/vmx/vmx.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 8a5713d49635..7beba7a9f247 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5891,6 +5891,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx =3D to_vmx(vcpu); u32 vmentry_ctl, vmexit_ctl; u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control; + u64 tertiary_exec_control; unsigned long cr4; int efer_slot; =20 @@ -5904,9 +5905,16 @@ void dump_vmcs(struct kvm_vcpu *vcpu) cpu_based_exec_ctrl =3D vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); pin_based_exec_ctrl =3D vmcs_read32(PIN_BASED_VM_EXEC_CONTROL); cr4 =3D vmcs_readl(GUEST_CR4); - secondary_exec_control =3D 0; + if (cpu_has_secondary_exec_ctrls()) secondary_exec_control =3D vmcs_read32(SECONDARY_VM_EXEC_CONTROL); + else + secondary_exec_control =3D 0; + + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_control =3D vmcs_read64(TERTIARY_VM_EXEC_CONTROL); + else + tertiary_exec_control =3D 0; =20 pr_err("VMCS %p, last attempted VM-entry on CPU %d\n", vmx->loaded_vmcs->vmcs, vcpu->arch.last_vmentry_cpu); @@ -6006,9 +6014,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); =20 pr_err("*** Control State ***\n"); - pr_err("PinBased=3D%08x CPUBased=3D%08x SecondaryExec=3D%08x\n", - pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control); - pr_err("EntryControls=3D%08x ExitControls=3D%08x\n", vmentry_ctl, vmexit_= ctl); + pr_err("CPUBased=3D0x%08x SecondaryExec=3D0x%08x TertiaryExec=3D0x%016llx= \n", + cpu_based_exec_ctrl, secondary_exec_control, tertiary_exec_control= ); + pr_err("PinBased=3D0x%08x EntryControls=3D%08x ExitControls=3D%08x\n", + pin_based_exec_ctrl, vmentry_ctl, vmexit_ctl); pr_err("ExceptionBitmap=3D%08x PFECmask=3D%08x PFECmatch=3D%08x\n", vmcs_read32(EXCEPTION_BITMAP), vmcs_read32(PAGE_FAULT_ERROR_CODE_MASK), --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1868CC433EF for ; Fri, 25 Feb 2022 08:53:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238777AbiBYIyO (ORCPT ); Fri, 25 Feb 2022 03:54:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238770AbiBYIyI (ORCPT ); Fri, 25 Feb 2022 03:54:08 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D2922255BC; Fri, 25 Feb 2022 00:53:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779216; x=1677315216; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=aAScniTQ6hnuqdjEfXh/mJB2BVkSgcocqcMrF3sJiqY=; b=bdJGMu/u+y48mUpyesbwHsvLx9TKoFTD9YdKkET/bhixqsbzYlVZDhZu rY6vF+M0fxrx78Xd3FwJsucwOvDdyhTLwm1Nd3lfH/k30Vt5LS+OdMoYP RHjvkPFxUoZOiScR/I37AhMyrVIAa9Wil6hpMj1PSbX+BCLere7MCoRLc SHLdu+TXyarKYHmc2w4E6v+4bj6O+vveX/RU7nfVEmtj8yhWnXJ/7jzdH SotZDypo1fPH7GkFLpRNKfv0tvkugzF9hLWKN/RzstSP7zeTdu+VmIaAE qQ0X1JkN7Ng2RbMN+IKurfLlNIpUbuSrwF/mOjVV3YxJ6j5Mh35b3GA8M Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="277090079" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="277090079" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:36 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186527" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:30 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v6 5/9] KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode Date: Fri, 25 Feb 2022 16:22:19 +0800 Message-Id: <20220225082223.18288-6-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR, i.e. will trap and generate an APIC-write VM-Exit instead of intercepting the WRMSR. Add support for handling "nodecode" x2APIC writes, which were previously impossible. Note, x2APIC MSR writes are 64 bits wide. Signed-off-by: Zeng Guang --- arch/x86/kvm/lapic.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 629c116b0d3e..e4bcdab1fac0 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -67,6 +67,7 @@ static bool lapic_timer_advance_dynamic __read_mostly; #define LAPIC_TIMER_ADVANCE_NS_MAX 5000 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 +static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data); =20 static inline void __kvm_lapic_set_reg(char *regs, int reg_off, u32 val) { @@ -2227,10 +2228,28 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); /* emulate APIC access in a trap manner */ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset) { - u32 val =3D kvm_lapic_get_reg(vcpu->arch.apic, offset); + struct kvm_lapic *apic =3D vcpu->arch.apic; + u64 val; + + if (apic_x2apic_mode(apic)) { + /* + * When guest APIC is in x2APIC mode and IPI virtualization + * is enabled, accessing APIC_ICR may cause trap-like VM-exit + * on Intel hardware. Other offsets are not possible. + */ + if (WARN_ON_ONCE(offset !=3D APIC_ICR)) + return; =20 - /* TODO: optimize to just emulate side effect w/o one more write */ - kvm_lapic_reg_write(vcpu->arch.apic, offset, val); + kvm_lapic_msr_read(apic, offset, &val); + if (val & APIC_ICR_BUSY) + kvm_x2apic_icr_write(apic, val); + else + kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32)); + } else { + val =3D kvm_lapic_get_reg(apic, offset); + /* TODO: optimize to just emulate side effect w/o one more write */ + kvm_lapic_reg_write(apic, offset, (u32)val); + } } EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode); =20 --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91231C433EF for ; Fri, 25 Feb 2022 08:54:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237978AbiBYIyj (ORCPT ); Fri, 25 Feb 2022 03:54:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238812AbiBYIy1 (ORCPT ); Fri, 25 Feb 2022 03:54:27 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2191B22B97C; Fri, 25 Feb 2022 00:53:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779230; x=1677315230; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=QMLp8NmC2YlCPDw6nrDcudbrHfqDQnsY4QCUiVPXsdQ=; b=jxLNK1mqIAusMlApkJx+D5IzvA/x1afkU2BgmN0oIyZYqNcFoVh1o5m9 vOpXErfhT9I2z027slixYRYTXLvRPJWLs7WFdNRVrmGILMJouD3VK7Qk9 GC1NJ9Ath6G+4U9E7OJrCr+Jrovd2hcBqiPM+1XeGwtMoK4U9Pz14Cgkb t6dMNIlpK/hdh35Hsxi4YwaSV2WyuJdOEuANzfl6YPcvcpsrVwCb4cKPp d4AK4wSI0/CW5LzhoWQbS/91Jwo09kt812qIApnj0npsqLaIjQEA4ybNH r300gLUDhTLjAVGgtNpPfXPdXCRE26zlgvHmARLUZKmdRVQcTtQ2ZiudX g==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="232429507" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="232429507" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:42 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186556" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:36 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Maxim Levitsky Subject: [PATCH v6 6/9] KVM: x86: lapic: don't allow to change APIC ID unconditionally Date: Fri, 25 Feb 2022 16:22:20 +0800 Message-Id: <20220225082223.18288-7-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Maxim Levitsky No normal guest has any reason to change physical APIC IDs, and allowing this introduces bugs into APIC acceleration code. And Intel recent hardware just ignores writes to APIC_ID in xAPIC mode. More background can be found at: https://lore.kernel.org/lkml/Yfw5ddGNOnDqxMLs@google.com/ Looks there is no much value to support writable xAPIC ID in guest except supporting some old and crazy use cases which probably would fail on real hardware. So, make xAPIC ID read-only for KVM guests. Signed-off-by: Maxim Levitsky Signed-off-by: Zeng Guang --- arch/x86/kvm/lapic.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e4bcdab1fac0..b38288c8a94f 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2044,10 +2044,17 @@ static int kvm_lapic_reg_write(struct kvm_lapic *ap= ic, u32 reg, u32 val) =20 switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) - kvm_apic_set_xapic_id(apic, val >> 24); - else + if (apic_x2apic_mode(apic)) { ret =3D 1; + break; + } + /* Don't allow changing APIC ID to avoid unexpected issues */ + if ((val >> 24) !=3D apic->vcpu->vcpu_id) { + kvm_vm_bugged(apic->vcpu->kvm); + break; + } + + kvm_apic_set_xapic_id(apic, val >> 24); break; =20 case APIC_TASKPRI: @@ -2631,11 +2638,15 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s, bool set) { - if (apic_x2apic_mode(vcpu->arch.apic)) { - u32 *id =3D (u32 *)(s->regs + APIC_ID); - u32 *ldr =3D (u32 *)(s->regs + APIC_LDR); - u64 icr; + u32 *id =3D (u32 *)(s->regs + APIC_ID); + u32 *ldr =3D (u32 *)(s->regs + APIC_LDR); + u64 icr; =20 + if (!apic_x2apic_mode(vcpu->arch.apic)) { + /* Don't allow changing APIC ID to avoid unexpected issues */ + if ((*id >> 24) !=3D vcpu->vcpu_id) + return -EINVAL; + } else { if (vcpu->kvm->arch.x2apic_format) { if (*id !=3D vcpu->vcpu_id) return -EINVAL; --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 759F7C433EF for ; Fri, 25 Feb 2022 08:54:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238701AbiBYIyd (ORCPT ); Fri, 25 Feb 2022 03:54:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238782AbiBYIy0 (ORCPT ); Fri, 25 Feb 2022 03:54:26 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D937322B94A; Fri, 25 Feb 2022 00:53:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779228; x=1677315228; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=dvbJIK0QMNlrOEILYNkrNk+qxgBojkw4JwEFJuXi5gc=; b=PeJ7T45d8VQj+3CttbTn2BZLYvBTmfK7ij+OHJC7rk1NcvDHVxmjUfRa i46Mc2e6UX+KpjxVIo/Zm3GGleBuF/W7c7WmPd9JUR0x+X3sRf7qT9YOS YMDT+G6r61iy0+twsxK235Cd375FUdVztWQem5LUsA2/3s560MEv35QJY +rgsZAIQIP9k+v/g/pPKk/awKVTbTJddUbaU4FSClzRaqc4QUjhj8tbo/ YykzwnwQLyZJCDrCRE6rTlAObhKdleuDTpOprIzUx4SA7EFe5zmHdHNzr rDzsSlpGnnRPo3N8w3bI7iIqAXWY6A5Okf29N/vFxr2pWHz4EvPkERFh7 g==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="252186106" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="252186106" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:48 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186584" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:42 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v6 7/9] KVM: VMX: enable IPI virtualization Date: Fri, 25 Feb 2022 16:22:21 +0800 Message-Id: <20220225082223.18288-8-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Gao Chao With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: Gao Chao Signed-off-by: Zeng Guang --- arch/x86/include/asm/vmx.h | 8 ++++ arch/x86/include/asm/vmxfeatures.h | 2 + arch/x86/kvm/vmx/capabilities.h | 6 +++ arch/x86/kvm/vmx/posted_intr.c | 12 ++++- arch/x86/kvm/vmx/vmx.c | 74 +++++++++++++++++++++++++++--- arch/x86/kvm/vmx/vmx.h | 3 ++ 6 files changed, 97 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 8c929596a299..b79b6438acaa 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -76,6 +76,11 @@ #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE VMCS_CONTROL_BIT(USR_WAIT_PAU= SE) #define SECONDARY_EXEC_BUS_LOCK_DETECTION VMCS_CONTROL_BIT(BUS_LOCK_DETECT= ION) =20 +/* + * Definitions of Tertiary Processor-Based VM-Execution Controls. + */ +#define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT) + #define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXIT= ING) #define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITI= NG) #define PIN_BASED_VIRTUAL_NMIS VMCS_CONTROL_BIT(VIRTUAL_N= MIS) @@ -159,6 +164,7 @@ static inline int vmx_misc_mseg_revid(u64 vmx_misc) enum vmcs_field { VIRTUAL_PROCESSOR_ID =3D 0x00000000, POSTED_INTR_NV =3D 0x00000002, + LAST_PID_POINTER_INDEX =3D 0x00000008, GUEST_ES_SELECTOR =3D 0x00000800, GUEST_CS_SELECTOR =3D 0x00000802, GUEST_SS_SELECTOR =3D 0x00000804, @@ -224,6 +230,8 @@ enum vmcs_field { TSC_MULTIPLIER_HIGH =3D 0x00002033, TERTIARY_VM_EXEC_CONTROL =3D 0x00002034, TERTIARY_VM_EXEC_CONTROL_HIGH =3D 0x00002035, + PID_POINTER_TABLE =3D 0x00002042, + PID_POINTER_TABLE_HIGH =3D 0x00002043, GUEST_PHYSICAL_ADDRESS =3D 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH =3D 0x00002401, VMCS_LINK_POINTER =3D 0x00002800, diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxf= eatures.h index ff20776dc83b..7ce616af2db2 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -86,4 +86,6 @@ #define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf= dependent) */ #define VMX_FEATURE_BUS_LOCK_DETECTION ( 2*32+ 30) /* "" VM-Exit when bus = lock caused */ =20 +/* Tertiary Processor-Based VM-Execution Controls, word 3 */ +#define VMX_FEATURE_IPI_VIRT (3*32 + 4) /* "" Enable IPI virtualization = */ #endif /* _ASM_X86_VMXFEATURES_H */ diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 31f3d88b3e4d..5f656c9e33be 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -13,6 +13,7 @@ extern bool __read_mostly enable_ept; extern bool __read_mostly enable_unrestricted_guest; extern bool __read_mostly enable_ept_ad_bits; extern bool __read_mostly enable_pml; +extern bool __read_mostly enable_ipiv; extern int __read_mostly pt_mode; =20 #define PT_MODE_SYSTEM 0 @@ -283,6 +284,11 @@ static inline bool cpu_has_vmx_apicv(void) cpu_has_vmx_posted_intr(); } =20 +static inline bool cpu_has_vmx_ipiv(void) +{ + return vmcs_config.cpu_based_3rd_exec_ctrl & TERTIARY_EXEC_IPI_VIRT; +} + static inline bool cpu_has_vmx_flexpriority(void) { return cpu_has_vmx_tpr_shadow() && diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index aa1fe9085d77..90124a30c074 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -177,11 +177,21 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu = *vcpu) local_irq_restore(flags); } =20 +static bool vmx_can_use_ipiv_pi(struct kvm *kvm) +{ + return irqchip_in_kernel(kvm) && enable_ipiv; +} + +static bool vmx_can_use_posted_interrupts(struct kvm *kvm) +{ + return vmx_can_use_ipiv_pi(kvm) || vmx_can_use_vtd_pi(kvm); +} + void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) { struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu); =20 - if (!vmx_can_use_vtd_pi(vcpu->kvm)) + if (!vmx_can_use_posted_interrupts(vcpu->kvm)) return; =20 if (kvm_vcpu_is_blocking(vcpu) && !vmx_interrupt_blocked(vcpu)) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 7beba7a9f247..0cb141c277ef 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -105,6 +105,9 @@ module_param(fasteoi, bool, S_IRUGO); =20 module_param(enable_apicv, bool, S_IRUGO); =20 +bool __read_mostly enable_ipiv =3D true; +module_param(enable_ipiv, bool, 0444); + /* * If nested=3D1, nested virtualization is supported, i.e., guests may use * VMX and be a hypervisor for its own guests. If nested=3D0, guests may n= ot @@ -227,6 +230,11 @@ static const struct { }; =20 #define L1D_CACHE_ORDER 4 + +/* PID(Posted-Interrupt Descriptor)-pointer table entry is 64-bit long */ +#define MAX_PID_TABLE_ORDER get_order(KVM_MAX_VCPU_IDS * sizeof(u64)) +#define PID_TABLE_ENTRY_VALID 1 + static void *vmx_l1d_flush_pages; =20 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) @@ -2543,7 +2551,7 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, } =20 if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) { - u64 opt3 =3D 0; + u64 opt3 =3D TERTIARY_EXEC_IPI_VIRT; u64 min3 =3D 0; =20 if (adjust_vmx_controls_64(min3, opt3, @@ -3898,6 +3906,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_v= cpu *vcpu) vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_= W); + if (enable_ipiv) + vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),MSR_TYPE_RW); } } =20 @@ -4219,14 +4229,21 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_= vcpu *vcpu) =20 pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); if (cpu_has_secondary_exec_ctrls()) { - if (kvm_vcpu_apicv_active(vcpu)) + if (kvm_vcpu_apicv_active(vcpu)) { secondary_exec_controls_setbit(vmx, SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); - else + if (cpu_has_tertiary_exec_ctrls() && enable_ipiv) + tertiary_exec_controls_setbit(vmx, + TERTIARY_EXEC_IPI_VIRT); + } else { secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_controls_clearbit(vmx, + TERTIARY_EXEC_IPI_VIRT); + } } =20 vmx_update_msr_bitmap_x2apic(vcpu); @@ -4260,7 +4277,16 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) =20 static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) { - return vmcs_config.cpu_based_3rd_exec_ctrl; + u64 exec_control =3D vmcs_config.cpu_based_3rd_exec_ctrl; + + /* + * IPI virtualization relies on APICv. Disable IPI + * virtualization if APICv is inhibited. + */ + if (!enable_ipiv || !kvm_vcpu_apicv_active(&vmx->vcpu)) + exec_control &=3D ~TERTIARY_EXEC_IPI_VIRT; + + return exec_control; } =20 /* @@ -4412,6 +4438,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx= *vmx) =20 static void init_vmcs(struct vcpu_vmx *vmx) { + struct kvm_vcpu *vcpu =3D &vmx->vcpu; + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(vcpu->kvm); + if (nested) nested_vmx_set_vmcs_shadowing_bitmap(); =20 @@ -4431,7 +4460,7 @@ static void init_vmcs(struct vcpu_vmx *vmx) if (cpu_has_tertiary_exec_ctrls()) tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx)); =20 - if (kvm_vcpu_apicv_active(&vmx->vcpu)) { + if (kvm_vcpu_apicv_active(vcpu)) { vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); vmcs_write64(EOI_EXIT_BITMAP2, 0); @@ -4441,6 +4470,13 @@ static void init_vmcs(struct vcpu_vmx *vmx) =20 vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); + + if (enable_ipiv) { + WRITE_ONCE(kvm_vmx->pid_table[vcpu->vcpu_id], + __pa(&vmx->pi_desc) | PID_TABLE_ENTRY_VALID); + vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table)); + vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->pid_last_index); + } } =20 if (!kvm_pause_in_guest(vmx->vcpu.kvm)) { @@ -4492,7 +4528,7 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1); } =20 - vmx_write_encls_bitmap(&vmx->vcpu, NULL); + vmx_write_encls_bitmap(vcpu, NULL); =20 if (vmx_pt_mode_is_host_guest()) { memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc)); @@ -4508,7 +4544,7 @@ static void init_vmcs(struct vcpu_vmx *vmx) =20 if (cpu_has_vmx_tpr_shadow()) { vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0); - if (cpu_need_tpr_shadow(&vmx->vcpu)) + if (cpu_need_tpr_shadow(vcpu)) vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, __pa(vmx->vcpu.arch.apic->regs)); vmcs_write32(TPR_THRESHOLD, 0); @@ -7165,6 +7201,18 @@ static int vmx_vm_init(struct kvm *kvm) break; } } + + if (enable_ipiv) { + struct page *pages; + + pages =3D alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PID_TABLE_ORDER); + if (!pages) + return -ENOMEM; + + to_kvm_vmx(kvm)->pid_table =3D (void *)page_address(pages); + to_kvm_vmx(kvm)->pid_last_index =3D KVM_MAX_VCPU_IDS - 1; + } + return 0; } =20 @@ -7756,6 +7804,14 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bi= t) return supported & BIT(bit); } =20 +static void vmx_vm_destroy(struct kvm *kvm) +{ + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm); + + if (kvm_vmx->pid_table) + free_pages((unsigned long)kvm_vmx->pid_table, MAX_PID_TABLE_ORDER); +} + static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .name =3D "kvm_intel", =20 @@ -7768,6 +7824,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { =20 .vm_size =3D sizeof(struct kvm_vmx), .vm_init =3D vmx_vm_init, + .vm_destroy =3D vmx_vm_destroy, =20 .vcpu_create =3D vmx_create_vcpu, .vcpu_free =3D vmx_free_vcpu, @@ -8022,6 +8079,9 @@ static __init int hardware_setup(void) if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr =3D NULL; =20 + if (!enable_apicv || !cpu_has_vmx_ipiv()) + enable_ipiv =3D false; + if (cpu_has_vmx_tsc_scaling()) { kvm_has_tsc_control =3D true; kvm_max_tsc_scaling_ratio =3D KVM_VMX_TSC_MULTIPLIER_MAX; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index d4a647d3ed4a..e7b0c00c9d43 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -365,6 +365,9 @@ struct kvm_vmx { unsigned int tss_addr; bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; + /* PID table for IPI virtualization */ + u64 *pid_table; + u16 pid_last_index; }; =20 bool nested_vmx_allowed(struct kvm_vcpu *vcpu); --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96715C433FE for ; Fri, 25 Feb 2022 08:54:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238804AbiBYIyg (ORCPT ); Fri, 25 Feb 2022 03:54:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238814AbiBYIy1 (ORCPT ); Fri, 25 Feb 2022 03:54:27 -0500 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1870B22320E; Fri, 25 Feb 2022 00:53:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779234; x=1677315234; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=uS226ko481dKVKU+nkMhxwjiC1Kg1uHtGTxSWuBb7xs=; b=TynXPmLmXwXxjbTcWECvLb4ztIu4U9AbsecKzIFHRROoEsUzU6OStHD1 pXJozf8Kklb5XTrSsTDOOzCrWv1xou4CSjvD44jFq61EJJmPlXnf6J/Sq 8LowEwCyjCBpFA+nphdIgK95XZSFzjtpoYqUEgI0VfOmHphi97BnwKe77 ZwHVE1jimdCpaGfy5FsoxwIiMF62x+IblByWhVY1/zobcmCU7Af/EhL/v iY4XLlKyVMtOFnhYb4h+gwm+1Xh3sApKgB8CB3q0xSyV7Di1o799UqDbt /nXOzZtWdZRigkjBLF0uaEoqG1WuQp/dHe8erHTwrz36D0JB5otQicnIe Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="252186122" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="252186122" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:54 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186594" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:48 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v6 8/9] KVM: x86: Allow userspace set maximum VCPU id for VM Date: Fri, 25 Feb 2022 16:22:22 +0800 Message-Id: <20220225082223.18288-9-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Introduce new max_vcpu_id in KVM for x86 architecture. Userspace can assign maximum possible vcpu id for current VM session using KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl(). This is done for x86 only because the sole use case is to guide memory allocation for PID-pointer table, a structure needed to enable VMX IPI. By default, max_vcpu_id set as KVM_MAX_VCPU_IDS. Suggested-by: Sean Christopherson Signed-off-by: Zeng Guang Reviewed-by: Maxim Levitsky --- No new KVM capability is added to advertise the support of configurable maximum vCPU ID to user space because max_vcpu_id is just a hint/commitment to allow KVM to reduce the size of PID-pointer table. But I am not 100% sure if it is proper to do so. arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/x86.c | 11 +++++++++++ 2 files changed, 17 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 6dcccb304775..db16aebd946c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1233,6 +1233,12 @@ struct kvm_arch { hpa_t hv_root_tdp; spinlock_t hv_root_tdp_lock; #endif + /* + * VM-scope maximum vCPU ID. Used to determine the size of structures + * that increase along with the maximum vCPU ID, in which case, using + * the global KVM_MAX_VCPU_IDS may lead to significant memory waste. + */ + u32 max_vcpu_id; }; =20 struct kvm_vm_stat { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4f6fe9974cb5..ca17cc452bd3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5994,6 +5994,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, kvm->arch.exit_on_emulation_error =3D cap->args[0]; r =3D 0; break; + case KVM_CAP_MAX_VCPU_ID: + if (cap->args[0] <=3D KVM_MAX_VCPU_IDS) { + kvm->arch.max_vcpu_id =3D cap->args[0]; + r =3D 0; + } else + r =3D -E2BIG; + break; default: r =3D -EINVAL; break; @@ -11067,6 +11074,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) struct page *page; int r; =20 + if (vcpu->vcpu_id >=3D vcpu->kvm->arch.max_vcpu_id) + return -E2BIG; + vcpu->arch.last_vmentry_cpu =3D -1; vcpu->arch.regs_avail =3D ~0; vcpu->arch.regs_dirty =3D ~0; @@ -11589,6 +11599,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) spin_lock_init(&kvm->arch.hv_root_tdp_lock); kvm->arch.hv_root_tdp =3D INVALID_PAGE; #endif + kvm->arch.max_vcpu_id =3D KVM_MAX_VCPU_IDS; =20 INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn); INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); --=20 2.27.0 From nobody Tue Jun 23 22:30:54 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F00FC433EF for ; Fri, 25 Feb 2022 08:54:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238816AbiBYIyl (ORCPT ); Fri, 25 Feb 2022 03:54:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238758AbiBYIyb (ORCPT ); Fri, 25 Feb 2022 03:54:31 -0500 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F539222193; Fri, 25 Feb 2022 00:54:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1645779240; x=1677315240; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=oeVLXWVPJ86PvAjYfho9TLMR4DXscOSnF7oYXF2Q+Ic=; b=cMv+RCYj/gEzQQMbUEhUbJpi4wE3OS0EJW/+2SphfT6te2PJrmB77DI0 fLvn4jK9nPX4D5JMmBmlIEnIlKE3xU4eJLvUROYGXS8nL5Tqcleq9xEYY qzAJ8qLHxYUugVkXhMQ2dq7kf+BK0H0zx+JvYp2lyrvCQLxD7F8wEkt+V zAfPMpjvsN//wz8IvcTm9p/LyJnyBvw8yC4SONekWJQmQkxgxJhObfzF+ cqOm5Z0UxbN/H5wyDjq6zUsQ2foK05PV29dxzn4xdYZHJtF1un7V9QxB1 hO+8ZZwcViBKXmTOF3Nv9C2tP2JzngiqOqw0qxpVt8BRuaHvpDmbAhAud Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10268"; a="250037416" X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="250037416" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:59 -0800 X-IronPort-AV: E=Sophos;i="5.90,136,1643702400"; d="scan'208";a="549186616" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Feb 2022 00:53:54 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v6 9/9] KVM: VMX: Optimize memory allocation for PID-pointer table Date: Fri, 25 Feb 2022 16:22:23 +0800 Message-Id: <20220225082223.18288-10-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220225082223.18288-1-guang.zeng@intel.com> References: <20220225082223.18288-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Current kvm allocates 8 pages in advance for Posted Interrupt Descriptor pointer (PID-pointer) table to accommodate vCPUs with APIC ID up to KVM_MAX_VCPU_IDS - 1. This policy wastes some memory because most of VMs have less than 512 vCPUs and then just need one page. If user hypervisor specify max practical vcpu id prior to vCPU creation, IPIv can allocate only essential memory for PID-pointer table and reduce the memory footprint of VMs. Suggested-by: Sean Christopherson Signed-off-by: Zeng Guang --- arch/x86/kvm/vmx/vmx.c | 45 ++++++++++++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0cb141c277ef..22bfb4953289 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -230,9 +230,6 @@ static const struct { }; =20 #define L1D_CACHE_ORDER 4 - -/* PID(Posted-Interrupt Descriptor)-pointer table entry is 64-bit long */ -#define MAX_PID_TABLE_ORDER get_order(KVM_MAX_VCPU_IDS * sizeof(u64)) #define PID_TABLE_ENTRY_VALID 1 =20 static void *vmx_l1d_flush_pages; @@ -4434,6 +4431,24 @@ static u32 vmx_secondary_exec_control(struct vcpu_vm= x *vmx) return exec_control; } =20 +static int vmx_alloc_pid_table(struct kvm_vmx *kvm_vmx) +{ + struct page *pages; + + if(kvm_vmx->pid_table) + return 0; + + pages =3D alloc_pages(GFP_KERNEL | __GFP_ZERO, + get_order(kvm_vmx->kvm.arch.max_vcpu_id * sizeof(u64))); + + if (!pages) + return -ENOMEM; + + kvm_vmx->pid_table =3D (void *)page_address(pages); + kvm_vmx->pid_last_index =3D kvm_vmx->kvm.arch.max_vcpu_id - 1; + return 0; +} + #define VMX_XSS_EXIT_BITMAP 0 =20 static void init_vmcs(struct vcpu_vmx *vmx) @@ -7159,6 +7174,16 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu) goto free_vmcs; } =20 + if (enable_ipiv && kvm_vcpu_apicv_active(vcpu)) { + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(vcpu->kvm); + + mutex_lock(&vcpu->kvm->lock); + err =3D vmx_alloc_pid_table(kvm_vmx); + mutex_unlock(&vcpu->kvm->lock); + if (err) + goto free_vmcs; + } + return 0; =20 free_vmcs: @@ -7202,17 +7227,6 @@ static int vmx_vm_init(struct kvm *kvm) } } =20 - if (enable_ipiv) { - struct page *pages; - - pages =3D alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PID_TABLE_ORDER); - if (!pages) - return -ENOMEM; - - to_kvm_vmx(kvm)->pid_table =3D (void *)page_address(pages); - to_kvm_vmx(kvm)->pid_last_index =3D KVM_MAX_VCPU_IDS - 1; - } - return 0; } =20 @@ -7809,7 +7823,8 @@ static void vmx_vm_destroy(struct kvm *kvm) struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm); =20 if (kvm_vmx->pid_table) - free_pages((unsigned long)kvm_vmx->pid_table, MAX_PID_TABLE_ORDER); + free_pages((unsigned long)kvm_vmx->pid_table, + get_order((kvm_vmx->pid_last_index + 1) * sizeof(u64))); } =20 static struct kvm_x86_ops vmx_x86_ops __initdata =3D { --=20 2.27.0