From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34855C433FE for ; Fri, 31 Dec 2021 14:59:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230496AbhLaO67 (ORCPT ); Fri, 31 Dec 2021 09:58:59 -0500 Received: from mga02.intel.com ([134.134.136.20]:14076 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229474AbhLaO65 (ORCPT ); Fri, 31 Dec 2021 09:58:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962737; x=1672498737; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=3pU68yOPBfgSPZj6Jckk7INJBpH2nTc4/p8+Eo8w+R8=; b=njF/81ZraWYfnTuSbfv7zHXFDP2IMi5/RXy66TqV0aDB1xvxT2s1OwNX 9Q+s6FhsyzUEB/9EhZZZmGg24+j7EcByGotGr4GUulaO9LYodRQjBh4gJ kpzKCnIDF++MOtZnyh7XuuHFTpe4nCJ2p8X4cjQiotOnw0QZgetDrqGiE MW+A7o7eD60MknnXvr+9qRz4I2riCFKRy2jyb7OgBqzyoLtG6LJluR6vI KoAWjmvGcZxIN5uXhdBART/mDPAtqu9NQ/P2qHeocuPzMCONuiNCg9T6d awb1qrEW33ewCb0i0rpe9zVn84jq01OQkV/xia0e+vn737NLRodke5Qnc A==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="229132977" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="229132977" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:58:57 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758399" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:58:51 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v5 1/8] x86/cpu: Add new VMX feature, Tertiary VM-Execution control Date: Fri, 31 Dec 2021 22:28:42 +0800 Message-Id: <20211231142849.611-2-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo A new 64-bit control field "tertiary processor-based VM-execution controls", is defined [1]. It's controlled by bit 17 of the primary processor-based VM-execution controls. Different from its brother VM-execution fields, this tertiary VM- execution controls field is 64 bit. So it occupies 2 vmx_feature_leafs, TERTIARY_CTLS_LOW and TERTIARY_CTLS_HIGH. Its companion VMX capability reporting MSR,MSR_IA32_VMX_PROCBASED_CTLS3 (0x492), is also semantically different from its brothers, whose 64 bits consist of all allow-1, rather than 32-bit allow-0 and 32-bit allow-1 [1][2= ]. Therefore, its init_vmx_capabilities() is a little different from others. [1] ISE 6.2 "VMCS Changes" https://www.intel.com/content/www/us/en/develop/download/intel-architecture= -instruction-set-extensions-programming-reference.html [2] SDM Vol3. Appendix A.3 Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang Reviewed-by: Sean Christopherson --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/vmxfeatures.h | 3 ++- arch/x86/kernel/cpu/feat_ctl.c | 9 ++++++++- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-in= dex.h index 01e2650b9585..4914de76ea51 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -921,6 +921,7 @@ #define MSR_IA32_VMX_TRUE_EXIT_CTLS 0x0000048f #define MSR_IA32_VMX_TRUE_ENTRY_CTLS 0x00000490 #define MSR_IA32_VMX_VMFUNC 0x00000491 +#define MSR_IA32_VMX_PROCBASED_CTLS3 0x00000492 =20 /* VMX_BASIC bits and bitmasks */ #define VMX_BASIC_VMCS_SIZE_SHIFT 32 diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxf= eatures.h index d9a74681a77d..ff20776dc83b 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -5,7 +5,7 @@ /* * Defines VMX CPU feature bits */ -#define NVMXINTS 3 /* N 32-bit words worth of info */ +#define NVMXINTS 5 /* N 32-bit words worth of info */ =20 /* * Note: If the comment begins with a quoted string, that string is used @@ -43,6 +43,7 @@ #define VMX_FEATURE_RDTSC_EXITING ( 1*32+ 12) /* "" VM-Exit on RDTSC */ #define VMX_FEATURE_CR3_LOAD_EXITING ( 1*32+ 15) /* "" VM-Exit on writes t= o CR3 */ #define VMX_FEATURE_CR3_STORE_EXITING ( 1*32+ 16) /* "" VM-Exit on reads f= rom CR3 */ +#define VMX_FEATURE_TERTIARY_CONTROLS ( 1*32+ 17) /* "" Enable Tertiary VM= -Execution Controls */ #define VMX_FEATURE_CR8_LOAD_EXITING ( 1*32+ 19) /* "" VM-Exit on writes t= o CR8 */ #define VMX_FEATURE_CR8_STORE_EXITING ( 1*32+ 20) /* "" VM-Exit on reads f= rom CR8 */ #define VMX_FEATURE_VIRTUAL_TPR ( 1*32+ 21) /* "vtpr" TPR virtualization,= a.k.a. TPR shadow */ diff --git a/arch/x86/kernel/cpu/feat_ctl.c b/arch/x86/kernel/cpu/feat_ctl.c index da696eb4821a..993697e71854 100644 --- a/arch/x86/kernel/cpu/feat_ctl.c +++ b/arch/x86/kernel/cpu/feat_ctl.c @@ -15,6 +15,8 @@ enum vmx_feature_leafs { MISC_FEATURES =3D 0, PRIMARY_CTLS, SECONDARY_CTLS, + TERTIARY_CTLS_LOW, + TERTIARY_CTLS_HIGH, NR_VMX_FEATURE_WORDS, }; =20 @@ -22,7 +24,7 @@ enum vmx_feature_leafs { =20 static void init_vmx_capabilities(struct cpuinfo_x86 *c) { - u32 supported, funcs, ept, vpid, ign; + u32 supported, funcs, ept, vpid, ign, low, high; =20 BUILD_BUG_ON(NVMXINTS !=3D NR_VMX_FEATURE_WORDS); =20 @@ -42,6 +44,11 @@ static void init_vmx_capabilities(struct cpuinfo_x86 *c) rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS2, &ign, &supported); c->vmx_capability[SECONDARY_CTLS] =3D supported; =20 + /* All 64 bits of tertiary controls MSR are allowed-1 settings. */ + rdmsr_safe(MSR_IA32_VMX_PROCBASED_CTLS3, &low, &high); + c->vmx_capability[TERTIARY_CTLS_LOW] =3D low; + c->vmx_capability[TERTIARY_CTLS_HIGH] =3D high; + rdmsr(MSR_IA32_VMX_PINBASED_CTLS, ign, supported); rdmsr_safe(MSR_IA32_VMX_VMFUNC, &ign, &funcs); =20 --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1047DC433F5 for ; Fri, 31 Dec 2021 14:59:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231127AbhLaO7H (ORCPT ); Fri, 31 Dec 2021 09:59:07 -0500 Received: from mga18.intel.com ([134.134.136.126]:29026 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229474AbhLaO7D (ORCPT ); Fri, 31 Dec 2021 09:59:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962743; x=1672498743; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=OVSz8IsiD1AyzuC6JkviSS4vl4sM3klnHE+FMiHvbOo=; b=WnNkvYqa7wR3PxQj/SI8/YeG196h8Hh2MfDLjPJtqRioN6k7VBObzl8e 1q7N6TFZ1qUT8PBbduYzAR3DEycTA207S3Owj6bwcWW/KfCuTXEQJtaAQ PAQtX0BJgsuY5zhAKIttL+hlbwqTS8yGKDQlOtN5hlIBvylbI+eVg01xb 3c+Ps2bEhK0u99C+boHP2ndYjOT5olFbdkFZeyM7YLtMsWuS7fcf+wsrS h69Pg6L8NbvcKjMMcpL0sULm13hlRgOmM7H54WFbMmjd/Uu42pJYQXD69 hKl4Fo0qNt4Fd+ZLwhthPdJG0x9Xp90J3pg/LyFa7TXBH3UuAd5i2tmKx g==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="228641845" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="228641845" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:03 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758418" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:58:57 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v5 2/8] KVM: VMX: Extend BUILD_CONTROLS_SHADOW macro to support 64-bit variation Date: Fri, 31 Dec 2021 22:28:43 +0800 Message-Id: <20211231142849.611-3-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo The Tertiary VM-Exec Control, different from previous control fields, is 64 bit. So extend BUILD_CONTROLS_SHADOW() by adding a 'bit' parameter, to support both 32 bit and 64 bit fields' auxiliary functions building. Suggested-by: Sean Christopherson Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/kvm/vmx/vmx.h | 59 ++++++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 28 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 4df2ac24ffc1..07e1753225bf 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -443,35 +443,38 @@ static inline u8 vmx_get_rvi(void) return vmcs_read16(GUEST_INTR_STATUS) & 0xff; } =20 -#define BUILD_CONTROLS_SHADOW(lname, uname) \ -static inline void lname##_controls_set(struct vcpu_vmx *vmx, u32 val) = \ -{ \ - if (vmx->loaded_vmcs->controls_shadow.lname !=3D val) { \ - vmcs_write32(uname, val); \ - vmx->loaded_vmcs->controls_shadow.lname =3D val; \ - } \ -} \ -static inline u32 __##lname##_controls_get(struct loaded_vmcs *vmcs) \ -{ \ - return vmcs->controls_shadow.lname; \ -} \ -static inline u32 lname##_controls_get(struct vcpu_vmx *vmx) \ -{ \ - return __##lname##_controls_get(vmx->loaded_vmcs); \ -} \ -static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u32 val) = \ -{ \ - lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \ -} \ -static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u32 val= ) \ -{ \ - lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ +#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \ +static inline \ +void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + if (vmx->loaded_vmcs->controls_shadow.lname !=3D val) { \ + vmcs_write##bits(uname, val); \ + vmx->loaded_vmcs->controls_shadow.lname =3D val; \ + } \ +} \ +static inline u##bits __##lname##_controls_get(struct loaded_vmcs *vmcs)\ +{ \ + return vmcs->controls_shadow.lname; \ +} \ +static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx) \ +{ \ + return __##lname##_controls_get(vmx->loaded_vmcs); \ +} \ +static inline \ +void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \ +} \ +static inline \ +void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \ +{ \ + lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \ } -BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS) -BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS) -BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL) -BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL) -BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL) +BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32) +BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) +BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32) =20 static inline void vmx_register_cache_reset(struct kvm_vcpu *vcpu) { --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15926C433FE for ; Fri, 31 Dec 2021 14:59:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231149AbhLaO7O (ORCPT ); Fri, 31 Dec 2021 09:59:14 -0500 Received: from mga18.intel.com ([134.134.136.126]:29033 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230516AbhLaO7J (ORCPT ); Fri, 31 Dec 2021 09:59:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962749; x=1672498749; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=kV6uZpZsPoQezpqTs1llnARoklxbYvyakusC1X0vRSg=; b=W2HSXv+lNUc/yDXTfyHiuy0LkabVOyz95cvp/wYTSVjwRfFAnnnym4oO xuxMf46y4YCse20p4hwlrTo8OpP4DyOPQoh88TdXsyhDnHvfVXmk2Cbxo IH434SUCLoELeqA6hm3y6w98Fg/fEvdIGReybbp3CY3My3RIS4gC6IbnQ AOAmZzgy6SxAXLSSTPO5KucysF7dWldr3vZzFNSX/wtgkv8+qqccTzBob XdUxW5Nn64DJit8ztv2OxmxeXuQJEoqFsJSwjwlR5t4vRVWjtw7DnDhi2 +XyI0aS1kx+VQ4gtEnxLPYdNo+9Gu6AL2am3rm6k3Df3IehRK+Tj4Qg8R Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="228641856" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="228641856" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:09 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758439" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:03 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v5 3/8] KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Date: Fri, 31 Dec 2021 22:28:44 +0800 Message-Id: <20211231142849.611-4-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo Check VMX features on tertiary execution control in VMCS config setup. Sub-features in tertiary execution control to be enabled are adjusted according to hardware capabilities although no sub-feature is enabled in this patch. EVMCSv1 doesn't support tertiary VM-execution control, so disable it when EVMCSv1 is in use. And define the auxiliary functions for Tertiary control field here, using the new BUILD_CONTROLS_SHADOW(). Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/include/asm/vmx.h | 3 +++ arch/x86/kvm/vmx/capabilities.h | 7 ++++++ arch/x86/kvm/vmx/evmcs.c | 2 ++ arch/x86/kvm/vmx/evmcs.h | 1 + arch/x86/kvm/vmx/vmcs.h | 1 + arch/x86/kvm/vmx/vmx.c | 38 ++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/vmx.h | 1 + 7 files changed, 52 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 0ffaa3156a4e..8c929596a299 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -31,6 +31,7 @@ #define CPU_BASED_RDTSC_EXITING VMCS_CONTROL_BIT(RDTSC_EXI= TING) #define CPU_BASED_CR3_LOAD_EXITING VMCS_CONTROL_BIT(CR3_LOAD_EXITING) #define CPU_BASED_CR3_STORE_EXITING VMCS_CONTROL_BIT(CR3_STORE_EXITING) +#define CPU_BASED_ACTIVATE_TERTIARY_CONTROLS VMCS_CONTROL_BIT(TERTIARY_CON= TROLS) #define CPU_BASED_CR8_LOAD_EXITING VMCS_CONTROL_BIT(CR8_LOAD_= EXITING) #define CPU_BASED_CR8_STORE_EXITING VMCS_CONTROL_BIT(CR8_STORE= _EXITING) #define CPU_BASED_TPR_SHADOW VMCS_CONTROL_BIT(VIRTUAL_T= PR) @@ -221,6 +222,8 @@ enum vmcs_field { ENCLS_EXITING_BITMAP_HIGH =3D 0x0000202F, TSC_MULTIPLIER =3D 0x00002032, TSC_MULTIPLIER_HIGH =3D 0x00002033, + TERTIARY_VM_EXEC_CONTROL =3D 0x00002034, + TERTIARY_VM_EXEC_CONTROL_HIGH =3D 0x00002035, GUEST_PHYSICAL_ADDRESS =3D 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH =3D 0x00002401, VMCS_LINK_POINTER =3D 0x00002800, diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 4705ad55abb5..38d414f64e61 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -59,6 +59,7 @@ struct vmcs_config { u32 pin_based_exec_ctrl; u32 cpu_based_exec_ctrl; u32 cpu_based_2nd_exec_ctrl; + u64 cpu_based_3rd_exec_ctrl; u32 vmexit_ctrl; u32 vmentry_ctrl; struct nested_vmx_msrs nested; @@ -131,6 +132,12 @@ static inline bool cpu_has_secondary_exec_ctrls(void) CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; } =20 +static inline bool cpu_has_tertiary_exec_ctrls(void) +{ + return vmcs_config.cpu_based_exec_ctrl & + CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; +} + static inline bool cpu_has_vmx_virtualize_apic_accesses(void) { return vmcs_config.cpu_based_2nd_exec_ctrl & diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c index ba6f99f584ac..03e7c80186fb 100644 --- a/arch/x86/kvm/vmx/evmcs.c +++ b/arch/x86/kvm/vmx/evmcs.c @@ -298,8 +298,10 @@ const unsigned int nr_evmcs_1_fields =3D ARRAY_SIZE(vm= cs_field_to_evmcs_1); =20 __init void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) { + vmcs_conf->cpu_based_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_EXEC_CTRL; vmcs_conf->pin_based_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_PINCTRL; vmcs_conf->cpu_based_2nd_exec_ctrl &=3D ~EVMCS1_UNSUPPORTED_2NDEXEC; + vmcs_conf->cpu_based_3rd_exec_ctrl =3D 0; =20 vmcs_conf->vmexit_ctrl &=3D ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL; vmcs_conf->vmentry_ctrl &=3D ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL; diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h index 16731d2cf231..65fd2b9f893c 100644 --- a/arch/x86/kvm/vmx/evmcs.h +++ b/arch/x86/kvm/vmx/evmcs.h @@ -50,6 +50,7 @@ DECLARE_STATIC_KEY_FALSE(enable_evmcs); */ #define EVMCS1_UNSUPPORTED_PINCTRL (PIN_BASED_POSTED_INTR | \ PIN_BASED_VMX_PREEMPTION_TIMER) +#define EVMCS1_UNSUPPORTED_EXEC_CTRL (CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) #define EVMCS1_UNSUPPORTED_2NDEXEC \ (SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY | \ SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | \ diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h index 6e5de2e2b0da..b9d18cfcf837 100644 --- a/arch/x86/kvm/vmx/vmcs.h +++ b/arch/x86/kvm/vmx/vmcs.h @@ -50,6 +50,7 @@ struct vmcs_controls_shadow { u32 pin; u32 exec; u32 secondary_exec; + u64 tertiary_exec; }; =20 /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5aadad3e7367..fb0f600368c6 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2383,6 +2383,21 @@ static __init int adjust_vmx_controls(u32 ctl_min, u= 32 ctl_opt, return 0; } =20 +static __init int adjust_vmx_controls_64(u64 ctl_min, u64 ctl_opt, + u32 msr, u64 *result) +{ + u64 allowed1; + + rdmsrl(msr, allowed1); + + /* Ensure minimum (required) set of control bits are supported. */ + if (ctl_min & ~allowed1) + return -EIO; + + *result =3D (ctl_min | ctl_opt) & allowed1; + return 0; +} + static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, struct vmx_capability *vmx_cap) { @@ -2391,6 +2406,7 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, u32 _pin_based_exec_control =3D 0; u32 _cpu_based_exec_control =3D 0; u32 _cpu_based_2nd_exec_control =3D 0; + u64 _cpu_based_3rd_exec_control =3D 0; u32 _vmexit_control =3D 0; u32 _vmentry_control =3D 0; =20 @@ -2412,7 +2428,8 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, =20 opt =3D CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | - CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS | + CPU_BASED_ACTIVATE_TERTIARY_CONTROLS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS, &_cpu_based_exec_control) < 0) return -EIO; @@ -2486,6 +2503,16 @@ static __init int setup_vmcs_config(struct vmcs_conf= ig *vmcs_conf, "1-setting enable VPID VM-execution control\n"); } =20 + if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) { + u64 opt3 =3D 0; + u64 min3 =3D 0; + + if (adjust_vmx_controls_64(min3, opt3, + MSR_IA32_VMX_PROCBASED_CTLS3, + &_cpu_based_3rd_exec_control)) + return -EIO; + } + min =3D VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT; #ifdef CONFIG_X86_64 min |=3D VM_EXIT_HOST_ADDR_SPACE_SIZE; @@ -2573,6 +2600,7 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, vmcs_conf->pin_based_exec_ctrl =3D _pin_based_exec_control; vmcs_conf->cpu_based_exec_ctrl =3D _cpu_based_exec_control; vmcs_conf->cpu_based_2nd_exec_ctrl =3D _cpu_based_2nd_exec_control; + vmcs_conf->cpu_based_3rd_exec_ctrl =3D _cpu_based_3rd_exec_control; vmcs_conf->vmexit_ctrl =3D _vmexit_control; vmcs_conf->vmentry_ctrl =3D _vmentry_control; =20 @@ -4128,6 +4156,11 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) return exec_control; } =20 +static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) +{ + return vmcs_config.cpu_based_3rd_exec_ctrl; +} + /* * Adjust a single secondary execution control bit to intercept/allow an * instruction in the guest. This is usually done based on whether or not= a @@ -4293,6 +4326,9 @@ static void init_vmcs(struct vcpu_vmx *vmx) if (cpu_has_secondary_exec_ctrls()) secondary_exec_controls_set(vmx, vmx_secondary_exec_control(vmx)); =20 + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx)); + if (kvm_vcpu_apicv_active(&vmx->vcpu)) { vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 07e1753225bf..ee94068ca8fb 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -475,6 +475,7 @@ BUILD_CONTROLS_SHADOW(vm_exit, VM_EXIT_CONTROLS, 32) BUILD_CONTROLS_SHADOW(pin, PIN_BASED_VM_EXEC_CONTROL, 32) BUILD_CONTROLS_SHADOW(exec, CPU_BASED_VM_EXEC_CONTROL, 32) BUILD_CONTROLS_SHADOW(secondary_exec, SECONDARY_VM_EXEC_CONTROL, 32) +BUILD_CONTROLS_SHADOW(tertiary_exec, TERTIARY_VM_EXEC_CONTROL, 64) =20 static inline void vmx_register_cache_reset(struct kvm_vcpu *vcpu) { --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A36DC433F5 for ; Fri, 31 Dec 2021 14:59:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230512AbhLaO7W (ORCPT ); Fri, 31 Dec 2021 09:59:22 -0500 Received: from mga03.intel.com ([134.134.136.65]:47298 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231154AbhLaO7P (ORCPT ); Fri, 31 Dec 2021 09:59:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962755; x=1672498755; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=VFfD9eTpxtjDaqvPE492xPzn5UqT5dKTnIj6sllM0iQ=; b=gdoCzp9FTl60e+tmw9/S1WMIfkk0aw+KhxbYabIMhGW/8o96rTA1cU62 oaeH9YTQuBG0tnrCVbeoiQn6B+Oz7iHhQgwVfA9uf8J1reB6eg3sdrDrA wzgg3Pe1RE0oR6YyARUEWQDfIfLOBAlDQll4aDNbwkkTzS7D50isozSF8 bCOY2pRlBsgXf3CYT4xI3KV+k1aNLcM1OXbN6zZoroG3hOEFIhCPK9CaZ OL/TkDGYwHn19ywDydSWWNF0FIEPpHkVtWINGbCN8KVAdnUjPQkqBM+jg hJxHGJXianEqNuUhojnEH+fcu8mQ83RZ/8LeCNEyg6zqQAoPUFOW0u6fy A==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="241773897" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="241773897" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:14 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758450" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:09 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang , Robert Hoo Subject: [PATCH v5 4/8] KVM: VMX: dump_vmcs() reports tertiary_exec_control field as well Date: Fri, 31 Dec 2021 22:28:45 +0800 Message-Id: <20211231142849.611-5-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Robert Hoo Add tertiary_exec_control field report in dump_vmcs() Signed-off-by: Robert Hoo Signed-off-by: Zeng Guang --- arch/x86/kvm/vmx/vmx.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fb0f600368c6..5716db9704c0 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -5729,6 +5729,7 @@ void dump_vmcs(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx =3D to_vmx(vcpu); u32 vmentry_ctl, vmexit_ctl; u32 cpu_based_exec_ctrl, pin_based_exec_ctrl, secondary_exec_control; + u64 tertiary_exec_control =3D 0; unsigned long cr4; int efer_slot; =20 @@ -5746,6 +5747,9 @@ void dump_vmcs(struct kvm_vcpu *vcpu) if (cpu_has_secondary_exec_ctrls()) secondary_exec_control =3D vmcs_read32(SECONDARY_VM_EXEC_CONTROL); =20 + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_control =3D vmcs_read64(TERTIARY_VM_EXEC_CONTROL); + pr_err("VMCS %p, last attempted VM-entry on CPU %d\n", vmx->loaded_vmcs->vmcs, vcpu->arch.last_vmentry_cpu); pr_err("*** Guest State ***\n"); @@ -5844,8 +5848,9 @@ void dump_vmcs(struct kvm_vcpu *vcpu) vmx_dump_msrs("host autoload", &vmx->msr_autoload.host); =20 pr_err("*** Control State ***\n"); - pr_err("PinBased=3D%08x CPUBased=3D%08x SecondaryExec=3D%08x\n", - pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control); + pr_err("PinBased=3D0x%08x CPUBased=3D0x%08x SecondaryExec=3D0x%08x Tertia= ryExec=3D0x%016llx\n", + pin_based_exec_ctrl, cpu_based_exec_ctrl, secondary_exec_control, + tertiary_exec_control); pr_err("EntryControls=3D%08x ExitControls=3D%08x\n", vmentry_ctl, vmexit_= ctl); pr_err("ExceptionBitmap=3D%08x PFECmask=3D%08x PFECmatch=3D%08x\n", vmcs_read32(EXCEPTION_BITMAP), --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9559DC433EF for ; Fri, 31 Dec 2021 14:59:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231243AbhLaO7e (ORCPT ); Fri, 31 Dec 2021 09:59:34 -0500 Received: from mga06.intel.com ([134.134.136.31]:23761 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231166AbhLaO7V (ORCPT ); Fri, 31 Dec 2021 09:59:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962760; x=1672498760; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=GYYW4VDo4aLjLd+zWJjAaLbWaXT9U1/xExbUlzLY79w=; b=j8t/RnGFAgRfEh+USF9E6wnjVRmLRdJEZH72xn+kYjrm2g+nbW5nJh/R pN9sGSJftzNDa8LXW4owLtnf1X0sbfmg7a+E+IqT2har0VSPzYdzIkRKD BYFuYbmdRWRtGYfscfCN7xizD/f5zwo6hJ6joMuAgDx6ey6gRZMGiy7+A xmiX2zfozbFLrGSAH0PcMEOuZ8vagPhBAtmlWonyapM2D1QEtGGH3ZW0o T2kgHD8h5XZISWY2em7+rqT8OMg+1qMGbtRxYGra7IGDs8LjTTJ/X1q96 CIQhgeVUkXput6POvmR69e5sKfmVfDAx+ByhF/YAZT5PVHw5CddwQD7XP g==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="302569970" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="302569970" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:20 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758458" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:14 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v5 5/8] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit Date: Fri, 31 Dec 2021 22:28:46 +0800 Message-Id: <20211231142849.611-6-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In VMX non-root operation, new behavior applies to virtualize WRMSR to vICR in x2APIC mode. Depending on settings of the VM-execution controls, CPU would produce APIC-write VM-exit following the 64-bit value written to offset 300H on the virtual-APIC page(vICR). KVM needs to retrieve the value written by CPU and emulate the vICR write to deliver an interrupt. Current KVM doesn't consider to handle the 64-bit setting on vICR in trap-like APIC-write VM-exit. Because using kvm_lapic_reg_write() to emulate writes to APIC_ICR requires the APIC_ICR2 is already programmed correctly. But in the above APIC-write VM-exit, CPU writes the whole 64 bits to APIC_ICR rather than program higher 32 bits and lower 32 bits to APIC_ICR2 and APIC_ICR respectively. So, KVM needs to retrieve the whole 64-bit value and program higher 32 bits to APIC_ICR2 first. Signed-off-by: Zeng Guang --- arch/x86/kvm/lapic.c | 12 +++++++++--- arch/x86/kvm/lapic.h | 5 +++++ 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index f206fc35deff..3ce7142ba00e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2186,15 +2186,21 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); /* emulate APIC access in a trap manner */ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset) { - u32 val =3D 0; + struct kvm_lapic *apic =3D vcpu->arch.apic; + u64 val =3D 0; =20 /* hw has done the conditional check and inst decode */ offset &=3D 0xff0; =20 - kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val); + /* exception dealing with 64bit data on vICR in x2apic mode */ + if ((offset =3D=3D APIC_ICR) && apic_x2apic_mode(apic)) { + val =3D kvm_lapic_get_reg64(apic, offset); + kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(val>>32)); + } else + kvm_lapic_reg_read(apic, offset, 4, &val); =20 /* TODO: optimize to just emulate side effect w/o one more write */ - kvm_lapic_reg_write(vcpu->arch.apic, offset, val); + kvm_lapic_reg_write(apic, offset, (u32)val); } EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode); =20 diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 2b44e533fc8d..91864e401a64 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -158,6 +158,11 @@ static inline u32 kvm_lapic_get_reg(struct kvm_lapic *= apic, int reg_off) return *((u32 *) (apic->regs + reg_off)); } =20 +static inline u64 kvm_lapic_get_reg64(struct kvm_lapic *apic, int reg_off) +{ + return *((u64 *) (apic->regs + reg_off)); +} + static inline void __kvm_lapic_set_reg(char *regs, int reg_off, u32 val) { *((u32 *) (regs + reg_off)) =3D val; --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BFD9C433F5 for ; Fri, 31 Dec 2021 14:59:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231192AbhLaO7v (ORCPT ); Fri, 31 Dec 2021 09:59:51 -0500 Received: from mga01.intel.com ([192.55.52.88]:17561 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231197AbhLaO71 (ORCPT ); Fri, 31 Dec 2021 09:59:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962767; x=1672498767; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=lLNpEPeCbg7gqzROgjx2JisXx9uznvx/oFvCKKg5h78=; b=ju9qwqSTzU87yepcaJ5uxzm8P3H9y9ZP768t3kMa2kLVHzaOc6X/hCty EamdpK3hbtFov6w3j9+YUqsVQw88p9KAYmuktqSYOKNSdiBeo0ixUW59t zu5/l/lepu7jI8UhKWezPGJQBERex6/9T5U9mWZugazsLGR/lWmyg5ckR /yeTCVIrBXDbmooNthzWjzP8dLDrTy/CWtzsvPjKC22nExuhEga+HmWGo XC+wVpTljMuGAE2spIhExo218OcC2m2NgDoOgLMkUEL7nItm+W/Pjx2PZ CHYPlGcmdCrcOPxEwlamdeiCQAEJnNJtZlNG4bEMNylgcFblXMnVpYdDm w==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="266070176" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="266070176" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:26 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758480" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:20 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v5 6/8] KVM: VMX: enable IPI virtualization Date: Fri, 31 Dec 2021 22:28:47 +0800 Message-Id: <20211231142849.611-7-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Gao Chao With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: Gao Chao Signed-off-by: Zeng Guang --- arch/x86/include/asm/vmx.h | 8 ++++ arch/x86/include/asm/vmxfeatures.h | 2 + arch/x86/kvm/vmx/capabilities.h | 7 +++ arch/x86/kvm/vmx/posted_intr.c | 9 +++- arch/x86/kvm/vmx/vmx.c | 74 +++++++++++++++++++++++++++--- arch/x86/kvm/vmx/vmx.h | 3 ++ 6 files changed, 94 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 8c929596a299..b79b6438acaa 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -76,6 +76,11 @@ #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE VMCS_CONTROL_BIT(USR_WAIT_PAU= SE) #define SECONDARY_EXEC_BUS_LOCK_DETECTION VMCS_CONTROL_BIT(BUS_LOCK_DETECT= ION) =20 +/* + * Definitions of Tertiary Processor-Based VM-Execution Controls. + */ +#define TERTIARY_EXEC_IPI_VIRT VMCS_CONTROL_BIT(IPI_VIRT) + #define PIN_BASED_EXT_INTR_MASK VMCS_CONTROL_BIT(INTR_EXIT= ING) #define PIN_BASED_NMI_EXITING VMCS_CONTROL_BIT(NMI_EXITI= NG) #define PIN_BASED_VIRTUAL_NMIS VMCS_CONTROL_BIT(VIRTUAL_N= MIS) @@ -159,6 +164,7 @@ static inline int vmx_misc_mseg_revid(u64 vmx_misc) enum vmcs_field { VIRTUAL_PROCESSOR_ID =3D 0x00000000, POSTED_INTR_NV =3D 0x00000002, + LAST_PID_POINTER_INDEX =3D 0x00000008, GUEST_ES_SELECTOR =3D 0x00000800, GUEST_CS_SELECTOR =3D 0x00000802, GUEST_SS_SELECTOR =3D 0x00000804, @@ -224,6 +230,8 @@ enum vmcs_field { TSC_MULTIPLIER_HIGH =3D 0x00002033, TERTIARY_VM_EXEC_CONTROL =3D 0x00002034, TERTIARY_VM_EXEC_CONTROL_HIGH =3D 0x00002035, + PID_POINTER_TABLE =3D 0x00002042, + PID_POINTER_TABLE_HIGH =3D 0x00002043, GUEST_PHYSICAL_ADDRESS =3D 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH =3D 0x00002401, VMCS_LINK_POINTER =3D 0x00002800, diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxf= eatures.h index ff20776dc83b..7ce616af2db2 100644 --- a/arch/x86/include/asm/vmxfeatures.h +++ b/arch/x86/include/asm/vmxfeatures.h @@ -86,4 +86,6 @@ #define VMX_FEATURE_ENCLV_EXITING ( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf= dependent) */ #define VMX_FEATURE_BUS_LOCK_DETECTION ( 2*32+ 30) /* "" VM-Exit when bus = lock caused */ =20 +/* Tertiary Processor-Based VM-Execution Controls, word 3 */ +#define VMX_FEATURE_IPI_VIRT (3*32 + 4) /* "" Enable IPI virtualization = */ #endif /* _ASM_X86_VMXFEATURES_H */ diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilitie= s.h index 38d414f64e61..78b0525dd991 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -12,6 +12,7 @@ extern bool __read_mostly enable_ept; extern bool __read_mostly enable_unrestricted_guest; extern bool __read_mostly enable_ept_ad_bits; extern bool __read_mostly enable_pml; +extern bool __read_mostly enable_ipiv; extern int __read_mostly pt_mode; =20 #define PT_MODE_SYSTEM 0 @@ -283,6 +284,12 @@ static inline bool cpu_has_vmx_apicv(void) cpu_has_vmx_posted_intr(); } =20 +static inline bool cpu_has_vmx_ipiv(void) +{ + return vmcs_config.cpu_based_3rd_exec_ctrl & + TERTIARY_EXEC_IPI_VIRT; +} + static inline bool cpu_has_vmx_flexpriority(void) { return cpu_has_vmx_tpr_shadow() && diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c index 1c94783b5a54..bd9c9a89726a 100644 --- a/arch/x86/kvm/vmx/posted_intr.c +++ b/arch/x86/kvm/vmx/posted_intr.c @@ -85,11 +85,16 @@ static bool vmx_can_use_vtd_pi(struct kvm *kvm) irq_remapping_cap(IRQ_POSTING_CAP); } =20 +static bool vmx_can_use_ipiv_pi(struct kvm *kvm) +{ + return irqchip_in_kernel(kvm) && enable_apicv && enable_ipiv; +} + void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu) { struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu); =20 - if (!vmx_can_use_vtd_pi(vcpu->kvm)) + if (!(vmx_can_use_ipiv_pi(vcpu->kvm) || vmx_can_use_vtd_pi(vcpu->kvm))) return; =20 /* Set SN when the vCPU is preempted */ @@ -147,7 +152,7 @@ int pi_pre_block(struct kvm_vcpu *vcpu) struct pi_desc old, new; struct pi_desc *pi_desc =3D vcpu_to_pi_desc(vcpu); =20 - if (!vmx_can_use_vtd_pi(vcpu->kvm)) + if (!(vmx_can_use_ipiv_pi(vcpu->kvm) || vmx_can_use_vtd_pi(vcpu->kvm))) return 0; =20 WARN_ON(irqs_disabled()); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 5716db9704c0..2e65464d6dee 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -104,6 +104,9 @@ module_param(fasteoi, bool, S_IRUGO); =20 module_param(enable_apicv, bool, S_IRUGO); =20 +bool __read_mostly enable_ipiv =3D true; +module_param(enable_ipiv, bool, 0444); + /* * If nested=3D1, nested virtualization is supported, i.e., guests may use * VMX and be a hypervisor for its own guests. If nested=3D0, guests may n= ot @@ -224,6 +227,11 @@ static const struct { }; =20 #define L1D_CACHE_ORDER 4 + +/* PID(Posted-Interrupt Descriptor)-pointer table entry is 64-bit long */ +#define MAX_PID_TABLE_ORDER get_order(KVM_MAX_VCPU_IDS * sizeof(u64)) +#define PID_TABLE_ENTRY_VALID 1 + static void *vmx_l1d_flush_pages; =20 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) @@ -2504,7 +2512,7 @@ static __init int setup_vmcs_config(struct vmcs_confi= g *vmcs_conf, } =20 if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) { - u64 opt3 =3D 0; + u64 opt3 =3D TERTIARY_EXEC_IPI_VIRT; u64 min3 =3D 0; =20 if (adjust_vmx_controls_64(min3, opt3, @@ -3841,6 +3849,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_v= cpu *vcpu) vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_= W); + vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR), + MSR_TYPE_RW, !enable_ipiv); } } =20 @@ -4117,14 +4127,21 @@ static void vmx_refresh_apicv_exec_ctrl(struct kvm_= vcpu *vcpu) =20 pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); if (cpu_has_secondary_exec_ctrls()) { - if (kvm_vcpu_apicv_active(vcpu)) + if (kvm_vcpu_apicv_active(vcpu)) { secondary_exec_controls_setbit(vmx, SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); - else + if (cpu_has_tertiary_exec_ctrls() && enable_ipiv) + tertiary_exec_controls_setbit(vmx, + TERTIARY_EXEC_IPI_VIRT); + } else { secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_APIC_REGISTER_VIRT | SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY); + if (cpu_has_tertiary_exec_ctrls()) + tertiary_exec_controls_clearbit(vmx, + TERTIARY_EXEC_IPI_VIRT); + } } =20 vmx_update_msr_bitmap_x2apic(vcpu); @@ -4158,7 +4175,16 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) =20 static u64 vmx_tertiary_exec_control(struct vcpu_vmx *vmx) { - return vmcs_config.cpu_based_3rd_exec_ctrl; + u64 exec_control =3D vmcs_config.cpu_based_3rd_exec_ctrl; + + /* + * IPI virtualization relies on APICv. Disable IPI + * virtualization if APICv is inhibited. + */ + if (!enable_ipiv || !kvm_vcpu_apicv_active(&vmx->vcpu)) + exec_control &=3D ~TERTIARY_EXEC_IPI_VIRT; + + return exec_control; } =20 /* @@ -4310,6 +4336,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx= *vmx) =20 static void init_vmcs(struct vcpu_vmx *vmx) { + struct kvm_vcpu *vcpu =3D &vmx->vcpu; + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(vcpu->kvm); + if (nested) nested_vmx_set_vmcs_shadowing_bitmap(); =20 @@ -4329,7 +4358,7 @@ static void init_vmcs(struct vcpu_vmx *vmx) if (cpu_has_tertiary_exec_ctrls()) tertiary_exec_controls_set(vmx, vmx_tertiary_exec_control(vmx)); =20 - if (kvm_vcpu_apicv_active(&vmx->vcpu)) { + if (kvm_vcpu_apicv_active(vcpu)) { vmcs_write64(EOI_EXIT_BITMAP0, 0); vmcs_write64(EOI_EXIT_BITMAP1, 0); vmcs_write64(EOI_EXIT_BITMAP2, 0); @@ -4339,6 +4368,13 @@ static void init_vmcs(struct vcpu_vmx *vmx) =20 vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR); vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); + + if (enable_ipiv) { + WRITE_ONCE(kvm_vmx->pid_table[vcpu->vcpu_id], + __pa(&vmx->pi_desc) | PID_TABLE_ENTRY_VALID); + vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table)); + vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->pid_last_index); + } } =20 if (!kvm_pause_in_guest(vmx->vcpu.kvm)) { @@ -4390,7 +4426,7 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1); } =20 - vmx_write_encls_bitmap(&vmx->vcpu, NULL); + vmx_write_encls_bitmap(vcpu, NULL); =20 if (vmx_pt_mode_is_host_guest()) { memset(&vmx->pt_desc, 0, sizeof(vmx->pt_desc)); @@ -4406,7 +4442,7 @@ static void init_vmcs(struct vcpu_vmx *vmx) =20 if (cpu_has_vmx_tpr_shadow()) { vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0); - if (cpu_need_tpr_shadow(&vmx->vcpu)) + if (cpu_need_tpr_shadow(vcpu)) vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, __pa(vmx->vcpu.arch.apic->regs)); vmcs_write32(TPR_THRESHOLD, 0); @@ -6963,6 +6999,18 @@ static int vmx_vm_init(struct kvm *kvm) break; } } + + if (enable_ipiv) { + struct page *pages; + + pages =3D alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PID_TABLE_ORDER); + if (!pages) + return -ENOMEM; + + to_kvm_vmx(kvm)->pid_table =3D (void *)page_address(pages); + to_kvm_vmx(kvm)->pid_last_index =3D KVM_MAX_VCPU_IDS - 1; + } + return 0; } =20 @@ -7577,6 +7625,14 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bi= t) return supported & BIT(bit); } =20 +static void vmx_vm_destroy(struct kvm *kvm) +{ + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm); + + if (kvm_vmx->pid_table) + free_pages((unsigned long)kvm_vmx->pid_table, MAX_PID_TABLE_ORDER); +} + static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .name =3D "kvm_intel", =20 @@ -7589,6 +7645,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { =20 .vm_size =3D sizeof(struct kvm_vmx), .vm_init =3D vmx_vm_init, + .vm_destroy =3D vmx_vm_destroy, =20 .vcpu_create =3D vmx_create_vcpu, .vcpu_free =3D vmx_free_vcpu, @@ -7828,6 +7885,9 @@ static __init int hardware_setup(void) if (!enable_apicv) vmx_x86_ops.sync_pir_to_irr =3D NULL; =20 + if (!enable_apicv || !cpu_has_vmx_ipiv()) + enable_ipiv =3D false; + if (cpu_has_vmx_tsc_scaling()) { kvm_has_tsc_control =3D true; kvm_max_tsc_scaling_ratio =3D KVM_VMX_TSC_MULTIPLIER_MAX; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index ee94068ca8fb..c8ae1458eb9e 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -353,6 +353,9 @@ struct kvm_vmx { unsigned int tss_addr; bool ept_identity_pagetable_done; gpa_t ept_identity_map_addr; + /* PID table for IPI virtualization */ + u64 *pid_table; + u16 pid_last_index; }; =20 bool nested_vmx_allowed(struct kvm_vcpu *vcpu); --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4CABC433FE for ; Fri, 31 Dec 2021 14:59:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231201AbhLaO7l (ORCPT ); Fri, 31 Dec 2021 09:59:41 -0500 Received: from mga12.intel.com ([192.55.52.136]:41074 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231219AbhLaO7c (ORCPT ); Fri, 31 Dec 2021 09:59:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962772; x=1672498772; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=613t3uRkyd2KPwLGDQ6MS1aekZdhOm2qmRirEa3kN+A=; b=baPQ93eZUAiC33QSH9T6LjI0n7rAuubKU0FQzyavTxaq0nLUgXj6Du+/ Vx+XLnCy3ixrDfX/wxOcNGPm/R1Q1zM6wqmJ2BEwGKpvd86KfGXq712N5 sFnUkrA0okeAH7GXVvxVickCKGjP5dPBOAkPiVEUpk96DG0PozB2L0cUH zj1N2RF5JIZnF+ds8wYZ1uQHVa0WQuN1aUxcMvGlpqegaqgL7uiX//2N4 5Tt4mnzP7fgPR9cEywhtYlKzTNNc3SlRKYLO4xeWQ0JM1l3mPcpGa5N4W jnD4mWHmywkdc3AT3ekv9HPNVuhzZAwULNWxmNMRfPWo36bELkfpbwWDP g==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="221815013" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="221815013" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:31 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758492" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:26 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v5 7/8] KVM: VMX: Update PID-pointer table entry when APIC ID is changed Date: Fri, 31 Dec 2021 22:28:48 +0800 Message-Id: <20211231142849.611-8-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In xAPIC mode, guest is allowed to modify APIC ID at runtime. If IPI virtualization is enabled, corresponding entry in PID-pointer table need change accordingly. Signed-off-by: Zeng Guang --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/lapic.c | 7 +++++-- arch/x86/kvm/vmx/vmx.c | 12 ++++++++++++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 2164b9f4c7b0..753bf2a7cebc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1493,6 +1493,7 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); =20 void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + void (*update_ipiv_pid_entry)(struct kvm_vcpu *vcpu, u8 old_id, u8 new_id= ); }; =20 struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 3ce7142ba00e..83c2c7594bcd 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2007,9 +2007,12 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 = reg, u32 val) =20 switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) + if (!apic_x2apic_mode(apic)) { + u8 old_id =3D kvm_lapic_get_reg(apic, APIC_ID) >> 24; + kvm_apic_set_xapic_id(apic, val >> 24); - else + kvm_x86_ops.update_ipiv_pid_entry(apic->vcpu, old_id, val >> 24); + } else ret =3D 1; break; =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 2e65464d6dee..f21ce15c5eb8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7633,6 +7633,17 @@ static void vmx_vm_destroy(struct kvm *kvm) free_pages((unsigned long)kvm_vmx->pid_table, MAX_PID_TABLE_ORDER); } =20 +static void vmx_update_ipiv_pid_entry(struct kvm_vcpu *vcpu, u8 old_id, u8= new_id) +{ + if (enable_ipiv && kvm_vcpu_apicv_active(vcpu)) { + u64 *pid_table =3D to_kvm_vmx(vcpu->kvm)->pid_table; + + WRITE_ONCE(pid_table[old_id], 0); + WRITE_ONCE(pid_table[new_id], __pa(&to_vmx(vcpu)->pi_desc) | + PID_TABLE_ENTRY_VALID); + } +} + static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .name =3D "kvm_intel", =20 @@ -7770,6 +7781,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .complete_emulated_msr =3D kvm_complete_insn_gp, =20 .vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector, + .update_ipiv_pid_entry =3D vmx_update_ipiv_pid_entry, }; =20 static __init void vmx_setup_user_return_msrs(void) --=20 2.27.0 From nobody Wed Jul 1 04:16:44 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8573EC433EF for ; Fri, 31 Dec 2021 14:59:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231187AbhLaO7o (ORCPT ); Fri, 31 Dec 2021 09:59:44 -0500 Received: from mga09.intel.com ([134.134.136.24]:8181 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230370AbhLaO7h (ORCPT ); Fri, 31 Dec 2021 09:59:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1640962777; x=1672498777; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=T9lZstSgT64E1WfAYsIVUx9bWf4gyoY3EbMk1/GBfFk=; b=ekAgyMjXzX7Rxk6PX9awTN12In19UTTDLMTPaCFG/pI45wgKX7wURFuh Qneafsgs+yknYT9fJinGg8WxIK0YPAIZRZHLthivOERkaohkZbhwtxl1n UJ4TnmNy2+7hJqKakWRiSTm3H5zVm+5vu/YYvREHGXbYtX8a0iaSgHjJ+ Ok/AFvpINIas8cGLnXLBk8TKUCvaVRD9VltE+i+LApo7UbZK97mUUpT2x UOSL+VZKa2qokkg7fJxs7JCy5XVLo6VfA9uvTpICwaJzXLuizGr+0Hxkr 5hCbbdyaQ+sMeNAFB2DZM7Bl6MN3RJ11rNQpeq/YZMg35V4vgCrXlO3sg g==; X-IronPort-AV: E=McAfee;i="6200,9189,10213"; a="241618910" X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="241618910" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:37 -0800 X-IronPort-AV: E=Sophos;i="5.88,251,1635231600"; d="scan'208";a="524758501" Received: from arthur-vostro-3668.sh.intel.com ([10.239.13.120]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Dec 2021 06:59:31 -0800 From: Zeng Guang To: Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, Dave Hansen , Tony Luck , Kan Liang , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Kim Phillips , Jarkko Sakkinen , Jethro Beekman , Kai Huang Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Robert Hu , Gao Chao , Zeng Guang Subject: [PATCH v5 8/8] KVM: VMX: Resize PID-ponter table on demand for IPI virtualization Date: Fri, 31 Dec 2021 22:28:49 +0800 Message-Id: <20211231142849.611-9-guang.zeng@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20211231142849.611-1-guang.zeng@intel.com> References: <20211231142849.611-1-guang.zeng@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Current kvm allocates 8 pages in advance for Posted Interrupt Descriptor pointer (PID-pointer) table to accommodate vCPUs with APIC ID up to KVM_MAX_VCPU_IDS - 1. This policy wastes some memory because most of VMs have less than 512 vCPUs and then just need one page. To reduce the memory consumption of most of VMs, KVM initially allocates one page for PID-pointer table for each VM and bumps up the table on demand according to the maximum APIC ID of all vCPUs of a VM. Bumping up PID-pointer table involves allocating a new table, requesting all vCPUs to update related VMCS fields and freeing the old table. In worst case that new memory allocation fails, KVM keep using the present PID-pointer table. Thus IPI virtualization won't take effect to those vCPUs not set in the table without impact on others. Signed-off-by: Zeng Guang --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 3 ++ arch/x86/kvm/vmx/vmx.c | 77 +++++++++++++++++++++++++----- arch/x86/kvm/vmx/vmx.h | 6 +++ arch/x86/kvm/x86.c | 2 + 5 files changed, 78 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index cefe1d81e2e8..847246f2537d 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -121,6 +121,7 @@ KVM_X86_OP_NULL(enable_direct_tlbflush) KVM_X86_OP_NULL(migrate_timers) KVM_X86_OP(msr_filter_changed) KVM_X86_OP_NULL(complete_emulated_msr) +KVM_X86_OP(update_ipiv_pid_table) =20 #undef KVM_X86_OP #undef KVM_X86_OP_NULL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 753bf2a7cebc..24990d4e94c4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -102,6 +102,8 @@ #define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29) #define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \ KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +#define KVM_REQ_PID_TABLE_UPDATE \ + KVM_ARCH_REQ_FLAGS(31, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) =20 #define CR0_RESERVED_BITS \ (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \ @@ -1494,6 +1496,7 @@ struct kvm_x86_ops { =20 void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); void (*update_ipiv_pid_entry)(struct kvm_vcpu *vcpu, u8 old_id, u8 new_id= ); + void (*update_ipiv_pid_table)(struct kvm_vcpu *vcpu); }; =20 struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index f21ce15c5eb8..fb8e2b52b5f7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -228,8 +228,9 @@ static const struct { =20 #define L1D_CACHE_ORDER 4 =20 -/* PID(Posted-Interrupt Descriptor)-pointer table entry is 64-bit long */ -#define MAX_PID_TABLE_ORDER get_order(KVM_MAX_VCPU_IDS * sizeof(u64)) +/* Each entry in PID(Posted-Interrupt Descriptor)-pointer table is 8 bytes= */ +#define table_index_to_size(index) ((index) << 3) +#define table_size_to_index(size) ((size) >> 3) #define PID_TABLE_ENTRY_VALID 1 =20 static void *vmx_l1d_flush_pages; @@ -4332,6 +4333,42 @@ static u32 vmx_secondary_exec_control(struct vcpu_vm= x *vmx) return exec_control; } =20 +static int vmx_alloc_pid_table(struct kvm_vmx *kvm_vmx, int order) +{ + u64 *pid_table; + + pid_table =3D (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order); + if (!pid_table) + return -ENOMEM; + + kvm_vmx->pid_table =3D pid_table; + kvm_vmx->pid_last_index =3D table_size_to_index(PAGE_SIZE << order) - 1; + return 0; +} + +static int vmx_expand_pid_table(struct kvm_vmx *kvm_vmx, int entry_idx) +{ + u64 *last_pid_table; + int last_table_size, new_order; + + if (entry_idx <=3D kvm_vmx->pid_last_index) + return 0; + + last_pid_table =3D kvm_vmx->pid_table; + last_table_size =3D table_index_to_size(kvm_vmx->pid_last_index + 1); + new_order =3D get_order(table_index_to_size(entry_idx + 1)); + + if (vmx_alloc_pid_table(kvm_vmx, new_order)) + return -ENOMEM; + + memcpy(kvm_vmx->pid_table, last_pid_table, last_table_size); + kvm_make_all_cpus_request(&kvm_vmx->kvm, KVM_REQ_PID_TABLE_UPDATE); + + /* Now old PID table can be freed safely as no vCPU is using it. */ + free_pages((unsigned long)last_pid_table, get_order(last_table_size)); + return 0; +} + #define VMX_XSS_EXIT_BITMAP 0 =20 static void init_vmcs(struct vcpu_vmx *vmx) @@ -4370,10 +4407,19 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); =20 if (enable_ipiv) { - WRITE_ONCE(kvm_vmx->pid_table[vcpu->vcpu_id], - __pa(&vmx->pi_desc) | PID_TABLE_ENTRY_VALID); + down_write(&kvm_vmx->pid_table_lock); + + /* + * In case new memory allocation for PID table fails, + * skip setting Posted-Interrupt descriptor of current + * vCPU which index is beyond present table limit. + */ + if (!vmx_expand_pid_table(kvm_vmx, vcpu->vcpu_id)) + WRITE_ONCE(kvm_vmx->pid_table[vcpu->vcpu_id], + __pa(&vmx->pi_desc) | PID_TABLE_ENTRY_VALID); vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table)); vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->pid_last_index); + up_write(&kvm_vmx->pid_table_lock); } } =20 @@ -7001,14 +7047,11 @@ static int vmx_vm_init(struct kvm *kvm) } =20 if (enable_ipiv) { - struct page *pages; + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm); =20 - pages =3D alloc_pages(GFP_KERNEL | __GFP_ZERO, MAX_PID_TABLE_ORDER); - if (!pages) + if (vmx_alloc_pid_table(kvm_vmx, 0)) return -ENOMEM; - - to_kvm_vmx(kvm)->pid_table =3D (void *)page_address(pages); - to_kvm_vmx(kvm)->pid_last_index =3D KVM_MAX_VCPU_IDS - 1; + init_rwsem(&kvm_vmx->pid_table_lock); } =20 return 0; @@ -7630,7 +7673,18 @@ static void vmx_vm_destroy(struct kvm *kvm) struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(kvm); =20 if (kvm_vmx->pid_table) - free_pages((unsigned long)kvm_vmx->pid_table, MAX_PID_TABLE_ORDER); + free_pages((unsigned long)kvm_vmx->pid_table, + get_order(table_index_to_size(kvm_vmx->pid_last_index))); +} + +static void vmx_update_ipiv_pid_table(struct kvm_vcpu *vcpu) +{ + struct kvm_vmx *kvm_vmx =3D to_kvm_vmx(vcpu->kvm); + + down_read(&kvm_vmx->pid_table_lock); + vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table)); + vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->pid_last_index); + up_read(&kvm_vmx->pid_table_lock); } =20 static void vmx_update_ipiv_pid_entry(struct kvm_vcpu *vcpu, u8 old_id, u8= new_id) @@ -7782,6 +7836,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { =20 .vcpu_deliver_sipi_vector =3D kvm_vcpu_deliver_sipi_vector, .update_ipiv_pid_entry =3D vmx_update_ipiv_pid_entry, + .update_ipiv_pid_table =3D vmx_update_ipiv_pid_table, }; =20 static __init void vmx_setup_user_return_msrs(void) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c8ae1458eb9e..8c437a7be08a 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -356,6 +356,12 @@ struct kvm_vmx { /* PID table for IPI virtualization */ u64 *pid_table; u16 pid_last_index; + /* + * Protects accesses to pid_table and pid_last_index. + * Request to reallocate and update PID table could + * happen on multiple vCPUs simultaneously. + */ + struct rw_semaphore pid_table_lock; }; =20 bool nested_vmx_allowed(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9a2972fdae82..97ec2adb76bd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9783,6 +9783,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) =20 if (kvm_check_request(KVM_REQ_UPDATE_CPU_DIRTY_LOGGING, vcpu)) static_call(kvm_x86_update_cpu_dirty_logging)(vcpu); + if (kvm_check_request(KVM_REQ_PID_TABLE_UPDATE, vcpu)) + static_call(kvm_x86_update_ipiv_pid_table)(vcpu); } =20 if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win || --=20 2.27.0