From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C36DEB64DA for ; Thu, 20 Jul 2023 23:33:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229937AbjGTXdT (ORCPT ); Thu, 20 Jul 2023 19:33:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229797AbjGTXdN (ORCPT ); Thu, 20 Jul 2023 19:33:13 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 952A12713; Thu, 20 Jul 2023 16:33:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895992; x=1721431992; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=l4LwBHb7iNSeo6ft+Y7idnNciA256kiEVCFl1gdyWmg=; b=Xv5FCQxQ1LMSG9Ba16RjywjK7U3zEWDtZRZuLq97fybfUUVk/wKZls71 QmiybSlApZ9aDBgbST5lZRW2yW3ibtrwYL4Sk/x9ptRlfKCBJMmEhitzL te61PXSsXdnmd4wE4A5+FokKKebmqYOR8ZDCUhUV55zgQQeAQnhUb3ytT ivGZ433fnNI4eaFyyT6u1KIBDy1fFIT/NbHmz+kQRzvGpG35QgMerFszD Y4ZNeIjmEMQ2AdKps8oDhsYzuScQkDhewuai8VgFXG0OHHrdHeBQpCXrF Kw2VQf0EnzUGfRlP/ERHxee8K1mCaVE88Y4rECoUQSALi/vSKRyHUWio+ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355902" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355902" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891778" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891778" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:10 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 01/10] KVM: x86: Add is_vm_type_supported callback Date: Thu, 20 Jul 2023 16:32:47 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata For TDX, allow the backend can override the supported vm type. Add KVM_X86_TDX_VM to reserve the bit. Signed-off-by: Isaku Yamahata --- Changes v3 -> v4: - Added KVM_X86_SNP_VM Changes v2 -> v3: - no change - didn't bother to rename KVM_X86_PROTECTED_VM to KVM_X86_SW_PROTECTED_VM Changes v1 -> v2 - no change --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/uapi/asm/kvm.h | 2 ++ arch/x86/kvm/svm/svm.c | 7 +++++++ arch/x86/kvm/vmx/vmx.c | 7 +++++++ arch/x86/kvm/x86.c | 12 +++++++++++- arch/x86/kvm/x86.h | 2 ++ 7 files changed, 31 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 13bc212cd4bc..c0143906fe6d 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -20,6 +20,7 @@ KVM_X86_OP(hardware_disable) KVM_X86_OP(hardware_unsetup) KVM_X86_OP(has_emulated_msr) KVM_X86_OP(vcpu_after_set_cpuid) +KVM_X86_OP(is_vm_type_supported) KVM_X86_OP(vm_init) KVM_X86_OP_OPTIONAL(vm_destroy) KVM_X86_OP_OPTIONAL_RET0(vcpu_precreate) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index bbefd79b7950..2c9350aa0da4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1543,6 +1543,7 @@ struct kvm_x86_ops { bool (*has_emulated_msr)(struct kvm *kvm, u32 index); void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu); =20 + bool (*is_vm_type_supported)(unsigned long vm_type); unsigned int vm_size; int (*vm_init)(struct kvm *kvm); void (*vm_destroy)(struct kvm *kvm); diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index a448d0964fc0..aa7a56a47564 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -564,5 +564,7 @@ struct kvm_pmu_event_filter { =20 #define KVM_X86_DEFAULT_VM 0 #define KVM_X86_SW_PROTECTED_VM 1 +#define KVM_X86_TDX_VM 2 +#define KVM_X86_SNP_VM 3 =20 #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d381ad424554..d681dd7ad397 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4768,6 +4768,12 @@ static void svm_vm_destroy(struct kvm *kvm) sev_vm_destroy(kvm); } =20 +static bool svm_is_vm_type_supported(unsigned long type) +{ + /* FIXME: Check if CPU is capable of SEV-SNP. */ + return __kvm_is_vm_type_supported(type); +} + static int svm_vm_init(struct kvm *kvm) { if (!pause_filter_count || !pause_filter_thresh) @@ -4796,6 +4802,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D { .vcpu_free =3D svm_vcpu_free, .vcpu_reset =3D svm_vcpu_reset, =20 + .is_vm_type_supported =3D svm_is_vm_type_supported, .vm_size =3D sizeof(struct kvm_svm), .vm_init =3D svm_vm_init, .vm_destroy =3D svm_vm_destroy, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 946380b53cf5..693f07b80966 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7511,6 +7511,12 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu) return err; } =20 +static bool vmx_is_vm_type_supported(unsigned long type) +{ + /* TODO: Check if TDX is supported. */ + return __kvm_is_vm_type_supported(type); +} + #define L1TF_MSG_SMT "L1TF CPU bug present and SMT on, data leak possible.= See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/h= w-vuln/l1tf.html for details.\n" #define L1TF_MSG_L1D "L1TF CPU bug present and virtualization mitigation d= isabled, data leak possible. See CVE-2018-3646 and https://www.kernel.org/d= oc/html/latest/admin-guide/hw-vuln/l1tf.html for details.\n" =20 @@ -8180,6 +8186,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .hardware_disable =3D vmx_hardware_disable, .has_emulated_msr =3D vmx_has_emulated_msr, =20 + .is_vm_type_supported =3D vmx_is_vm_type_supported, .vm_size =3D sizeof(struct kvm_vmx), .vm_init =3D vmx_vm_init, .vm_destroy =3D vmx_vm_destroy, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index de195ad83ec0..fd6c05d1883c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4427,12 +4427,18 @@ static int kvm_ioctl_get_supported_hv_cpuid(struct = kvm_vcpu *vcpu, return 0; } =20 -static bool kvm_is_vm_type_supported(unsigned long type) +bool __kvm_is_vm_type_supported(unsigned long type) { return type =3D=3D KVM_X86_DEFAULT_VM || (type =3D=3D KVM_X86_SW_PROTECTED_VM && IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_enabled); } +EXPORT_SYMBOL_GPL(__kvm_is_vm_type_supported); + +static bool kvm_is_vm_type_supported(unsigned long type) +{ + return static_call(kvm_x86_is_vm_type_supported)(type); +} =20 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) { @@ -4628,6 +4634,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, lo= ng ext) r =3D BIT(KVM_X86_DEFAULT_VM); if (kvm_is_vm_type_supported(KVM_X86_SW_PROTECTED_VM)) r |=3D BIT(KVM_X86_SW_PROTECTED_VM); + if (kvm_is_vm_type_supported(KVM_X86_TDX_VM)) + r |=3D BIT(KVM_X86_TDX_VM); + if (kvm_is_vm_type_supported(KVM_X86_SNP_VM)) + r |=3D BIT(KVM_X86_SNP_VM); break; default: break; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 82e3dafc5453..7de3a45f655a 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -9,6 +9,8 @@ #include "kvm_cache_regs.h" #include "kvm_emulate.h" =20 +bool __kvm_is_vm_type_supported(unsigned long type); + struct kvm_caps { /* control of guest tsc rate supported? */ bool has_tsc_control; --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 001BEEB64DA for ; Thu, 20 Jul 2023 23:33:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229890AbjGTXdQ (ORCPT ); Thu, 20 Jul 2023 19:33:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229635AbjGTXdN (ORCPT ); Thu, 20 Jul 2023 19:33:13 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0542271E; Thu, 20 Jul 2023 16:33:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895992; x=1721431992; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XgEvWWJfCSkaHXBU9qxLzS4Sis3yAaVmilVG11RQU2I=; b=L9wSSiW8AyJ6wMdwFw4Na4MHLObdhz0U4eI7XThUVgCgRapwy0/jU8+Y c/2chd8rQk0gAoXZiyxP9dOfetmdFPTi8TBrH0WHAnCOg2HfOqik3fPj1 Rvs6CWVlk4INIrCBOoHlVv+7yohrjZjll+Vau7lBXQzUim+0Q3/5n4YnE 2gdZ2ItVziMQJBvktwU1mClGix1niEt0N7QzXwvgVD1QdM1QyUCAiI5Fp xW+eaiiUVC8ZUjVUzoCvVg1B2aL8FDFCEgRcR5M0maxyr+vbymhMb2NyT FBH/3hmHgXigxoRfwzgAKQ9OLATpmOO71trghqHbS14TLpbavM+IPK4pd g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355909" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355909" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891782" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891782" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:10 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 02/10] KVM: x86/mmu: Guard against collision with KVM-defined PFERR_IMPLICIT_ACCESS Date: Thu, 20 Jul 2023 16:32:48 -0700 Message-Id: <0d71b1cdd5d901478cbfd421b4b0071cce44e16a.1689893403.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add an assertion in kvm_mmu_page_fault() to ensure the error code provided by hardware doesn't conflict with KVM's software-defined IMPLICIT_ACCESS flag. In the unlikely scenario that future hardware starts using bit 48 for a hardware-defined flag, preserving the bit could result in KVM incorrectly interpreting the unknown flag as KVM's IMPLICIT_ACCESS flag. WARN so that any such conflict can be surfaced to KVM developers and resolved, but otherwise ignore the bit as KVM can't possibly rely on a flag it knows nothing about. Fixes: 4f4aa80e3b88 ("KVM: X86: Handle implicit supervisor access with SMAP= ") Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 05943ccb55a4..a9bbc20c7dfd 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5822,6 +5822,17 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcp= u, gpa_t cr2_or_gpa, u64 err int r, emulation_type =3D EMULTYPE_PF; bool direct =3D vcpu->arch.mmu->root_role.direct; =20 + /* + * IMPLICIT_ACCESS is a KVM-defined flag used to correctly perform SMAP + * checks when emulating instructions that triggers implicit access. + * WARN if hardware generates a fault with an error code that collides + * with the KVM-defined value. Clear the flag and continue on, i.e. + * don't terminate the VM, as KVM can't possibly be relying on a flag + * that KVM doesn't know about. + */ + if (WARN_ON_ONCE(error_code & PFERR_IMPLICIT_ACCESS)) + error_code &=3D ~PFERR_IMPLICIT_ACCESS; + if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) return RET_PF_RETRY; =20 --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D8BCEB64DA for ; Thu, 20 Jul 2023 23:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230010AbjGTXdW (ORCPT ); Thu, 20 Jul 2023 19:33:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229593AbjGTXdP (ORCPT ); Thu, 20 Jul 2023 19:33:15 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E7682726; Thu, 20 Jul 2023 16:33:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895993; x=1721431993; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6QAyRks6GCgs8PQtb9mnH47VZqnkkcQc1XdzVo3l3HM=; b=joiRtRd2tjuGayqlgyuvHQsd3mLe890V1jIGSKhsjjUf2uyhYiM7BV1V GSWBJuTCvqv+FgMRuVQvfCmCzCbKKFA7444Hl85iWGRdsG95DzAuW3em8 XKqy/n2tFv7CBV/3Z20IQzFVWuUIzffIhl9dpfoGosHprqzRxgXiiAU2A gOgOEtwV/uTgSFr+KGAZSGIdSF6WQU1YC7S5oISxKrTHKry6siVjMehrm 18oYVBXTvRvTv9M0+0MjgC+Hb/v2Q6g4T67wastBe5/TRoyw57CXzSc/4 u5Ac/UBfO1Ci4byV9CylNQEsoiTBqTIpHVJbT3pRPB7eQhHGZ1PDooyRk Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355918" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355918" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891785" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891785" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:11 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 03/10] KVM: x86/mmu: Pass around full 64-bit error code for the KVM page fault Date: Thu, 20 Jul 2023 16:32:49 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Because the full 64-bit error code, or more info about the fault, for the KVM page fault will be needed for protected VM, TDX and SEV-SNP, update kvm_mmu_do_page_fault() to accept the 64-bit value so it can pass it to the callbacks. The upper 32 bits of error code are discarded at kvm_mmu_page_fault() by lower_32_bits(). Now it's passed down as full 64 bits. Currently two hardware defined bits, PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK, and one software defined bit, PFERR_IMPLICIT_ACCESS, is defined. PFERR_IMPLICIT_ACCESS: commit 4f4aa80e3b88 ("KVM: X86: Handle implicit supervisor access with SMAP= ") introduced a software defined bit PFERR_IMPLICIT_ACCESS at bit 48 to indicate implicit access for SMAP with instruction emulator. Concretely emulator_read_std() and emulator_write_std() set the bit. permission_fault() checks the bit as smap implicit access. The vendor page fault handler shouldn't pass the bit to kvm_mmu_page_fault(). PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK: commit 147277540bbc ("kvm: svm: Add support for additional SVM NPF error co= des") introduced them to optimize the nested page fault handling. Other code path doesn't use the bits. Those two bits can be safely passed down without functionality change. The accesses of fault->error_code are as follows - FNAME(page_fault): PFERR_IMPLICIT_ACCESS shouldn't be passed down. PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK aren't used. - kvm_mmu_page_fault(): explicit mask with PFERR_RSVD_MASK, and PFERR_NESTED_GUEST_PAGE is used outside of the masking upper 32 bits. - mmutrace: change u32 -> u64 - pgprintk(): change %x -> %llx No functional change is intended. This is a preparation to pass on more info with page fault error code. Signed-off-by: Isaku Yamahata --- Changes v2 -> v3: - Make depends on a patch to clear PFERR_IMPLICIT_ACCESS - drop clearing the upper 32 bit, instead just pass whole 64 bits - update commit message to mention about PFERR_IMPLICIT_ACCESS and PFERR_NESTED_GUEST_PAGE Changes v1 -> v2: - no change --- arch/x86/kvm/mmu/mmu.c | 5 ++--- arch/x86/kvm/mmu/mmu_internal.h | 4 ++-- arch/x86/kvm/mmu/mmutrace.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a9bbc20c7dfd..a2fe091e327a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4523,7 +4523,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault static int nonpaging_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { - pgprintk("%s: gva %lx error %x\n", __func__, fault->addr, fault->error_co= de); + pgprintk("%s: gva %llx error %llx\n", __func__, fault->addr, fault->error= _code); =20 /* This path builds a PAE pagetable, we can map 2mb pages at maximum. */ fault->max_level =3D PG_LEVEL_2M; @@ -5844,8 +5844,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu= , gpa_t cr2_or_gpa, u64 err } =20 if (r =3D=3D RET_PF_INVALID) { - r =3D kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, - lower_32_bits(error_code), false, + r =3D kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false, &emulation_type); if (KVM_BUG_ON(r =3D=3D RET_PF_INVALID, vcpu->kvm)) return -EIO; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index f1786698ae00..7f9ec1e5b136 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -191,7 +191,7 @@ static inline bool is_nx_huge_page_enabled(struct kvm *= kvm) struct kvm_page_fault { /* arguments to kvm_mmu_do_page_fault. */ const gpa_t addr; - const u32 error_code; + const u64 error_code; const bool prefetch; =20 /* Derived from error_code. */ @@ -283,7 +283,7 @@ enum { }; =20 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_o= r_gpa, - u32 err, bool prefetch, int *emulation_type) + u64 err, bool prefetch, int *emulation_type) { struct kvm_page_fault fault =3D { .addr =3D cr2_or_gpa, diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h index 2d7555381955..2e77883c92f6 100644 --- a/arch/x86/kvm/mmu/mmutrace.h +++ b/arch/x86/kvm/mmu/mmutrace.h @@ -261,7 +261,7 @@ TRACE_EVENT( TP_STRUCT__entry( __field(int, vcpu_id) __field(gpa_t, cr2_or_gpa) - __field(u32, error_code) + __field(u64, error_code) __field(u64 *, sptep) __field(u64, old_spte) __field(u64, new_spte) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 0662e0278e70..42d48b1ec7b3 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -758,7 +758,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, str= uct kvm_page_fault *fault struct guest_walker walker; int r; =20 - pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_cod= e); + pgprintk("%s: addr %llx err %llx\n", __func__, fault->addr, fault->error_= code); WARN_ON_ONCE(fault->is_tdp); =20 /* --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22F84C0015E for ; Thu, 20 Jul 2023 23:33:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229652AbjGTXdZ (ORCPT ); Thu, 20 Jul 2023 19:33:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229848AbjGTXdP (ORCPT ); Thu, 20 Jul 2023 19:33:15 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C18E9271E; Thu, 20 Jul 2023 16:33:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895993; x=1721431993; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ax6ORuGmO7dyoYVgLyCg1s0HQk7F5KLKTu96xRoFvXU=; b=PqyrxRRG9Wyao9T3c/Wn5MRCW2ZofQrWwWzFLDKRcQYucXnojhh/623z tGMxK+inQiwJOHOEANuxVSMtL9TWA9ZkGFmOv4/rU/QDlhgyBtf41IoF3 TqJQIQb5AxYu51Ris7XAfdfLJVB+Eqbn0DGMOy6gslhVZFMdCtbO/qwkn czhN9XUwma/4l0YSC3tF/DVOIniIL/DVi46zbkJ49dhOCn9bwvOrarN1O FAX2bCYeYFhWpaiZo+DFkipdsih7zpNEWkg7uE9y2jXnSgpDmLFGrQ3AS vn1+L1bDXUpi5i8y9GuTc5DQca3PX1dqannP/+lhE/45bC6NA7fZ7fZQi g==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355928" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355928" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891789" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891789" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:11 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 04/10] KVM: x86: Introduce PFERR_GUEST_ENC_MASK to indicate fault is private Date: Thu, 20 Jul 2023 16:32:50 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Add two PFERR codes to designate that the page fault is private and that it requires looking up memory attributes. The vendor kvm page fault handler should set PFERR_GUEST_ENC_MASK bit based on their fault information. It may or may not use the hardware value directly or parse the hardware value to set the bit. For KVM_X86_PROTECTED_VM, ask memory attributes for the fault privateness. Signed-off-by: Isaku Yamahata --- Changes v3 -> v4: - rename back struct kvm_page_fault::private =3D> is_private - catch up rename: KVM_X86_PROTECTED_VM =3D> KVM_X86_SW_PROTECTED_VM Changes v2 -> v3: - Revive PFERR_GUEST_ENC_MASK - rename struct kvm_page_fault::is_private =3D> private - Add check KVM_X86_PROTECTED_VM Changes v1 -> v2: - Introduced fault type and replaced is_private with fault_type. - Add kvm_get_fault_type() to encapsulate the difference. --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu/mmu.c | 8 ++++++-- arch/x86/kvm/mmu/mmu_internal.h | 14 +++++++++++++- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 2c9350aa0da4..ab7d080bf544 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -255,6 +255,7 @@ enum x86_intercept_stage; #define PFERR_SGX_BIT 15 #define PFERR_GUEST_FINAL_BIT 32 #define PFERR_GUEST_PAGE_BIT 33 +#define PFERR_GUEST_ENC_BIT 34 #define PFERR_IMPLICIT_ACCESS_BIT 48 =20 #define PFERR_PRESENT_MASK BIT(PFERR_PRESENT_BIT) @@ -266,6 +267,7 @@ enum x86_intercept_stage; #define PFERR_SGX_MASK BIT(PFERR_SGX_BIT) #define PFERR_GUEST_FINAL_MASK BIT_ULL(PFERR_GUEST_FINAL_BIT) #define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT) +#define PFERR_GUEST_ENC_MASK BIT_ULL(PFERR_GUEST_ENC_BIT) #define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT) =20 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a2fe091e327a..d2ebe26fb822 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4399,8 +4399,12 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, = struct kvm_page_fault *fault return RET_PF_EMULATE; } =20 - if (fault->is_private !=3D kvm_mem_is_private(vcpu->kvm, fault->gfn)) - return kvm_do_memory_fault_exit(vcpu, fault); + if (fault->is_private !=3D kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + if (vcpu->kvm->arch.vm_type =3D=3D KVM_X86_SW_PROTECTED_VM) + return RET_PF_RETRY; + else + return kvm_do_memory_fault_exit(vcpu, fault); + } =20 if (fault->is_private) return kvm_faultin_pfn_private(vcpu, fault); diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 7f9ec1e5b136..4f8f83546c37 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -282,6 +282,18 @@ enum { RET_PF_SPURIOUS, }; =20 +static inline bool kvm_is_fault_private(struct kvm *kvm, gpa_t gpa, u64 er= ror_code) +{ + /* + * This is racy with mmu_seq. If we hit a race, it would result in a + * spurious KVM_EXIT_MEMORY_FAULT. + */ + if (kvm->arch.vm_type =3D=3D KVM_X86_SW_PROTECTED_VM) + return kvm_mem_is_private(kvm, gpa_to_gfn(gpa)); + + return error_code & PFERR_GUEST_ENC_MASK; +} + static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_o= r_gpa, u64 err, bool prefetch, int *emulation_type) { @@ -295,13 +307,13 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vc= pu *vcpu, gpa_t cr2_or_gpa, .user =3D err & PFERR_USER_MASK, .prefetch =3D prefetch, .is_tdp =3D likely(vcpu->arch.mmu->page_fault =3D=3D kvm_tdp_page_fault), + .is_private =3D kvm_is_fault_private(vcpu->kvm, cr2_or_gpa, err), .nx_huge_page_workaround_enabled =3D is_nx_huge_page_enabled(vcpu->kvm), =20 .max_level =3D KVM_MAX_HUGEPAGE_LEVEL, .req_level =3D PG_LEVEL_4K, .goal_level =3D PG_LEVEL_4K, - .is_private =3D kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT), }; int r; =20 --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06AABEB64DD for ; Thu, 20 Jul 2023 23:33:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230004AbjGTXd3 (ORCPT ); Thu, 20 Jul 2023 19:33:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229862AbjGTXdP (ORCPT ); Thu, 20 Jul 2023 19:33:15 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E8B12727; Thu, 20 Jul 2023 16:33:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895994; x=1721431994; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rvpwGTo756PNEl2wWmMC9MHOKfm+GNxe+wwXhyK+d9U=; b=S0JZ1TabYQeEafQWRORBX639g3kz84njEPF4qo9f/w8SBpSFdIe12Anq X0jxde/oncS/Ajp+piq/119+l3v3uJwOtfmZzaHpJ46xb8rnLoqBUy51U e7nW5kJCqvdtYrTUJcuHsT61o+KuZo9CeiSIDD4FE1tX7emZebExxTvcL Q2W6A+KqYWZfxMkO496pehB0ygpeze84vuh8a2bwIx1OTP2kGFWXYLt7g N5myOEfqDFoKXiOfByOOcmCl8rtcNs4+kbqZ+orzVtE2KF8Q62IrkETij G9WHwGR9bhcRpF6rKSJB60vZTuy3QKwvk+SgCZUFhZxMIphk5GoHS4LRm Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355935" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355935" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891793" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891793" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:11 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 05/10] KVM: Add new members to struct kvm_gfn_range to operate on Date: Thu, 20 Jul 2023 16:32:51 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata TDX needs to know which mapping to operate on. Shared-EPT vs. Secure-EPT. The following sequence to convert the GPA to private doesn't work for TDX because the page can already be private. 1) Update memory attributes to private in memory attributes xarray 2) Zap the GPA range irrespective of private-or-shared. Even if the page is already private, zap the entry. 3) EPT violation on the GPA 4) Populate the GPA as private The page is zeroed, and the guest has to accept the page again. In step 2, TDX wants to zap only shared pages and skip private ones. Add new members to strut kvm_gfn_range to indicate which mapping (private-vs-shared) to operate on. only_private and only_shared. Update mmu notifier, set memory attributes ioctl or KVM gmem callback to initialize them. - If operating on a file to back shared pages, zap shared pages only. It's the mmu notifier. (only_private, only_shared) =3D (false, true) - If operating a file to back private pages, zap private pages only. It's the KVM gmem. (only_private, only_shared) =3D (true, false) - If setting memory attributes, vendor callback checks new attributes and make decisions. SNP would do nothing and handle it later with gmem callback TDX callback would do as follows. When it converts pages to shared, zap private pages only. When it converts pages to private, zap shared pages only. (only_private, only_shared) =3D (false, false) - If operating on both backing files, zap both private and shared pages. This is when destructing guest. (only_private, only_shared) =3D (true, true) Suggested-by: Sean Christopherson Signed-off-by: Isaku Yamahata --- Changes v3 -> v4: - rebased v11 kvm gmem. Changes v2 -> v3: - Drop the KVM_GFN_RANGE flags - Updated struct kvm_gfn_range - Change kvm_arch_set_memory_attributes() to return bool for flush - Added set_memory_attributes x86 op for vendor backends - Refined commit message to describe TDX care concretely Changes v1 -> v2: - consolidate KVM_GFN_RANGE_FLAGS_GMEM_{PUNCH_HOLE, RELEASE} into KVM_GFN_RANGE_FLAGS_GMEM. - Update the commit message to describe TDX more. Drop SEV_SNP. --- include/linux/kvm_host.h | 2 ++ virt/kvm/guest_mem.c | 2 ++ virt/kvm/kvm_main.c | 4 ++++ 3 files changed, 8 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 091bc89ae805..ce4d91585368 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -268,6 +268,8 @@ struct kvm_gfn_range { u64 raw; } arg; bool may_block; + bool only_private; + bool only_shared; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range); diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 384671a55b41..ac185c776cda 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -105,6 +105,8 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *= gmem, pgoff_t start, .end =3D slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, .slot =3D slot, .may_block =3D true, + .only_private =3D true, + .only_shared =3D false, }; =20 flush |=3D kvm_mmu_unmap_gfn_range(kvm, &gfn_range); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index ee331cf8ba54..4e2a2463ab19 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -603,6 +603,8 @@ static __always_inline int __kvm_handle_hva_range(struc= t kvm *kvm, */ gfn_range.arg.raw =3D range->arg.raw; gfn_range.may_block =3D range->may_block; + gfn_range.only_private =3D false; + gfn_range.only_shared =3D true; =20 /* * {gfn(page) | page intersects with [hva_start, hva_end)} =3D @@ -2405,6 +2407,8 @@ static __always_inline void kvm_handle_gfn_range(stru= ct kvm *kvm, =20 gfn_range.arg.raw =3D range->arg.raw; gfn_range.may_block =3D range->may_block; + gfn_range.only_private =3D false; + gfn_range.only_shared =3D false; =20 for (i =3D 0; i < kvm_arch_nr_memslot_as_ids(kvm); i++) { slots =3D __kvm_memslots(kvm, i); --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D242EB64DA for ; Thu, 20 Jul 2023 23:33:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229622AbjGTXdc (ORCPT ); Thu, 20 Jul 2023 19:33:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229867AbjGTXdQ (ORCPT ); Thu, 20 Jul 2023 19:33:16 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07AC3272C; Thu, 20 Jul 2023 16:33:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895995; x=1721431995; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MK2XOabpp0+ITG3nb9egYU4kf4/X+3sgx4FRHS80pAI=; b=OiFhvEJNwbaew7mF1TwykVV3bfDHXVA337OETK+MZnBJWhVgOhPbQTCh ntVYTvV8svkDbjXFInNSJJ+ok8UguVMLoO6j5MtCYwazv0hNE1VzWV7Kq rqfLLNWsFjH0N4e9bnSYw7e927Jx19XpdDPbjtp0ugtA4uSzP1ZekMYY7 VnOqrxXic1qQNosqQmKFOUKs2xr9s8o5zAQBE7xg14P6CAWGJlJqb3jNZ +eBROtDV+Ga+EoHAd9icwmxfJVQ6i6FDuXPV4vVU4XzFrgCBep5mUpcjC 9diYETxg5DU1hK9BZboGxDgDpgjtyGoisjlPPdummapMXMSqMdGrKFXft Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355943" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355943" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891796" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891796" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Brijesh Singh , Ashish Kalra Subject: [RFC PATCH v4 06/10] KVM: x86: Export the kvm_zap_gfn_range() for the SNP use Date: Thu, 20 Jul 2023 16:32:52 -0700 Message-Id: <54a48823a690d2f44bcadd36abc3274e5368903f.1689893403.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Brijesh Singh While resolving the RMP page fault, there may be cases where the page level between the RMP entry and TDP does not match and the 2M RMP entry must be split into 4K RMP entries. Or a 2M TDP page need to be broken into multiple of 4K pages. To keep the RMP and TDP page level in sync, zap the gfn range after splitting the pages in the RMP entry. The zap should force the TDP to gets rebuilt with the new page level. Signed-off-by: Brijesh Singh Signed-off-by: Ashish Kalra Signed-off-by: Michael Roth Link: https://lore.kernel.org/r/20230612042559.375660-39-michael.roth@amd.c= om --- Changes v3 -> v4: - removed redandunt blank line Changes v2 -> v3: - Newly added --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu.h | 2 -- arch/x86/kvm/mmu/mmu.c | 1 + 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index ab7d080bf544..e4f2938bb1fc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1842,6 +1842,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm, void kvm_mmu_zap_all(struct kvm *kvm); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pa= ges); +void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); =20 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3); =20 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 92d5a1924fc1..963c734642f6 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -235,8 +235,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu= , struct kvm_mmu *mmu, return -(u32)fault & errcode; } =20 -void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); - int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); =20 int kvm_mmu_post_init_vm(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d2ebe26fb822..a73ddb43a2cf 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6759,6 +6759,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *= kvm, =20 return need_tlb_flush; } +EXPORT_SYMBOL_GPL(kvm_zap_gfn_range); =20 static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm, const struct kvm_memory_slot *slot) --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C9FCEB64DA for ; Thu, 20 Jul 2023 23:33:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230105AbjGTXdj (ORCPT ); Thu, 20 Jul 2023 19:33:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229635AbjGTXdQ (ORCPT ); Thu, 20 Jul 2023 19:33:16 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8A19270D; Thu, 20 Jul 2023 16:33:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895995; x=1721431995; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2Iou5yV+PtzR5uy8SPV1ytBA3jrafg+2C5na99bH7EU=; b=YnnuP6N0eq3VkuGvDrRPxeY8WZo+ypk9/elMCCuVp8lwinf6Xgprq//r pq9+q8IF+nhwyELwgeDeWkQnc/jBeaqsZCRNi9UnFXjymLSUusfp6pFR4 3VGBqI7MIj2p366Nxmjh0GVSDuYubm1HjqK02w/hluydIop+6KmRjpjJW s8n7WrUP11foQw0lGd2aVRUl44VLTAwlUBaHQ9Zchvyj9quuCBKX50gjo 75Q5bIlZrgHZ+d34l75ZdQLvNOwi0hID4Oca9fZz6nYiBaEkYLD+EnFaq hMscO5goZYfTd3UlvUzzSTW7eHenbd1eJuI73vLSBNWlEmp9+sXbZ5spb w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355952" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355952" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891799" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891799" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 07/10] KVM: x86: Add gmem hook for initializing private memory Date: Thu, 20 Jul 2023 16:32:53 -0700 Message-Id: <21e488b6ced77c08d9e6718fcf57e100af409c64.1689893403.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Michael Roth All gmem pages are expected to be 'private' as defined by a particular arch/platform. Platforms like SEV-SNP require additional operations to move these pages into a private state, so implement a hook that can be used to prepare this memory prior to mapping it into a guest. In the case of SEV-SNP, whether or not a 2MB page can be mapped via a 2MB mapping in the guest's nested page table depends on whether or not any subpages within the range have already been initialized as private in the RMP table, so this hook will also be used by the KVM MMU to clamp the maximum mapping size accordingly. Signed-off-by: Michael Roth Link: https://lore.kernel.org/r/20230612042559.375660-2-michael.roth@amd.com --- Changes v2 -> v3: - Newly added --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/mmu/mmu.c | 12 ++++++++++-- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index c0143906fe6d..a4cb248519cf 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -134,6 +134,7 @@ KVM_X86_OP(msr_filter_changed) KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); +KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) =20 #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e4f2938bb1fc..de7f0dffa135 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1735,6 +1735,9 @@ struct kvm_x86_ops { * Returns vCPU specific APICv inhibit reasons */ unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); + + int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot, + kvm_pfn_t pfn, gfn_t gfn, u8 *max_level); }; =20 struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a73ddb43a2cf..35bb14363828 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4352,6 +4352,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *v= cpu, struct kvm_page_fault *fault) { int max_order, r; + u8 max_level; =20 if (!kvm_slot_can_be_private(fault->slot)) return kvm_do_memory_fault_exit(vcpu, fault); @@ -4361,8 +4362,15 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *= vcpu, if (r) return r; =20 - fault->max_level =3D min(kvm_max_level_for_order(max_order), - fault->max_level); + max_level =3D kvm_max_level_for_order(max_order); + r =3D static_call(kvm_x86_gmem_prepare)(vcpu->kvm, fault->slot, fault->pf= n, + fault->gfn, &max_level); + if (r) { + kvm_release_pfn_clean(fault->pfn); + return r; + } + + fault->max_level =3D min(max_level, fault->max_level); fault->map_writable =3D !(fault->slot->flags & KVM_MEM_READONLY); return RET_PF_CONTINUE; } --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAF44EB64DD for ; Thu, 20 Jul 2023 23:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230049AbjGTXdl (ORCPT ); Thu, 20 Jul 2023 19:33:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229903AbjGTXdR (ORCPT ); Thu, 20 Jul 2023 19:33:17 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 008C32726; Thu, 20 Jul 2023 16:33:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895996; x=1721431996; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ggAHQUAr6ssNNMu9eAzv9bii3A6Zv7CddzoLl/MzcTs=; b=COL4TnQsm7yecA0HJnFAUxqeWYQkr+UIUJQlDNZDwDo6Fl2Uw7TZC9iZ bX/oCxVyFWIduMdjFpLFoeT2+mzHLzzfid9a/r5ac/V4iWVE43YzXHwyK mAdKH0bEmmuK60m5GYFY9SmJchIzGXQlO5svJM8RRBUSOhAXKl5UCMlh1 9KFYfiY/CW1QdMqu7WlXcJNa1JkX+4cbIaUeswciUU+D17abQGwWuRp2W YSkkcXrVXdFuC8ercs5kT9eTeB3q5zQS9DWJCuSvgFpW1nEoXwTrja65R uX902iahVc7Hl0/ij3W2uM6K4QB2T2CTNHYVFnBgp6kA6/Td8Ue/Dtmtj Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355961" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355961" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891803" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891803" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 08/10] KVM: x86: Add gmem hook for invalidating private memory Date: Thu, 20 Jul 2023 16:32:54 -0700 Message-Id: <1233d749211c08d51f9ca5d427938d47f008af1f.1689893403.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Michael Roth TODO: add a CONFIG option that can be to completely skip arch invalidation loop and avoid __weak references for arch/platforms that don't need an additional invalidation hook. In some cases, like with SEV-SNP, guest memory needs to be updated in a platform-specific manner before it can be safely freed back to the host. Add hooks to wire up handling of this sort when freeing memory in response to FALLOC_FL_PUNCH_HOLE operations. Also issue invalidations of all allocated pages when releasing the gmem file so that the pages are not left in an unusable state when they get freed back to the host. Signed-off-by: Michael Roth Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@amd.com --- Changes v2 -> v3: - Newly added --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 6 +++++ include/linux/kvm_host.h | 3 +++ virt/kvm/guest_mem.c | 42 ++++++++++++++++++++++++++++++ 5 files changed, 53 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index a4cb248519cf..d520c6370cd6 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -135,6 +135,7 @@ KVM_X86_OP(complete_emulated_msr) KVM_X86_OP(vcpu_deliver_sipi_vector) KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) +KVM_X86_OP_OPTIONAL(gmem_invalidate) =20 #undef KVM_X86_OP #undef KVM_X86_OP_OPTIONAL diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index de7f0dffa135..440a4a13a93f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1738,6 +1738,7 @@ struct kvm_x86_ops { =20 int (*gmem_prepare)(struct kvm *kvm, struct kvm_memory_slot *slot, kvm_pfn_t pfn, gfn_t gfn, u8 *max_level); + void (*gmem_invalidate)(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t end); }; =20 struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fd6c05d1883c..2ae40fa8e178 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13284,6 +13284,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_arch_no_poll); =20 +#ifdef CONFIG_KVM_PRIVATE_MEM +void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t = end) +{ + static_call_cond(kvm_x86_gmem_invalidate)(kvm, start, end); +} +#endif =20 int kvm_spec_ctrl_test_value(u64 value) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ce4d91585368..6c5d39e429e9 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2360,6 +2360,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm= , gfn_t gfn) #ifdef CONFIG_KVM_PRIVATE_MEM int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order); +void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t = end); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -2368,6 +2369,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, KVM_BUG_ON(1, kvm); return -EIO; } + +void kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm_pfn_t = end) { } #endif /* CONFIG_KVM_PRIVATE_MEM */ =20 #endif diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index ac185c776cda..a14eaac9dbad 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -129,6 +129,46 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *g= mem, pgoff_t start, KVM_MMU_UNLOCK(kvm); } =20 +void __weak kvm_arch_gmem_invalidate(struct kvm *kvm, kvm_pfn_t start, kvm= _pfn_t end) +{ +} + +/* Handle arch-specific hooks needed before releasing guarded pages. */ +static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *= inode, + pgoff_t start, pgoff_t end) +{ + pgoff_t file_end =3D i_size_read(inode) >> PAGE_SHIFT; + pgoff_t index =3D start; + + end =3D min(end, file_end); + + while (index < end) { + struct folio *folio; + unsigned int order; + struct page *page; + kvm_pfn_t pfn; + + folio =3D __filemap_get_folio(inode->i_mapping, index, + FGP_LOCK, 0); + if (!folio) { + index++; + continue; + } + + page =3D folio_file_page(folio, index); + pfn =3D page_to_pfn(page); + order =3D folio_order(folio); + + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index= )); + + index =3D folio_next_index(folio); + folio_unlock(folio); + folio_put(folio); + + cond_resched(); + } +} + static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t= len) { struct list_head *gmem_list =3D &inode->i_mapping->private_list; @@ -145,6 +185,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, lo= ff_t offset, loff_t len) list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_begin(gmem, start, end); =20 + kvm_gmem_issue_arch_invalidate(gmem->kvm, inode, start, end); truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); =20 list_for_each_entry(gmem, gmem_list, entry) @@ -255,6 +296,7 @@ static int kvm_gmem_release(struct inode *inode, struct= file *file) * memory, as its lifetime is associated with the inode, not the file. */ kvm_gmem_invalidate_begin(gmem, 0, -1ul); + kvm_gmem_issue_arch_invalidate(gmem->kvm, inode, 0, -1ul); kvm_gmem_invalidate_end(gmem, 0, -1ul); =20 mutex_unlock(&kvm->slots_lock); --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA5E5EB64DA for ; Thu, 20 Jul 2023 23:33:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230024AbjGTXdg (ORCPT ); Thu, 20 Jul 2023 19:33:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229904AbjGTXdR (ORCPT ); Thu, 20 Jul 2023 19:33:17 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 082712733; Thu, 20 Jul 2023 16:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895996; x=1721431996; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=akaN+utMQ8juiTuY6mwV99pr2tJtji8DhcmufCIiDIg=; b=HGegAFWQnQSREkA79SBn+M+kFJ7uOCoTVlNCFUIanabPmpK8ZFxuda/o JTxQmc+guLQDQ8uBhzDYBriP/Govi0Ut5nIbIKzKb4rlaNS9sZKyfrMGF ATq+zlqqRwkoJUv6GShkhSfxeSVfAp7ueQf62Fx+TFmg+ND3JCeaRp66M ij0D9yDyn4bRbhiZCvOWUTIEBeIS+TPiqQdpjhx8WePXopB58kF9ZySAd DsI8Q2bMQLnW37Sp3khZxiWcHR9amB0DOqGNiVSwbTsCN4E1xPAYDqSWA AZ1HIvsvx7Xpa0P++2jkpFnzdPOm3QXpstYbETQX5DL434bkxxjJ0+djV w==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355969" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355969" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891808" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891808" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:12 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 09/10] KVM: x86: Make struct sev_cmd common for KVM_MEM_ENC_OP Date: Thu, 20 Jul 2023 16:32:55 -0700 Message-Id: <8c0b7babbdd777a33acd4f6b0f831ae838037806.1689893403.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata TDX KVM will use KVM_MEM_ENC_OP. Make struct sev_cmd common both for vendor backends, SEV and TDX, with rename. Make the struct common uABI for KVM_MEM_ENC_OP. TDX backend wants to return 64 bit error code instead of 32 bit. To keep ABI for SEV backend, use union to accommodate 64 bit member. Opportunistically make the implicit padding after id member to explicit flags member for future use and clarity. Some data structures for sub-commands could be common. The current candidate would be KVM_SEV{,_ES}_INIT, KVM_SEV_LAUNCH_FINISH, KVM_SEV_LAUNCH_UPDATE_VMSA, KVM_SEV_DBG_DECRYPT, and KVM_SEV_DBG_ENCRYPT. Only compile tested for SEV code. Signed-off-by: Isaku Yamahata --- Changes v3 -> v4: - newly added --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/uapi/asm/kvm.h | 33 ++++++++++++++++ arch/x86/kvm/svm/sev.c | 68 ++++++++++++++++++--------------- arch/x86/kvm/svm/svm.h | 2 +- arch/x86/kvm/x86.c | 16 +++++++- 5 files changed, 87 insertions(+), 34 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 440a4a13a93f..5ede982442a0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1710,7 +1710,7 @@ struct kvm_x86_ops { void (*enable_smi_window)(struct kvm_vcpu *vcpu); #endif =20 - int (*mem_enc_ioctl)(struct kvm *kvm, void __user *argp); + int (*mem_enc_ioctl)(struct kvm *kvm, struct kvm_mem_enc_cmd *cmd); int (*mem_enc_register_region)(struct kvm *kvm, struct kvm_enc_region *ar= gp); int (*mem_enc_unregister_region)(struct kvm *kvm, struct kvm_enc_region *= argp); int (*vm_copy_enc_context_from)(struct kvm *kvm, unsigned int source_fd); diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index aa7a56a47564..32883e520b00 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -562,6 +562,39 @@ struct kvm_pmu_event_filter { /* x86-specific KVM_EXIT_HYPERCALL flags. */ #define KVM_EXIT_HYPERCALL_LONG_MODE BIT(0) =20 +struct kvm_mem_enc_cmd { + /* sub-command id of KVM_MEM_ENC_OP. */ + __u32 id; + /* + * Auxiliary flags for sub-command. If sub-command doesn't use it, + * set zero. + */ + __u32 flags; + /* + * Data for sub-command. An immediate or a pointer to the actual + * data in process virtual address. If sub-command doesn't use it, + * set zero. + */ + __u64 data; + /* + * Supplemental error code in the case of error. + * SEV error code from the PSP or TDX SEAMCALL status code. + * The caller should set zero. + */ + union { + struct { + __u32 error; + /* + * KVM_SEV_LAUNCH_START and KVM_SEV_RECEIVE_START + * require extra data. Not included in struct + * kvm_sev_launch_start or struct kvm_sev_receive_start. + */ + __u32 sev_fd; + }; + __u64 error64; + }; +}; + #define KVM_X86_DEFAULT_VM 0 #define KVM_X86_SW_PROTECTED_VM 1 #define KVM_X86_TDX_VM 2 diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 07756b7348ae..94e13bb49c86 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -1835,30 +1835,39 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, u= nsigned int source_fd) return ret; } =20 -int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp) +int sev_mem_enc_ioctl(struct kvm *kvm, struct kvm_mem_enc_cmd *cmd) { - struct kvm_sev_cmd sev_cmd; + struct kvm_sev_cmd *sev_cmd =3D (struct kvm_sev_cmd *)cmd; int r; =20 + /* TODO: replace struct kvm_sev_cmd with kvm_mem_enc_cmd. */ + BUILD_BUG_ON(sizeof(*sev_cmd) !=3D sizeof(*cmd)); + BUILD_BUG_ON(offsetof(struct kvm_sev_cmd, id) !=3D + offsetof(struct kvm_mem_enc_cmd, id)); + BUILD_BUG_ON(sizeof(sev_cmd->id) !=3D sizeof(cmd->id)); + BUILD_BUG_ON(offsetof(struct kvm_sev_cmd, data) !=3D + offsetof(struct kvm_mem_enc_cmd, data)); + BUILD_BUG_ON(sizeof(sev_cmd->data) !=3D sizeof(cmd->data)); + BUILD_BUG_ON(offsetof(struct kvm_sev_cmd, error) !=3D + offsetof(struct kvm_mem_enc_cmd, error)); + BUILD_BUG_ON(sizeof(sev_cmd->error) !=3D sizeof(cmd->error)); + BUILD_BUG_ON(offsetof(struct kvm_sev_cmd, sev_fd) !=3D + offsetof(struct kvm_mem_enc_cmd, sev_fd)); + BUILD_BUG_ON(sizeof(sev_cmd->sev_fd) !=3D sizeof(cmd->sev_fd)); + if (!sev_enabled) return -ENOTTY; =20 - if (!argp) - return 0; - - if (copy_from_user(&sev_cmd, argp, sizeof(struct kvm_sev_cmd))) - return -EFAULT; - mutex_lock(&kvm->lock); =20 /* Only the enc_context_owner handles some memory enc operations. */ if (is_mirroring_enc_context(kvm) && - !is_cmd_allowed_from_mirror(sev_cmd.id)) { + !is_cmd_allowed_from_mirror(sev_cmd->id)) { r =3D -EINVAL; goto out; } =20 - switch (sev_cmd.id) { + switch (sev_cmd->id) { case KVM_SEV_ES_INIT: if (!sev_es_enabled) { r =3D -ENOTTY; @@ -1866,67 +1875,64 @@ int sev_mem_enc_ioctl(struct kvm *kvm, void __user = *argp) } fallthrough; case KVM_SEV_INIT: - r =3D sev_guest_init(kvm, &sev_cmd); + r =3D sev_guest_init(kvm, sev_cmd); break; case KVM_SEV_LAUNCH_START: - r =3D sev_launch_start(kvm, &sev_cmd); + r =3D sev_launch_start(kvm, sev_cmd); break; case KVM_SEV_LAUNCH_UPDATE_DATA: - r =3D sev_launch_update_data(kvm, &sev_cmd); + r =3D sev_launch_update_data(kvm, sev_cmd); break; case KVM_SEV_LAUNCH_UPDATE_VMSA: - r =3D sev_launch_update_vmsa(kvm, &sev_cmd); + r =3D sev_launch_update_vmsa(kvm, sev_cmd); break; case KVM_SEV_LAUNCH_MEASURE: - r =3D sev_launch_measure(kvm, &sev_cmd); + r =3D sev_launch_measure(kvm, sev_cmd); break; case KVM_SEV_LAUNCH_FINISH: - r =3D sev_launch_finish(kvm, &sev_cmd); + r =3D sev_launch_finish(kvm, sev_cmd); break; case KVM_SEV_GUEST_STATUS: - r =3D sev_guest_status(kvm, &sev_cmd); + r =3D sev_guest_status(kvm, sev_cmd); break; case KVM_SEV_DBG_DECRYPT: - r =3D sev_dbg_crypt(kvm, &sev_cmd, true); + r =3D sev_dbg_crypt(kvm, sev_cmd, true); break; case KVM_SEV_DBG_ENCRYPT: - r =3D sev_dbg_crypt(kvm, &sev_cmd, false); + r =3D sev_dbg_crypt(kvm, sev_cmd, false); break; case KVM_SEV_LAUNCH_SECRET: - r =3D sev_launch_secret(kvm, &sev_cmd); + r =3D sev_launch_secret(kvm, sev_cmd); break; case KVM_SEV_GET_ATTESTATION_REPORT: - r =3D sev_get_attestation_report(kvm, &sev_cmd); + r =3D sev_get_attestation_report(kvm, sev_cmd); break; case KVM_SEV_SEND_START: - r =3D sev_send_start(kvm, &sev_cmd); + r =3D sev_send_start(kvm, sev_cmd); break; case KVM_SEV_SEND_UPDATE_DATA: - r =3D sev_send_update_data(kvm, &sev_cmd); + r =3D sev_send_update_data(kvm, sev_cmd); break; case KVM_SEV_SEND_FINISH: - r =3D sev_send_finish(kvm, &sev_cmd); + r =3D sev_send_finish(kvm, sev_cmd); break; case KVM_SEV_SEND_CANCEL: - r =3D sev_send_cancel(kvm, &sev_cmd); + r =3D sev_send_cancel(kvm, sev_cmd); break; case KVM_SEV_RECEIVE_START: - r =3D sev_receive_start(kvm, &sev_cmd); + r =3D sev_receive_start(kvm, sev_cmd); break; case KVM_SEV_RECEIVE_UPDATE_DATA: - r =3D sev_receive_update_data(kvm, &sev_cmd); + r =3D sev_receive_update_data(kvm, sev_cmd); break; case KVM_SEV_RECEIVE_FINISH: - r =3D sev_receive_finish(kvm, &sev_cmd); + r =3D sev_receive_finish(kvm, sev_cmd); break; default: r =3D -EINVAL; goto out; } =20 - if (copy_to_user(argp, &sev_cmd, sizeof(struct kvm_sev_cmd))) - r =3D -EFAULT; - out: mutex_unlock(&kvm->lock); return r; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 18af7e712a5a..74ecab20c24b 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -716,7 +716,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vc= pu); extern unsigned int max_sev_asid; =20 void sev_vm_destroy(struct kvm *kvm); -int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp); +int sev_mem_enc_ioctl(struct kvm *kvm, struct kvm_mem_enc_cmd *cmd); int sev_mem_enc_register_region(struct kvm *kvm, struct kvm_enc_region *range); int sev_mem_enc_unregister_region(struct kvm *kvm, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2ae40fa8e178..ab36e8940f1b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7040,11 +7040,25 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned i= nt ioctl, unsigned long arg) goto out; } case KVM_MEMORY_ENCRYPT_OP: { + struct kvm_mem_enc_cmd cmd; + r =3D -ENOTTY; if (!kvm_x86_ops.mem_enc_ioctl) goto out; =20 - r =3D static_call(kvm_x86_mem_enc_ioctl)(kvm, argp); + if (!argp) { + r =3D 0; + goto out; + } + + if (copy_from_user(&cmd, argp, sizeof(cmd))) { + r =3D -EFAULT; + goto out; + } + r =3D static_call(kvm_x86_mem_enc_ioctl)(kvm, &cmd); + if (copy_to_user(argp, &cmd, sizeof(cmd))) + r =3D -EFAULT; + break; } case KVM_MEMORY_ENCRYPT_REG_REGION: { --=20 2.25.1 From nobody Mon Sep 8 09:47:34 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC899EB64DA for ; Thu, 20 Jul 2023 23:33:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230108AbjGTXdq (ORCPT ); Thu, 20 Jul 2023 19:33:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230003AbjGTXdW (ORCPT ); Thu, 20 Jul 2023 19:33:22 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA449273A; Thu, 20 Jul 2023 16:33:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689895996; x=1721431996; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=54f6oGhMQOvzfr4uV0Pwhmi/Poh/uRJMyo79eGxiezU=; b=BlOfBC6SMptz3gZ5eLCSfnFKDivgO+dvCKtAsnh9cRZkEvuBs1un6czK dRbmCIBhAD6mlRv97G0MTCprlutpq9qsO9dOVIcnVo8aUPBC74g/xFn++ Wq9OYi7qzHCJKxQPfoAXaT63kaWlHA9B8yl2FFM70W+iunVANMh5rByvM VBd0d2g48j6WIDowFGjtE4xMXXqMhclGtPDlHr3SoK4qMbS4G1Fx1saZs 96i7qcvdgz3svzO5fWT/1wK1BBChLcaaLl6PJYCDorx9RW6rdERtkf7OS Tc0mbb7oabIFUD2cu60+4Pm7wfI9HZC2L5KqOz6Mas0uF46crJ616gMud A==; X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="364355976" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="364355976" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10777"; a="727891811" X-IronPort-AV: E=Sophos;i="6.01,220,1684825200"; d="scan'208";a="727891811" Received: from ls.sc.intel.com (HELO localhost) ([172.25.112.31]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Jul 2023 16:33:13 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , Sean Christopherson , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao Subject: [RFC PATCH v4 10/10] KVM: X86: KVM_MEM_ENC_OP check if unused field (flags, error) is zero Date: Thu, 20 Jul 2023 16:32:56 -0700 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata This breaks uABI as the current code doesn't check padding and sev_fd when unused. Signed-off-by: Isaku Yamahata --- Changes v3 -> v4: - newly added --- arch/x86/kvm/x86.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab36e8940f1b..1d6085af6a00 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7055,6 +7055,22 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned in= t ioctl, unsigned long arg) r =3D -EFAULT; goto out; } + /* No sub-command uses flags at the moment. */ + if (cmd.flags) { + r =3D -EINVAL; + goto out; + } + if (cmd.id !=3D KVM_SEV_LAUNCH_START && + cmd.id !=3D KVM_SEV_RECEIVE_START && cmd.error64) { + r =3D -EINVAL; + goto out; + } + if ((cmd.id =3D=3D KVM_SEV_LAUNCH_START || + cmd.id =3D=3D KVM_SEV_RECEIVE_START) && cmd.error) { + r =3D -EINVAL; + goto out; + } + r =3D static_call(kvm_x86_mem_enc_ioctl)(kvm, &cmd); if (copy_to_user(argp, &cmd, sizeof(cmd))) r =3D -EFAULT; --=20 2.25.1