From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5C7C433F5 for ; Thu, 21 Apr 2022 05:13:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384267AbiDUFQF (ORCPT ); Thu, 21 Apr 2022 01:16:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1382793AbiDUFPx (ORCPT ); Thu, 21 Apr 2022 01:15:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3FC816563 for ; Wed, 20 Apr 2022 22:13:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517984; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v5Osah9g2vRm7G/Dpws6Fwb9j0/8Bm+Stdzyof2NxB4=; b=DDf6m1ujr7Ml+Nb+J2eIvpuSay5K+Qxqbs9/isDEfgdZUpQoDkVUSxUwJo5NXSqpg2cH7S sNZ8m49Li++N8t0VGtKCXdQ5H4OiRPXAXWk1kHdYbU4Gsv2v+oIbl/wUioqDPdX9IYylCm 1nsmz4AEllUro6oMzVM6zRD72pffg30= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-278-u5aCeUtBPnOB8gWikvy6Pw-1; Thu, 21 Apr 2022 01:12:58 -0400 X-MC-Unique: u5aCeUtBPnOB8gWikvy6Pw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 86CD7811E75; Thu, 21 Apr 2022 05:12:57 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 37A14145BA68; Thu, 21 Apr 2022 05:12:52 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 01/10] KVM: x86: mmu: allow to enable write tracking externally Date: Thu, 21 Apr 2022 08:12:35 +0300 Message-Id: <20220421051244.187733-2-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This will be used to enable write tracking from nested AVIC code and can also be used to enable write tracking in GVT-g module when it actually uses it as opposed to always enabling it, when the module is compiled in the kernel. No functional change intended. Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/kvm_page_track.h | 1 + arch/x86/kvm/mmu.h | 8 +++++--- arch/x86/kvm/mmu/mmu.c | 17 ++++++++++------- arch/x86/kvm/mmu/page_track.c | 10 ++++++++-- 5 files changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 2c20f715f0094..ae41d2df69fe9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1234,7 +1234,7 @@ struct kvm_arch { * is used as one input when determining whether certain memslot * related allocations are necessary. */ - bool shadow_root_allocated; + bool mmu_page_tracking_enabled; =20 #if IS_ENABLED(CONFIG_HYPERV) hpa_t hv_root_tdp; diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/k= vm_page_track.h index eb186bc57f6a9..955a5ae07b10e 100644 --- a/arch/x86/include/asm/kvm_page_track.h +++ b/arch/x86/include/asm/kvm_page_track.h @@ -50,6 +50,7 @@ int kvm_page_track_init(struct kvm *kvm); void kvm_page_track_cleanup(struct kvm *kvm); =20 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm); +int kvm_page_track_write_tracking_enable(struct kvm *kvm); int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot); =20 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot); diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 671cfeccf04e9..44d15551f7156 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -269,7 +269,7 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); =20 -static inline bool kvm_shadow_root_allocated(struct kvm *kvm) +static inline bool mmu_page_tracking_enabled(struct kvm *kvm) { /* * Read shadow_root_allocated before related pointers. Hence, threads @@ -277,9 +277,11 @@ static inline bool kvm_shadow_root_allocated(struct kv= m *kvm) * see the pointers. Pairs with smp_store_release in * mmu_first_shadow_root_alloc. */ - return smp_load_acquire(&kvm->arch.shadow_root_allocated); + return smp_load_acquire(&kvm->arch.mmu_page_tracking_enabled); } =20 +int mmu_enable_write_tracking(struct kvm *kvm); + #ifdef CONFIG_X86_64 static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.= tdp_mmu_enabled; } #else @@ -288,7 +290,7 @@ static inline bool is_tdp_mmu_enabled(struct kvm *kvm) = { return false; } =20 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) { - return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm); + return !is_tdp_mmu_enabled(kvm) || mmu_page_tracking_enabled(kvm); } =20 static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 69a30d6d1e2b9..2c4edae4b026d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3368,7 +3368,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vc= pu) return r; } =20 -static int mmu_first_shadow_root_alloc(struct kvm *kvm) +int mmu_enable_write_tracking(struct kvm *kvm) { struct kvm_memslots *slots; struct kvm_memory_slot *slot; @@ -3378,21 +3378,20 @@ static int mmu_first_shadow_root_alloc(struct kvm *= kvm) * Check if this is the first shadow root being allocated before * taking the lock. */ - if (kvm_shadow_root_allocated(kvm)) + if (mmu_page_tracking_enabled(kvm)) return 0; =20 mutex_lock(&kvm->slots_arch_lock); =20 /* Recheck, under the lock, whether this is the first shadow root. */ - if (kvm_shadow_root_allocated(kvm)) + if (mmu_page_tracking_enabled(kvm)) goto out_unlock; =20 /* * Check if anything actually needs to be allocated, e.g. all metadata * will be allocated upfront if TDP is disabled. */ - if (kvm_memslots_have_rmaps(kvm) && - kvm_page_track_write_tracking_enabled(kvm)) + if (kvm_memslots_have_rmaps(kvm) && mmu_page_tracking_enabled(kvm)) goto out_success; =20 for (i =3D 0; i < KVM_ADDRESS_SPACE_NUM; i++) { @@ -3422,7 +3421,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kv= m) * all the related pointers are set. */ out_success: - smp_store_release(&kvm->arch.shadow_root_allocated, true); + smp_store_release(&kvm->arch.mmu_page_tracking_enabled, true); =20 out_unlock: mutex_unlock(&kvm->slots_arch_lock); @@ -3459,7 +3458,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vc= pu) } } =20 - r =3D mmu_first_shadow_root_alloc(vcpu->kvm); + r =3D mmu_enable_write_tracking(vcpu->kvm); if (r) return r; =20 @@ -5727,6 +5726,10 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_write =3D kvm_mmu_pte_write; node->track_flush_slot =3D kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); + + if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled) + mmu_enable_write_tracking(kvm); + return 0; } =20 diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c index 2e09d1b6249f3..8857d629036d7 100644 --- a/arch/x86/kvm/mmu/page_track.c +++ b/arch/x86/kvm/mmu/page_track.c @@ -21,10 +21,16 @@ =20 bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) { - return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || - !tdp_enabled || kvm_shadow_root_allocated(kvm); + return mmu_page_tracking_enabled(kvm); } =20 +int kvm_page_track_write_tracking_enable(struct kvm *kvm) +{ + return mmu_enable_write_tracking(kvm); +} +EXPORT_SYMBOL_GPL(kvm_page_track_write_tracking_enable); + + void kvm_page_track_free_memslot(struct kvm_memory_slot *slot) { int i; --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7854C433FE for ; Thu, 21 Apr 2022 05:13:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384285AbiDUFQW (ORCPT ); Thu, 21 Apr 2022 01:16:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384242AbiDUFQA (ORCPT ); Thu, 21 Apr 2022 01:16:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 433A712AD2 for ; Wed, 20 Apr 2022 22:13:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517990; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dtw2d3gtqOXpmah/qxNUA+qsKhDf3+sbZ1LnrFMMKWM=; b=aYfvgPSAc8CSyywlgSXlnNmlP6XsCVfSAbvUQEOOrQzTZm1lraO0IbdkTVEVXrRxidoxzx 18rZIw44dzkve8a/YTzUYMltrdOGnu9wnKLdkdnsw6aHFcNamttJaD5mmCMVjWvEvEK6ef sBwEgZ/q435nt9Hzvy2lCy7urfKhpCA= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-659-tUHxrnQ3PUWjBn7iMMhztg-1; Thu, 21 Apr 2022 01:13:04 -0400 X-MC-Unique: tUHxrnQ3PUWjBn7iMMhztg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6B988811E75; Thu, 21 Apr 2022 05:13:03 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id E12FF145BA5A; Thu, 21 Apr 2022 05:12:57 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 02/10] x86: KVMGT: use kvm_page_track_write_tracking_enable Date: Thu, 21 Apr 2022 08:12:36 +0300 Message-Id: <20220421051244.187733-3-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This allows to enable the write tracking only when KVMGT is actually used and doesn't carry any penalty otherwise. Tested by booting a VM with a kvmgt mdev device. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/Kconfig | 3 --- arch/x86/kvm/mmu/mmu.c | 2 +- drivers/gpu/drm/i915/Kconfig | 1 - drivers/gpu/drm/i915/gvt/kvmgt.c | 5 +++++ 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index e3cbd77061364..41341905d3734 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -126,7 +126,4 @@ config KVM_XEN =20 If in doubt, say "N". =20 -config KVM_EXTERNAL_WRITE_TRACKING - bool - endif # VIRTUALIZATION diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 2c4edae4b026d..23f895d439cf5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5727,7 +5727,7 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_flush_slot =3D kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); =20 - if (IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || !tdp_enabled) + if (!tdp_enabled) mmu_enable_write_tracking(kvm); =20 return 0; diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 98c5450b8eacc..7d8346f4bae11 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -130,7 +130,6 @@ config DRM_I915_GVT_KVMGT depends on DRM_I915_GVT depends on KVM depends on VFIO_MDEV - select KVM_EXTERNAL_WRITE_TRACKING default n help Choose this option if you want to enable KVMGT support for diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kv= mgt.c index 057ec44901045..4c62ab3ef245d 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1933,6 +1933,7 @@ static int kvmgt_guest_init(struct mdev_device *mdev) struct intel_vgpu *vgpu; struct kvmgt_vdev *vdev; struct kvm *kvm; + int ret; =20 vgpu =3D mdev_get_drvdata(mdev); if (handle_valid(vgpu->handle)) @@ -1948,6 +1949,10 @@ static int kvmgt_guest_init(struct mdev_device *mdev) if (__kvmgt_vgpu_exist(vgpu, kvm)) return -EEXIST; =20 + ret =3D kvm_page_track_write_tracking_enable(kvm); + if (ret) + return ret; + info =3D vzalloc(sizeof(struct kvmgt_guest_info)); if (!info) return -ENOMEM; --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDD7DC433F5 for ; Thu, 21 Apr 2022 05:13:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384273AbiDUFQR (ORCPT ); Thu, 21 Apr 2022 01:16:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384265AbiDUFQD (ORCPT ); Thu, 21 Apr 2022 01:16:03 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 77B69DF38 for ; Wed, 20 Apr 2022 22:13:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517993; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5bCkYEOsNHgNL9DAc6eiIfRMxj3itayT5GreRyCWKFU=; b=T1dHgzLfNx9QacLdhLAzek0CizQFBkaIwDnvdyh89pepxOXwBH/iFutlyQWWAKHYr7H9v7 JwcT6wZhzA+gWAe/O8fW9rBc1zv37IiVQg8UW5tFIml/mdClwD1QJJW6Ezgz86/6u2dzWr fa5p7A5a9NQSLmCiTEbhlB7fQavLwAQ= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-516-0HIxVeq1PGehQlJ21WvyzA-1; Thu, 21 Apr 2022 01:13:10 -0400 X-MC-Unique: 0HIxVeq1PGehQlJ21WvyzA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FF8229DD98F; Thu, 21 Apr 2022 05:13:09 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id C534D145B96B; Thu, 21 Apr 2022 05:13:03 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 03/10] KVM: x86: mmu: add gfn_in_memslot helper Date: Thu, 21 Apr 2022 08:12:37 +0300 Message-Id: <20220421051244.187733-4-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This is a tiny refactoring, and can be useful to check if a GPA/GFN is within a memslot a bit more cleanly. Signed-off-by: Maxim Levitsky --- include/linux/kvm_host.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 252ee4a61b58b..12e261559070b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1580,6 +1580,13 @@ int kvm_request_irq_source_id(struct kvm *kvm); void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id); bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args); =20 + +static inline bool gfn_in_memslot(struct kvm_memory_slot *slot, gfn_t gfn) +{ + return (gfn >=3D slot->base_gfn && gfn < slot->base_gfn + slot->npages); +} + + /* * Returns a pointer to the memslot if it contains gfn. * Otherwise returns NULL. @@ -1590,12 +1597,13 @@ try_get_memslot(struct kvm_memory_slot *slot, gfn_t= gfn) if (!slot) return NULL; =20 - if (gfn >=3D slot->base_gfn && gfn < slot->base_gfn + slot->npages) + if (gfn_in_memslot(slot, gfn)) return slot; else return NULL; } =20 + /* * Returns a pointer to the memslot that contains gfn. Otherwise returns N= ULL. * --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 595EDC433FE for ; Thu, 21 Apr 2022 05:13:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384290AbiDUFQb (ORCPT ); Thu, 21 Apr 2022 01:16:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384271AbiDUFQJ (ORCPT ); Thu, 21 Apr 2022 01:16:09 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7444912AD2 for ; Wed, 20 Apr 2022 22:13:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650517999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=onN0ZHZPa2j6tpDgsJt0Wzjm7hzuHqZyanl0pCWwqvQ=; b=LdDLhbOrjZfR6vHHIxaadkBLyqYtEO00yGBzUyk4bkG/AiJqnuwvZzDw+sQuC0181wsMNh cb0sYK4PC3MTbGy4xFI2VXsZgk39IA9/aSvAxGgwfXN/a9Arp0pJa/EZxY4qWCnvDv7JYv xc3vibJAj3qb10JUn8W/lXKU1+DXbok= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-588-n6ShaFpmO8GKtexO1bl7cw-1; Thu, 21 Apr 2022 01:13:15 -0400 X-MC-Unique: n6ShaFpmO8GKtexO1bl7cw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C7D40185A794; Thu, 21 Apr 2022 05:13:14 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 791B6145BA5A; Thu, 21 Apr 2022 05:13:09 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 04/10] KVM: x86: mmu: tweak fast path for emulation of access to nested NPT pages Date: Thu, 21 Apr 2022 08:12:38 +0300 Message-Id: <20220421051244.187733-5-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" --- arch/x86/kvm/mmu/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 23f895d439cf5..b63398dfdac3b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5315,8 +5315,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t c= r2_or_gpa, u64 error_code, */ if (vcpu->arch.mmu->root_role.direct && (error_code & PFERR_NESTED_GUEST_PAGE) =3D=3D PFERR_NESTED_GUEST_PAGE= ) { - kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); - return 1; + if (kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa))) + return 1; } =20 /* --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95045C433EF for ; Thu, 21 Apr 2022 05:13:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384303AbiDUFQj (ORCPT ); Thu, 21 Apr 2022 01:16:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384252AbiDUFQO (ORCPT ); Thu, 21 Apr 2022 01:16:14 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A22BFDF38 for ; Wed, 20 Apr 2022 22:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518005; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e0g3mBb+ynPHD4DJjbvIQMVAKnaFWx4DSoUHFihzCfc=; b=cLr7yYmI32Mj3VKW2tlkIKeHrzpKyL2cdpef3REuz0LyIWxBp+6au3iDg4G1pVYWK8U7Q7 9h3ge8fsKhm9AxsIFyUpczFpVAE7tjBPue9rV7BdqFPZR7XNiFp4oaKcK0PS9BWgHoG03P m4ReFBx5t/DJD79bmSZi3wDyTot+6Ng= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-330-zdEv8QKuNvOaoaf16ssl_A-1; Thu, 21 Apr 2022 01:13:22 -0400 X-MC-Unique: zdEv8QKuNvOaoaf16ssl_A-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AB0802805336; Thu, 21 Apr 2022 05:13:20 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2DD9B145BA5A; Thu, 21 Apr 2022 05:13:14 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 05/10] KVM: x86: lapic: don't allow to change APIC ID when apic acceleration is enabled Date: Thu, 21 Apr 2022 08:12:39 +0300 Message-Id: <20220421051244.187733-6-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" No normal guest has any reason to change physical APIC IDs, and allowing this introduces bugs into APIC acceleration code. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/lapic.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66b0eb0bda94e..56996aeca9881 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2046,10 +2046,20 @@ static int kvm_lapic_reg_write(struct kvm_lapic *ap= ic, u32 reg, u32 val) =20 switch (reg) { case APIC_ID: /* Local APIC ID */ - if (!apic_x2apic_mode(apic)) - kvm_apic_set_xapic_id(apic, val >> 24); - else + if (apic_x2apic_mode(apic)) { ret =3D 1; + break; + } + /* + * Don't allow setting APIC ID with any APIC acceleration + * enabled to avoid unexpected issues + */ + if (enable_apicv && ((val >> 24) !=3D apic->vcpu->vcpu_id)) { + kvm_vm_bugged(apic->vcpu->kvm); + break; + } + + kvm_apic_set_xapic_id(apic, val >> 24); break; =20 case APIC_TASKPRI: @@ -2617,8 +2627,16 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu) static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s, bool set) { - if (apic_x2apic_mode(vcpu->arch.apic)) { - u32 *id =3D (u32 *)(s->regs + APIC_ID); + u32 *id =3D (u32 *)(s->regs + APIC_ID); + + if (!apic_x2apic_mode(vcpu->arch.apic)) { + /* Don't allow setting APIC ID with any APIC acceleration + * enabled to avoid unexpected issues + */ + if (enable_apicv && (*id >> 24) !=3D vcpu->vcpu_id) + return -EINVAL; + } else { + u32 *ldr =3D (u32 *)(s->regs + APIC_LDR); u64 icr; =20 --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCC2CC433EF for ; Thu, 21 Apr 2022 05:14:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384349AbiDUFQq (ORCPT ); Thu, 21 Apr 2022 01:16:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384307AbiDUFQ1 (ORCPT ); Thu, 21 Apr 2022 01:16:27 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AB4BD12AF5 for ; Wed, 20 Apr 2022 22:13:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518010; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MCnDcG50OJVRyX/LehdalM91tRLRSat9OLrAV6Orn+A=; b=A/7Z8i92G8p0tlltR2aKLNVTvimPfa3BNyLWacRL/g8XHjnh7GasBDUfVqbHDr3nXAIoxC DuPuT4B79GDrL7w5XkQXlC45gn3uu0xQQ7jc8TJot48qxf4LB+sMT4r5JMweEmP9P3rif/ 5jXc1Dh0cJqxtJf+JRlS4pua01c3b2g= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-391-fjwQIF29PQSTCAlkuArcbQ-1; Thu, 21 Apr 2022 01:13:27 -0400 X-MC-Unique: fjwQIF29PQSTCAlkuArcbQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6543980005D; Thu, 21 Apr 2022 05:13:26 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 112E2145BA5A; Thu, 21 Apr 2022 05:13:20 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 06/10] KVM: x86: SVM: remove avic's broken code that updated APIC ID Date: Thu, 21 Apr 2022 08:12:40 +0300 Message-Id: <20220421051244.187733-7-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Now that KVM doesn't allow to change APIC ID in case AVIC is enabled, remove buggy AVIC code that tried to do so. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 35 ----------------------------------- 1 file changed, 35 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 9b859218af59c..f375ca1d6518e 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -442,35 +442,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcp= u) return ret; } =20 -static int avic_handle_apic_id_update(struct kvm_vcpu *vcpu) -{ - u64 *old, *new; - struct vcpu_svm *svm =3D to_svm(vcpu); - u32 id =3D kvm_xapic_id(vcpu->arch.apic); - - if (vcpu->vcpu_id =3D=3D id) - return 0; - - old =3D avic_get_physical_id_entry(vcpu, vcpu->vcpu_id); - new =3D avic_get_physical_id_entry(vcpu, id); - if (!new || !old) - return 1; - - /* We need to move physical_id_entry to new offset */ - *new =3D *old; - *old =3D 0ULL; - to_svm(vcpu)->avic_physical_id_cache =3D new; - - /* - * Also update the guest physical APIC ID in the logical - * APIC ID table entry if already setup the LDR. - */ - if (svm->ldr_reg) - avic_handle_ldr_update(vcpu); - - return 0; -} - static void avic_handle_dfr_update(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm =3D to_svm(vcpu); @@ -489,10 +460,6 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vc= pu) AVIC_UNACCEL_ACCESS_OFFSET_MASK; =20 switch (offset) { - case APIC_ID: - if (avic_handle_apic_id_update(vcpu)) - return 0; - break; case APIC_LDR: if (avic_handle_ldr_update(vcpu)) return 0; @@ -584,8 +551,6 @@ int avic_init_vcpu(struct vcpu_svm *svm) =20 void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu) { - if (avic_handle_apic_id_update(vcpu) !=3D 0) - return; avic_handle_dfr_update(vcpu); avic_handle_ldr_update(vcpu); } --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31D9DC433FE for ; Thu, 21 Apr 2022 05:14:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384334AbiDUFQv (ORCPT ); Thu, 21 Apr 2022 01:16:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49850 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384292AbiDUFQ2 (ORCPT ); Thu, 21 Apr 2022 01:16:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BE74113CDF for ; Wed, 20 Apr 2022 22:13:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518018; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bFfak0fL1x1PD2NGlDq0gbCHu4stLjWexcxTqgBpjZU=; b=RkhPwnz6Qn4fNAwQ1izVLFFLs/cnZuyRBS1KFeIKaBZ379a+P8OXXnMCTEARWeAn4cQ6F+ hknfzG58Dapr+us1GdhMJBqVPPBc/x/PWBB6UKmMy6lhuIJLdggAl8i9fSpGn9nO2PMWIH nPx8ZjlPXteEJGYJjyR9PXM4qR8/i1c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-210-E-cy5rt1PnWU_0o6vX-bsg-1; Thu, 21 Apr 2022 01:13:33 -0400 X-MC-Unique: E-cy5rt1PnWU_0o6vX-bsg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 16E3D8038E3; Thu, 21 Apr 2022 05:13:32 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id BD2DE145BA5A; Thu, 21 Apr 2022 05:13:26 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 07/10] KVM: x86: SVM: move avic state to separate struct Date: Thu, 21 Apr 2022 08:12:41 +0300 Message-Id: <20220421051244.187733-8-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This will make the code a bit easier to read when nested AVIC support is added. No functional change intended. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 49 +++++++++++++++++++++++------------------ arch/x86/kvm/svm/svm.h | 14 +++++++----- 2 files changed, 36 insertions(+), 27 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index f375ca1d6518e..87756237c646d 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -69,6 +69,8 @@ int avic_ga_log_notifier(u32 ga_tag) unsigned long flags; struct kvm_svm *kvm_svm; struct kvm_vcpu *vcpu =3D NULL; + struct kvm_svm_avic *avic; + u32 vm_id =3D AVIC_GATAG_TO_VMID(ga_tag); u32 vcpu_id =3D AVIC_GATAG_TO_VCPUID(ga_tag); =20 @@ -76,9 +78,13 @@ int avic_ga_log_notifier(u32 ga_tag) trace_kvm_avic_ga_log(vm_id, vcpu_id); =20 spin_lock_irqsave(&svm_vm_data_hash_lock, flags); - hash_for_each_possible(svm_vm_data_hash, kvm_svm, hnode, vm_id) { - if (kvm_svm->avic_vm_id !=3D vm_id) + hash_for_each_possible(svm_vm_data_hash, avic, hnode, vm_id) { + + + if (avic->vm_id !=3D vm_id) continue; + + kvm_svm =3D container_of(avic, struct kvm_svm, avic); vcpu =3D kvm_get_vcpu_by_id(&kvm_svm->kvm, vcpu_id); break; } @@ -98,18 +104,18 @@ int avic_ga_log_notifier(u32 ga_tag) void avic_vm_destroy(struct kvm *kvm) { unsigned long flags; - struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + struct kvm_svm_avic *avic =3D &to_kvm_svm(kvm)->avic; =20 if (!enable_apicv) return; =20 - if (kvm_svm->avic_logical_id_table_page) - __free_page(kvm_svm->avic_logical_id_table_page); - if (kvm_svm->avic_physical_id_table_page) - __free_page(kvm_svm->avic_physical_id_table_page); + if (avic->logical_id_table_page) + __free_page(avic->logical_id_table_page); + if (avic->physical_id_table_page) + __free_page(avic->physical_id_table_page); =20 spin_lock_irqsave(&svm_vm_data_hash_lock, flags); - hash_del(&kvm_svm->hnode); + hash_del(&avic->hnode); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); } =20 @@ -117,10 +123,9 @@ int avic_vm_init(struct kvm *kvm) { unsigned long flags; int err =3D -ENOMEM; - struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); - struct kvm_svm *k2; struct page *p_page; struct page *l_page; + struct kvm_svm_avic *avic =3D &to_kvm_svm(kvm)->avic; u32 vm_id; =20 if (!enable_apicv) @@ -131,14 +136,14 @@ int avic_vm_init(struct kvm *kvm) if (!p_page) goto free_avic; =20 - kvm_svm->avic_physical_id_table_page =3D p_page; + avic->physical_id_table_page =3D p_page; =20 /* Allocating logical APIC ID table (4KB) */ l_page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); if (!l_page) goto free_avic; =20 - kvm_svm->avic_logical_id_table_page =3D l_page; + avic->logical_id_table_page =3D l_page; =20 spin_lock_irqsave(&svm_vm_data_hash_lock, flags); again: @@ -149,13 +154,15 @@ int avic_vm_init(struct kvm *kvm) } /* Is it still in use? Only possible if wrapped at least once */ if (next_vm_id_wrapped) { - hash_for_each_possible(svm_vm_data_hash, k2, hnode, vm_id) { - if (k2->avic_vm_id =3D=3D vm_id) + struct kvm_svm_avic *avic2; + + hash_for_each_possible(svm_vm_data_hash, avic2, hnode, vm_id) { + if (avic2->vm_id =3D=3D vm_id) goto again; } } - kvm_svm->avic_vm_id =3D vm_id; - hash_add(svm_vm_data_hash, &kvm_svm->hnode, kvm_svm->avic_vm_id); + avic->vm_id =3D vm_id; + hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); =20 return 0; @@ -169,8 +176,8 @@ void avic_init_vmcb(struct vcpu_svm *svm, struct vmcb *= vmcb) { struct kvm_svm *kvm_svm =3D to_kvm_svm(svm->vcpu.kvm); phys_addr_t bpa =3D __sme_set(page_to_phys(svm->avic_backing_page)); - phys_addr_t lpa =3D __sme_set(page_to_phys(kvm_svm->avic_logical_id_table= _page)); - phys_addr_t ppa =3D __sme_set(page_to_phys(kvm_svm->avic_physical_id_tabl= e_page)); + phys_addr_t lpa =3D __sme_set(page_to_phys(kvm_svm->avic.logical_id_table= _page)); + phys_addr_t ppa =3D __sme_set(page_to_phys(kvm_svm->avic.physical_id_tabl= e_page)); =20 vmcb->control.avic_backing_page =3D bpa & AVIC_HPA_MASK; vmcb->control.avic_logical_id =3D lpa & AVIC_HPA_MASK; @@ -193,7 +200,7 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu = *vcpu, if (index >=3D AVIC_MAX_PHYSICAL_ID_COUNT) return NULL; =20 - avic_physical_id_table =3D page_address(kvm_svm->avic_physical_id_table_p= age); + avic_physical_id_table =3D page_address(kvm_svm->avic.physical_id_table_p= age); =20 return &avic_physical_id_table[index]; } @@ -387,7 +394,7 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *= vcpu, u32 ldr, bool flat) index =3D (cluster << 2) + apic; } =20 - logical_apic_id_table =3D (u32 *) page_address(kvm_svm->avic_logical_id_t= able_page); + logical_apic_id_table =3D (u32 *) page_address(kvm_svm->avic.logical_id_t= able_page); =20 return &logical_apic_id_table[index]; } @@ -737,7 +744,7 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int h= ost_irq, /* Try to enable guest_mode in IRTE */ pi.base =3D __sme_set(page_to_phys(svm->avic_backing_page) & AVIC_HPA_MASK); - pi.ga_tag =3D AVIC_GATAG(to_kvm_svm(kvm)->avic_vm_id, + pi.ga_tag =3D AVIC_GATAG(to_kvm_svm(kvm)->avic.vm_id, svm->vcpu.vcpu_id); pi.is_guest_mode =3D true; pi.vcpu_data =3D &vcpu_info; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index e246793cbeaeb..96390fa5e3917 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -88,15 +88,17 @@ struct kvm_sev_info { atomic_t migration_in_progress; }; =20 -struct kvm_svm { - struct kvm kvm; =20 - /* Struct members for AVIC */ - u32 avic_vm_id; - struct page *avic_logical_id_table_page; - struct page *avic_physical_id_table_page; +struct kvm_svm_avic { + u32 vm_id; + struct page *logical_id_table_page; + struct page *physical_id_table_page; struct hlist_node hnode; +}; =20 +struct kvm_svm { + struct kvm kvm; + struct kvm_svm_avic avic; struct kvm_sev_info sev_info; }; =20 --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40C8AC433F5 for ; Thu, 21 Apr 2022 05:14:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384341AbiDUFQy (ORCPT ); Thu, 21 Apr 2022 01:16:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384328AbiDUFQk (ORCPT ); Thu, 21 Apr 2022 01:16:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EF5B212AD1 for ; Wed, 20 Apr 2022 22:13:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518024; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=httJCT0Tg3TZnTNvm8TBcLaMq9M0mwNcCFiF9q7hQXs=; b=igqxYCTx1Gf0JhOi1ho38yrDaJ2MIyZtKQyFmA5CTmuE9kd3EyxTAc1YXbCoRKSRPOr5u+ a9Y93uerCgpLM4YKqSvvNgS13yNtY4gKEFtWzNFWFB637ItNX3CWmRkjIQdtU1UEbv4Qgf xxCd78QziSgpIcyQZqe8FvwogifHp7Q= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-460-VNUGnnvNOnmOxtmgV5S5vw-1; Thu, 21 Apr 2022 01:13:38 -0400 X-MC-Unique: VNUGnnvNOnmOxtmgV5S5vw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C01BF803D65; Thu, 21 Apr 2022 05:13:37 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 70F05145BA5A; Thu, 21 Apr 2022 05:13:32 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 08/10] KVM: x86: rename .set_apic_access_page_addr to reload_apic_access_page Date: Thu, 21 Apr 2022 08:12:42 +0300 Message-Id: <20220421051244.187733-9-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This will be used on SVM to reload shadow page of the AVIC physid table No functional change intended Signed-off-by: Maxim Levitsky --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/vmx/vmx.c | 8 ++++---- arch/x86/kvm/x86.c | 6 +++--- 4 files changed, 9 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 96e4e9842dfc6..997edb7453ac2 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -82,7 +82,7 @@ KVM_X86_OP_OPTIONAL(hwapic_isr_update) KVM_X86_OP_OPTIONAL_RET0(guest_apic_has_interrupt) KVM_X86_OP_OPTIONAL(load_eoi_exitmap) KVM_X86_OP_OPTIONAL(set_virtual_apic_mode) -KVM_X86_OP_OPTIONAL(set_apic_access_page_addr) +KVM_X86_OP_OPTIONAL(reload_apic_pages) KVM_X86_OP(deliver_interrupt) KVM_X86_OP_OPTIONAL(sync_pir_to_irr) KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index ae41d2df69fe9..f83cfcd7dd74c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1415,7 +1415,7 @@ struct kvm_x86_ops { bool (*guest_apic_has_interrupt)(struct kvm_vcpu *vcpu); void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap); void (*set_virtual_apic_mode)(struct kvm_vcpu *vcpu); - void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu); + void (*reload_apic_pages)(struct kvm_vcpu *vcpu); void (*deliver_interrupt)(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector); int (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); @@ -1888,7 +1888,6 @@ int kvm_cpu_has_extint(struct kvm_vcpu *v); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event); - int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low, unsigned long ipi_bitmap_high, u32 min, unsigned long icr, int op_64_bit); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index cf8581978bce3..7defd31703c61 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6339,7 +6339,7 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu) vmx_update_msr_bitmap_x2apic(vcpu); } =20 -static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) +static void vmx_reload_apic_access_page(struct kvm_vcpu *vcpu) { struct page *page; =20 @@ -7777,7 +7777,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D { .enable_irq_window =3D vmx_enable_irq_window, .update_cr8_intercept =3D vmx_update_cr8_intercept, .set_virtual_apic_mode =3D vmx_set_virtual_apic_mode, - .set_apic_access_page_addr =3D vmx_set_apic_access_page_addr, + .reload_apic_pages =3D vmx_reload_apic_access_page, .refresh_apicv_exec_ctrl =3D vmx_refresh_apicv_exec_ctrl, .load_eoi_exitmap =3D vmx_load_eoi_exitmap, .apicv_post_state_restore =3D vmx_apicv_post_state_restore, @@ -7940,12 +7940,12 @@ static __init int hardware_setup(void) enable_vnmi =3D 0; =20 /* - * set_apic_access_page_addr() is used to reload apic access + * kvm_vcpu_reload_apic_pages() is used to reload apic access * page upon invalidation. No need to do anything if not * using the APIC_ACCESS_ADDR VMCS field. */ if (!flexpriority_enabled) - vmx_x86_ops.set_apic_access_page_addr =3D NULL; + vmx_x86_ops.reload_apic_pages =3D NULL; =20 if (!cpu_has_vmx_tpr_shadow()) vmx_x86_ops.update_cr8_intercept =3D NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ab336f7c82e4b..3ac2d0134271b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9949,12 +9949,12 @@ void kvm_arch_mmu_notifier_invalidate_range(struct = kvm *kvm, kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); } =20 -static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) +static void kvm_vcpu_reload_apic_pages(struct kvm_vcpu *vcpu) { if (!lapic_in_kernel(vcpu)) return; =20 - static_call_cond(kvm_x86_set_apic_access_page_addr)(vcpu); + static_call_cond(kvm_x86_reload_apic_pages)(vcpu); } =20 void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu) @@ -10071,7 +10071,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_LOAD_EOI_EXITMAP, vcpu)) vcpu_load_eoi_exitmap(vcpu); if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu)) - kvm_vcpu_reload_apic_access_page(vcpu); + kvm_vcpu_reload_apic_pages(vcpu); if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) { vcpu->run->exit_reason =3D KVM_EXIT_SYSTEM_EVENT; vcpu->run->system_event.type =3D KVM_SYSTEM_EVENT_CRASH; --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A5CBC433EF for ; Thu, 21 Apr 2022 05:15:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384294AbiDUFSA (ORCPT ); Thu, 21 Apr 2022 01:18:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384354AbiDUFQr (ORCPT ); Thu, 21 Apr 2022 01:16:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DB0BE13D34 for ; Wed, 20 Apr 2022 22:13:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518028; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0oDTC/F4WCpqkhQlb4Y7NxOZ6sI0GU5Tn9ocL+UaQDI=; b=LIyQcLIaKr/rDXO/yBUjKyhJ9GUVEOKGobb/PO1HfYtWT9Cqcbuswnn7Y/+lI9+q/EBjsI eGvI0gg0QjdtD5CErmuFkCrbwQZRfXsdIZrLPLS3ALF6zyD7y9v3pI64a0qh6mk5zDXsMx NwYQ4xH0FZBcEgc9a+AhMcrbRn5hnYE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-591-DyysfhH4PSWgs0-iHThtvA-1; Thu, 21 Apr 2022 01:13:44 -0400 X-MC-Unique: DyysfhH4PSWgs0-iHThtvA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A71B6280533B; Thu, 21 Apr 2022 05:13:43 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27320145B96B; Thu, 21 Apr 2022 05:13:37 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 09/10] KVM: nSVM: implement support for nested AVIC Date: Thu, 21 Apr 2022 08:12:43 +0300 Message-Id: <20220421051244.187733-10-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This implements initial support of using the AVIC in a nested guest Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 850 +++++++++++++++++++++++++++++++++++++- arch/x86/kvm/svm/nested.c | 131 +++++- arch/x86/kvm/svm/svm.c | 18 + arch/x86/kvm/svm/svm.h | 150 +++++++ arch/x86/kvm/trace.h | 140 ++++++- arch/x86/kvm/x86.c | 11 + 6 files changed, 1282 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 87756237c646d..9176c35662ada 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -51,6 +51,526 @@ static u32 next_vm_id =3D 0; static bool next_vm_id_wrapped =3D 0; static DEFINE_SPINLOCK(svm_vm_data_hash_lock); =20 +static u32 nested_avic_get_reg(struct kvm_vcpu *vcpu, int reg_off) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + void *nested_apic_regs =3D svm->nested.l2_apic_access_page.hva; + + if (WARN_ON_ONCE(!nested_apic_regs)) + return 0; + + return *((u32 *) (nested_apic_regs + reg_off)); +} + +static inline struct kvm_vcpu *avic_vcpu_by_l1_apicid(struct kvm *kvm, + int l1_apicid) +{ + WARN_ON(l1_apicid =3D=3D -1); + return kvm_get_vcpu_by_id(kvm, l1_apicid); +} + +static void avic_physid_shadow_entry_set_vcpu(struct kvm *kvm, + struct avic_physid_table *t, + int n, + int new_l1_apicid) +{ + struct avic_physid_entry_descr *e =3D &t->entries[n]; + u64 sentry =3D READ_ONCE(*e->sentry); + u64 old_sentry =3D sentry; + struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + struct kvm_vcpu *new_vcpu =3D NULL; + int l0_apicid =3D -1; + unsigned long flags; + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + WARN_ON(!test_bit(n, t->valid_entires)); + + if (!list_empty(&e->link)) + list_del_init(&e->link); + + if (new_l1_apicid !=3D -1) + new_vcpu =3D avic_vcpu_by_l1_apicid(kvm, new_l1_apicid); + + if (new_vcpu) + list_add_tail(&e->link, &to_svm(new_vcpu)->nested.physid_ref_entries); + + if (new_vcpu && to_svm(new_vcpu)->nested_avic_active) + l0_apicid =3D kvm_cpu_get_apicid(new_vcpu->cpu); + + physid_entry_set_apicid(&sentry, l0_apicid); + + if (sentry !=3D old_sentry) + WRITE_ONCE(*e->sentry, sentry); + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} + +static void avic_physid_shadow_entry_create(struct kvm *kvm, + struct avic_physid_table *t, + int n, + u64 gentry) +{ + struct avic_physid_entry_descr *e =3D &t->entries[n]; + struct page *backing_page; + u64 backing_page_gpa =3D physid_entry_get_backing_table(gentry); + int l1_apic_id =3D physid_entry_get_apicid(gentry); + hpa_t backing_page_hpa; + u64 sentry =3D 0; + + + if (backing_page_gpa =3D=3D INVALID_BACKING_PAGE) + return; + + /* Pin the APIC backing page */ + backing_page =3D gfn_to_page(kvm, gpa_to_gfn(backing_page_gpa)); + + if (is_error_page(backing_page)) + /* Invalid GPA in the guest entry - point to a dummy entry */ + backing_page_hpa =3D t->dummy_page_hpa; + else + backing_page_hpa =3D page_to_phys(backing_page); + + physid_entry_set_backing_table(&sentry, backing_page_hpa); + + e->gentry =3D gentry; + *e->sentry =3D sentry; + + if (test_and_set_bit(n, t->valid_entires)) + WARN_ON(1); + + if (backing_page_hpa !=3D t->dummy_page_hpa) + avic_physid_shadow_entry_set_vcpu(kvm, t, n, l1_apic_id); +} + +static void avic_physid_shadow_entry_remove(struct kvm *kvm, + struct avic_physid_table *t, + int n) +{ + struct avic_physid_entry_descr *e =3D &t->entries[n]; + struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + hpa_t backing_page_hpa; + unsigned long flags; + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + if (!test_and_clear_bit(n, t->valid_entires)) + WARN_ON(1); + + /* Release the APIC backing page */ + backing_page_hpa =3D physid_entry_get_backing_table(*e->sentry); + + if (backing_page_hpa !=3D t->dummy_page_hpa) + kvm_release_pfn_dirty(backing_page_hpa >> PAGE_SHIFT); + + if (!list_empty(&e->link)) + list_del_init(&e->link); + + e->gentry =3D 0; + *e->sentry =3D 0; + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} + +static void avic_update_peer_physid_entries(struct kvm_vcpu *vcpu, int cpu) +{ + /* + * Update all shadow physid tables which contain entries + * which reference this vCPU with its new physical location + */ + struct kvm_svm *kvm_svm =3D to_kvm_svm(vcpu->kvm); + struct vcpu_svm *vcpu_svm =3D to_svm(vcpu); + struct avic_physid_entry_descr *e; + int nentries =3D 0; + int l0_apicid =3D -1; + unsigned long flags; + bool new_active =3D cpu !=3D -1; + + if (vcpu_svm->nested_avic_active =3D=3D new_active) + return; + + if (cpu !=3D -1) + l0_apicid =3D kvm_cpu_get_apicid(cpu); + + raw_spin_lock_irqsave(&kvm_svm->avic.table_entries_lock, flags); + + list_for_each_entry(e, &vcpu_svm->nested.physid_ref_entries, link) { + u64 sentry =3D READ_ONCE(*e->sentry); + u64 old_sentry =3D sentry; + + physid_entry_set_apicid(&sentry, l0_apicid); + + if (sentry !=3D old_sentry) + WRITE_ONCE(*e->sentry, sentry); + + nentries++; + } + + if (nentries) + trace_kvm_avic_physid_update_vcpu(vcpu->vcpu_id, cpu, nentries); + + vcpu_svm->nested_avic_active =3D new_active; + + raw_spin_unlock_irqrestore(&kvm_svm->avic.table_entries_lock, flags); +} + +static bool +avic_physid_shadow_table_setup_write_tracking(struct kvm *kvm, + struct avic_physid_table *t, + bool enable) +{ + struct kvm_memory_slot *slot; + + write_lock(&kvm->mmu_lock); + slot =3D gfn_to_memslot(kvm, t->gfn); + if (!slot) { + write_unlock(&kvm->mmu_lock); + return false; + } + + if (enable) + kvm_slot_page_track_add_page(kvm, slot, t->gfn, KVM_PAGE_TRACK_WRITE); + else + kvm_slot_page_track_remove_page(kvm, slot, t->gfn, KVM_PAGE_TRACK_WRITE); + write_unlock(&kvm->mmu_lock); + return true; +} + +static void +avic_physid_shadow_table_erase(struct kvm *kvm, struct avic_physid_table *= t) +{ + int i; + + if (!t->nentries) + return; + + avic_physid_shadow_table_setup_write_tracking(kvm, t, false); + + for_each_set_bit(i, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) + avic_physid_shadow_entry_remove(kvm, t, i); + + t->nentries =3D 0; + t->flood_count =3D 0; +} + +static struct avic_physid_table * +avic_physid_shadow_table_alloc(struct kvm *kvm, gfn_t gfn) +{ + struct avic_physid_entry_descr *e; + struct avic_physid_table *t; + struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + u64 *shadow_table_address; + int i; + + if (kvm_page_track_write_tracking_enable(kvm)) + return NULL; + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + + t =3D kzalloc(sizeof(*t), GFP_KERNEL_ACCOUNT); + if (!t) + return NULL; + + t->shadow_table =3D alloc_page(GFP_KERNEL_ACCOUNT|__GFP_ZERO); + if (!t->shadow_table) + goto err_free_table; + + shadow_table_address =3D page_address(t->shadow_table); + t->shadow_table_hpa =3D __sme_set(page_to_phys(t->shadow_table)); + + for (i =3D 0; i < ARRAY_SIZE(t->entries); i++) { + e =3D &t->entries[i]; + e->sentry =3D &shadow_table_address[i]; + e->gentry =3D 0; + INIT_LIST_HEAD(&e->link); + } + + t->gfn =3D gfn; + t->refcount =3D 1; + list_add_tail(&t->link, &kvm_svm->avic.physid_tables); + + t->dummy_page_hpa =3D page_to_phys(kvm_svm->avic.invalid_physid_page); + + trace_kvm_avic_physid_table_alloc(gfn_to_gpa(gfn)); + return t; + +err_free_table: + kfree(t); + return NULL; +} + +static void +avic_physid_shadow_table_free(struct kvm *kvm, struct avic_physid_table *t) +{ + struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + + lockdep_assert_held(&kvm_svm->avic.tables_lock); + + WARN_ON(t->refcount); + + avic_physid_shadow_table_erase(kvm, t); + + trace_kvm_avic_physid_table_free(gfn_to_gpa(t->gfn)); + + hlist_del(&t->hash_link); + list_del(&t->link); + __free_page(t->shadow_table); + kfree(t); +} + +static struct avic_physid_table * +__avic_physid_shadow_table_get(struct hlist_head *head, gfn_t gfn) +{ + struct avic_physid_table *t; + + hlist_for_each_entry(t, head, hash_link) + if (t->gfn =3D=3D gfn) { + t->refcount++; + return t; + } + return NULL; +} + +struct avic_physid_table * +avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + struct kvm_svm *kvm_svm =3D to_kvm_svm(vcpu->kvm); + struct hlist_head *hlist; + struct avic_physid_table *t; + + mutex_lock(&kvm_svm->avic.tables_lock); + + hlist =3D &kvm_svm->avic.physid_gpa_hash[avic_physid_hash(gfn)]; + t =3D __avic_physid_shadow_table_get(hlist, gfn); + if (!t) { + t =3D avic_physid_shadow_table_alloc(vcpu->kvm, gfn); + if (!t) + goto out_unlock; + hlist_add_head(&t->hash_link, hlist); + } + t->flood_count =3D 0; +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); + return t; +} + +static void +__avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_table *= t) +{ + WARN_ON(t->refcount <=3D 0); + if (--t->refcount =3D=3D 0) + avic_physid_shadow_table_free(kvm, t); +} + +void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_tabl= e *t) +{ + struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + + mutex_lock(&kvm_svm->avic.tables_lock); + __avic_physid_shadow_table_put(kvm, t); + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +static void avic_physid_shadow_table_invalidate(struct kvm *kvm, + struct avic_physid_table *t) +{ + avic_physid_shadow_table_erase(kvm, t); + kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); +} + +int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, + struct avic_physid_table *t, int nentries) +{ + struct kvm_svm *kvm_svm =3D to_kvm_svm(vcpu->kvm); + struct kvm_host_map map; + u64 *gentries; + int i; + int ret =3D 0; + + mutex_lock(&kvm_svm->avic.tables_lock); + + if (t->nentries >=3D nentries) + goto out_unlock; + + trace_kvm_avic_physid_table_reload(gfn_to_gpa(t->gfn), t->nentries, nentr= ies); + + if (t->nentries =3D=3D 0) { + if (!avic_physid_shadow_table_setup_write_tracking(vcpu->kvm, t, true)) { + ret =3D -EFAULT; + goto out_unlock; + } + } + + if (kvm_vcpu_map(vcpu, t->gfn, &map)) { + ret =3D -EFAULT; + goto out_unlock; + } + + gentries =3D (u64 *)map.hva; + + for (i =3D t->nentries ; i < nentries ; i++) + avic_physid_shadow_entry_create(vcpu->kvm, t, i, gentries[i]); + + /* publish the table before setting nentries */ + wmb(); + WRITE_ONCE(t->nentries, nentries); + + kvm_vcpu_unmap(vcpu, &map, false); +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); + return ret; +} + +static void avic_physid_shadow_table_track_write(struct kvm_vcpu *vcpu, + gpa_t gpa, + const u8 *new, + int bytes, + struct kvm_page_track_notifier_node *node) +{ + struct kvm_svm *kvm_svm =3D to_kvm_svm(vcpu->kvm); + struct hlist_head *hlist; + struct avic_physid_table *t; + gfn_t gfn =3D gpa_to_gfn(gpa); + unsigned int page_offset =3D offset_in_page(gpa); + unsigned int entry_offset =3D page_offset & 0x7; + int first =3D page_offset / sizeof(u64); + int last =3D (page_offset + bytes - 1) / sizeof(u64); + u64 new_entry, old_entry; + int l1_apic_id; + + if (WARN_ON_ONCE(bytes =3D=3D 0)) + return; + + mutex_lock(&kvm_svm->avic.tables_lock); + + hlist =3D &kvm_svm->avic.physid_gpa_hash[avic_physid_hash(gfn)]; + t =3D __avic_physid_shadow_table_get(hlist, gfn); + + if (!t) + goto out_unlock; + + trace_kvm_avic_physid_table_write(gpa, bytes); + + /* + * Update policy: + * + * Only a write to a single entry, entry that had a valid backing page + * on the last VM entry with this page, and only if the + * write touches only the is_running and/or apic_id part of this entry + * is allowed. + * + * Writes outside of known number of entries are ignored to support + * case when the guest is adding entries to end of the page + * in the process of a cpu hotplug. + * + * All other writes, which are not supposed to happen during + * use of the page, cause the page to be invalidated, + * and read as a whole, next time it is used by a vCPU for VM entry. + */ + + if (first >=3D t->nentries) + goto out_table_put; + + if (first !=3D last || !test_bit(first, t->valid_entires)) + goto invalidate; + + /* update the entry with written bytes */ + old_entry =3D t->entries[first].gentry; + new_entry =3D old_entry; + memcpy(((u8 *)&new_entry) + entry_offset, new, bytes); + + /* if backing page changed, invalidate the whole page*/ + if (physid_entry_get_backing_table(old_entry) !=3D + physid_entry_get_backing_table(new_entry)) + goto invalidate; + + if (++t->flood_count > t->nentries * AVIC_PHYSID_FLOOD_COUNT) + goto invalidate; + + /* Update the backing cpu */ + l1_apic_id =3D physid_entry_get_apicid(new_entry); + avic_physid_shadow_entry_set_vcpu(vcpu->kvm, t, first, l1_apic_id); + t->entries[first].gentry =3D new_entry; + goto out_table_put; +invalidate: + avic_physid_shadow_table_invalidate(vcpu->kvm, t); +out_table_put: + __avic_physid_shadow_table_put(vcpu->kvm, t); +out_unlock: + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +static void avic_physid_shadow_table_flush_memslot(struct kvm *kvm, + struct kvm_memory_slot *slot, + struct kvm_page_track_notifier_node *node) +{ + struct kvm_svm *kvm_svm =3D to_kvm_svm(kvm); + struct avic_physid_table *t, *n; + int i; + + mutex_lock(&kvm_svm->avic.tables_lock); + + list_for_each_entry_safe(t, n, &kvm_svm->avic.physid_tables, link) { + + if (gfn_in_memslot(slot, t->gfn)) { + avic_physid_shadow_table_invalidate(kvm, t); + continue; + } + + for_each_set_bit(i, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) { + u64 gentry =3D t->entries[i].gentry; + gpa_t gpa =3D physid_entry_get_backing_table(gentry); + + if (gfn_in_memslot(slot, gpa_to_gfn(gpa))) { + avic_physid_shadow_table_invalidate(kvm, t); + break; + } + } + } + mutex_unlock(&kvm_svm->avic.tables_lock); +} + +bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu) +{ + int off; + + if (!nested_avic_in_use(vcpu)) + return false; + + for (off =3D 0x10; off < 0x80; off +=3D 0x10) + if (nested_avic_get_reg(vcpu, APIC_IRR + off)) + return true; + return false; +} + +void avic_reload_apic_pages(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *vcpu_svm =3D to_svm(vcpu); + struct avic_physid_table *t =3D vcpu_svm->nested.l2_physical_id_table; + + int nentries =3D vcpu_svm->nested.ctl.avic_physical_id & + AVIC_PHYSICAL_ID_TABLE_SIZE_MASK; + + if (t && is_guest_mode(vcpu) && nested_avic_in_use(vcpu)) + avic_physid_shadow_table_sync(vcpu, t, nentries); +} + +void avic_free_nested(struct kvm_vcpu *vcpu) +{ + struct avic_physid_table *t; + struct vcpu_svm *svm =3D to_svm(vcpu); + + t =3D svm->nested.l2_physical_id_table; + if (t) { + avic_physid_shadow_table_put(vcpu->kvm, t); + svm->nested.l2_physical_id_table =3D NULL; + } + + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); +} + /* * This is a wrapper of struct amd_iommu_ir_data. */ @@ -105,26 +625,38 @@ void avic_vm_destroy(struct kvm *kvm) { unsigned long flags; struct kvm_svm_avic *avic =3D &to_kvm_svm(kvm)->avic; + unsigned long i; + struct kvm_vcpu *vcpu; =20 if (!enable_apicv) return; =20 + kvm_for_each_vcpu(i, vcpu, kvm) { + vcpu_load(vcpu); + avic_free_nested(vcpu); + vcpu_put(vcpu); + } + if (avic->logical_id_table_page) __free_page(avic->logical_id_table_page); if (avic->physical_id_table_page) __free_page(avic->physical_id_table_page); + if (avic->invalid_physid_page) + __free_page(avic->invalid_physid_page); =20 spin_lock_irqsave(&svm_vm_data_hash_lock, flags); hash_del(&avic->hnode); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); + + + kvm_page_track_unregister_notifier(kvm, &avic->write_tracker); } =20 int avic_vm_init(struct kvm *kvm) { unsigned long flags; int err =3D -ENOMEM; - struct page *p_page; - struct page *l_page; + struct page *page; struct kvm_svm_avic *avic =3D &to_kvm_svm(kvm)->avic; u32 vm_id; =20 @@ -132,18 +664,26 @@ int avic_vm_init(struct kvm *kvm) return 0; =20 /* Allocating physical APIC ID table (4KB) */ - p_page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); - if (!p_page) + page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) goto free_avic; =20 - avic->physical_id_table_page =3D p_page; + avic->physical_id_table_page =3D page; =20 /* Allocating logical APIC ID table (4KB) */ - l_page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); - if (!l_page) + page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) + goto free_avic; + + avic->logical_id_table_page =3D page; + + page =3D alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!page) goto free_avic; =20 - avic->logical_id_table_page =3D l_page; + /* Allocating dummy page for invalid nested avic physid entries */ + avic->invalid_physid_page =3D page; + =20 spin_lock_irqsave(&svm_vm_data_hash_lock, flags); again: @@ -165,6 +705,14 @@ int avic_vm_init(struct kvm *kvm) hash_add(svm_vm_data_hash, &avic->hnode, avic->vm_id); spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); =20 + raw_spin_lock_init(&avic->table_entries_lock); + mutex_init(&avic->tables_lock); + INIT_LIST_HEAD(&avic->physid_tables); + + avic->write_tracker.track_write =3D avic_physid_shadow_table_track_write; + avic->write_tracker.track_flush_slot =3D avic_physid_shadow_table_flush_m= emslot; + + kvm_page_track_register_notifier(kvm, &avic->write_tracker); return 0; =20 free_avic: @@ -316,6 +864,161 @@ static void avic_kick_target_vcpus(struct kvm *kvm, s= truct kvm_lapic *source, } } =20 +static void +avic_kick_target_vcpu_nested_physical(struct vcpu_svm *svm, + int target_l2_apic_id, + int *index, + bool *invalid_page) +{ + u64 gentry, sentry; + int target_l1_apicid; + struct avic_physid_table *t =3D svm->nested.l2_physical_id_table; + + if (WARN_ON_ONCE(!t)) + return; + + /* + * This shouldn't normally happen as such condition + * should cause AVIC_IPI_FAILURE_INVALID_TARGET vmexit, + * however guest can change the page under us. + */ + if (target_l2_apic_id >=3D t->nentries) + return; + + gentry =3D t->entries[target_l2_apic_id].gentry; + sentry =3D *t->entries[target_l2_apic_id].sentry; + + /* Same reasoning as above */ + if (!(gentry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return; + + /* + * This races against the guest updating is_running bit. + * Race itself happens on real hardware as well, and the guest + * should use correct means to avoid it. + * + * AVIC hardware already set IRR and should have done memory + * barrier, and then found that shadowed is_running is false. + * We are doing another is_running check, completing it, + * thus don't need additional memory barriers + */ + + target_l1_apicid =3D physid_entry_get_apicid(gentry); + + if (target_l1_apicid =3D=3D -1) { + + /* is_running is false, need to vmexit to the guest */ + if (*index =3D=3D -1) { + u64 backing_page_phys =3D physid_entry_get_backing_table(sentry); + + *index =3D target_l2_apic_id; + if (backing_page_phys =3D=3D t->dummy_page_hpa) + *invalid_page =3D true; + } + } else { + /* Wake up the target vCPU and hide the VM exit from the guest */ + struct kvm_vcpu *target =3D avic_vcpu_by_l1_apicid(svm->vcpu.kvm, target= _l1_apicid); + + if (target && target !=3D &svm->vcpu) + kvm_vcpu_wake_up(target); + } + + trace_kvm_avic_nested_kick_vcpu(svm->vcpu.vcpu_id, + target_l2_apic_id, + target_l1_apicid); +} + +static void +avic_kick_target_vcpus_nested_logical(struct vcpu_svm *svm, unsigned long = dest, + int *index, bool *invalid_page) +{ + int logical_id; + u8 cluster =3D 0; + u64 *logical_id_table =3D (u64 *)svm->nested.l2_logical_id_table.hva; + int physical_index =3D -1; + + if (WARN_ON_ONCE(!logical_id_table)) + return; + + if (nested_avic_get_reg(&svm->vcpu, APIC_DFR) =3D=3D APIC_DFR_CLUSTER) { + if (dest >=3D 0x40) + return; + cluster =3D dest & 0x3C; + dest &=3D 0x3; + } + + for_each_set_bit(logical_id, &dest, 8) { + int logical_index =3D cluster | logical_id; + u64 log_gentry =3D logical_id_table[logical_index]; + int l2_apicid =3D logid_get_physid(log_gentry); + + /* Should not happen as in this case AVIC should VM exit + * with 'invalid target' + + * However the guest can change the entry under KVM's back, + * thus ignore this case. + */ + if (l2_apicid =3D=3D -1) + continue; + + avic_kick_target_vcpu_nested_physical(svm, l2_apicid, + &physical_index, + invalid_page); + + /* Reported index is the index of the logical entry in this case */ + if (physical_index !=3D -1) + *index =3D logical_index; + } +} + +static void +avic_kick_target_vcpus_nested_broadcast(struct vcpu_svm *svm, + int *index, bool *invalid_page) +{ + struct avic_physid_table *t =3D svm->nested.l2_physical_id_table; + int l2_apicid; + + /* + * This races against guest changing valid bit in the table and/or + * increasing nentries of the table. + * In both cases the race would happen on real hardware as well + * thus there is no need to take locks. + */ + for_each_set_bit(l2_apicid, t->valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT) + avic_kick_target_vcpu_nested_physical(svm, l2_apicid, + index, invalid_page); +} + + +static void avic_kick_target_vcpus_nested(struct kvm_vcpu *vcpu, + struct kvm_lapic *source, + u32 icrl, u32 icrh, + int *index, bool *invalid_page) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + int dest =3D GET_APIC_DEST_FIELD(icrh); + + switch (icrl & APIC_SHORT_MASK) { + case APIC_DEST_NOSHORT: + if (dest =3D=3D 0xFF) + avic_kick_target_vcpus_nested_broadcast(svm, + index, invalid_page); + else if (icrl & APIC_DEST_MASK) + avic_kick_target_vcpus_nested_logical(svm, dest, + index, invalid_page); + else + avic_kick_target_vcpu_nested_physical(svm, dest, + index, invalid_page); + break; + case APIC_DEST_ALLINC: + case APIC_DEST_ALLBUT: + avic_kick_target_vcpus_nested_broadcast(svm, index, invalid_page); + break; + case APIC_DEST_SELF: + break; + } +} + int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm =3D to_svm(vcpu); @@ -323,10 +1026,20 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu= *vcpu) u32 icrl =3D svm->vmcb->control.exit_info_1; u32 id =3D svm->vmcb->control.exit_info_2 >> 32; u32 index =3D svm->vmcb->control.exit_info_2 & 0xFF; + int nindex =3D -1; + bool invalid_page =3D false; + struct kvm_lapic *apic =3D vcpu->arch.apic; =20 trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index); =20 + if (is_guest_mode(&svm->vcpu)) { + if (WARN_ON_ONCE(!nested_avic_in_use(vcpu))) + return 1; + if (WARN_ON_ONCE(!svm->nested.l2_physical_id_table)) + return 1; + } + switch (id) { case AVIC_IPI_FAILURE_INVALID_INT_TYPE: /* @@ -338,23 +1051,49 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu= *vcpu) * which case KVM needs to emulate the ICR write as well in * order to clear the BUSY flag. */ + if (is_guest_mode(&svm->vcpu)) { + nested_svm_vmexit(svm); + break; + } + if (icrl & APIC_ICR_BUSY) kvm_apic_write_nodecode(vcpu, APIC_ICR); else kvm_apic_send_ipi(apic, icrl, icrh); + break; case AVIC_IPI_FAILURE_TARGET_NOT_RUNNING: /* * At this point, we expect that the AVIC HW has already * set the appropriate IRR bits on the valid target * vcpus. So, we just need to kick the appropriate vcpu. + * + * If nested we might also need to reflect the VM exit to + * the guest */ - avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh); + if (!is_guest_mode(&svm->vcpu)) { + avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh); + break; + } + + avic_kick_target_vcpus_nested(vcpu, apic, icrl, icrh, + &nindex, &invalid_page); + if (nindex !=3D -1) { + if (invalid_page) + id =3D AVIC_IPI_FAILURE_INVALID_BACKING_PAGE; + + svm->vmcb->control.exit_info_2 =3D ((u64)id << 32) | nindex; + nested_svm_vmexit(svm); + } break; case AVIC_IPI_FAILURE_INVALID_TARGET: + if (is_guest_mode(&svm->vcpu)) + nested_svm_vmexit(svm); + else + WARN_ON_ONCE(1); break; case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE: - WARN_ONCE(1, "Invalid backing page\n"); + WARN_ON_ONCE(1); break; default: pr_err("Unknown IPI interception\n"); @@ -370,6 +1109,48 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(str= uct kvm_vcpu *vcpu) return 0; } =20 +int avic_emulate_doorbell_write(struct kvm_vcpu *vcpu, u64 data) +{ + int source_l1_apicid =3D vcpu->vcpu_id; + int target_l1_apicid =3D data & AVIC_DOORBELL_PHYSICAL_ID_MASK; + bool target_running, target_nested; + struct kvm_vcpu *target; + + if (data & ~AVIC_DOORBELL_PHYSICAL_ID_MASK) + return 1; + + target =3D avic_vcpu_by_l1_apicid(vcpu->kvm, target_l1_apicid); + if (!target) + /* Guest bug: targeting invalid APIC ID. */ + return 0; + + target_running =3D READ_ONCE(target->mode) =3D=3D IN_GUEST_MODE; + target_nested =3D is_guest_mode(target); + + trace_kvm_avic_nested_doorbell(source_l1_apicid, target_l1_apicid, + target_nested, target_running); + + /* + * Target is not in nested mode, thus doorbell doesn't affect it + * if it became just now nested now, + * it means that it processed the doorbell on entry + */ + if (!target_nested) + return 0; + + /* + * If the target vCPU is in guest mode, kick the real doorbell. + * Otherwise we need to wake it up in case it is not scheduled to run. + */ + if (target_running) + wrmsr(MSR_AMD64_SVM_AVIC_DOORBELL, + kvm_cpu_get_apicid(READ_ONCE(target->cpu)), 0); + else + kvm_vcpu_wake_up(target); + + return 0; +} + static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool= flat) { struct kvm_svm *kvm_svm =3D to_kvm_svm(vcpu->kvm); @@ -463,9 +1244,13 @@ static void avic_handle_dfr_update(struct kvm_vcpu *v= cpu) =20 static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu) { + struct vcpu_svm *svm =3D to_svm(vcpu); u32 offset =3D to_svm(vcpu)->vmcb->control.exit_info_1 & AVIC_UNACCEL_ACCESS_OFFSET_MASK; =20 + if (WARN_ON_ONCE(is_guest_mode(&svm->vcpu))) + return 0; + switch (offset) { case APIC_LDR: if (avic_handle_ldr_update(vcpu)) @@ -523,6 +1308,8 @@ int avic_unaccelerated_access_interception(struct kvm_= vcpu *vcpu) AVIC_UNACCEL_ACCESS_WRITE_MASK; bool trap =3D is_avic_unaccelerated_access_trap(offset); =20 + WARN_ON_ONCE(is_guest_mode(&svm->vcpu)); + trace_kvm_avic_unaccelerated_access(vcpu->vcpu_id, offset, trap, write, vector); if (trap) { @@ -908,18 +1695,51 @@ static void avic_vcpu_load(struct kvm_vcpu *vcpu) int cpu =3D get_cpu(); =20 WARN_ON(cpu !=3D vcpu->cpu); - __avic_vcpu_load(vcpu, cpu); - put_cpu(); } =20 static void avic_vcpu_put(struct kvm_vcpu *vcpu) { preempt_disable(); - __avic_vcpu_put(vcpu); + preempt_enable(); +} + + +void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + lockdep_assert_preemption_disabled(); + + if (svm->nested.initialized && svm->avic_enabled) + avic_update_peer_physid_entries(vcpu, cpu); +} + +void __nested_avic_put(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm =3D to_svm(vcpu); + + lockdep_assert_preemption_disabled(); + + if (svm->nested.initialized && svm->avic_enabled) + avic_update_peer_physid_entries(vcpu, -1); +} + +void nested_avic_load(struct kvm_vcpu *vcpu) +{ + int cpu =3D get_cpu(); + + WARN_ON(cpu !=3D vcpu->cpu); + __nested_avic_load(vcpu, cpu); + put_cpu(); +} =20 +void nested_avic_put(struct kvm_vcpu *vcpu) +{ + preempt_disable(); + __nested_avic_put(vcpu); preempt_enable(); } =20 @@ -983,3 +1803,7 @@ void avic_vcpu_unblocking(struct kvm_vcpu *vcpu) =20 avic_vcpu_load(vcpu); } + +/* + * TODO: Deal with AVIC errata in regard to flushing TLB on vCPU change + */ diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index bed5e1692cef0..811fa79c51801 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -387,6 +387,14 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vc= pu *vcpu, memcpy(to->reserved_sw, from->reserved_sw, sizeof(struct hv_enlightenments)); } + + /* copy avic related settings only when it is enabled */ + if (from->int_ctl & AVIC_ENABLE_MASK) { + to->avic_vapic_bar =3D from->avic_vapic_bar; + to->avic_backing_page =3D from->avic_backing_page; + to->avic_logical_id =3D from->avic_logical_id; + to->avic_physical_id =3D from->avic_physical_id; + } } =20 void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm, @@ -539,6 +547,71 @@ void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm) svm->nested.vmcb02.ptr->save.g_pat =3D svm->vmcb01.ptr->save.g_pat; } =20 + +static bool nested_vmcb02_prepare_avic(struct vcpu_svm *svm) +{ + struct vmcb *vmcb02 =3D svm->nested.vmcb02.ptr; + struct avic_physid_table *t =3D svm->nested.l2_physical_id_table; + gfn_t physid_gfn; + int physid_nentries; + + if (!nested_avic_in_use(&svm->vcpu)) + return true; + + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.ctl.avic_backing_page= & AVIC_HPA_MASK), + &svm->nested.l2_apic_access_page)) + goto error; + + if (kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->nested.ctl.avic_logical_id &= AVIC_HPA_MASK), + &svm->nested.l2_logical_id_table)) + goto error_unmap_backing_page; + + physid_gfn =3D gpa_to_gfn(svm->nested.ctl.avic_physical_id & + AVIC_HPA_MASK); + physid_nentries =3D svm->nested.ctl.avic_physical_id & + AVIC_PHYSICAL_ID_TABLE_SIZE_MASK; + + if (t && t->gfn !=3D physid_gfn) { + avic_physid_shadow_table_put(svm->vcpu.kvm, t); + svm->nested.l2_physical_id_table =3D NULL; + } + + if (!svm->nested.l2_physical_id_table) { + t =3D avic_physid_shadow_table_get(&svm->vcpu, physid_gfn); + if (!t) + goto error_unmap_logical_id_table; + svm->nested.l2_physical_id_table =3D t; + } + + if (t->nentries < physid_nentries) + if (avic_physid_shadow_table_sync(&svm->vcpu, t, physid_nentries) < 0) + goto error_put_table; + + /* Everything is setup, we can enable AVIC */ + vmcb02->control.avic_vapic_bar =3D + svm->nested.ctl.avic_vapic_bar & VMCB_AVIC_APIC_BAR_MASK; + vmcb02->control.avic_backing_page =3D + pfn_to_hpa(svm->nested.l2_apic_access_page.pfn); + vmcb02->control.avic_logical_id =3D + pfn_to_hpa(svm->nested.l2_logical_id_table.pfn); + vmcb02->control.avic_physical_id =3D + (svm->nested.l2_physical_id_table->shadow_table_hpa) | physid_nentries; + + vmcb02->control.int_ctl |=3D AVIC_ENABLE_MASK; + vmcb_mark_dirty(vmcb02, VMCB_AVIC); + return true; + +error_put_table: + avic_physid_shadow_table_put(svm->vcpu.kvm, t); + svm->nested.l2_physical_id_table =3D NULL; +error_unmap_logical_id_table: + kvm_vcpu_unmap(&svm->vcpu, &svm->nested.l2_logical_id_table, false); +error_unmap_backing_page: + kvm_vcpu_unmap(&svm->vcpu, &svm->nested.l2_apic_access_page, false); +error: + return false; +} + static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *= vmcb12) { bool new_vmcb12 =3D false; @@ -627,6 +700,17 @@ static void nested_vmcb02_prepare_control(struct vcpu_= svm *svm) else int_ctl_vmcb01_bits |=3D (V_GIF_MASK | V_GIF_ENABLE_MASK); =20 + if (nested_avic_in_use(vcpu)) { + + /* + * Enabling AVIC implicitly disables the + * V_IRQ, V_INTR_PRIO, V_IGN_TPR, and V_INTR_VECTOR + * fields in the VMCB Control Word" + */ + int_ctl_vmcb12_bits &=3D ~V_IRQ_INJECTION_BITS_MASK; + } + + /* Copied from vmcb01. msrpm_base can be overwritten later. */ vmcb02->control.nested_ctl =3D vmcb01->control.nested_ctl; vmcb02->control.iopm_base_pa =3D vmcb01->control.iopm_base_pa; @@ -829,7 +913,10 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) if (enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, true)) goto out_exit_err; =20 - if (nested_svm_vmrun_msrpm(svm)) + if (!nested_svm_vmrun_msrpm(svm)) + goto out_exit_err; + + if (nested_vmcb02_prepare_avic(svm)) goto out; =20 out_exit_err: @@ -844,7 +931,6 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) =20 out: kvm_vcpu_unmap(vcpu, &map, true); - return ret; } =20 @@ -956,6 +1042,11 @@ int nested_svm_vmexit(struct vcpu_svm *svm) =20 nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); =20 + if (nested_avic_in_use(vcpu)) { + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); + } + svm_switch_vmcb(svm, &svm->vmcb01); =20 if (unlikely(svm->lbrv_enabled && (svm->nested.ctl.virt_ext & LBR_CTL_ENA= BLE_MASK))) { @@ -1069,6 +1160,7 @@ int svm_allocate_nested(struct vcpu_svm *svm) svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm); =20 svm->nested.initialized =3D true; + nested_avic_load(&svm->vcpu); return 0; =20 err_free_vmcb02: @@ -1078,6 +1170,8 @@ int svm_allocate_nested(struct vcpu_svm *svm) =20 void svm_free_nested(struct vcpu_svm *svm) { + struct kvm_vcpu *vcpu =3D &svm->vcpu; + if (!svm->nested.initialized) return; =20 @@ -1096,6 +1190,11 @@ void svm_free_nested(struct vcpu_svm *svm) */ svm->nested.last_vmcb12_gpa =3D INVALID_GPA; =20 + if (svm->avic_enabled) { + nested_avic_put(vcpu); + avic_free_nested(vcpu); + } + svm->nested.initialized =3D false; } =20 @@ -1116,8 +1215,10 @@ void svm_leave_nested(struct kvm_vcpu *vcpu) =20 nested_svm_uninit_mmu_context(vcpu); vmcb_mark_all_dirty(svm->vmcb); - } =20 + kvm_vcpu_unmap(vcpu, &svm->nested.l2_apic_access_page, true); + kvm_vcpu_unmap(vcpu, &svm->nested.l2_logical_id_table, true); + } kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); } =20 @@ -1206,6 +1307,20 @@ static int nested_svm_intercept(struct vcpu_svm *svm) vmexit =3D NESTED_EXIT_DONE; break; } + case SVM_EXIT_AVIC_UNACCELERATED_ACCESS: { + /* + * Unaccelerated AVIC access is always reflected + * and there is no intercept bit for it + */ + vmexit =3D NESTED_EXIT_DONE; + break; + } + case SVM_EXIT_AVIC_INCOMPLETE_IPI: + /* + * Doesn't have an intercept bit, host needs to intercept + * and in some cases reflect to the guest + */ + break; default: { if (vmcb12_is_intercept(&svm->nested.ctl, exit_code)) vmexit =3D NESTED_EXIT_DONE; @@ -1296,6 +1411,7 @@ static int svm_check_nested_events(struct kvm_vcpu *v= cpu) kvm_event_needs_reinjection(vcpu) || svm->nested.nested_run_pending; struct kvm_lapic *apic =3D vcpu->arch.apic; =20 + if (lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &apic->pending_events)) { if (block_nested_events) @@ -1423,6 +1539,13 @@ static void nested_copy_vmcb_cache_to_control(struct= vmcb_control_area *dst, dst->pause_filter_count =3D from->pause_filter_count; dst->pause_filter_thresh =3D from->pause_filter_thresh; /* 'clean' and 'reserved_sw' are not changed by KVM */ + + if (from->int_ctl & AVIC_ENABLE_MASK) { + dst->avic_vapic_bar =3D from->avic_vapic_bar; + dst->avic_backing_page =3D from->avic_backing_page; + dst->avic_logical_id =3D from->avic_logical_id; + dst->avic_physical_id =3D from->avic_physical_id; + } } =20 static int svm_get_nested_state(struct kvm_vcpu *vcpu, @@ -1644,7 +1767,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcp= u *vcpu) if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) return false; =20 - if (!nested_svm_vmrun_msrpm(svm)) { + if (!nested_svm_vmrun_msrpm(svm) || !nested_vmcb02_prepare_avic(svm)) { vcpu->run->exit_reason =3D KVM_EXIT_INTERNAL_ERROR; vcpu->run->internal.suberror =3D KVM_INTERNAL_ERROR_EMULATION; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index fc1725b7d05f6..3d9ab1e7b2b52 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1301,6 +1301,8 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu) =20 svm->guest_state_loaded =3D false; =20 + INIT_LIST_HEAD(&svm->nested.physid_ref_entries); + return 0; =20 error_free_vmsa_page: @@ -1390,8 +1392,11 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int= cpu) sd->current_vmcb =3D svm->vmcb; indirect_branch_prediction_barrier(); } + if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_load(vcpu, cpu); + + __nested_avic_load(vcpu, cpu); } =20 static void svm_vcpu_put(struct kvm_vcpu *vcpu) @@ -1399,6 +1404,8 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu) if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_put(vcpu); =20 + __nested_avic_put(vcpu); + svm_prepare_host_switch(vcpu); =20 ++vcpu->stat.host_state_reload; @@ -2764,6 +2771,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct = msr_data *msr) u32 ecx =3D msr->index; u64 data =3D msr->data; switch (ecx) { + case MSR_AMD64_SVM_AVIC_DOORBELL: + return avic_emulate_doorbell_write(vcpu, data); case MSR_AMD64_TSC_RATIO: =20 if (!svm->tsc_scaling_enabled) { @@ -4060,6 +4069,9 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu = *vcpu) if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC)) kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_X2APIC); } + + svm->avic_enabled =3D enable_apicv && guest_cpuid_has(vcpu, X86_FEATURE_A= VIC); + init_vmcb_after_set_cpuid(vcpu); } =20 @@ -4669,9 +4681,11 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D= { .enable_nmi_window =3D svm_enable_nmi_window, .enable_irq_window =3D svm_enable_irq_window, .update_cr8_intercept =3D svm_update_cr8_intercept, + .reload_apic_pages =3D avic_reload_apic_pages, .refresh_apicv_exec_ctrl =3D avic_refresh_apicv_exec_ctrl, .check_apicv_inhibit_reasons =3D avic_check_apicv_inhibit_reasons, .apicv_post_state_restore =3D avic_apicv_post_state_restore, + .guest_apic_has_interrupt =3D avic_nested_has_interrupt, =20 .get_mt_mask =3D svm_get_mt_mask, .get_exit_info =3D svm_get_exit_info, @@ -4798,6 +4812,9 @@ static __init void svm_set_cpu_caps(void) if (vgif) kvm_cpu_cap_set(X86_FEATURE_VGIF); =20 + if (enable_apicv) + kvm_cpu_cap_set(X86_FEATURE_AVIC); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); } @@ -4923,6 +4940,7 @@ static __init int svm_hardware_setup(void) svm_x86_ops.vcpu_blocking =3D NULL; svm_x86_ops.vcpu_unblocking =3D NULL; svm_x86_ops.vcpu_get_apicv_inhibit_reasons =3D NULL; + svm_x86_ops.guest_apic_has_interrupt =3D NULL; } =20 if (vls) { diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 96390fa5e3917..7d1a5028750e6 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -18,6 +18,7 @@ #include #include #include +#include =20 #include #include @@ -89,13 +90,36 @@ struct kvm_sev_info { }; =20 =20 +#define AVIC_PHYSID_HASH_SHIFT 8 +#define AVIC_PHYSID_HASH_SIZE (1 << AVIC_PHYSID_HASH_SHIFT) + struct kvm_svm_avic { u32 vm_id; struct page *logical_id_table_page; struct page *physical_id_table_page; struct hlist_node hnode; + + raw_spinlock_t table_entries_lock; + struct mutex tables_lock; + + /* List of all shadow tables */ + struct list_head physid_tables; + + /* GPA hash table to find a shadow table via its GPA */ + struct hlist_head physid_gpa_hash[AVIC_PHYSID_HASH_SIZE]; + + struct kvm_page_track_notifier_node write_tracker; + + struct page *invalid_physid_page; }; =20 + +static __always_inline unsigned int avic_physid_hash(gfn_t gfn) +{ + return hash_64(gfn, AVIC_PHYSID_HASH_SHIFT); +} + + struct kvm_svm { struct kvm kvm; struct kvm_svm_avic avic; @@ -145,6 +169,51 @@ struct vmcb_ctrl_area_cached { u64 virt_ext; u32 clean; u8 reserved_sw[32]; + + u64 avic_vapic_bar; + u64 avic_backing_page; + u64 avic_logical_id; + u64 avic_physical_id; +}; + +struct avic_physid_entry_descr { + struct list_head link; + + /* cached value of guest entry */ + u64 gentry; + + /* shadow table entry pointer*/ + u64 *sentry; +}; + +#define AVIC_PHYSID_FLOOD_COUNT 5 + +struct avic_physid_table { + /* List of all tables member */ + struct list_head link; + + /* GPA hash of all tables member */ + struct hlist_node hash_link; + + /* GPA of the table in guest memory*/ + gfn_t gfn; + + /* Number of entries that we shadow and which are valid*/ + int nentries; + DECLARE_BITMAP(valid_entires, AVIC_MAX_PHYSICAL_ID_COUNT); + + struct avic_physid_entry_descr entries[AVIC_MAX_PHYSICAL_ID_COUNT]; + + /* Guest visible shadow table */ + struct page *shadow_table; + hpa_t shadow_table_hpa; + hpa_t dummy_page_hpa; + + /* Number of vCPUs which are in nested mode and use this table */ + int refcount; + + /* Number of writes to this page between uses of it*/ + int flood_count; }; =20 struct svm_nested_state { @@ -180,6 +249,13 @@ struct svm_nested_state { * on its side. */ bool force_msr_bitmap_recalc; + + /* All AVIC shadow PID table entry descriptors that reference this vCPU */ + struct list_head physid_ref_entries; + + struct kvm_host_map l2_apic_access_page; + struct kvm_host_map l2_logical_id_table; + struct avic_physid_table *l2_physical_id_table; }; =20 struct vcpu_sev_es_state { @@ -242,11 +318,13 @@ struct vcpu_svm { bool pause_filter_enabled : 1; bool pause_threshold_enabled : 1; bool vgif_enabled : 1; + bool avic_enabled : 1; =20 u32 ldr_reg; u32 dfr_reg; struct page *avic_backing_page; u64 *avic_physical_id_cache; + bool nested_avic_active; =20 /* * Per-vcpu list of struct amd_svm_iommu_ir: @@ -614,6 +692,11 @@ int avic_unaccelerated_access_interception(struct kvm_= vcpu *vcpu); int avic_init_vcpu(struct vcpu_svm *svm); void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void __avic_vcpu_put(struct kvm_vcpu *vcpu); +void __nested_avic_load(struct kvm_vcpu *vcpu, int cpu); +void __nested_avic_put(struct kvm_vcpu *vcpu); +void nested_avic_load(struct kvm_vcpu *vcpu); +void nested_avic_put(struct kvm_vcpu *vcpu); + void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu); void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu); void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu); @@ -627,6 +710,73 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu); void avic_vcpu_unblocking(struct kvm_vcpu *vcpu); void avic_ring_doorbell(struct kvm_vcpu *vcpu); unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu); +int avic_emulate_doorbell_write(struct kvm_vcpu *vcpu, u64 data); +void avic_reload_apic_pages(struct kvm_vcpu *vcpu); +void avic_free_nested(struct kvm_vcpu *vcpu); +bool avic_nested_has_interrupt(struct kvm_vcpu *vcpu); + +struct avic_physid_table * +avic_physid_shadow_table_get(struct kvm_vcpu *vcpu, gfn_t gfn); +void avic_physid_shadow_table_put(struct kvm *kvm, struct avic_physid_tabl= e *t); +int avic_physid_shadow_table_sync(struct kvm_vcpu *vcpu, + struct avic_physid_table *t, int nentries); + +static inline bool nested_avic_in_use(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *vcpu_svm =3D to_svm(vcpu); + + if (!vcpu_svm->avic_enabled) + return false; + + if (!nested_npt_enabled(vcpu_svm)) + return false; + + return vcpu_svm->nested.ctl.int_ctl & AVIC_ENABLE_MASK; +} + +#define INVALID_BACKING_PAGE (~(u64)0) + +static inline u64 physid_entry_get_backing_table(u64 entry) +{ + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return INVALID_BACKING_PAGE; + return entry & AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK; +} + +static inline int physid_entry_get_apicid(u64 entry) +{ + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)) + return -1; + if (!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)) + return -1; + + return entry & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; +} + +static inline int logid_get_physid(u64 entry) +{ + if (!(entry & AVIC_LOGICAL_ID_ENTRY_VALID_BIT)) + return -1; + return entry & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK; +} + +static inline void physid_entry_set_backing_table(u64 *entry, u64 value) +{ + *entry &=3D ~AVIC_PHYSICAL_ID_ENTRY_BACKING_PAGE_MASK; + *entry |=3D (AVIC_PHYSICAL_ID_ENTRY_VALID_MASK | value); +} + +static inline void physid_entry_set_apicid(u64 *entry, int value) +{ + WARN_ON(!(*entry & AVIC_PHYSICAL_ID_ENTRY_VALID_MASK)); + + *entry &=3D ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + + if (value =3D=3D -1) + *entry &=3D ~(AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + else + *entry |=3D (AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK | value); +} =20 /* sev.c */ =20 diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index e3a24b8f04be8..e063580559e9f 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -1385,7 +1385,7 @@ TRACE_EVENT(kvm_apicv_accept_irq, ); =20 /* - * Tracepoint for AMD AVIC + * Tracepoints for AMD AVIC */ TRACE_EVENT(kvm_avic_incomplete_ipi, TP_PROTO(u32 vcpu, u32 icrh, u32 icrl, u32 id, u32 index), @@ -1459,6 +1459,144 @@ TRACE_EVENT(kvm_avic_ga_log, __entry->vmid, __entry->vcpuid) ); =20 + +TRACE_EVENT(kvm_avic_physid_table_alloc, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64, gpa) + ), + + TP_fast_assign( + __entry->gpa =3D gpa; + ), + + TP_printk("table at gpa 0x%llx", + __entry->gpa) +); + + +TRACE_EVENT(kvm_avic_physid_table_free, + TP_PROTO(u64 gpa), + TP_ARGS(gpa), + + TP_STRUCT__entry( + __field(u64, gpa) + ), + + TP_fast_assign( + __entry->gpa =3D gpa; + ), + + TP_printk("table at gpa 0x%llx", + __entry->gpa) +); + +TRACE_EVENT(kvm_avic_physid_table_reload, + TP_PROTO(u64 gpa, int nentries, int new_nentires), + TP_ARGS(gpa, nentries, new_nentires), + + TP_STRUCT__entry( + __field(u64, gpa) + __field(int, nentries) + __field(int, new_nentires) + ), + + TP_fast_assign( + __entry->gpa =3D gpa; + __entry->nentries =3D nentries; + __entry->new_nentires =3D new_nentires; + ), + + TP_printk("table at gpa 0x%llx, nentires %d -> %d", + __entry->gpa, __entry->nentries, __entry->new_nentires) +); + +TRACE_EVENT(kvm_avic_physid_table_write, + TP_PROTO(u64 gpa, int bytes), + TP_ARGS(gpa, bytes), + + TP_STRUCT__entry( + __field(u64, gpa) + __field(int, bytes) + ), + + TP_fast_assign( + __entry->gpa =3D gpa; + __entry->bytes =3D bytes; + ), + + TP_printk("gpa 0x%llx, write of %d bytes", + __entry->gpa, __entry->bytes) +); + +TRACE_EVENT(kvm_avic_physid_update_vcpu, + TP_PROTO(int vcpu_id, int cpu_id, int n), + TP_ARGS(vcpu_id, cpu_id, n), + + TP_STRUCT__entry( + __field(int, vcpu_id) + __field(int, cpu_id) + __field(int, n) + ), + + TP_fast_assign( + __entry->vcpu_id =3D vcpu_id; + __entry->cpu_id =3D cpu_id; + __entry->n =3D n; + ), + + TP_printk("vcpu %d cpu %d (%d entries)", + __entry->vcpu_id, __entry->cpu_id, __entry->n) +); + +TRACE_EVENT(kvm_avic_nested_doorbell, + TP_PROTO(int source_l1_apicid, int target_l1_apicid, bool target_nest= ed, + bool target_running), + TP_ARGS(source_l1_apicid, target_l1_apicid, target_nested, + target_running), + + TP_STRUCT__entry( + __field(int, source_l1_apicid) + __field(int, target_l1_apicid) + __field(bool, target_nested) + __field(bool, target_running) + ), + + TP_fast_assign( + __entry->source_l1_apicid =3D source_l1_apicid; + __entry->target_l1_apicid =3D target_l1_apicid; + __entry->target_nested =3D target_nested; + __entry->target_running =3D target_running; + ), + + TP_printk("source %d target %d (nested: %d, running %d)", + __entry->source_l1_apicid, __entry->target_l1_apicid, + __entry->target_nested, __entry->target_running) +); + +TRACE_EVENT(kvm_avic_nested_kick_vcpu, + TP_PROTO(int source_l1_apic_id, int target_l2_apic_id, int target_l1_= apic_id), + TP_ARGS(source_l1_apic_id, target_l2_apic_id, target_l1_apic_id), + + TP_STRUCT__entry( + __field(int, source_l1_apic_id) + __field(int, target_l2_apic_id) + __field(int, target_l1_apic_id) + ), + + TP_fast_assign( + __entry->source_l1_apic_id =3D source_l1_apic_id; + __entry->target_l2_apic_id =3D target_l2_apic_id; + __entry->target_l1_apic_id =3D target_l1_apic_id; + ), + + TP_printk("source l1 apic id: %d target l2 apic id: %d target l1 apic_id:= %d", + __entry->source_l1_apic_id, __entry->target_l2_apic_id, + __entry->target_l1_apic_id) +); + TRACE_EVENT(kvm_hv_timer_state, TP_PROTO(unsigned int vcpu_id, unsigned int hv_timer_in_use), TP_ARGS(vcpu_id, hv_timer_in_use), diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3ac2d0134271b..94c663a555a0c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13063,9 +13063,20 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window_update); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_alloc); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_free); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_reload); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_table_write); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_physid_update_vcpu); + +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_nested_doorbell); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_nested_kick_vcpu); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_accept_irq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit); --=20 2.26.3 From nobody Sun May 10 23:26:02 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A138C433F5 for ; Thu, 21 Apr 2022 05:15:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1384412AbiDUFS3 (ORCPT ); Thu, 21 Apr 2022 01:18:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1384363AbiDUFQr (ORCPT ); Thu, 21 Apr 2022 01:16:47 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A47A213D59 for ; Wed, 20 Apr 2022 22:13:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650518033; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TE0/IyhbX+PJJtmqWsYqEi/mVebhKn0u1Uz64oTynUA=; b=DPvYTjKjOQqc7bYknPx+lcT0djp2xTE4Q2pRoYENN7C+SF/Ws+JaYdDDbaiBZDc1NZEMmX +xBuios5lhZSx2NZO+Us8uvqh+H6pAtInmQNxnheqxzLLxKKeZUOXdhauoCCYlgZVnPdc1 t0HfqGsK9zWMB4AyRcY1no0UtDZlMJM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-212-BwterifsOBidoQRuIu9gkQ-1; Thu, 21 Apr 2022 01:13:50 -0400 X-MC-Unique: BwterifsOBidoQRuIu9gkQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5DC1A80005D; Thu, 21 Apr 2022 05:13:49 +0000 (UTC) Received: from localhost.localdomain (unknown [10.40.194.231]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0CEB1145BA5A; Thu, 21 Apr 2022 05:13:43 +0000 (UTC) From: Maxim Levitsky To: kvm@vger.kernel.org Cc: Rodrigo Vivi , Paolo Bonzini , intel-gfx@lists.freedesktop.org, Joonas Lahtinen , Jani Nikula , Thomas Gleixner , linux-kernel@vger.kernel.org, Wanpeng Li , Jim Mattson , Tvrtko Ursulin , "H. Peter Anvin" , Vitaly Kuznetsov , Zhi Wang , Daniel Vetter , intel-gvt-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org, x86@kernel.org, David Airlie , Sean Christopherson , Ingo Molnar , Joerg Roedel , Dave Hansen , Borislav Petkov , Zhenyu Wang , Maxim Levitsky Subject: [RFC PATCH v2 10/10] KVM: SVM: allow to avoid not needed updates to is_running Date: Thu, 21 Apr 2022 08:12:44 +0300 Message-Id: <20220421051244.187733-11-mlevitsk@redhat.com> In-Reply-To: <20220421051244.187733-1-mlevitsk@redhat.com> References: <20220421051244.187733-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Allow optionally to make KVM not update is_running unless it is functionally needed which is only when a vCPU halts, or is in the guest mode. This means security wise that if a vCPU is scheduled out, other vCPUs could still send doorbell messages to the last physical CPU where this vCPU was last running. If a malicious guest tries to do it can slow down the victim CPU by about 40% in my testing, so this should only be enabled if physical CPUs are not shared among guests. The option is avic_doorbell_strict and is true by default, setting it to false allows this relaxed non strict mode. Signed-off-by: Maxim Levitsky --- arch/x86/kvm/svm/avic.c | 19 ++++++++++++------- arch/x86/kvm/svm/svm.c | 19 ++++++++++++++----- arch/x86/kvm/svm/svm.h | 1 + 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index 9176c35662ada..1bfe58ee961b2 100644 --- a/arch/x86/kvm/svm/avic.c +++ b/arch/x86/kvm/svm/avic.c @@ -1641,7 +1641,7 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu= , int cpu, bool r) =20 void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { - u64 entry; + u64 old_entry, new_entry; int h_physical_id =3D kvm_cpu_get_apicid(cpu); struct vcpu_svm *svm =3D to_svm(vcpu); =20 @@ -1660,14 +1660,16 @@ void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cp= u) if (kvm_vcpu_is_blocking(vcpu)) return; =20 - entry =3D READ_ONCE(*(svm->avic_physical_id_cache)); - WARN_ON(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK); + old_entry =3D READ_ONCE(*(svm->avic_physical_id_cache)); + new_entry =3D old_entry; =20 - entry &=3D ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; - entry |=3D (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK); - entry |=3D AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + new_entry &=3D ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK; + new_entry |=3D (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_M= ASK); + new_entry |=3D AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; + + if (old_entry !=3D new_entry) + WRITE_ONCE(*(svm->avic_physical_id_cache), new_entry); =20 - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true); } =20 @@ -1777,6 +1779,9 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vc= pu) =20 void avic_vcpu_blocking(struct kvm_vcpu *vcpu) { + if (!avic_doorbell_strict) + __nested_avic_put(vcpu); + if (!kvm_vcpu_apicv_active(vcpu)) return; =20 diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 3d9ab1e7b2b52..7e79fefc81650 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -190,6 +190,10 @@ module_param(avic, bool, 0444); static bool force_avic; module_param_unsafe(force_avic, bool, 0444); =20 +bool avic_doorbell_strict =3D true; +module_param(avic_doorbell_strict, bool, 0444); + + bool __read_mostly dump_invalid_vmcb; module_param(dump_invalid_vmcb, bool, 0644); =20 @@ -1395,16 +1399,21 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, in= t cpu) =20 if (kvm_vcpu_apicv_active(vcpu)) __avic_vcpu_load(vcpu, cpu); - __nested_avic_load(vcpu, cpu); } =20 static void svm_vcpu_put(struct kvm_vcpu *vcpu) { - if (kvm_vcpu_apicv_active(vcpu)) - __avic_vcpu_put(vcpu); - - __nested_avic_put(vcpu); + /* + * Forbid AVIC's peers to send interrupts + * to this CPU unless we are in non strict mode, + * in which case, we will do so only when this vCPU blocks + */ + if (avic_doorbell_strict) { + if (kvm_vcpu_apicv_active(vcpu)) + __avic_vcpu_put(vcpu); + __nested_avic_put(vcpu); + } =20 svm_prepare_host_switch(vcpu); =20 diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 7d1a5028750e6..7139bbb534f9e 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -36,6 +36,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; extern bool npt_enabled; extern int vgif; extern bool intercept_smi; +extern bool avic_doorbell_strict; =20 /* * Clean bits in VMCB. --=20 2.26.3