From nobody Wed Dec 17 10:46:09 2025 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DE1217D898 for ; Fri, 12 Jul 2024 17:01:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803715; cv=none; b=TB2ZjNF45y3mFgxXANiuaRvAfgo7GrsFOytbmbETeIoCZ56TFfvRTnrx0FcCAFaJUYXabi3OCT3NBH2txPbeBPlyaNx6uf/7LWZAEmAmiv8Ux2Xpa9FI5u90bmRQbnHB/26p/bXANAFBJ9wv3h5rEAu24C0ZS3Il06lR4LzDXfg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803715; c=relaxed/simple; bh=RlnEc2kzUTtJVPred8ra/z6KZ+O/0UJVwVIJk8uU5UA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cV/UAmMNrpFRDyB6SxpSecbNUgkG5rFpgI75qLVJLJ9+J5I2aix4QtWesiXWonYrCurYLUxBVp+Fwug+AaGLUMmUVD28ANSbCoi1vkOTu+KRVjTXsjo1+ovJ10S8q4sfebOHl1vUNQTF1GtTZpiRj75l6MVZhs9aT6wvT9kKmH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xj6l2CTD; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xj6l2CTD" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e0561513c21so3716110276.0 for ; Fri, 12 Jul 2024 10:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803713; x=1721408513; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=xj6l2CTDjePDABHZ9WMo+3Xh/938jSAYAMhAJCnh4Tshfaw+ePABB2aERXMuWtp3Yk gK2jfxOIOhxqeXitFIDeh5pq09qig6YXzajhNnofOyS1g8qDWhhYeX3XFnENctbqm5ih W9ZUQGdHOaBSI2W0xyQ+kZTRZkF5i7zmXq48ALKbtZj/LM76ORowR0dRhN9m1IH0g9hr zB0G6SfY3HpPSUvqA9e3+GO2hPGu+32Wt9U/m8spK5B0/JkMYfoXU1I7zez6I8sFUHVd dY+UuyH9wQjv/W002TzLG0N9BtU9Pe8LKE9XsS9o1ytbKs5l4XpJEQjCejWDYKTwgkZR LLlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803713; x=1721408513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=XU4n47LtYKabnCACwQd/TGyvcaBbrvBaSO8ZDzKOra/TLQGn4zO0gjErgVR7WwJePV +z73XjjjoMvvdS67/9K/48ahYpV0DqH54De/cXPbsxgLmott2QyNZvkqvj8M8Ty9LySN 2rTeic5UbjzUy7WomxORocbqJjizfAwIoixVId7EfnGmeYj4Fd3uBeThSemkdfLIekYF CdofHLC6mglr3ceXJjJuIQQGGzPKfChFwe8igYfbxepCW97PkT/LWtYvMi5kJzWWV4qq u7Of1PpHg2KsZyam9cv4dAefujGlOBCjTozkIcwjXH4FboDUERiM9wpcAFcsTHTOTjbY 6h3A== X-Forwarded-Encrypted: i=1; AJvYcCVF9Zrtv4kxY5XbkhuI8I/yqzLukm8nl/K1BfjywRDNwT5tT1RRn+KWKqn4PBBmWYt0xIG3GLmGHLNqCYFdvkjCdbeGJrYFRm7qjNAV X-Gm-Message-State: AOJu0YwuAOrq6PhYznRl0Rhi7KopqI/+k3r0qXhCovQvPpCqeXFIYm3/ b9wrQrNwvRxDBrKrqx9sSX1M9K40qqL8biEYx17VhwjlKZrvMoq/X2cI44x7Xy7wFYtrO2Kgdr2 Hwf4ZY8ShCA== X-Google-Smtp-Source: AGHT+IGnjpXnxbERxy2U1NWxsSb37YpVUWZ9S/9b1Rm5PwazcdcV13kMgwRbwaJighCTP4xDBxvKZC1Kul9XcQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:1389:b0:e02:f35c:d398 with SMTP id 3f1490d57ef6-e058a707db8mr92172276.0.1720803713060; Fri, 12 Jul 2024 10:01:53 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:39 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-21-144b319a40d8@google.com> Subject: [PATCH 21/26] KVM: x86: asi: Restricted address space for VM execution From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable An ASI restricted address space is added for KVM. It is currently only enabled for Intel CPUs. This change incorporates an extra asi_exit at the end of vcpu_run. We expect later iterations of ASI to drop that call as we gain the ablity to context switch within the ASI domain. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 36 ++++++++++++++++++++++-------------- arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++++++-- 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 6efd1497b0263..6c3326cb8273c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -36,6 +36,7 @@ #include #include #include +#include =20 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS =20 @@ -1514,6 +1515,8 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + + struct asi *asi; }; =20 struct kvm_vm_stat { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9aaf83c8d57df..6f9a279c12dc7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4108,6 +4108,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vc= pu *vcpu, bool spec_ctrl_in guest_state_enter_irqoff(); =20 amd_clear_divider(); + asi_enter(vcpu->kvm->arch.asi); =20 if (sev_es_guest(vcpu->kvm)) __svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted, @@ -4115,6 +4116,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vc= pu *vcpu, bool spec_ctrl_in else __svm_vcpu_run(svm, spec_ctrl_intercepted); =20 + asi_relax(); guest_state_exit_irqoff(); } =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 22411f4aff530..1105d666a8ade 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -49,6 +49,7 @@ #include #include #include +#include =20 #include =20 @@ -7255,14 +7256,32 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_= vcpu *vcpu, unsigned int flags) { struct vcpu_vmx *vmx =3D to_vmx(vcpu); + unsigned long cr3; =20 guest_state_enter_irqoff(); + asi_enter(vcpu->kvm->arch.asi); + + /* + * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately + * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time + * it switches back to the current->mm, which can occur in KVM context + * when switching to a temporary mm to patch kernel code, e.g. if KVM + * toggles a static key while handling a VM-Exit. + * Also, this must be done after asi_enter(), as it changes CR3 + * when switching address spaces. + */ + cr3 =3D __get_current_cr3_fast(); + if (unlikely(cr3 !=3D vmx->loaded_vmcs->host_state.cr3)) { + vmcs_writel(HOST_CR3, cr3); + vmx->loaded_vmcs->host_state.cr3 =3D cr3; + } =20 /* * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW * mitigation for MDS is done late in VMentry and is still * executed in spite of L1D Flush. This is because an extra VERW * should not matter much after the big hammer L1D Flush. + * This is only after asi_enter() for performance reasons. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); @@ -7283,6 +7302,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vc= pu *vcpu, =20 vmx->idt_vectoring_info =3D 0; =20 + asi_relax(); + vmx_enable_fb_clear(vmx); =20 if (unlikely(vmx->fail)) { @@ -7311,7 +7332,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vc= pu *vcpu, static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate= _exit) { struct vcpu_vmx *vmx =3D to_vmx(vcpu); - unsigned long cr3, cr4; + unsigned long cr4; =20 /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && @@ -7354,19 +7375,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu= , bool force_immediate_exit) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); vcpu->arch.regs_dirty =3D 0; =20 - /* - * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately - * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time - * it switches back to the current->mm, which can occur in KVM context - * when switching to a temporary mm to patch kernel code, e.g. if KVM - * toggles a static key while handling a VM-Exit. - */ - cr3 =3D __get_current_cr3_fast(); - if (unlikely(cr3 !=3D vmx->loaded_vmcs->host_state.cr3)) { - vmcs_writel(HOST_CR3, cr3); - vmx->loaded_vmcs->host_state.cr3 =3D cr3; - } - cr4 =3D cr4_read_shadow(); if (unlikely(cr4 !=3D vmx->loaded_vmcs->host_state.cr4)) { vmcs_writel(HOST_CR4, cr4); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91478b769af08..b9947e88d4ac6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -85,6 +85,7 @@ #include #include #include +#include =20 #define CREATE_TRACE_POINTS #include "trace.h" @@ -318,6 +319,8 @@ u64 __read_mostly host_xcr0; =20 static struct kmem_cache *x86_emulator_cache; =20 +static int __read_mostly kvm_asi_index =3D -1; + /* * When called, it means the previous get/set msr reached an invalid msr. * Return true if we want to ignore/silent this failed msr access. @@ -9750,6 +9753,11 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r) goto out_free_percpu; =20 + r =3D asi_register_class("KVM", NULL); + if (r < 0) + goto out_mmu_exit; + kvm_asi_index =3D r; + if (boot_cpu_has(X86_FEATURE_XSAVE)) { host_xcr0 =3D xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 =3D host_xcr0 & KVM_SUPPORTED_XCR0; @@ -9767,7 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) =20 r =3D ops->hardware_setup(); if (r !=3D 0) - goto out_mmu_exit; + goto out_asi_unregister; =20 kvm_ops_update(ops); =20 @@ -9820,6 +9828,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) out_unwind_ops: kvm_x86_ops.hardware_enable =3D NULL; static_call(kvm_x86_hardware_unsetup)(); +out_asi_unregister: + asi_unregister_class(kvm_asi_index); out_mmu_exit: kvm_mmu_vendor_module_exit(); out_free_percpu: @@ -9851,6 +9861,7 @@ void kvm_x86_vendor_exit(void) cancel_work_sync(&pvclock_gtod_work); #endif static_call(kvm_x86_hardware_unsetup)(); + asi_unregister_class(kvm_asi_index); kvm_mmu_vendor_module_exit(); free_percpu(user_return_msrs); kmem_cache_destroy(x86_emulator_cache); @@ -11436,6 +11447,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) =20 r =3D vcpu_run(vcpu); =20 + /* + * At present ASI doesn't have the capability to transition directly + * from the restricted address space to the user address space. So we + * just return to the unrestricted address space in between. + */ + asi_exit(); + out: kvm_put_guest_fpu(vcpu); if (kvm_run->kvm_valid_regs) @@ -12539,10 +12557,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned lo= ng type) =20 kvm_mmu_init_vm(kvm); =20 - ret =3D static_call(kvm_x86_vm_init)(kvm); + ret =3D asi_init(kvm->mm, kvm_asi_index, &kvm->arch.asi); if (ret) goto out_uninit_mmu; =20 + ret =3D static_call(kvm_x86_vm_init)(kvm); + if (ret) + goto out_asi_destroy; + INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list); atomic_set(&kvm->arch.noncoherent_dma_count, 0); =20 @@ -12579,6 +12601,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) =20 return 0; =20 +out_asi_destroy: + asi_destroy(kvm->arch.asi); out_uninit_mmu: kvm_mmu_uninit_vm(kvm); kvm_page_track_cleanup(kvm); @@ -12720,6 +12744,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_destroy_vcpus(kvm); kvfree(rcu_dereference_check(kvm->arch.apic_map, 1)); kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1)); + asi_destroy(kvm->arch.asi); kvm_mmu_uninit_vm(kvm); kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); --=20 2.45.2.993.g49e7a77208-goog