From nobody Wed Dec 17 19:43:51 2025 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8CFA1F61C; Mon, 9 Dec 2024 01:05:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733706341; cv=none; b=DQCmaH4booMRcGk00qR8xbnSrxE3njMfa2CtzdyVOn0Y/Urzepr+KDLBMNTD+eoMHIHdkNvCD38xhfEx9AncfjFBVPTNYe+X7oVnWKb+bonQ5HZ9tiHi48FKZ2xAbEcu0H9KFttZ1Lt82zI4AKtx1XSVTePLZdLTi70Klpg2zQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733706341; c=relaxed/simple; bh=stfiHi25sk71ujlViQsooGofNoOuqcWASBBc9QE/OCg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HyVQPaXNsck1LeZhvomuZZJARfW9eFxshezWyGQ7g/bsqD7BLnhrgziqx2oDPt/UEw3IZX+s7f17TfZNmdhr9SqReRwBIajvFzfq5kmp2PwS6/v95IZTSMemSL82LiAFvP7loa7QEaeS7YzBrXoTpcErgnsChC1TO5d3TUEVp6M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LuavPKn5; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LuavPKn5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1733706340; x=1765242340; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=stfiHi25sk71ujlViQsooGofNoOuqcWASBBc9QE/OCg=; b=LuavPKn5pnFhxePdiwZ33LU0w1E6SJUo63YsJOXs4xZzA6taQQUTWaj6 D31XbEtug7tyxacy+McDxD85qlh/8LmHpuQtDJwj+noCF39JxlkvWmWbk E5Y6DPucw3U8etUA4IGlBofNjc0M9fJ+qKuXXRA5+LBMUVxIzPTa53Bp9 IlzWCULG4EV2cili4jOph9D+6Gl2Bka6drxx0sCuabvb9sXUObHiJxVWR ieLNp136IhrMGIx5wo0Vx7IknoQzPSIPjfAW50V0WME0lBV74LlUjc+cT /vuZGgFFCVNuklLmU5jcTtnEfuS7GRFvKX0hKprkdjfcMPyvRSiXMrAu7 A==; X-CSE-ConnectionGUID: 0g6QGR57R32k+QmPRg86Zg== X-CSE-MsgGUID: cztECklaRlWUPt6LDUI5Bg== X-IronPort-AV: E=McAfee;i="6700,10204,11280"; a="36833679" X-IronPort-AV: E=Sophos;i="6.12,218,1728975600"; d="scan'208";a="36833679" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2024 17:05:40 -0800 X-CSE-ConnectionGUID: GYhPBrMDTPSwp0y1HlGW0A== X-CSE-MsgGUID: I98aAQmdRrqLqcA8MRUEUQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,218,1728975600"; d="scan'208";a="95402402" Received: from litbin-desktop.sh.intel.com ([10.239.156.93]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Dec 2024 17:05:36 -0800 From: Binbin Wu To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com, adrian.hunter@intel.com, reinette.chatre@intel.com, xiaoyao.li@intel.com, tony.lindgren@linux.intel.com, isaku.yamahata@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, linux-kernel@vger.kernel.org, binbin.wu@linux.intel.com Subject: [PATCH 01/16] KVM: TDX: Add support for find pending IRQ in a protected local APIC Date: Mon, 9 Dec 2024 09:07:15 +0800 Message-ID: <20241209010734.3543481-2-binbin.wu@linux.intel.com> X-Mailer: git-send-email 2.46.0 In-Reply-To: <20241209010734.3543481-1-binbin.wu@linux.intel.com> References: <20241209010734.3543481-1-binbin.wu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add flag and hook to KVM's local APIC management to support determining whether or not a TDX guest as a pending IRQ. For TDX vCPUs, the virtual APIC page is owned by the TDX module and cannot be accessed by KVM. As a result, registers that are virtualized by the CPU, e.g. PPR, cannot be read or written by KVM. To deliver interrupts for TDX guests, KVM must send an IRQ to the CPU on the posted interrupt notification vector. And to determine if TDX vCPU has a pending interrupt, KVM must check if there is an outstanding notification. Return "no interrupt" in kvm_apic_has_interrupt() if the guest APIC is protected to short-circuit the various other flows that try to pull an IRQ out of the vAPIC, the only valid operation is querying _if_ an IRQ is pending, KVM can't do anything based on _which_ IRQ is pending. Intentionally omit sanity checks from other flows, e.g. PPR update, so as not to degrade non-TDX guests with unnecessary checks. A well-behaved KVM and userspace will never reach those flows for TDX guests, but reaching them is not fatal if something does go awry. Note, this doesn't handle interrupts that have been delivered to the vCPU but not yet recognized by the core, i.e. interrupts that are sitting in vmcs.GUEST_INTR_STATUS. Querying that state requires a SEAMCALL and will be supported in a future patch. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Signed-off-by: Binbin Wu --- TDX interrupts breakout: - Dropped vt_protected_apic_has_interrupt() with KVM_BUG_ON(), wire in tdx_protected_apic_has_interrupt() directly. (Rick) - Add {} on else in vt_hardware_setup() --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/irq.c | 3 +++ arch/x86/kvm/lapic.c | 3 +++ arch/x86/kvm/lapic.h | 2 ++ arch/x86/kvm/vmx/main.c | 3 +++ arch/x86/kvm/vmx/tdx.c | 6 ++++++ arch/x86/kvm/vmx/x86_ops.h | 2 ++ 8 files changed, 21 insertions(+) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index ec1b1b39c6b3..d5faaaee6ac0 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -114,6 +114,7 @@ KVM_X86_OP_OPTIONAL(pi_start_assignment) KVM_X86_OP_OPTIONAL(apicv_pre_state_restore) KVM_X86_OP_OPTIONAL(apicv_post_state_restore) KVM_X86_OP_OPTIONAL_RET0(dy_apicv_has_pending_interrupt) +KVM_X86_OP_OPTIONAL(protected_apic_has_interrupt) KVM_X86_OP_OPTIONAL(set_hv_timer) KVM_X86_OP_OPTIONAL(cancel_hv_timer) KVM_X86_OP(setup_mce) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 37dc7edef1ca..32c7d58a5d68 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1811,6 +1811,7 @@ struct kvm_x86_ops { void (*apicv_pre_state_restore)(struct kvm_vcpu *vcpu); void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu); bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu); + bool (*protected_apic_has_interrupt)(struct kvm_vcpu *vcpu); =20 int (*set_hv_timer)(struct kvm_vcpu *vcpu, u64 guest_deadline_tsc, bool *expired); diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 63f66c51975a..f0644d0bbe11 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -100,6 +100,9 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) if (kvm_cpu_has_extint(v)) return 1; =20 + if (lapic_in_kernel(v) && v->arch.apic->guest_apic_protected) + return static_call(kvm_x86_protected_apic_has_interrupt)(v); + return kvm_apic_has_interrupt(v) !=3D -1; /* LAPIC */ } EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 65412640cfc7..684777c2f0a4 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2920,6 +2920,9 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) if (!kvm_apic_present(vcpu)) return -1; =20 + if (apic->guest_apic_protected) + return -1; + __apic_update_ppr(apic, &ppr); return apic_has_interrupt_for_ppr(apic, ppr); } diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 1b8ef9856422..82355faf8c0d 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -65,6 +65,8 @@ struct kvm_lapic { bool sw_enabled; bool irr_pending; bool lvt0_in_nmi_mode; + /* Select registers in the vAPIC cannot be read/written. */ + bool guest_apic_protected; /* Number of bits set in ISR. */ s16 isr_count; /* The highest vector set in ISR; if -1 - invalid, must scan ISR. */ diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 4f6faeb6e8e5..a964093b5c03 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -58,6 +58,8 @@ static __init int vt_hardware_setup(void) vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; + } else { + vt_x86_ops.protected_apic_has_interrupt =3D NULL; } =20 return 0; @@ -356,6 +358,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .sync_pir_to_irr =3D vmx_sync_pir_to_irr, .deliver_interrupt =3D vmx_deliver_interrupt, .dy_apicv_has_pending_interrupt =3D pi_has_pending_interrupt, + .protected_apic_has_interrupt =3D tdx_protected_apic_has_interrupt, =20 .set_tss_addr =3D vmx_set_tss_addr, .set_identity_map_addr =3D vmx_set_identity_map_addr, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d7cdb44be5cf..877cf9e1fd65 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -722,6 +722,7 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) return -EINVAL; =20 fpstate_set_confidential(&vcpu->arch.guest_fpu); + vcpu->arch.apic->guest_apic_protected =3D true; =20 vcpu->arch.efer =3D EFER_SCE | EFER_LME | EFER_LMA | EFER_NX; =20 @@ -764,6 +765,11 @@ void tdx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) local_irq_enable(); } =20 +bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu) +{ + return pi_has_pending_interrupt(vcpu); +} + /* * Compared to vmx_prepare_switch_to_guest(), there is not much to do * as SEAMCALL/SEAMRET calls take care of most of save and restore. diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h index 1c18943e0e1d..a3a5b25976f0 100644 --- a/arch/x86/kvm/vmx/x86_ops.h +++ b/arch/x86/kvm/vmx/x86_ops.h @@ -133,6 +133,7 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu); fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit); void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void tdx_vcpu_put(struct kvm_vcpu *vcpu); +bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu); int tdx_handle_exit(struct kvm_vcpu *vcpu, enum exit_fastpath_completion fastpath); void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, @@ -171,6 +172,7 @@ static inline fastpath_t tdx_vcpu_run(struct kvm_vcpu *= vcpu, bool force_immediat } static inline void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu) {} static inline void tdx_vcpu_put(struct kvm_vcpu *vcpu) {} +static inline bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)= { return false; } static inline int tdx_handle_exit(struct kvm_vcpu *vcpu, enum exit_fastpath_completion fastpath) { return 0; } static inline void tdx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, u= 64 *info1, --=20 2.46.0