From nobody Mon May 25 01:17:36 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B47F235DA76; Tue, 19 May 2026 18:27:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779215254; cv=none; b=Z+lXnr8/21joqvHpFGi6rJrWoPYsofBPNaL+RMywzVkrDs3ynGt44UYneSXuTmwwaJHrJ6UTRv/6jYzZHCQCDLkxJZVjwlhnsnzjKTQMJJrL/IUIS19iRxXyGleZVptyrAjgjbi0QuOhjqRVEXMLoryxD/XdZPIuPVB9JF0cY5w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779215254; c=relaxed/simple; bh=nNcHVQQAZrz37TCGM7RsPswbfbVQoxqX8QZgxhpQSZ0=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=N1vdPPA3HbxHvoVB5SNGMQi7S3qj7V5b+YHwLgnQuMU/Xvk9A5QT4QMW1+qFRxwjzbMa7XNi03IlyHORHBzTQUt5K53HNA+fySCoIX0jIeI1pr38P+kxCpO6iJ/kE5dcC6vU3RTb3BhuCxHZezMh7zOsGZugCKGk4upLBzHz1+c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Jw5zBB+X; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fFLvUAzN; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Jw5zBB+X"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fFLvUAzN" Date: Tue, 19 May 2026 18:27:29 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1779215250; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WBZhSQlN3fItW2ipQq2Cu8gWDOA/ojjU6f+hN+dPIMY=; b=Jw5zBB+XQZSYUkKOjrZwXgMc6/aadwomWihMFHfdm2TumZe9KHC5j3ovkJojuqdaHx/1Q4 LrKb5kRVZaHZTbFju5L8ELJZM2PBKvZDiuSLEMQQMiuGUrDEXlOyW9rO4mgvkiH28wpBVr ZIfK8RDsYUY86XGYTw4ZPsB6KNVB0alDN1VanrlabdoUufGPJa6XvGNjLVEqPbF5bdqEpF F3H8B2Gt7P5iS6im4mqG0YPXf2JsXZ+WJtwJhXuIJypTvUrOx8a/G1vRTuL05mWQRk4mQX g6PLi4ToZZNDJ9PG6ypKt0avcyRmt20AoGELhK3lVGL+RIwoQg4Edxm4ZAc/hw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1779215250; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WBZhSQlN3fItW2ipQq2Cu8gWDOA/ojjU6f+hN+dPIMY=; b=fFLvUAzNb5bKWno9nQncr+ztricOCR36UqE0MOJp5MgOEfPaqpN0TrbeROxhq3pYT5FpsV +cURJzvo0gAb9vAw== From: "tip-bot2 for Peter Zijlstra" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/urgent] x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core Cc: Sean Christopherson , "Peter Zijlstra (Intel)" , Thomas Gleixner , "Verma, Vishal L" , Zhao Liu , Binbin Wu , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260508091829.GO3126523@noisy.programming.kicks-ass.net> References: <20260508091829.GO3126523@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177921524951.711.9142753022148052907.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 0701c9e17bd903d95b2ddf7dd2e1d8be5027f331 Gitweb: https://git.kernel.org/tip/0701c9e17bd903d95b2ddf7dd2e1d8be5= 027f331 Author: Peter Zijlstra AuthorDate: Fri, 08 May 2026 11:18:29 +02:00 Committer: Thomas Gleixner CommitterDate: Tue, 19 May 2026 20:25:51 +02:00 x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core Move the VMX interrupt dispatch magic into the x86 core code. This isolates KVM from the FRED/IDT decisions and reduces the amount of EXPORT_SYMBOL_FOR_KVM(). Suggested-by: Sean Christopherson Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Tested-by: "Verma, Vishal L" Tested-by: Zhao Liu Tested-by: Zhao Liu Tested-by: Sean Christopherson Reviewed-by: Binbin Wu Acked-by: Sean Christopherson Link: https://patch.msgid.link/20260508091829.GO3126523@noisy.programming.k= icks-ass.net --- arch/x86/entry/Makefile | 2 +- arch/x86/entry/common.c | 48 ++++++++++++++++++++++++++++- arch/x86/entry/entry.S | 46 +++++++++++++++++++++++++++- arch/x86/entry/entry_64_fred.S | 1 +- arch/x86/include/asm/desc.h | 4 ++- arch/x86/include/asm/desc_defs.h | 2 +- arch/x86/include/asm/entry-common.h | 2 +- arch/x86/include/asm/fred.h | 1 +- arch/x86/kernel/idt.c | 15 +++++++++- arch/x86/kernel/nmi.c | 1 +- arch/x86/kvm/vmx/vmenter.S | 46 +--------------------------- arch/x86/kvm/vmx/vmx.c | 19 +---------- 12 files changed, 119 insertions(+), 68 deletions(-) create mode 100644 arch/x86/entry/common.c diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile index 72cae8e..83b4762 100644 --- a/arch/x86/entry/Makefile +++ b/arch/x86/entry/Makefile @@ -13,7 +13,7 @@ CFLAGS_REMOVE_syscall_64.o =3D $(CC_FLAGS_FTRACE) CFLAGS_syscall_32.o +=3D -fno-stack-protector CFLAGS_syscall_64.o +=3D -fno-stack-protector =20 -obj-y :=3D entry.o entry_$(BITS).o syscall_$(BITS).o +obj-y :=3D entry.o entry_$(BITS).o syscall_$(BITS).o common.o =20 obj-y +=3D vdso/ obj-y +=3D vsyscall/ diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c new file mode 100644 index 0000000..b62ac82 --- /dev/null +++ b/arch/x86/entry/common.c @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include +#include +#include +#include + +#if IS_ENABLED(CONFIG_KVM_INTEL) +/* + * On VMX, NMIs and IRQs (as configured by KVM) are acknowledged by hardwa= re as + * part of the VM-Exit, i.e. the event itself is consumed as part the VM-E= xit. + * x86_entry_from_kvm() is invoked by KVM to effectively forward NMIs and = IRQs + * to the kernel for servicing. On SVM, a.k.a. AMD, the NMI/IRQ VM-Exit is + * purely a signal that an NMI/IRQ is pending, i.e. the event that trigger= ed + * the VM-Exit is held pending until it's unblocked in the host. + */ +noinstr void x86_entry_from_kvm(unsigned int event_type, unsigned int vect= or) +{ + if (event_type =3D=3D EVENT_TYPE_EXTINT) { +#ifdef CONFIG_X86_64 + /* + * Use FRED dispatch, even when running IDT. The dispatch + * tables are kept in sync between FRED and IDT, and the FRED + * dispatch works well with CFI. + */ + fred_entry_from_kvm(event_type, vector); +#else + idt_entry_from_kvm(vector); +#endif + return; + } + + WARN_ON_ONCE(event_type !=3D EVENT_TYPE_NMI); + +#ifdef CONFIG_X86_64 + if (cpu_feature_enabled(X86_FEATURE_FRED)) + return fred_entry_from_kvm(event_type, vector); +#endif + + /* + * Notably, we must use IDT dispatch for NMI when running in IDT mode. + * The FRED NMI context is significantly different and will not work + * right (specifically FRED fixed the NMI recursion issue). + */ + idt_entry_from_kvm(vector); +} +EXPORT_SYMBOL_FOR_KVM(x86_entry_from_kvm); +#endif diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S index 6ba2b3a..a56e043 100644 --- a/arch/x86/entry/entry.S +++ b/arch/x86/entry/entry.S @@ -75,3 +75,49 @@ THUNK warn_thunk_thunk, __warn_thunk #if defined(CONFIG_STACKPROTECTOR) && defined(CONFIG_SMP) EXPORT_SYMBOL(__ref_stack_chk_guard); #endif + +#if IS_ENABLED(CONFIG_KVM_INTEL) +.macro IDT_DO_EVENT_IRQOFF call_insn call_target + /* + * Unconditionally create a stack frame, getting the correct RSP on the + * stack (for x86-64) would take two instructions anyways, and RBP can + * be used to restore RSP to make objtool happy (see below). + */ + push %_ASM_BP + mov %_ASM_SP, %_ASM_BP + +#ifdef CONFIG_X86_64 + /* + * Align RSP to a 16-byte boundary (to emulate CPU behavior) before + * creating the synthetic interrupt stack frame for the IRQ/NMI. + */ + and $-16, %rsp + push $__KERNEL_DS + push %rbp +#endif + pushf + push $__KERNEL_CS + \call_insn \call_target + + /* + * "Restore" RSP from RBP, even though IRET has already unwound RSP to + * the correct value. objtool doesn't know the callee will IRET and, + * without the explicit restore, thinks the stack is getting walloped. + * Using an unwind hint is problematic due to x86-64's dynamic alignment. + */ + leave + RET +.endm + +.pushsection .text, "ax" +SYM_FUNC_START(idt_do_interrupt_irqoff) + IDT_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 +SYM_FUNC_END(idt_do_interrupt_irqoff) +.popsection + +.pushsection .noinstr.text, "ax" +SYM_FUNC_START(idt_do_nmi_irqoff) + IDT_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx +SYM_FUNC_END(idt_do_nmi_irqoff) +.popsection +#endif diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S index 894f7f1..0d2768a 100644 --- a/arch/x86/entry/entry_64_fred.S +++ b/arch/x86/entry/entry_64_fred.S @@ -147,5 +147,4 @@ SYM_FUNC_START(asm_fred_entry_from_kvm) RET =20 SYM_FUNC_END(asm_fred_entry_from_kvm) -EXPORT_SYMBOL_FOR_KVM(asm_fred_entry_from_kvm); #endif diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index ec95fe4..00aeae8 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -438,6 +438,10 @@ extern void idt_setup_traps(void); extern void idt_setup_apic_and_irq_gates(void); extern bool idt_is_f00f_address(unsigned long address); =20 +extern void idt_do_interrupt_irqoff(unsigned long address); +extern void idt_do_nmi_irqoff(void); +extern void idt_entry_from_kvm(unsigned int vector); + #ifdef CONFIG_X86_64 extern void idt_setup_early_pf(void); #else diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_d= efs.h index 7e6b931..2f2ce8a 100644 --- a/arch/x86/include/asm/desc_defs.h +++ b/arch/x86/include/asm/desc_defs.h @@ -145,7 +145,7 @@ struct gate_struct { typedef struct gate_struct gate_desc; =20 #ifndef _SETUP -static inline unsigned long gate_offset(const gate_desc *g) +static __always_inline unsigned long gate_offset(const gate_desc *g) { #ifdef CONFIG_X86_64 return g->offset_low | ((unsigned long)g->offset_middle << 16) | diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/ent= ry-common.h index 7535131..eca24b5 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -97,4 +97,6 @@ static __always_inline void arch_exit_to_user_mode(void) } #define arch_exit_to_user_mode arch_exit_to_user_mode =20 +extern void x86_entry_from_kvm(unsigned int entry_type, unsigned int vecto= r); + #endif diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h index 2bb6567..18a2f81 100644 --- a/arch/x86/include/asm/fred.h +++ b/arch/x86/include/asm/fred.h @@ -110,7 +110,6 @@ static __always_inline unsigned long fred_event_data(st= ruct pt_regs *regs) { ret static inline void cpu_init_fred_exceptions(void) { } static inline void cpu_init_fred_rsps(void) { } static inline void fred_complete_exception_setup(void) { } -static inline void fred_entry_from_kvm(unsigned int type, unsigned int vec= tor) { } static inline void fred_sync_rsp0(unsigned long rsp0) { } static inline void fred_update_rsp0(void) { } #endif /* CONFIG_X86_FRED */ diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c index 2604565..7bcf1de 100644 --- a/arch/x86/kernel/idt.c +++ b/arch/x86/kernel/idt.c @@ -268,6 +268,21 @@ void __init idt_setup_early_pf(void) } #endif =20 +#if IS_ENABLED(CONFIG_KVM_INTEL) +noinstr void idt_entry_from_kvm(unsigned int vector) +{ + if (vector =3D=3D NMI_VECTOR) + return idt_do_nmi_irqoff(); + + /* + * Only the NMI path requires noinstr. + */ + instrumentation_begin(); + idt_do_interrupt_irqoff(gate_offset(idt_table + vector)); + instrumentation_end(); +} +#endif + static void __init idt_map_in_cea(void) { /* diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 3d239ed..52a3afb 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -614,7 +614,6 @@ DEFINE_IDTENTRY_RAW(exc_nmi_kvm_vmx) { exc_nmi(regs); } -EXPORT_SYMBOL_FOR_KVM(asm_exc_nmi_kvm_vmx); #endif =20 #ifdef CONFIG_NMI_CHECK_CPU diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 8a481da..ff1f254 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -31,38 +31,6 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif =20 -.macro VMX_DO_EVENT_IRQOFF call_insn call_target - /* - * Unconditionally create a stack frame, getting the correct RSP on the - * stack (for x86-64) would take two instructions anyways, and RBP can - * be used to restore RSP to make objtool happy (see below). - */ - push %_ASM_BP - mov %_ASM_SP, %_ASM_BP - -#ifdef CONFIG_X86_64 - /* - * Align RSP to a 16-byte boundary (to emulate CPU behavior) before - * creating the synthetic interrupt stack frame for the IRQ/NMI. - */ - and $-16, %rsp - push $__KERNEL_DS - push %rbp -#endif - pushf - push $__KERNEL_CS - \call_insn \call_target - - /* - * "Restore" RSP from RBP, even though IRET has already unwound RSP to - * the correct value. objtool doesn't know the callee will IRET and, - * without the explicit restore, thinks the stack is getting walloped. - * Using an unwind hint is problematic due to x86-64's dynamic alignment. - */ - leave - RET -.endm - .section .noinstr.text, "ax" =20 /** @@ -320,10 +288,6 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL) =20 SYM_FUNC_END(__vmx_vcpu_run) =20 -SYM_FUNC_START(vmx_do_nmi_irqoff) - VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx -SYM_FUNC_END(vmx_do_nmi_irqoff) - #ifndef CONFIG_CC_HAS_ASM_GOTO_OUTPUT =20 /** @@ -375,13 +339,3 @@ SYM_FUNC_START(vmread_error_trampoline) RET SYM_FUNC_END(vmread_error_trampoline) #endif - -.section .text, "ax" - -#ifndef CONFIG_X86_FRED - -SYM_FUNC_START(vmx_do_interrupt_irqoff) - VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 -SYM_FUNC_END(vmx_do_interrupt_irqoff) - -#endif diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 49feecb..b9103de 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7117,9 +7117,6 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 = *eoi_exit_bitmap) vmcs_write64(EOI_EXIT_BITMAP3, eoi_exit_bitmap[3]); } =20 -void vmx_do_interrupt_irqoff(unsigned long entry); -void vmx_do_nmi_irqoff(void); - static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu) { /* @@ -7161,17 +7158,8 @@ static void handle_external_interrupt_irqoff(struct = kvm_vcpu *vcpu, "unexpected VM-Exit interrupt info: 0x%x", intr_info)) return; =20 - /* - * Invoke the kernel's IRQ handler for the vector. Use the FRED path - * when it's available even if FRED isn't fully enabled, e.g. even if - * FRED isn't supported in hardware, in order to avoid the indirect - * CALL in the non-FRED path. - */ kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ); - if (IS_ENABLED(CONFIG_X86_FRED)) - fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector); - else - vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector)= ); + x86_entry_from_kvm(EVENT_TYPE_EXTINT, vector); kvm_after_interrupt(vcpu); =20 vcpu->arch.at_instruction_boundary =3D true; @@ -7481,10 +7469,7 @@ noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu) return; =20 kvm_before_interrupt(vcpu, KVM_HANDLING_NMI); - if (cpu_feature_enabled(X86_FEATURE_FRED)) - fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR); - else - vmx_do_nmi_irqoff(); + x86_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR); kvm_after_interrupt(vcpu); } =20