From nobody Sat Feb 7 17:41:06 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D947725F98B for ; Fri, 23 Jan 2026 22:45:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208325; cv=none; b=V8LHslN7uckEFYDvKhdbYaZMDIFfNVnJtEQuU+JMRBwM4z4lSR0hO/hXn2YG7raOpYnSs9IxUjJEE+QbtVf8W52lCkmMzRNRVFoILi+J7XJWXCn6T/6PO6MY3aPgX/eL69H+BQeDPDwIPdOjcL23OJA63P52ghQ/BhliwalJlcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208325; c=relaxed/simple; bh=Ucvy3mCSRlDP1/idUF+uj0a+ZsMEHMK6vqwX5ekHvlE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=awscbPbYgbiLcNaZ49LI4Q4MdNLa7sfW2mOeAHb5eG+omzowgB7k5VHct0CELnFHi76hMO6xP0NshLf73SWmlVQG6YhhmPs2Yxe4oGR/2PTWBOAbGGhFFUpV+TQG3d3Yer860Ku23ZXbitJ80Xa5q0+Vx+C5zWlj5yERGZiF1hs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SNDPLbmA; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SNDPLbmA" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c503d6be76fso3874045a12.0 for ; Fri, 23 Jan 2026 14:45:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769208322; x=1769813122; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=6KEU0oyTWRASi8ApGU00J+KrwOMhi5teKUB83me0wcM=; b=SNDPLbmA3Yz+OIrK6XNKbCENJDnKp+lnBJJZ9S/cmhVR2eOTikoxPSotQePKmhVe8j p5P8c/Z1hkMHu6dUfJkd1x/ahBwv9Suzm81eSFOQ+sBBvTnsHQpeVvuY9D5Hh8ae40jG RVZHBE+2BtPJazCR+SWoM0wewjXK3oLvn6zKB9Ee/rRLfWfyH3Ejv2+CCf1JIIyzC0AG /GnOtpytmBc61pG2M1LCp0kukrxXBlfleOEGwhiIKTpMLKmA9RHC2NI2SMa0HDHPlOrB B+JB+Acvz28FHv92bKjHWoIUp9O3IZIHjIFNTUafLPaFS3bu9iQQbGDp5oa6bC5zQo+a NJTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769208322; x=1769813122; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6KEU0oyTWRASi8ApGU00J+KrwOMhi5teKUB83me0wcM=; b=qTt6gWwRI1ZFuc1uG8xxmBzAgo+1E4571gdtCPKmADyHropdrEDrIcfNLs6kjyVBd9 UbbBMqBGfbeiw+Gm3oeP7q7K65pLyiKv1i/195zpnUsCa3AbFcXyi7UzWV70WkWyiKwt oN7AAUIA6Gm+qY3YL/9LW6nzOAu86d39GGjm394yzCAnJiy20x7uXfT8jXOkLd3DVybH 6bx3FeHW61b0oSPm0jlxd8uOT8wAGBJ9px1PKU74B9GpJ5C1CegbDGS9KBO7uO0Tx56n ynYRxl/jkzdOKr3pvg9mEepxpoxW/3C6HU3eIPC3DA6GkdjQejjEV1QDp8ARtEpYRnyk XZEA== X-Forwarded-Encrypted: i=1; AJvYcCX5Xx9UklGcU16G9dxMf/aj/UUrSshxnNdtsASdhg4YJY/XoWVSJnevK5IppiYd4IQpXDaRlSj8/3v7Vk4=@vger.kernel.org X-Gm-Message-State: AOJu0YwZu3SwVYbF6eL2rUMZibXe5zLqD2VFgibP6Ek087rOQd0lJG3m wKBY13JsSLiO7lp/o9KTYeMeausmfCnI1bgmuD+Dj68ewtdno7Gr06WJf3l+npafMODbJIVb3Wl 8mV+qag== X-Received: from pgg15.prod.google.com ([2002:a05:6a02:4d8f:b0:bd9:a349:94a3]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:c704:b0:370:73c1:6a87 with SMTP id adf61e73a8af0-38e6f7f9940mr3774066637.58.1769208322115; Fri, 23 Jan 2026 14:45:22 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 23 Jan 2026 14:45:12 -0800 In-Reply-To: <20260123224514.2509129-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260123224514.2509129-1-seanjc@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260123224514.2509129-3-seanjc@google.com> Subject: [PATCH v2 2/4] KVM: SVM: Fix IRQ window inhibit handling across multiple vCPUs From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Naveen N Rao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" IRQ window inhibits can be requested by multiple vCPUs at the same time for injecting interrupts meant for different vCPUs. However, AVIC inhibition is VM-wide and hence it is possible for the inhibition to be cleared prematurely by the first vCPU that obtains the IRQ window even though a second vCPU is still waiting for its IRQ window. This is likely not a functional issue since the other vCPU will again see that interrupts are pending to be injected (due to KVM_REQ_EVENT), and will again request for an IRQ window inhibition. However, this can result in AVIC being rapidly toggled resulting in high contention on apicv_update_lock and degrading performance of the guest. Address this by maintaining a VM-wide count of the number of vCPUs that have requested for an IRQ window. Set/clear the inhibit reason when the count transitions between 0 and 1. This ensures that the inhibit reason is not cleared as long as there are some vCPUs still waiting for an IRQ window. Co-developed-by: Paolo Bonzini Signed-off-by: Paolo Bonzini Co-developed-by: Naveen N Rao (AMD) Signed-off-by: Naveen N Rao (AMD) Tested-by: Naveen N Rao (AMD) Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 19 ++++++++++++++++- arch/x86/kvm/svm/svm.c | 36 +++++++++++++++++++++++---------- arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.c | 19 +++++++++++++++++ 4 files changed, 63 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e441f270f354..b08baeff98b2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1427,6 +1427,7 @@ struct kvm_arch { struct kvm_pit *vpit; #endif atomic_t vapics_in_nmi_mode; + struct mutex apic_map_lock; struct kvm_apic_map __rcu *apic_map; atomic_t apic_map_dirty; @@ -1434,9 +1435,13 @@ struct kvm_arch { bool apic_access_memslot_enabled; bool apic_access_memslot_inhibited; =20 - /* Protects apicv_inhibit_reasons */ + /* + * Protects apicv_inhibit_reasons and apicv_nr_irq_window_req (with an + * asterisk, see kvm_inc_or_dec_irq_window_inhibit() for details). + */ struct rw_semaphore apicv_update_lock; unsigned long apicv_inhibit_reasons; + atomic_t apicv_nr_irq_window_req; =20 gpa_t wall_clock; =20 @@ -2309,6 +2314,18 @@ static inline void kvm_clear_apicv_inhibit(struct kv= m *kvm, kvm_set_or_clear_apicv_inhibit(kvm, reason, false); } =20 +void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc); + +static inline void kvm_inc_apicv_irq_window_req(struct kvm *kvm) +{ + kvm_inc_or_dec_irq_window_inhibit(kvm, true); +} + +static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm) +{ + kvm_inc_or_dec_irq_window_inhibit(kvm, false); +} + int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_= code, void *insn, int insn_len); void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg= ); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 24b9c2275821..559e8fa76b7e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3729,8 +3729,11 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bo= ol reinjected) * the case in which the interrupt window was requested while L1 was * active (the vCPU was not running nested). */ - if (!kvm_cpu_has_injectable_intr(vcpu) || is_guest_mode(vcpu)) - kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN); + if (svm->avic_irq_window && + (!kvm_cpu_has_injectable_intr(vcpu) || is_guest_mode(vcpu))) { + svm->avic_irq_window =3D false; + kvm_dec_apicv_irq_window_req(svm->vcpu.kvm); + } =20 trace_kvm_inj_virq(intr->nr, intr->soft, reinjected); ++vcpu->stat.irq_injections; @@ -3932,17 +3935,28 @@ static void svm_enable_irq_window(struct kvm_vcpu *= vcpu) */ if (vgif || gif_set(svm)) { /* - * IRQ window is not needed when AVIC is enabled, - * unless we have pending ExtINT since it cannot be injected - * via AVIC. In such case, KVM needs to temporarily disable AVIC, - * and fallback to injecting IRQ via V_IRQ. + * KVM only enables IRQ windows when AVIC is enabled if there's + * pending ExtINT since it cannot be injected via AVIC (ExtINT + * bypasses the local APIC). V_IRQ is ignored by hardware when + * AVIC is enabled, and so KVM needs to temporarily disable + * AVIC in order to detect when it's ok to inject the ExtINT. * - * If running nested, AVIC is already locally inhibited - * on this vCPU, therefore there is no need to request - * the VM wide AVIC inhibition. + * If running nested, AVIC is already locally inhibited on this + * vCPU (L2 vCPUs use a different MMU that never maps the AVIC + * backing page), therefore there is no need to increment the + * VM-wide AVIC inhibit. KVM will re-evaluate events when the + * vCPU exits to L1 and enable an IRQ window if the ExtINT is + * still pending. + * + * Note, the IRQ window inhibit needs to be updated even if + * AVIC is inhibited for a different reason, as KVM needs to + * keep AVIC inhibited if the other reason is cleared and there + * is still an injectable interrupt pending. */ - if (!is_guest_mode(vcpu)) - kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN); + if (enable_apicv && !svm->avic_irq_window && !is_guest_mode(vcpu)) { + svm->avic_irq_window =3D true; + kvm_inc_apicv_irq_window_req(vcpu->kvm); + } =20 svm_set_vintr(svm); } diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index ebd7b36b1ceb..68675b25ef8e 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -333,6 +333,7 @@ struct vcpu_svm { =20 bool guest_state_loaded; =20 + bool avic_irq_window; bool x2avic_msrs_intercepted; bool lbr_msrs_intercepted; =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8acfdfc583a1..2528dfffb42b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10994,6 +10994,25 @@ void kvm_set_or_clear_apicv_inhibit(struct kvm *kv= m, } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_or_clear_apicv_inhibit); =20 +void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc) +{ + int add =3D inc ? 1 : -1; + + if (!enable_apicv) + return; + + /* + * Strictly speaking, the lock is only needed if going 0->1 or 1->0, + * a la atomic_dec_and_mutex_lock. However, ExtINTs are rare and + * only target a single CPU, so that is the common case; do not + * bother eliding the down_write()/up_write() pair. + */ + guard(rwsem_write)(&kvm->arch.apicv_update_lock); + if (atomic_add_return(add, &kvm->arch.apicv_nr_irq_window_req) =3D=3D inc) + __kvm_set_or_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_IRQWIN, inc); +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inc_or_dec_irq_window_inhibit); + static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) { if (!kvm_apic_present(vcpu)) --=20 2.52.0.457.g6b5491de43-goog