From nobody Sat Feb 7 09:42:42 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E015726738B for ; Fri, 23 Jan 2026 22:45:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208322; cv=none; b=X8dnhNZnjnZRcSfykXZgWT1e+fRjyT0gFjKuFiMR9B3NztclU10esh2/mcEQRXFeJvA4YAwzjuFjEvd/35Ma03mQAHESDNm2H9OGsvQbfjR0SoBUaRFR8cucg9hR8Xw7chqJt0TsIyYjKip8crdHCkexZW3yfJwX81MluRwW32U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208322; c=relaxed/simple; bh=AiqWO8SpVo62e8X1WEFA3hq64LuUmaVr6HZPgUvbHnw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iIRFlqbiRmxvzG6O8+hDhr6HNws910chTHcTgRxV24iAc2NHbwnr1x0XEocE6axobyBrO1MTOuT6zAk/eLhGDFv/qubQT0Lp74TO0ser2Ls6qSqTivsERCqmyBNx6es2lHWaR+qSjK3yQpWJ8oDLeT8tUcu33gK/P4sfPVniWEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=pLbnHm7w; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pLbnHm7w" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-352e1a8603bso2611068a91.1 for ; Fri, 23 Jan 2026 14:45:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769208320; x=1769813120; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=DgWvnMczPCtvjTL6g2NKbG8isQMgrKZUrFDimvKoHZM=; b=pLbnHm7w7faLELo+w27aFoHI3BlzJE07On7PeezbqLkXUW6c7vsGnnTlQYuFLxATtd HGL6F3dTtFqSr9EbeOGa6gxIaBIt17YXjawf1LY0G8bAG3YDZLOxECOPcb5d4hnOueZJ jr33yfQBVIjmCEHytJcHUh4wwgmrkSwBDboZ+yJN8zRVXdJLUFb4G9cr7hW4eYz6nVES efsQEQrpxte3Fl7/FZgd4F3+Vlc3rXa+xqOcxk/h2GQtsObCHEvBCx6PEUIQqS/V8L+u JSLqUX50Pv5ecx6ffiJ5ULkKBKr1jxbiu5yuqdpLucOaGilrH6oZoUkc73NWBZU+vUTi ELWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769208320; x=1769813120; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DgWvnMczPCtvjTL6g2NKbG8isQMgrKZUrFDimvKoHZM=; b=Mg5NaqxnqxugvEUF6+0MdWVgczh1IyBdFczb82/EVKhnVI9XWn2/RZ4piVgrG6gP3w UEnlg2A80R6MuVFStz8tvu0fj54PgBUTHNFtm0PKR7P0IOJuy9mPRbEjw3OC/WZ+Ay9u Vt4Gbg35XKtsFYvUtMgfI5BnxMsHqvo7NgwVeo/ABciHCcTF5Zo4w8iJdASfg46oDNm+ 4/PFXdrzRGccdVM5pUdY3vn8TXNOfAfMMpoy8dIDpOg14ESrGIjVCDzevxBAbNNxID+o +K0G10xqGENxNzJ7jGPSHHNX/O5X7K4NbrWH+axAfmCRxq7VRQgqNYFRL1IbqolvqA1G g/Jg== X-Forwarded-Encrypted: i=1; AJvYcCUeeyIhHV32xXA6aBo5FX2WWzznj8rq2FsRqfjBQMYCcNKRu39m5ZWY4j5sFrpCe1F+rXhEsdsFVW6EOyM=@vger.kernel.org X-Gm-Message-State: AOJu0Yz+qf00o2hRiYswjLgqO230O7YILZ7bPKzncnn7MoMr64BjnxuZ OFfBX0cw/lWYpvZA3oBd1/0WxsqQQDKXB+xTUSaH1X1FjbaDSNVWGGzOlC/9+0t3HfSzo4DFauT Y+MVkpA== X-Received: from pjbmi1.prod.google.com ([2002:a17:90b:4b41:b0:34c:2f02:7f5d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3b4a:b0:33f:eca0:47c6 with SMTP id 98e67ed59e1d1-3536911dbefmr3269951a91.30.1769208320265; Fri, 23 Jan 2026 14:45:20 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 23 Jan 2026 14:45:11 -0800 In-Reply-To: <20260123224514.2509129-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260123224514.2509129-1-seanjc@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260123224514.2509129-2-seanjc@google.com> Subject: [PATCH v2 1/4] KVM: SVM: Fix clearing IRQ window inhibit with nested guests From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Naveen N Rao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Clearing IRQ window inhibit today relies on interrupt window interception, but that is not always reachable when nested guests are involved. If L1 is intercepting IRQs, then interrupt_window_interception() will never be reached while L2 is active, because the only reason KVM would set the V_IRQ intercept in vmcb02 would be on behalf of L1, i.e. because of vmcb12. svm_clear_vintr() always operates on (at least) vmcb01, and VMRUN unconditionally sets GIF=3D1, which means that enter_svm_guest_mode() will always do svm_clear_vintr() via svm_set_gif(svm, true). I.e. KVM will keep the VM-wide inhibit set until control transfers back to L1 *and* an interrupt window is triggered. If L1 is not intercepting IRQs, KVM may immediately inject L1's ExtINT into L2 if IRQs are enabled in L2 without taking an interrupt window interception. Address this by clearing the IRQ window inhibit when KVM actually injects an interrupt and there are no further injectable interrupts. That way, if L1 isn't intercepting IRQs, KVM will drop the inhibit as soon as an interrupt is injected into L2. And if L1 is intercepting IRQs, KVM will keep the inhibit until the IRQ is injected into L2. So, AVIC won't be left inhibited. Note, somewhat blindly invoking kvm_clear_apicv_inhibit() is both wrong and suboptimal. If the IRQWIN inhibit isn't set, then the vCPU will unnecessarily take apicv_update_lock for write. And if a _different_ vCPU has an injectable IRQ, clearing IRQWIN may block that vCPU's ability to inject its IRQ. Defer fixing both issues to a future commit, as fixing one problem without also fixing the other would also leave KVM in a temporarily bad state, as would fixing both issues without fixing _this_ bug. I.e. it's not feasible to fix each bug independently without there being some remaining flaw in KVM. Co-developed-by: Naveen N Rao (AMD) Signed-off-by: Naveen N Rao (AMD) Tested-by: Naveen N Rao (AMD) Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/svm.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7803d2781144..24b9c2275821 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3130,20 +3130,6 @@ static int interrupt_window_interception(struct kvm_= vcpu *vcpu) kvm_make_request(KVM_REQ_EVENT, vcpu); svm_clear_vintr(to_svm(vcpu)); =20 - /* - * If not running nested, for AVIC, the only reason to end up here is Ext= INTs. - * In this case AVIC was temporarily disabled for - * requesting the IRQ window and we have to re-enable it. - * - * If running nested, still remove the VM wide AVIC inhibit to - * support case in which the interrupt window was requested when the - * vCPU was not running nested. - - * All vCPUs which run still run nested, will remain to have their - * AVIC still inhibited due to per-cpu AVIC inhibition. - */ - kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN); - ++vcpu->stat.irq_window_exits; return 1; } @@ -3732,6 +3718,20 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bo= ol reinjected) type =3D SVM_EVTINJ_TYPE_INTR; } =20 + /* + * If AVIC was inhibited in order to detect an IRQ window, and there's + * no other injectable interrupts pending or L2 is active (see below), + * then drop the inhibit as the window has served its purpose. + * + * If L2 is active, this path is reachable if L1 is not intercepting + * IRQs, i.e. if KVM is injecting L1 IRQs into L2. AVIC is locally + * inhibited while L2 is active; drop the VM-wide inhibit to optimize + * the case in which the interrupt window was requested while L1 was + * active (the vCPU was not running nested). + */ + if (!kvm_cpu_has_injectable_intr(vcpu) || is_guest_mode(vcpu)) + kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN); + trace_kvm_inj_virq(intr->nr, intr->soft, reinjected); ++vcpu->stat.irq_injections; =20 --=20 2.52.0.457.g6b5491de43-goog From nobody Sat Feb 7 09:42:42 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D947725F98B for ; Fri, 23 Jan 2026 22:45:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208325; cv=none; b=V8LHslN7uckEFYDvKhdbYaZMDIFfNVnJtEQuU+JMRBwM4z4lSR0hO/hXn2YG7raOpYnSs9IxUjJEE+QbtVf8W52lCkmMzRNRVFoILi+J7XJWXCn6T/6PO6MY3aPgX/eL69H+BQeDPDwIPdOjcL23OJA63P52ghQ/BhliwalJlcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208325; c=relaxed/simple; bh=Ucvy3mCSRlDP1/idUF+uj0a+ZsMEHMK6vqwX5ekHvlE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=awscbPbYgbiLcNaZ49LI4Q4MdNLa7sfW2mOeAHb5eG+omzowgB7k5VHct0CELnFHi76hMO6xP0NshLf73SWmlVQG6YhhmPs2Yxe4oGR/2PTWBOAbGGhFFUpV+TQG3d3Yer860Ku23ZXbitJ80Xa5q0+Vx+C5zWlj5yERGZiF1hs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SNDPLbmA; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SNDPLbmA" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c503d6be76fso3874045a12.0 for ; Fri, 23 Jan 2026 14:45:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769208322; x=1769813122; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=6KEU0oyTWRASi8ApGU00J+KrwOMhi5teKUB83me0wcM=; b=SNDPLbmA3Yz+OIrK6XNKbCENJDnKp+lnBJJZ9S/cmhVR2eOTikoxPSotQePKmhVe8j p5P8c/Z1hkMHu6dUfJkd1x/ahBwv9Suzm81eSFOQ+sBBvTnsHQpeVvuY9D5Hh8ae40jG RVZHBE+2BtPJazCR+SWoM0wewjXK3oLvn6zKB9Ee/rRLfWfyH3Ejv2+CCf1JIIyzC0AG /GnOtpytmBc61pG2M1LCp0kukrxXBlfleOEGwhiIKTpMLKmA9RHC2NI2SMa0HDHPlOrB B+JB+Acvz28FHv92bKjHWoIUp9O3IZIHjIFNTUafLPaFS3bu9iQQbGDp5oa6bC5zQo+a NJTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769208322; x=1769813122; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6KEU0oyTWRASi8ApGU00J+KrwOMhi5teKUB83me0wcM=; b=qTt6gWwRI1ZFuc1uG8xxmBzAgo+1E4571gdtCPKmADyHropdrEDrIcfNLs6kjyVBd9 UbbBMqBGfbeiw+Gm3oeP7q7K65pLyiKv1i/195zpnUsCa3AbFcXyi7UzWV70WkWyiKwt oN7AAUIA6Gm+qY3YL/9LW6nzOAu86d39GGjm394yzCAnJiy20x7uXfT8jXOkLd3DVybH 6bx3FeHW61b0oSPm0jlxd8uOT8wAGBJ9px1PKU74B9GpJ5C1CegbDGS9KBO7uO0Tx56n ynYRxl/jkzdOKr3pvg9mEepxpoxW/3C6HU3eIPC3DA6GkdjQejjEV1QDp8ARtEpYRnyk XZEA== X-Forwarded-Encrypted: i=1; AJvYcCX5Xx9UklGcU16G9dxMf/aj/UUrSshxnNdtsASdhg4YJY/XoWVSJnevK5IppiYd4IQpXDaRlSj8/3v7Vk4=@vger.kernel.org X-Gm-Message-State: AOJu0YwZu3SwVYbF6eL2rUMZibXe5zLqD2VFgibP6Ek087rOQd0lJG3m wKBY13JsSLiO7lp/o9KTYeMeausmfCnI1bgmuD+Dj68ewtdno7Gr06WJf3l+npafMODbJIVb3Wl 8mV+qag== X-Received: from pgg15.prod.google.com ([2002:a05:6a02:4d8f:b0:bd9:a349:94a3]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:c704:b0:370:73c1:6a87 with SMTP id adf61e73a8af0-38e6f7f9940mr3774066637.58.1769208322115; Fri, 23 Jan 2026 14:45:22 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 23 Jan 2026 14:45:12 -0800 In-Reply-To: <20260123224514.2509129-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260123224514.2509129-1-seanjc@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260123224514.2509129-3-seanjc@google.com> Subject: [PATCH v2 2/4] KVM: SVM: Fix IRQ window inhibit handling across multiple vCPUs From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Naveen N Rao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" IRQ window inhibits can be requested by multiple vCPUs at the same time for injecting interrupts meant for different vCPUs. However, AVIC inhibition is VM-wide and hence it is possible for the inhibition to be cleared prematurely by the first vCPU that obtains the IRQ window even though a second vCPU is still waiting for its IRQ window. This is likely not a functional issue since the other vCPU will again see that interrupts are pending to be injected (due to KVM_REQ_EVENT), and will again request for an IRQ window inhibition. However, this can result in AVIC being rapidly toggled resulting in high contention on apicv_update_lock and degrading performance of the guest. Address this by maintaining a VM-wide count of the number of vCPUs that have requested for an IRQ window. Set/clear the inhibit reason when the count transitions between 0 and 1. This ensures that the inhibit reason is not cleared as long as there are some vCPUs still waiting for an IRQ window. Co-developed-by: Paolo Bonzini Signed-off-by: Paolo Bonzini Co-developed-by: Naveen N Rao (AMD) Signed-off-by: Naveen N Rao (AMD) Tested-by: Naveen N Rao (AMD) Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 19 ++++++++++++++++- arch/x86/kvm/svm/svm.c | 36 +++++++++++++++++++++++---------- arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.c | 19 +++++++++++++++++ 4 files changed, 63 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e441f270f354..b08baeff98b2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1427,6 +1427,7 @@ struct kvm_arch { struct kvm_pit *vpit; #endif atomic_t vapics_in_nmi_mode; + struct mutex apic_map_lock; struct kvm_apic_map __rcu *apic_map; atomic_t apic_map_dirty; @@ -1434,9 +1435,13 @@ struct kvm_arch { bool apic_access_memslot_enabled; bool apic_access_memslot_inhibited; =20 - /* Protects apicv_inhibit_reasons */ + /* + * Protects apicv_inhibit_reasons and apicv_nr_irq_window_req (with an + * asterisk, see kvm_inc_or_dec_irq_window_inhibit() for details). + */ struct rw_semaphore apicv_update_lock; unsigned long apicv_inhibit_reasons; + atomic_t apicv_nr_irq_window_req; =20 gpa_t wall_clock; =20 @@ -2309,6 +2314,18 @@ static inline void kvm_clear_apicv_inhibit(struct kv= m *kvm, kvm_set_or_clear_apicv_inhibit(kvm, reason, false); } =20 +void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc); + +static inline void kvm_inc_apicv_irq_window_req(struct kvm *kvm) +{ + kvm_inc_or_dec_irq_window_inhibit(kvm, true); +} + +static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm) +{ + kvm_inc_or_dec_irq_window_inhibit(kvm, false); +} + int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_= code, void *insn, int insn_len); void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg= ); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 24b9c2275821..559e8fa76b7e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3729,8 +3729,11 @@ static void svm_inject_irq(struct kvm_vcpu *vcpu, bo= ol reinjected) * the case in which the interrupt window was requested while L1 was * active (the vCPU was not running nested). */ - if (!kvm_cpu_has_injectable_intr(vcpu) || is_guest_mode(vcpu)) - kvm_clear_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN); + if (svm->avic_irq_window && + (!kvm_cpu_has_injectable_intr(vcpu) || is_guest_mode(vcpu))) { + svm->avic_irq_window =3D false; + kvm_dec_apicv_irq_window_req(svm->vcpu.kvm); + } =20 trace_kvm_inj_virq(intr->nr, intr->soft, reinjected); ++vcpu->stat.irq_injections; @@ -3932,17 +3935,28 @@ static void svm_enable_irq_window(struct kvm_vcpu *= vcpu) */ if (vgif || gif_set(svm)) { /* - * IRQ window is not needed when AVIC is enabled, - * unless we have pending ExtINT since it cannot be injected - * via AVIC. In such case, KVM needs to temporarily disable AVIC, - * and fallback to injecting IRQ via V_IRQ. + * KVM only enables IRQ windows when AVIC is enabled if there's + * pending ExtINT since it cannot be injected via AVIC (ExtINT + * bypasses the local APIC). V_IRQ is ignored by hardware when + * AVIC is enabled, and so KVM needs to temporarily disable + * AVIC in order to detect when it's ok to inject the ExtINT. * - * If running nested, AVIC is already locally inhibited - * on this vCPU, therefore there is no need to request - * the VM wide AVIC inhibition. + * If running nested, AVIC is already locally inhibited on this + * vCPU (L2 vCPUs use a different MMU that never maps the AVIC + * backing page), therefore there is no need to increment the + * VM-wide AVIC inhibit. KVM will re-evaluate events when the + * vCPU exits to L1 and enable an IRQ window if the ExtINT is + * still pending. + * + * Note, the IRQ window inhibit needs to be updated even if + * AVIC is inhibited for a different reason, as KVM needs to + * keep AVIC inhibited if the other reason is cleared and there + * is still an injectable interrupt pending. */ - if (!is_guest_mode(vcpu)) - kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_IRQWIN); + if (enable_apicv && !svm->avic_irq_window && !is_guest_mode(vcpu)) { + svm->avic_irq_window =3D true; + kvm_inc_apicv_irq_window_req(vcpu->kvm); + } =20 svm_set_vintr(svm); } diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index ebd7b36b1ceb..68675b25ef8e 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -333,6 +333,7 @@ struct vcpu_svm { =20 bool guest_state_loaded; =20 + bool avic_irq_window; bool x2avic_msrs_intercepted; bool lbr_msrs_intercepted; =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8acfdfc583a1..2528dfffb42b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10994,6 +10994,25 @@ void kvm_set_or_clear_apicv_inhibit(struct kvm *kv= m, } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_or_clear_apicv_inhibit); =20 +void kvm_inc_or_dec_irq_window_inhibit(struct kvm *kvm, bool inc) +{ + int add =3D inc ? 1 : -1; + + if (!enable_apicv) + return; + + /* + * Strictly speaking, the lock is only needed if going 0->1 or 1->0, + * a la atomic_dec_and_mutex_lock. However, ExtINTs are rare and + * only target a single CPU, so that is the common case; do not + * bother eliding the down_write()/up_write() pair. + */ + guard(rwsem_write)(&kvm->arch.apicv_update_lock); + if (atomic_add_return(add, &kvm->arch.apicv_nr_irq_window_req) =3D=3D inc) + __kvm_set_or_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_IRQWIN, inc); +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_inc_or_dec_irq_window_inhibit); + static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) { if (!kvm_apic_present(vcpu)) --=20 2.52.0.457.g6b5491de43-goog From nobody Sat Feb 7 09:42:42 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF9862DBF40 for ; Fri, 23 Jan 2026 22:45:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208326; cv=none; b=lRPjeCNK1Oa6FoQRpbiJKKNNUi9s0hbfMFyRoxJ9Vn0dUkc5iao7IGzggQvSL/nKdnk8d+K3KjBmSRn39fwt0QOGZcgpUijvWjQgxkypzfFvB7EbyEnKNtWqpYLUcLDoZ0OVymrVmt6q8o7AH/zVh3W8kolJFKhcf3YAQuVo/qU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208326; c=relaxed/simple; bh=9V0ElE/LIA1HrnEsfs19cXITptMFm7vKADHdbabnT6g=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rX7DhKQzKUVQe08qCb+xMxVdADSXGsVfJyh7H6xzhzDeVM4ftHIr1K/dt7Khvu9GNICPbR+rcRvt92wfUtm1E5Lrr/vUZQp/ZNDbHm/x0rtvJG7w48uG1nb6APJqlCVDJzcOhO+MOhRRjUm6oUwIVnzZsq41dwnv7PVA5XR3Ztk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kgOFkDQC; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kgOFkDQC" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-29f25e494c2so30103635ad.0 for ; Fri, 23 Jan 2026 14:45:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769208324; x=1769813124; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=GTDRhMyxfxwDC5vc3ZJ39I5CBO2ClzgRooXF3e1HA1E=; b=kgOFkDQCh1fGT/Yaa14+F/mCPRSBezB8xxjy9L3rbye66+Z5QX2TLcDCx6eeY0Ihxh m1B4RXOwHXHR1ITkmthKacVyMnXdzfodXrYqsnIXcX57hCGyTQIrjxeVlXqPWkZdRS1E TpgLxdfgyQmjLzCu+LalPM2pdwikebLYK8LxreuiLSP8Izbb/WlNxKCYK0pcbB28UytY fIEnTsP5tsafjXQwTVpjKFnIIvUhjW8XyylntH/H5FkXr8HY737fjqRvZtUIqmNRkR2V 6wwSQ3HLcWgLaH5zF8DGOJLTnfsIHoyzfOWME6NZ8/4lVrXCQzOAh7nhxAfMgxt3lQKi Q+tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769208324; x=1769813124; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GTDRhMyxfxwDC5vc3ZJ39I5CBO2ClzgRooXF3e1HA1E=; b=XWIBZLjKqXsLPEQl0WbN6PQbH2/Vm+ibJvmcYasKPeSWSMT0mcrSgK83Lqz7UE/Iik 0tzx+tJqgWUhS0y8iAPIMGMeybGfgtIXK/AwIuVF3SbCZ9v4XKJ1TJGKmoiExiwyzKKv Pbm8e6rxjNz70zgXtoSl+phOxy1ILMPIONTCxnUEvdXFlsqGA731JRsBP9YEjUALyQjr Qzp+Hfj2rGUy0ISJJ+Y21Hk2b/sU+pRsAuXud7oAZoib9sRm6jHpnferhMeR2fjGlmzW 4fnR9MPI7ME3KA6c97T6lzeTY748jH+OVNPU/Rs9BnBBTWyTNYm37LlYzTZpRZ3XKjKz DopQ== X-Forwarded-Encrypted: i=1; AJvYcCUIfnCy86tUVy4XdO7Sy9N1QTzg/lVZcw1Az367plOeb5Fup4bRgeB1Ku3C7bP9cXACqq77QVkDMr2uqHM=@vger.kernel.org X-Gm-Message-State: AOJu0Yw5xnerXUFCtnKgJyw6vPXSQC1hfrCd4lRy4tWSQTT5Y7ihPDkf UxZtmhqxarGFVQYbMZmC0k5oVqiewc24sDXPdi1MVdj+5gi4fDI/6qf8Ft34tHcRjYlncOWCtQr qeyu6hg== X-Received: from pjc6.prod.google.com ([2002:a17:90b:2f46:b0:34c:1d76:2fe9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e744:b0:2a7:cb46:7069 with SMTP id d9443c01a7336-2a7d2fbaf36mr61685695ad.25.1769208323961; Fri, 23 Jan 2026 14:45:23 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 23 Jan 2026 14:45:13 -0800 In-Reply-To: <20260123224514.2509129-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260123224514.2509129-1-seanjc@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260123224514.2509129-4-seanjc@google.com> Subject: [PATCH v2 3/4] KVM: SVM: Optimize IRQ window inhibit handling From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Naveen N Rao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" IRQ windows represent times during which an IRQ can be injected into a vCPU, and thus represent times when a vCPU is running with RFLAGS.IF=3D1 and GIF enabled (TPR/PPR don't matter since KVM controls interrupt injection and it only injects one interrupt at a time). On SVM, when emulating the local APIC (i.e., AVIC disabled), KVM detects IRQ windows by injecting a dummy virtual interrupt through VMCB.V_IRQ and intercepting virtual interrupts (INTERCEPT_VINTR). This intercept triggers as soon as the guest enables interrupts and is about to take the dummy interrupt, at which point the actual interrupt can be injected through VMCB.EVENTINJ. When AVIC is enabled, VMCB.V_IRQ is ignored by the hardware and so detecting IRQ windows requires AVIC to be inhibited. However, this is only necessary for ExtINTs since all other interrupts can be injected either by directly setting IRR in the APIC backing page and letting the AVIC hardware inject the interrupt into the guest, or via VMCB.V_NMI for NMIs. If AVIC is enabled but inhibited for some other reason, KVM has to request for IRQ window inhibits every time it has to inject an interrupt into the guest. This is because APICv inhibits are dynamic in nature, so KVM has to be sure that AVIC is inhibited for purposes of discovering an IRQ window even if the other inhibit is cleared in the meantime. This is particularly problematic with APICV_INHIBIT_REASON_PIT_REINJ which stays set throughout the life of the guest and results in KVM rapidly toggling IRQ window inhibit resulting in contention on apicv_update_lock. Address this by setting and clearing APICV_INHIBIT_REASON_PIT_REINJ lazily: if some other inhibit reason is already set, just increment the IRQ window request count and do not update apicv_inhibit_reasons immediately. If any other inhibit reason is set/cleared in the meantime, re-evaluate APICV_INHIBIT_REASON_PIT_REINJ by checking the IRQ window request count and update apicv_inhibit_reasons appropriately. Otherwise, just the IRQ window request count is incremented/decremented each time an IRQ window is requested. This reduces much of the contention on the apicv_update_lock semaphore and does away with much of the performance degradation. Co-developed-by: Paolo Bonzini Signed-off-by: Paolo Bonzini Co-developed-by: Naveen N Rao (AMD) Signed-off-by: Naveen N Rao (AMD) Tested-by: Naveen N Rao (AMD) Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2528dfffb42b..822644d23933 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10953,7 +10953,11 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *= kvm, =20 old =3D new =3D kvm->arch.apicv_inhibit_reasons; =20 - set_or_clear_apicv_inhibit(&new, reason, set); + if (reason !=3D APICV_INHIBIT_REASON_IRQWIN) + set_or_clear_apicv_inhibit(&new, reason, set); + + set_or_clear_apicv_inhibit(&new, APICV_INHIBIT_REASON_IRQWIN, + atomic_read(&kvm->arch.apicv_nr_irq_window_req)); =20 if (!!old !=3D !!new) { /* @@ -11001,6 +11005,26 @@ void kvm_inc_or_dec_irq_window_inhibit(struct kvm = *kvm, bool inc) if (!enable_apicv) return; =20 + /* + * IRQ windows are requested either because of ExtINT injections, or + * because APICv is already disabled/inhibited for another reason. + * While ExtINT injections are rare and should not happen while the + * vCPU is running its actual workload, it's worth avoiding thrashing + * if the IRQ window is being requested because APICv is already + * inhibited. So, toggle the actual inhibit (which requires taking + * the lock for write) if and only if there's no other inhibit. + * kvm_set_or_clear_apicv_inhibit() always evaluates the IRQ window + * count; thus the IRQ window inhibit call _will_ be lazily updated on + * the next call, if it ever happens. + */ + if (READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~BIT(APICV_INHIBIT_REASO= N_IRQWIN)) { + guard(rwsem_read)(&kvm->arch.apicv_update_lock); + if (READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~BIT(APICV_INHIBIT_REAS= ON_IRQWIN)) { + atomic_add(add, &kvm->arch.apicv_nr_irq_window_req); + return; + } + } + /* * Strictly speaking, the lock is only needed if going 0->1 or 1->0, * a la atomic_dec_and_mutex_lock. However, ExtINTs are rare and --=20 2.52.0.457.g6b5491de43-goog From nobody Sat Feb 7 09:42:42 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0562283C89 for ; Fri, 23 Jan 2026 22:45:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208333; cv=none; b=tlaWWbw0X1y4+FCFvIEqSJ/b3xq0F+5oFBPfjnbrzOkb/dj/TPBylP4u8dgqHP243OxRjxYzZcyQJMmHyyMn2uJVUGpOEl9sQ2XeaqA8AUu0qcD90gxGTGaAVFAzC6DfoKAA44AA1QKpHfjk2J8yDf8eDrywBbB9JQ2OURhBwr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769208333; c=relaxed/simple; bh=Ccvg+AYwTTpgd358fTNaOhaa4kNk7pNmtPUuSx54YI8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=R90j4sNFMV6E/DiltFPGNgYqwbzDYbXtUK4BShQLEAseGR/ts7puSC/FCsha0bRj98/d3cc6vyXpmzdmS6lvZLQRvIod8KrNJlItzguQ/FMYLb4XMyvK1WSqNX00/lUqEYRhc9nOygO/BgYVFVLn4uLpYcx4ReF7D1vaWohhDu4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CplhEJpb; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CplhEJpb" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a7d7b87977so21671495ad.0 for ; Fri, 23 Jan 2026 14:45:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769208326; x=1769813126; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Na3qF+OZinsVySntYS5W+c40/G1z3qAxcjUlnhMcrIQ=; b=CplhEJpbs/vQQlqQIyhUpnxr/sWGsnF57+S3Yf4J1PaY9tvkfn3IixgRPqZ5kn+XeK bGXJ7hmLfkyr9Ft4eyVk89ZApPujIkD3KsEvVAGvpjwM66bVLlx/CA2h3nCsdURA9XiU qjMbqj43xa6ed0K7P+ZwOWYcN/OxG0NMyP/FSBTZ2tO1741DI5eBt2XJdghcWA+QonHC YIt5dsXP+UNjwQnKADFw6HVGycj7X1TquVd81eIfrGw50547C3rGJ7T0mBapVt25u4L0 cMK9PieDGU6ExXGuehcbpIkN/L9FvP+Gwm9h0uko8tUmKqscQ7O5a3TOYN9chLfIspt4 3PBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769208326; x=1769813126; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Na3qF+OZinsVySntYS5W+c40/G1z3qAxcjUlnhMcrIQ=; b=NJXpZltNMUogzQy0Hc77hE6uXZ1EFrP7yKL4gY4fv5jNzjVpsXGf1uTkDmwA+md+wW +8/4y4CgAKu9et19Rb9WycfEFhsyY6GV2/WyqHj35FtecmIp9AcFN2eGX5YCDeYwOQPL naXWUs4YrTyZFDTKdblxEnbeXHq+kDNcQ2b4qpnUL3K0uv9yLcGMrQV4HcRX+BbkuNgK 1Ae+/761+Pj5V2JwIMAVdTkyIxfGCin5lmJWxCo+giqgPyk9zbvBCekGxuP4ZjZ8JFO2 gld57Rxb4iTU3gONjaj054Ezy4zT+S9BUWIlph/p20UG57SDZwUCr1ozxaTtaoXc9V5v q0TQ== X-Forwarded-Encrypted: i=1; AJvYcCU/ta29Y4O+Y+5xSEhwdHhtz6d5ZWn5Ar6R4endi/P76EV7VSnzy5qX4ImoulXqGQZyFtOAVEfzQDHv0SY=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9Px8YoHi/hu4HeSWUCuB2I2qF5tFALokmmLbYrOh1RCdUXkO4 RvGT8NxNRmvLj+FMsu29wFAlfY9ZRbPEOzCwVabE1QMHFkrRETUUOlrmGzc0LyEcidRlX2kZJRI 2G7HtfQ== X-Received: from plhy12.prod.google.com ([2002:a17:902:d64c:b0:29f:26e9:4ade]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:98b:b0:295:b46f:a6c2 with SMTP id d9443c01a7336-2a7fe625074mr37894025ad.37.1769208325807; Fri, 23 Jan 2026 14:45:25 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 23 Jan 2026 14:45:14 -0800 In-Reply-To: <20260123224514.2509129-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260123224514.2509129-1-seanjc@google.com> X-Mailer: git-send-email 2.52.0.457.g6b5491de43-goog Message-ID: <20260123224514.2509129-5-seanjc@google.com> Subject: [PATCH v2 4/4] KVM: Isolate apicv_update_lock and apicv_nr_irq_window_req in a cacheline From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Naveen N Rao Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Force apicv_update_lock and apicv_nr_irq_window_req to reside in their own cacheline to avoid generating significant contention due to false sharing when KVM is contantly creating IRQ windows. E.g. apicv_inhibit_reasons is read on every VM-Enter; disabled_exits is read on page faults, on PAUSE exits, if a vCPU is scheduled out, etc.; kvmclock_offset is read every time a vCPU needs to refresh kvmclock, and so on and so forth. Isolating the write-mostly fields from all other (read-mostly) fields improves performance by 7-8% when running netperf TCP_RR between two guests on the same physical host when using an in-kernel PIT in re-inject mode. Reported-by: Naveen N Rao (AMD) Closes: https://lore.kernel.org/all/yrxhngndj37edud6tj5y3vunaf7nirwor4n63yf= 4275wdocnd3@c77ujgialc6r Tested-by: Naveen N Rao (AMD) Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index b08baeff98b2..8a9f797b6a68 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1435,13 +1435,23 @@ struct kvm_arch { bool apic_access_memslot_enabled; bool apic_access_memslot_inhibited; =20 + /* + * Force apicv_update_lock and apicv_nr_irq_window_req to reside in a + * dedicated cacheline. They are write-mostly, whereas most everything + * else in kvm_arch is read-mostly. Note that apicv_inhibit_reasons is + * read-mostly: toggling VM-wide inhibits is rare; _checking_ for + * inhibits is common. + */ + ____cacheline_aligned /* * Protects apicv_inhibit_reasons and apicv_nr_irq_window_req (with an * asterisk, see kvm_inc_or_dec_irq_window_inhibit() for details). */ struct rw_semaphore apicv_update_lock; - unsigned long apicv_inhibit_reasons; atomic_t apicv_nr_irq_window_req; + ____cacheline_aligned + + unsigned long apicv_inhibit_reasons; =20 gpa_t wall_clock; =20 --=20 2.52.0.457.g6b5491de43-goog