From nobody Sat Jun 13 07:52:24 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DF2B3D8122; Fri, 8 May 2026 18:17:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264263; cv=none; b=E6Aj+QUJ+KDqGBkHbbUgUFTQ6VzpD5vpB7T/4SUOfPKTpZLcWOock8LHq0DMVCQDVFA44/t443Zuh+1FKX8HKyGEIV10PJj63ox8qE1awooAbNkgY3DT9BhgepiKwzQEydBXqSWoUGVXCK0tl6qzLeuCHNdYRm976Y8xLvH4t0c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264263; c=relaxed/simple; bh=2ACizhkl4W/Aai3iUn5Na6hUxUGCawJ8JBkYrlgzOAQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uIS7P1GPtMv37gkdZvIXz+7d+PGq7mAsGJ9j6xPbFNYRdOWqbyxvRX+wK83rAh1U14pOc/tvrkTVSMFQ/eMGDCK9STkj55e+of4gE6PRZk+XUgiDkaQ71POCqamO6ZLlaXqUTHuVsmZCXi4V5//iXn75PKcTFlzitt4sZPvTbBs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=a03sg80Y; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="a03sg80Y" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=FQOlSxV2N2SZxsJanRrE5lx7UIrWEeyRThnKAatYFck=; b=a03sg80YA+skTAfvvdbx/mOu6c jVfRWw2U0oJJ05+3OFSD/1AjPot4c0pwpTJUN1OgsnPEcSU95vWFaeKVsd7DDBEphoTJK0mzKC9s0 /UF+OQl9WyXxmATyBqsIESmPdfwtXG33lziwIH9emTVs2dc+hFZCSxl8QJV0QEe0/zbzPKZ9QBelg s+TQzPlmhcaYDfDEU2HHk1Tv4hcDu9aX8HNVHBit7wk2myTXMRSkG3754eNXd2qj+7YJ9LZxntF8g tKNztP9zzXfZb7rz5U2Ob/qTHWE3Wb/hXlFyI7o4tVbLvdqDT6zIu3uACmwSK5FqbQkgpY5pEQeVR 8qLu4eNg==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlH-00000004XFE-1L2g; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYYd-03S5; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 1/7] locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock() Date: Fri, 8 May 2026 19:10:03 +0100 Message-ID: <20260508181717.3230988-2-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse __rwbase_read_unlock() uses raw_spin_lock_irq()/raw_spin_unlock_irq() which unconditionally disables and re-enables interrupts. When read_unlock() is called from hardirq context (e.g. after a successful read_trylock() in a timer callback), the raw_spin_unlock_irq() incorrectly re-enables interrupts within the hardirq handler. This causes lockdep warnings ('hardirqs_on_prepare' from hardirq context) and can lead to IRQ state corruption. Using read_trylock() in hardirq context on PREEMPT_RT is safe because it does not record the lock owner. The read_unlock() acquires the wait_lock which is hardirq safe. This change additionally allows rwlock_t during early boot. Switch to raw_spin_lock_irqsave()/raw_spin_unlock_irqrestore() to preserve the caller's IRQ state. Signed-off-by: David Woodhouse Reviewed-by: Sebastian Andrzej Siewior --- kernel/locking/rwbase_rt.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 82e078c0665a..25744862d627 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -153,8 +153,9 @@ static void __sched __rwbase_read_unlock(struct rwbase_= rt *rwb, struct rt_mutex_base *rtm =3D &rwb->rtmutex; struct task_struct *owner; DEFINE_RT_WAKE_Q(wqh); + unsigned long flags; =20 - raw_spin_lock_irq(&rtm->wait_lock); + raw_spin_lock_irqsave(&rtm->wait_lock, flags); /* * Wake the writer, i.e. the rtmutex owner. It might release the * rtmutex concurrently in the fast path (due to a signal), but to @@ -167,7 +168,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_= rt *rwb, =20 /* Pairs with the preempt_enable in rt_mutex_wake_up_q() */ preempt_disable(); - raw_spin_unlock_irq(&rtm->wait_lock); + raw_spin_unlock_irqrestore(&rtm->wait_lock, flags); rt_mutex_wake_up_q(&wqh); } =20 --=20 2.51.0 From nobody Sat Jun 13 07:52:24 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B97E3D7D70; Fri, 8 May 2026 18:17:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264263; cv=none; b=jExWYWxQ6Oe34PrC0RdZJ/9zNx7FAEx9IDWKARQigfPfvpPAynMk++TS5JWeWsgfq9WG4uZ1647YnCvNsyknM7RGJm83HOQy7MUNv+X+LhYIyKEHABOZXSGMEPRZIUPT5cr8rUBF+++TFWY4NW7AgDfKTpsiVhAX3JJWf0Vfsqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264263; c=relaxed/simple; bh=O27nyNn94du8Hn5Mi1rs15u7hoFsjfVlxvRcc/IoALU=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ocSdhJdo31rbx+PO21MvNOu3YA4kuY4Frr6/p57a8RwVJpha9rwviEzh2bmt9jPytSXJWv+k8+TNxue3ppHkn4fOIdnWgZbjb19bf0yKiLO+zF5w4WJ/a8wDg9BI4jqoIA0VxKTJcPpd2KNG5pA+f6QjOJp5+M5TWkOKWw2U6jo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=HhoPIp8K; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="HhoPIp8K" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=GvtBklbxsq8gFXeoqk09HOHNotOYiYTnyMZsSVGgNas=; b=HhoPIp8Kbtb2jmZCHXjBtfwmtQ GfpFGDBebHV2KFBO36zP3dpdW43WjoYvTc5G+QuFEaDjCiTpfI3YkapjePjeTa8Y1E/1Jtl1mlplE MsSEJqG50WcZdUDNdApGGsnm3JJKnLF78K6en3x6zN4GWGEopKhvkCbM+VtMKQ9+4CBvyQvu16Z1k l4ZyhJ3p7wftJgTb9IEprOSH/OjwhqNyvJA7rlnyJQuXkuX64zPfOe5psn9cNzPby0lNz+JEMQHQV 1wriD2IDTJcJkunjGM/xYW+N9beCf8434JW9WCC8bIeToWZ2OX+YJ/U0/ez1zP92FIc5+PflR7Zpl 4Ky4hDXA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlH-00000004XFG-1S2q; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYYg-0GPG; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 2/7] KVM: x86: Use gfn_to_pfn_cache for record_steal_time Date: Fri, 8 May 2026 19:10:04 +0100 Message-ID: <20260508181717.3230988-3-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html From: Carsten Stollmaier This largely reverts commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status"), which dropped the use of the gfn_to_pfn_cache because it was not integrated with the MMU notifiers at the time. That shortcoming has long since been addressed, making the GPC work correctly for this use case. Aside from cleaning up the last open-coded assembler access to user addresses and associated explicit asm exception fixups, moving back to the now-functional GPC also resolves an issue with contention on the mmap_lock with userfaultfd. The contention issue is as follows: On vcpu_run, before entering the guest, the update of the steal time information causes a page-fault if the page is not present. In our scenario, this gets handled by do_user_addr_fault() and successively handle_userfault() because the region is registered to that. Since handle_userfault() uses TASK_INTERRUPTIBLE, it is interruptible by signals. But do_user_addr_fault() then busy-retries if the pending signal is non-fatal, which leads to heavy contention of the mmap_lock. By restoring the use of GPC for accessing the guest steal time, the contention is avoided and refreshing the GPC happens when the vCPU is next scheduled. Since the gfn_to_pfn_cache gives a kernel mapping rather than a userspace HVA, accesses are now plain C instead of unsafe_put_user() et al. Use READ_ONCE()/WRITE_ONCE() to prevent the compiler from reordering or tearing the accesses, and add an smp_wmb() before the final version increment to ensure the data writes are ordered before the seqcount update =E2=80=94 the old unsafe_put_user() inline assembly act= ed as an implicit compiler barrier. In kvm_steal_time_set_preempted(), use read_trylock() instead of read_lock_irqsave() since this is called from the scheduler path where rwlock_t is not safe on PREEMPT_RT (it becomes sleepable). Since we only trylock and bail on failure, there is no risk of deadlock with an interrupt handler, so no need to disable interrupts at all. Setting the preempted flag is best-effort anyway. Signed-off-by: Carsten Stollmaier Co-developed-by: David Woodhouse Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/x86.c | 126 ++++++++++++++++---------------- 2 files changed, 66 insertions(+), 62 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index c470e40a00aa..6f26c68db4b0 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -959,7 +959,7 @@ struct kvm_vcpu_arch { u8 preempted; u64 msr_val; u64 last_steal; - struct gfn_to_hva_cache cache; + struct gfn_to_pfn_cache cache; } st; =20 u64 l1_tsc_offset; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0a1b63c63d1a..ae71f28cc1c5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3747,10 +3747,8 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_service_local_tlb= _flush_requests); =20 static void record_steal_time(struct kvm_vcpu *vcpu) { - struct gfn_to_hva_cache *ghc =3D &vcpu->arch.st.cache; - struct kvm_steal_time __user *st; - struct kvm_memslots *slots; - gpa_t gpa =3D vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS; + struct gfn_to_pfn_cache *gpc =3D &vcpu->arch.st.cache; + struct kvm_steal_time *st; u64 steal; u32 version; =20 @@ -3765,42 +3763,26 @@ static void record_steal_time(struct kvm_vcpu *vcpu) if (WARN_ON_ONCE(current->mm !=3D vcpu->kvm->mm)) return; =20 - slots =3D kvm_memslots(vcpu->kvm); - - if (unlikely(slots->generation !=3D ghc->generation || - gpa !=3D ghc->gpa || - kvm_is_error_hva(ghc->hva) || !ghc->memslot)) { + read_lock(&gpc->lock); + while (!kvm_gpc_check(gpc, sizeof(*st))) { /* We rely on the fact that it fits in a single page. */ BUILD_BUG_ON((sizeof(*st) - 1) & KVM_STEAL_VALID_BITS); =20 - if (kvm_gfn_to_hva_cache_init(vcpu->kvm, ghc, gpa, sizeof(*st)) || - kvm_is_error_hva(ghc->hva) || !ghc->memslot) + read_unlock(&gpc->lock); + + if (kvm_gpc_refresh(gpc, sizeof(*st))) return; + + read_lock(&gpc->lock); } =20 - st =3D (struct kvm_steal_time __user *)ghc->hva; + st =3D (struct kvm_steal_time *)gpc->khva; /* * Doing a TLB flush here, on the guest's behalf, can avoid * expensive IPIs. */ if (guest_pv_has(vcpu, KVM_FEATURE_PV_TLB_FLUSH)) { - u8 st_preempted =3D 0; - int err =3D -EFAULT; - - if (!user_access_begin(st, sizeof(*st))) - return; - - asm volatile("1: xchgb %0, %2\n" - "xor %1, %1\n" - "2:\n" - _ASM_EXTABLE_UA(1b, 2b) - : "+q" (st_preempted), - "+&r" (err), - "+m" (st->preempted)); - if (err) - goto out; - - user_access_end(); + u8 st_preempted =3D xchg(&st->preempted, 0); =20 vcpu->arch.st.preempted =3D 0; =20 @@ -3808,39 +3790,34 @@ static void record_steal_time(struct kvm_vcpu *vcpu) st_preempted & KVM_VCPU_FLUSH_TLB); if (st_preempted & KVM_VCPU_FLUSH_TLB) kvm_vcpu_flush_tlb_guest(vcpu); - - if (!user_access_begin(st, sizeof(*st))) - goto dirty; } else { - if (!user_access_begin(st, sizeof(*st))) - return; - - unsafe_put_user(0, &st->preempted, out); + WRITE_ONCE(st->preempted, 0); vcpu->arch.st.preempted =3D 0; } =20 - unsafe_get_user(version, &st->version, out); + version =3D READ_ONCE(st->version); if (version & 1) version +=3D 1; /* first time write, random junk */ =20 version +=3D 1; - unsafe_put_user(version, &st->version, out); + WRITE_ONCE(st->version, version); =20 smp_wmb(); =20 - unsafe_get_user(steal, &st->steal, out); + steal =3D READ_ONCE(st->steal); steal +=3D current->sched_info.run_delay - vcpu->arch.st.last_steal; vcpu->arch.st.last_steal =3D current->sched_info.run_delay; - unsafe_put_user(steal, &st->steal, out); + WRITE_ONCE(st->steal, steal); + + smp_wmb(); =20 version +=3D 1; - unsafe_put_user(version, &st->version, out); + WRITE_ONCE(st->version, version); + + kvm_gpc_mark_dirty_in_slot(gpc); =20 - out: - user_access_end(); - dirty: - mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa)); + read_unlock(&gpc->lock); } =20 /* @@ -4175,8 +4152,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct= msr_data *msr_info) =20 vcpu->arch.st.msr_val =3D data; =20 - if (!(data & KVM_MSR_ENABLED)) - break; + if (data & KVM_MSR_ENABLED) + kvm_gpc_activate(&vcpu->arch.st.cache, data & ~KVM_MSR_ENABLED, + sizeof(struct kvm_steal_time)); + else + kvm_gpc_deactivate(&vcpu->arch.st.cache); =20 kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); =20 @@ -5239,11 +5219,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int c= pu) =20 static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) { - struct gfn_to_hva_cache *ghc =3D &vcpu->arch.st.cache; - struct kvm_steal_time __user *st; - struct kvm_memslots *slots; + struct gfn_to_pfn_cache *gpc =3D &vcpu->arch.st.cache; + struct kvm_steal_time *st; static const u8 preempted =3D KVM_VCPU_PREEMPTED; - gpa_t gpa =3D vcpu->arch.st.msr_val & KVM_STEAL_VALID_BITS; =20 /* * The vCPU can be marked preempted if and only if the VM-Exit was on @@ -5268,20 +5246,41 @@ static void kvm_steal_time_set_preempted(struct kvm= _vcpu *vcpu) if (unlikely(current->mm !=3D vcpu->kvm->mm)) return; =20 - slots =3D kvm_memslots(vcpu->kvm); - - if (unlikely(slots->generation !=3D ghc->generation || - gpa !=3D ghc->gpa || - kvm_is_error_hva(ghc->hva) || !ghc->memslot)) + /* + * Use a trylock as this is called from the scheduler path (via + * kvm_sched_out), where rwlock_t is not safe on PREEMPT_RT (it + * becomes sleepable). Setting preempted is best-effort anyway; + * the old HVA-based code used copy_to_user_nofault() which could + * also silently fail. + * + * Since we only trylock and bail on failure, there is no risk of + * deadlock with an interrupt handler, so no need to disable + * interrupts. + */ + if (!read_trylock(&gpc->lock)) return; =20 - st =3D (struct kvm_steal_time __user *)ghc->hva; + if (!kvm_gpc_check(gpc, sizeof(*st))) + goto out_unlock_gpc; + + st =3D (struct kvm_steal_time *)gpc->khva; BUILD_BUG_ON(sizeof(st->preempted) !=3D sizeof(preempted)); =20 - if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted))) - vcpu->arch.st.preempted =3D KVM_VCPU_PREEMPTED; + WRITE_ONCE(st->preempted, preempted); + vcpu->arch.st.preempted =3D KVM_VCPU_PREEMPTED; + + kvm_gpc_mark_dirty_in_slot(gpc); + +out_unlock_gpc: + read_unlock(&gpc->lock); +} =20 - mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa)); +static void kvm_steal_time_reset(struct kvm_vcpu *vcpu) +{ + kvm_gpc_deactivate(&vcpu->arch.st.cache); + vcpu->arch.st.preempted =3D 0; + vcpu->arch.st.msr_val =3D 0; + vcpu->arch.st.last_steal =3D 0; } =20 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) @@ -12841,6 +12840,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) =20 kvm_gpc_init(&vcpu->arch.pv_time, vcpu->kvm); =20 + kvm_gpc_init(&vcpu->arch.st.cache, vcpu->kvm); + if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu)) kvm_set_mp_state(vcpu, KVM_MP_STATE_RUNNABLE); else @@ -12948,6 +12949,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_clear_async_pf_completion_queue(vcpu); kvm_mmu_unload(vcpu); =20 + kvm_steal_time_reset(vcpu); + kvmclock_reset(vcpu); =20 for_each_possible_cpu(cpu) @@ -13068,7 +13071,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool ini= t_event) kvm_make_request(KVM_REQ_EVENT, vcpu); vcpu->arch.apf.msr_en_val =3D 0; vcpu->arch.apf.msr_int_val =3D 0; - vcpu->arch.st.msr_val =3D 0; + + kvm_steal_time_reset(vcpu); =20 kvmclock_reset(vcpu); =20 --=20 2.51.0 From nobody Sat Jun 13 07:52:24 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D4AD3D8115; Fri, 8 May 2026 18:17:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264262; cv=none; b=rGfke7eNY9ixNeJy/lbVIUUJR/b/C0JZbuI/fIgfnZo4f2BL2aL9g4OyMvKkYC9lLw1vOuZ6AR9lC4tGzC+Yc/MxdoUECTn5btAhPWAG2rPVTU6R7KnYldoEbUlghvS+IBaTIeHgNChV4ADYoEAnCMmd549s7hCVpmbcR6HyUp8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264262; c=relaxed/simple; bh=jOOtfpLCvgDeAQsuv4PgwZQvgAa1sFeO7VQI/DZihBM=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=u1a1ZKgRJrowK5fkGUJHi2JKlaHUrmbCozHXhW2oA9z+gDMFiglLZlv/+UWb8ckwofgb7ikk0SF8jOpyeRWBXSyymCFANXGnzdyCkt6aZzWGtTBrM0zbzUiGKGdDMjXsjm0c2JkB/Ce77ftEPZkywjel1XeYK5Y8eoC3JX4fobE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=T/WY2eeP; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="T/WY2eeP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=bso8q6C61d4RyZ3spyPByq0wAnze56h7HTvtVJmS3nY=; b=T/WY2eePMtWzt07DhmnCW4ly4K hYbZwX2O49DsY62J14dwD2AN49c+ODMzXkFVb6V8+bF1ojafPa8BnfPsGdhZsvXapfSm+vOwDfQm/ JQn5TAOiXErOfBUZGrKAALITFKr3WaFs/koHvDMRmMt2mLEXTypUYxL3WrK8es0WGHec944EM5O8x wc9UTOyemcBFiJN50zEye4ksMzSmPTPJeAL2gs7UzlwSJEoesXd4BEjmmx6uDqW1ZOvM6Uc+bbmuu 91LC/8BQuaoCqDI4CNJGQtwKZINlDfLZ0sPTyV7gMqY45VaKmDpCjh4QRGqrbC1g3zwQ5Smofi+Nk jm9RWrRw==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlH-00000004XFJ-1d5N; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYYj-0Yci; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 3/7] KVM: x86/xen: Use read_trylock() for GPC locks in hardirq/atomic paths Date: Fri, 8 May 2026 19:10:05 +0100 Message-ID: <20260508181717.3230988-4-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse kvm_xen_set_evtchn_fast() is called from hardirq context (timer callback, kvm_arch_set_irq_inatomic()). On PREEMPT_RT, rwlock_t is a sleeping lock, so read_lock_irqsave() cannot be used in this context. Switch to read_trylock() and return -EWOULDBLOCK on contention, which is the designed fallback =E2=80=94 there is always a slow path for the case wh= ere the GPC is invalid and needs to be refreshed. Reported-by: syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3D208f7f3e5f59c11aeb90 Fixes: 14243b387137 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and even= t channel delivery") Signed-off-by: David Woodhouse --- arch/x86/kvm/xen.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 91fd3673c09a..9bdb8e3cad58 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -697,6 +697,7 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) int __kvm_xen_has_interrupt(struct kvm_vcpu *v) { struct gfn_to_pfn_cache *gpc =3D &v->arch.xen.vcpu_info_cache; + bool atomic =3D in_atomic() || !task_is_running(current); unsigned long flags; u8 rc =3D 0; =20 @@ -713,7 +714,15 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) BUILD_BUG_ON(sizeof(rc) !=3D sizeof_field(struct compat_vcpu_info, evtchn_upcall_pending)); =20 - read_lock_irqsave(&gpc->lock, flags); + if (atomic) { + local_irq_save(flags); + if (!read_trylock(&gpc->lock)) { + local_irq_restore(flags); + return 1; + } + } else { + read_lock_irqsave(&gpc->lock, flags); + } while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { read_unlock_irqrestore(&gpc->lock, flags); =20 @@ -725,7 +734,7 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) * and we'll end up getting called again from a context where we *can* * fault in the page and wait for it. */ - if (in_atomic() || !task_is_running(current)) + if (atomic) return 1; =20 if (kvm_gpc_refresh(gpc, sizeof(struct vcpu_info))) { @@ -1794,7 +1803,6 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe= , struct kvm *kvm) struct gfn_to_pfn_cache *gpc =3D &kvm->arch.xen.shinfo_cache; struct kvm_vcpu *vcpu; unsigned long *pending_bits, *mask_bits; - unsigned long flags; int port_word_bit; bool kick_vcpu =3D false; int vcpu_idx, idx, rc; @@ -1816,9 +1824,10 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *x= e, struct kvm *kvm) =20 idx =3D srcu_read_lock(&kvm->srcu); =20 - read_lock_irqsave(&gpc->lock, flags); - if (!kvm_gpc_check(gpc, PAGE_SIZE)) + if (!read_trylock(&gpc->lock)) goto out_rcu; + if (!kvm_gpc_check(gpc, PAGE_SIZE)) + goto out_unlock; =20 if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { struct shared_info *shinfo =3D gpc->khva; @@ -1847,11 +1856,10 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *= xe, struct kvm *kvm) } else { rc =3D 1; /* Delivered to the bitmap in shared_info. */ /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); gpc =3D &vcpu->arch.xen.vcpu_info_cache; =20 - read_lock_irqsave(&gpc->lock, flags); - if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { + if (!read_trylock(&gpc->lock)) { /* * Could not access the vcpu_info. Set the bit in-kernel * and prod the vCPU to deliver it for itself. @@ -1860,6 +1868,11 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *x= e, struct kvm *kvm) kick_vcpu =3D true; goto out_rcu; } + if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { + if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel= )) + kick_vcpu =3D true; + goto out_unlock; + } =20 if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) { struct vcpu_info *vcpu_info =3D gpc->khva; @@ -1883,8 +1896,9 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe= , struct kvm *kvm) } } =20 + out_unlock: + read_unlock(&gpc->lock); out_rcu: - read_unlock_irqrestore(&gpc->lock, flags); srcu_read_unlock(&kvm->srcu, idx); =20 if (kick_vcpu) { --=20 2.51.0 From nobody Sat Jun 13 07:52:24 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1774C37FF76; Fri, 8 May 2026 18:17:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264262; cv=none; b=Mifmi+DrKWVLF/ocS8HZAOu9Cw0YJH44RYF8XjSDIzsbP8uIfTJeyjsC1ta+8Qz4kTVhL4Hdh4qos+wdltQPa7uNComOwvC47ybyIGSEh9GkGc6FHvNEwLT2pVfBaIonIc6FZLPhzqZhkRmQhZ+lPwLYfoFuN+pybzj5naZAcSQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264262; c=relaxed/simple; bh=QLavTUg3Q5LlFNBG0LAYvFru+cTLMlVpf1ukvskpmzs=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RxeVLm3YWXllImrz8z7+KRSSTFv77e8Di0hhJKKl0fBUO6uTEpiyv+b9z96k+gsbDrL4Dd3+7CzUIz3lyiM6p9WviqlgbZhHLHffOQXUWewI13iU+WJPnm2cCqqNi/BWqMBAGGFHL6KcgQExvx+IOHzWxJTHpma4CG9/gvWvwgg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=KWhmJX4W; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="KWhmJX4W" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=zMxK4YhCdW8tvFFY70Jve60rlJalaXQPnPh4Wp9brTs=; b=KWhmJX4WAjOiKr3iZL1OWI0m+d J29jgQ6JGi/eK1gHumih3KnhPdQ62OiJegsryIEQ6iYe+YqKSpNNCR51L1gmJpHKZiS7UAfBus8YI GbmkwQchwpJpD6BM1IUH7CmHhuocFbaluPuot9ImPZwGLiMyl68dabCrW2DyVoBfR/6Sl2cTYffFZ se2mq/zaa0tZcu8L4O5dtpWy4xypuQ5OSRAIQM1majKXsuvoMJUue3fCdIi2cWPLXeqtYTWZ2wGr0 tZnjPkqUo9+KDTKPWkjCE/PdhN4PFI9ei6dFN0ZcAbnD2GSTrKJ6s2PxW+LvxcYg79ssIcS1xwXFv yNoh/3VQ==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlH-00000004XFK-1zku; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYYp-0qfu; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 4/7] KVM: x86/xen: Remove unnecessary irqsave from GPC lock usage in xen.c Date: Fri, 8 May 2026 19:10:06 +0100 Message-ID: <20260508181717.3230988-5-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Now that the hardirq path (xen_timer_callback and set_evtchn_fast) uses read_trylock() instead of read_lock_irqsave(), the remaining GPC lock users in xen.c are only called from process context (vcpu_run, ioctls). There is no need to disable interrupts to prevent concurrent access from a hardirq user, since the hardirq path no longer takes the lock. Convert read_lock_irqsave()/read_unlock_irqrestore() to plain read_lock()/read_unlock() in: - kvm_xen_update_runstate_guest() - kvm_xen_shared_info_init() - xen_get_guest_pvclock() - kvm_xen_inject_pending_events() - __kvm_xen_has_interrupt() - wait_pending_event() Signed-off-by: David Woodhouse --- arch/x86/kvm/xen.c | 60 +++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 35 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 9bdb8e3cad58..b1fae42bf295 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -45,15 +45,15 @@ static int kvm_xen_shared_info_init(struct kvm *kvm) int ret =3D 0; int idx =3D srcu_read_lock(&kvm->srcu); =20 - read_lock_irq(&gpc->lock); + read_lock(&gpc->lock); while (!kvm_gpc_check(gpc, PAGE_SIZE)) { - read_unlock_irq(&gpc->lock); + read_unlock(&gpc->lock); =20 ret =3D kvm_gpc_refresh(gpc, PAGE_SIZE); if (ret) goto out; =20 - read_lock_irq(&gpc->lock); + read_lock(&gpc->lock); } =20 /* @@ -96,7 +96,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm) smp_wmb(); =20 wc->version =3D wc_version + 1; - read_unlock_irq(&gpc->lock); + read_unlock(&gpc->lock); =20 kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE); =20 @@ -155,22 +155,21 @@ static int xen_get_guest_pvclock(struct kvm_vcpu *vcp= u, struct gfn_to_pfn_cache *gpc, unsigned int offset) { - unsigned long flags; int r; =20 - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); while (!kvm_gpc_check(gpc, offset + sizeof(*hv_clock))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 r =3D kvm_gpc_refresh(gpc, offset + sizeof(*hv_clock)); if (r) return r; =20 - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); } =20 memcpy(hv_clock, gpc->khva + offset, sizeof(*hv_clock)); - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 /* * Sanity check TSC shift+multiplier to verify the guest's view of time @@ -325,7 +324,6 @@ static void kvm_xen_update_runstate_guest(struct kvm_vc= pu *v, bool atomic) struct gfn_to_pfn_cache *gpc2 =3D &vx->runstate2_cache; size_t user_len, user_len1, user_len2; struct vcpu_runstate_info rs; - unsigned long flags; size_t times_ofs; uint8_t *update_bit =3D NULL; uint64_t entry_time; @@ -421,16 +419,14 @@ static void kvm_xen_update_runstate_guest(struct kvm_= vcpu *v, bool atomic) * gfn_to_pfn caches that cover the region. */ if (atomic) { - local_irq_save(flags); if (!read_trylock(&gpc1->lock)) { - local_irq_restore(flags); return; } } else { - read_lock_irqsave(&gpc1->lock, flags); + read_lock(&gpc1->lock); } while (!kvm_gpc_check(gpc1, user_len1)) { - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock(&gpc1->lock); =20 /* When invoked from kvm_sched_out() we cannot sleep */ if (atomic) @@ -439,7 +435,7 @@ static void kvm_xen_update_runstate_guest(struct kvm_vc= pu *v, bool atomic) if (kvm_gpc_refresh(gpc1, user_len1)) return; =20 - read_lock_irqsave(&gpc1->lock, flags); + read_lock(&gpc1->lock); } =20 if (likely(!user_len2)) { @@ -467,7 +463,7 @@ static void kvm_xen_update_runstate_guest(struct kvm_vc= pu *v, bool atomic) lock_set_subclass(&gpc1->lock.dep_map, 1, _THIS_IP_); if (atomic) { if (!read_trylock(&gpc2->lock)) { - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock(&gpc1->lock); return; } } else { @@ -476,7 +472,7 @@ static void kvm_xen_update_runstate_guest(struct kvm_vc= pu *v, bool atomic) =20 if (!kvm_gpc_check(gpc2, user_len2)) { read_unlock(&gpc2->lock); - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock(&gpc1->lock); =20 /* When invoked from kvm_sched_out() we cannot sleep */ if (atomic) @@ -581,7 +577,7 @@ static void kvm_xen_update_runstate_guest(struct kvm_vc= pu *v, bool atomic) } =20 kvm_gpc_mark_dirty_in_slot(gpc1); - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock(&gpc1->lock); } =20 void kvm_xen_update_runstate(struct kvm_vcpu *v, int state) @@ -640,7 +636,6 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) { unsigned long evtchn_pending_sel =3D READ_ONCE(v->arch.xen.evtchn_pending= _sel); struct gfn_to_pfn_cache *gpc =3D &v->arch.xen.vcpu_info_cache; - unsigned long flags; =20 if (!evtchn_pending_sel) return; @@ -650,14 +645,14 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) * does anyway. Page it in and retry the instruction. We're just a * little more honest about it. */ - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 if (kvm_gpc_refresh(gpc, sizeof(struct vcpu_info))) return; =20 - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); } =20 /* Now gpc->khva is a valid kernel address for the vcpu_info */ @@ -687,7 +682,7 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) } =20 kvm_gpc_mark_dirty_in_slot(gpc); - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 /* For the per-vCPU lapic vector, deliver it as MSI. */ if (v->arch.xen.upcall_vector) @@ -698,7 +693,6 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) { struct gfn_to_pfn_cache *gpc =3D &v->arch.xen.vcpu_info_cache; bool atomic =3D in_atomic() || !task_is_running(current); - unsigned long flags; u8 rc =3D 0; =20 /* @@ -715,16 +709,13 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) sizeof_field(struct compat_vcpu_info, evtchn_upcall_pending)); =20 if (atomic) { - local_irq_save(flags); - if (!read_trylock(&gpc->lock)) { - local_irq_restore(flags); + if (!read_trylock(&gpc->lock)) return 1; - } } else { - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); } while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 /* * This function gets called from kvm_vcpu_block() after setting the @@ -744,11 +735,11 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) */ return 0; } - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); } =20 rc =3D ((struct vcpu_info *)gpc->khva)->evtchn_upcall_pending; - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); return rc; } =20 @@ -1445,12 +1436,11 @@ static bool wait_pending_event(struct kvm_vcpu *vcp= u, int nr_ports, struct kvm *kvm =3D vcpu->kvm; struct gfn_to_pfn_cache *gpc =3D &kvm->arch.xen.shinfo_cache; unsigned long *pending_bits; - unsigned long flags; bool ret =3D true; int idx, i; =20 idx =3D srcu_read_lock(&kvm->srcu); - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); if (!kvm_gpc_check(gpc, PAGE_SIZE)) goto out_rcu; =20 @@ -1471,7 +1461,7 @@ static bool wait_pending_event(struct kvm_vcpu *vcpu,= int nr_ports, } =20 out_rcu: - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); srcu_read_unlock(&kvm->srcu, idx); =20 return ret; --=20 2.51.0 From nobody Sat Jun 13 07:52:24 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 646C13890E7; Fri, 8 May 2026 18:17:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264261; cv=none; b=g0OcH6OOGREk9hV8fB82Dvop/aY3rzXSVjf2jrwZSSGNnxUG0wHwtgwhbkPkqEyrjOyIYO0jQ3omtR6j7aeu+m48cCrM1zsbH5wlOBkWfM8YxzeW+zoFp51/u0vC6Ro17zUl0R1NmMGzc0jRi2Jf4SKieyLOQnSnl9/nqn5HLik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264261; c=relaxed/simple; bh=+Q8fYgfGx0JQyWahkuakw0voj2ae6I7/Wap02lr1D/U=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FAlStTztwbzZi2LazhtmlU7NT8YiGlEjN7V5gnk6ibsxR1/XvVCdbljAVyCmHduoUF7fC6u4L5PbpOnDwyG4HxlCLRyOCvEqEYiqgygl794cIxHkv7ZJhhpN4fGkmS5x0jnYV/0jsUEXcghN1X0i8G0H34nUhoxJaKjcpA24AwE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=VO0DdZJd; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VO0DdZJd" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=kT9LrUUPvgw7yF7JUGVt2eIItVMRgHFKo8Fq70KxKGs=; b=VO0DdZJd2SLjQdm0Rr2aX2nCF6 aaUqJRYAUfbnYVGyvCMoMa/Vo472SGHURL4S40xOFRmg00YjlKY3RL9zZNpkmEm7gxozODJBtHLAM foJYRqOzkggps/J62IKSTHOOOQfmMh6WvfZW7JulKUjXSw8zRNY1KbTIzNRAJ13ZQvxzascS4pTvA B6qvusgYe+6cu+YYrGH0w9CwP1HeJ9AHTJH0cv3orw7AaDWCyZSVnORH1zzUq7f11vyAw//mzWdVS dQsgeg4Oq8QtBcpQ+EEYlAHKJNDaETjsVHp/oy6gTyumUg2YDlwkj7Qy4f6huXn9i91+Tu4dAddqx dH5qyRHg==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlH-00000004XFL-2CuA; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYYt-1BgU; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 5/7] KVM: x86: Remove unnecessary irqsave from kvm_setup_guest_pvclock() Date: Fri, 8 May 2026 19:10:07 +0100 Message-ID: <20260508181717.3230988-6-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse kvm_setup_guest_pvclock() is only called from kvm_guest_time_update() which runs in process context (vcpu_enter_guest or ioctl). There is no hardirq path that takes the GPC read lock for pvclock, so irqsave is unnecessary. Convert to plain read_lock()/read_unlock(). Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ae71f28cc1c5..e62f4a9ad334 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3270,18 +3270,17 @@ static void kvm_setup_guest_pvclock(struct pvclock_= vcpu_time_info *ref_hv_clock, { struct pvclock_vcpu_time_info *guest_hv_clock; struct pvclock_vcpu_time_info hv_clock; - unsigned long flags; =20 memcpy(&hv_clock, ref_hv_clock, sizeof(hv_clock)); =20 - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 if (kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock))) return; =20 - read_lock_irqsave(&gpc->lock, flags); + read_lock(&gpc->lock); } =20 guest_hv_clock =3D (void *)(gpc->khva + offset); @@ -3306,7 +3305,7 @@ static void kvm_setup_guest_pvclock(struct pvclock_vc= pu_time_info *ref_hv_clock, guest_hv_clock->version =3D ++hv_clock.version; =20 kvm_gpc_mark_dirty_in_slot(gpc); - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock(&gpc->lock); =20 trace_kvm_pvclock_update(vcpu->vcpu_id, &hv_clock); } --=20 2.51.0 From nobody Sat Jun 13 07:52:24 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37C2D37DEB6; Fri, 8 May 2026 18:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264251; cv=none; b=RXGkSZSoz/9SgEKFdCEOi5vzNQzS5BIgRyTLL7F6vTEMpDepxujRjBM3UIrhBK1Vo85940rneEHnIsdBiS2+uoObrBEJ+FvI1k3fe3vBinED+PUImox9ovc4IEfu5S7DkveN6XLndZFJ/GkuS5sRffRX82bVZEWKHubEAg+ko30= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264251; c=relaxed/simple; bh=0inD9J5ZMx3ydStx8bgf42Nm3OYAdUYPQfpl1+huSLs=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fpGk2S1ObldFai73K/I4eOZ2sJyg8sTvXPRNjafU1Ynao0NKRPwYuPf738HtCzfFSUL0iMrKi1QEgMX3x1+49L8gjQ3jUDyKhLStFQTh2CD7ABuj/wSF7alZ1p67i61TqWHeySZ32p05N6doMxSrbFhCARYgD+f8NGYyHZcUhyc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=J+0DmznO; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="J+0DmznO" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=REZRf0+NDKaOilloT3mYryb1YEizmDl3e++BEDzpvTs=; b=J+0DmznONpTTOv6NJoAvLRI8yd SgId31jjSWhdoaJsPZAnyk+kaLE3xS/nwPAnRfyY73IOSe4s30XgrvWy/Sxx8CU0AlwcTJnavM590 JoJSa1loMa4PzcAjGUKJRulq4gSM9ZCxOkCV41sjrS8OG76XBccBNhWgCUuuWk2oDV/vA+xGDB3nW 8cA2iUw0O/CaGFRZ0dO4SYl5aDfhmMkkdpNRx1kLOVzbXjPby9aSyh95bEElPWhahTIGEBR1ysB6x WfpVrKHirWfchofiSl4u9CEtOax32DaaTL1FDbeLUxD8gXPb9foKQeEJnK2o8SIApKXQ8fevseTrX 2y9WNsKA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlI-000000072tq-0SK9; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYYx-1OTh; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 6/7] KVM: Remove unnecessary IRQ disabling from GPC lock in pfncache.c Date: Fri, 8 May 2026 19:10:08 +0100 Message-ID: <20260508181717.3230988-7-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Now that all hardirq/atomic GPC users (xen_timer_callback, kvm_xen_set_evtchn_fast) use read_trylock() instead of read_lock(), no hardirq path ever holds the GPC rwlock. There is therefore no risk of deadlock between the write side and a hardirq reader, and no need to disable interrupts when taking the lock. Convert all read_lock_irq()/write_lock_irq() and their unlock counterparts to plain read_lock()/write_lock() in pfncache.c. Signed-off-by: David Woodhouse --- virt/kvm/pfncache.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 728d2c1b488a..70b102095173 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -29,12 +29,12 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,= unsigned long start, =20 spin_lock(&kvm->gpc_lock); list_for_each_entry(gpc, &kvm->gpc_list, list) { - read_lock_irq(&gpc->lock); + read_lock(&gpc->lock); =20 /* Only a single page so no need to care about length */ if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && gpc->uhva >=3D start && gpc->uhva < end) { - read_unlock_irq(&gpc->lock); + read_unlock(&gpc->lock); =20 /* * There is a small window here where the cache could @@ -44,15 +44,15 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,= unsigned long start, * acquired. */ =20 - write_lock_irq(&gpc->lock); + write_lock(&gpc->lock); if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && gpc->uhva >=3D start && gpc->uhva < end) gpc->valid =3D false; - write_unlock_irq(&gpc->lock); + write_unlock(&gpc->lock); continue; } =20 - read_unlock_irq(&gpc->lock); + read_unlock(&gpc->lock); } spin_unlock(&kvm->gpc_lock); } @@ -184,7 +184,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cac= he *gpc) mmu_seq =3D gpc->kvm->mmu_invalidate_seq; smp_rmb(); =20 - write_unlock_irq(&gpc->lock); + write_unlock(&gpc->lock); =20 /* * If the previous iteration "failed" due to an mmu_notifier @@ -225,7 +225,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cac= he *gpc) goto out_error; } =20 - write_lock_irq(&gpc->lock); + write_lock(&gpc->lock); =20 /* * Other tasks must wait for _this_ refresh to complete before @@ -248,7 +248,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cac= he *gpc) return 0; =20 out_error: - write_lock_irq(&gpc->lock); + write_lock(&gpc->lock); =20 return -EFAULT; } @@ -269,7 +269,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *g= pc, gpa_t gpa, unsigned l =20 lockdep_assert_held(&gpc->refresh_lock); =20 - write_lock_irq(&gpc->lock); + write_lock(&gpc->lock); =20 if (!gpc->active) { ret =3D -EINVAL; @@ -355,7 +355,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *g= pc, gpa_t gpa, unsigned l unmap_old =3D (old_pfn !=3D gpc->pfn); =20 out_unlock: - write_unlock_irq(&gpc->lock); + write_unlock(&gpc->lock); =20 if (unmap_old) gpc_unmap(old_pfn, old_khva); @@ -417,9 +417,9 @@ static int __kvm_gpc_activate(struct gfn_to_pfn_cache *= gpc, gpa_t gpa, unsigned * refresh must not establish a mapping until the cache is * reachable by mmu_notifier events. */ - write_lock_irq(&gpc->lock); + write_lock(&gpc->lock); gpc->active =3D true; - write_unlock_irq(&gpc->lock); + write_unlock(&gpc->lock); } return __kvm_gpc_refresh(gpc, gpa, uhva); } @@ -458,7 +458,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) * must stall mmu_notifier events until all users go away, i.e. * until gpc->lock is dropped and refresh is guaranteed to fail. */ - write_lock_irq(&gpc->lock); + write_lock(&gpc->lock); gpc->active =3D false; gpc->valid =3D false; =20 @@ -473,7 +473,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) =20 old_pfn =3D gpc->pfn; gpc->pfn =3D KVM_PFN_ERR_FAULT; - write_unlock_irq(&gpc->lock); + write_unlock(&gpc->lock); =20 spin_lock(&kvm->gpc_lock); list_del(&gpc->list); --=20 2.51.0 From nobody Sat Jun 13 07:52:24 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F0E1390CB8; Fri, 8 May 2026 18:17:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264254; cv=none; b=O2BPgrDSgfTDk2iSsnZqBJm7bxVAI9I/T+nLKRQfPWtZ3B3DPYf+7aYoOGKuLBRAEjLWoSEtcR0fcF6tjGuthIz0YwUsor7WFq5tN32iuuSKpz7zkgQ3brytYBMiiWmspXYj4dmNTu0U9sW4ONNkQ7Vhl5xJP6oRxb58xFu2je8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778264254; c=relaxed/simple; bh=Qcd2atfGyJQWYahNhwH9zauz3NBcPwffSp7KVyZKEEQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ezJCChtE8yXFr68JYvoHGQq7U7S6/fJTSVDONjFTCIGIB7o3Ghv8T4cwkVKVjDObruXXABYro+1YQLFmxeGJ7/H/mb4ziZ60gUC0l/3tDBFQJtUs7YKZKBC9aH0SVfCxd253JPGuhjLaQXSxZPBOvxxMR042uhUwMJt6GG8t0jE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=c2wnVYp7; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="c2wnVYp7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=XXv2L/eBnTQ2xxqyNdJVoSpOW2dKadewk0bGdFKpA/4=; b=c2wnVYp7MfNMNl+9SNE14bmawl B4/ghYUnxTHZMyDnzjxao7QrWrsgk00ieRYr1nRkzVPlrYiFuLkRQq+j0zVYzcFhqC3WwDYduq5Xc C1B/ii5VytaxTa53R1+F36lQKlyxHnOolCH4xjWDHNFaqV7tfEIUTleL73dzmdeeiKutxUrjgxsIB i+gGuObGAE+XyIWE7CjztWDhBZbJzNGU1IzxgnXLo8C2b6OCeiLBQf47UOT8IXimQt9urUoC+cAnP H12RAQboOxjtMDcfpztJgJI1KKRk+gFkWCfQKxOsY5+RGyJwQeOm4dy9mGKvH5P0DmqDchDHDnYq0 utEhKssg==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wLPlI-000000072tr-0SKZ; Fri, 08 May 2026 18:17:20 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1wLPlH-0000000DYZ1-1VvY; Fri, 08 May 2026 19:17:19 +0100 From: David Woodhouse To: Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Paul Durrant , Peter Zijlstra , Will Deacon , Boqun Feng , Waiman Long , Sebastian Andrzej Siewior , Clark Williams , Steven Rostedt , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev, Mauricio Faria de Oliveira , kernel-dev@igalia.com, syzbot+208f7f3e5f59c11aeb90@syzkaller.appspotmail.com Subject: [PATCH 7/7] KVM: x86/xen: Handle pending Xen timer events in vcpu_enter_guest() Date: Fri, 8 May 2026 19:10:09 +0100 Message-ID: <20260508181717.3230988-8-dwmw2@infradead.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260508181717.3230988-1-dwmw2@infradead.org> References: <20260508181717.3230988-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse If xen_timer_callback() can't deliver an event directly to the guest (e.g. due to memslot changes causing the GPC to need refreshing), it sets the timer_pending flag and kicks the vCPU. However, the pending timer was only injected from the outer vcpu_run() loop via kvm_inject_pending_timer_irqs(), not from the inner loop in vcpu_enter_guest(). This means that the timer could be delayed until something else causes vcpu_enter_guest() to return to the outer loop. Thus, timer delivery could be delayed by a whole scheduler tick, or hypothetically for ever in a NOHZ_FULL environment. Subsume Xen timer handling into kvm_xen_has_pending_events() and kvm_xen_inject_pending_events(), and use those directly from the inner vcpu_enter_guest() loop. This ensures deferred timer delivery happens on the next VM-entry rather than waiting for the scheduler. Remove the Xen timer handling from kvm_inject_pending_timer_irqs() and from kvm_cpu_has_pending_timer(), since kvm_vcpu_has_events() already covers the wakeup case via kvm_xen_has_pending_events(). Pull the actual event injection into kvm_xen_inject_pending_events() and remove kvm_xen_inject_timer_irqs() to avoid a double check of arch.xen.timer_pending in caller and callee. Its other caller can just call kvm_xen_inject_pending_events() (to ensure pending timers are flushed when setting them from userspace). Signed-off-by: David Woodhouse --- arch/x86/kvm/irq.c | 4 ---- arch/x86/kvm/x86.c | 3 +++ arch/x86/kvm/xen.c | 35 +++++++++++++++++------------------ arch/x86/kvm/xen.h | 21 ++------------------- 4 files changed, 22 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 9519fec09ee6..7527c9bfe244 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -30,8 +30,6 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) =20 if (lapic_in_kernel(vcpu)) r =3D apic_has_pending_timer(vcpu); - if (kvm_xen_timer_enabled(vcpu)) - r +=3D kvm_xen_has_pending_timer(vcpu); =20 return r; } @@ -170,8 +168,6 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcp= u) { if (lapic_in_kernel(vcpu)) kvm_inject_apic_timer_irqs(vcpu); - if (kvm_xen_timer_enabled(vcpu)) - kvm_xen_inject_timer_irqs(vcpu); } =20 void __kvm_migrate_timers(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e62f4a9ad334..c8e58a18a3e7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11254,6 +11254,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) } if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) record_steal_time(vcpu); + if (kvm_check_request(KVM_REQ_UNBLOCK, vcpu) && + kvm_xen_has_pending_events(vcpu)) + kvm_xen_inject_pending_events(vcpu); if (kvm_check_request(KVM_REQ_PMU, vcpu)) kvm_pmu_handle_event(vcpu); if (kvm_check_request(KVM_REQ_PMI, vcpu)) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index b1fae42bf295..16b8c154243c 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -105,22 +105,6 @@ static int kvm_xen_shared_info_init(struct kvm *kvm) return ret; } =20 -void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu) -{ - if (atomic_read(&vcpu->arch.xen.timer_pending) > 0) { - struct kvm_xen_evtchn e; - - e.vcpu_id =3D vcpu->vcpu_id; - e.vcpu_idx =3D vcpu->vcpu_idx; - e.port =3D vcpu->arch.xen.timer_virq; - e.priority =3D KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; - - kvm_xen_set_evtchn(&e, vcpu->kvm); - - vcpu->arch.xen.timer_expires =3D 0; - atomic_set(&vcpu->arch.xen.timer_pending, 0); - } -} =20 static enum hrtimer_restart xen_timer_callback(struct hrtimer *timer) { @@ -634,9 +618,24 @@ void kvm_xen_inject_vcpu_vector(struct kvm_vcpu *v) */ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) { - unsigned long evtchn_pending_sel =3D READ_ONCE(v->arch.xen.evtchn_pending= _sel); + unsigned long evtchn_pending_sel; struct gfn_to_pfn_cache *gpc =3D &v->arch.xen.vcpu_info_cache; =20 + if (kvm_xen_timer_enabled(v) && atomic_read(&v->arch.xen.timer_pending)) { + struct kvm_xen_evtchn e; + + e.vcpu_id =3D v->vcpu_id; + e.vcpu_idx =3D v->vcpu_idx; + e.port =3D v->arch.xen.timer_virq; + e.priority =3D KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL; + + kvm_xen_set_evtchn(&e, v->kvm); + + v->arch.xen.timer_expires =3D 0; + atomic_set(&v->arch.xen.timer_pending, 0); + } + + evtchn_pending_sel =3D READ_ONCE(v->arch.xen.evtchn_pending_sel); if (!evtchn_pending_sel) return; =20 @@ -1238,7 +1237,7 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, stru= ct kvm_xen_vcpu_attr *data) */ if (vcpu->arch.xen.timer_expires) { hrtimer_cancel(&vcpu->arch.xen.timer); - kvm_xen_inject_timer_irqs(vcpu); + kvm_xen_inject_pending_events(vcpu); } =20 data->u.timer.port =3D vcpu->arch.xen.timer_virq; diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 59e6128a7bd3..029026853af5 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -92,7 +92,8 @@ static inline int kvm_xen_has_interrupt(struct kvm_vcpu *= vcpu) static inline bool kvm_xen_has_pending_events(struct kvm_vcpu *vcpu) { return static_branch_unlikely(&kvm_xen_enabled.key) && - vcpu->arch.xen.evtchn_pending_sel; + (vcpu->arch.xen.evtchn_pending_sel || + atomic_read(&vcpu->arch.xen.timer_pending)); } =20 static inline bool kvm_xen_timer_enabled(struct kvm_vcpu *vcpu) @@ -100,15 +101,6 @@ static inline bool kvm_xen_timer_enabled(struct kvm_vc= pu *vcpu) return !!vcpu->arch.xen.timer_virq; } =20 -static inline int kvm_xen_has_pending_timer(struct kvm_vcpu *vcpu) -{ - if (kvm_xen_hypercall_enabled(vcpu->kvm) && kvm_xen_timer_enabled(vcpu)) - return atomic_read(&vcpu->arch.xen.timer_pending); - - return 0; -} - -void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu); #else static inline int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 = data) { @@ -164,15 +156,6 @@ static inline bool kvm_xen_has_pending_events(struct k= vm_vcpu *vcpu) return false; } =20 -static inline int kvm_xen_has_pending_timer(struct kvm_vcpu *vcpu) -{ - return 0; -} - -static inline void kvm_xen_inject_timer_irqs(struct kvm_vcpu *vcpu) -{ -} - static inline bool kvm_xen_timer_enabled(struct kvm_vcpu *vcpu) { return false; --=20 2.51.0