From nobody Fri Dec 19 15:48:17 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 093182165E9 for ; Mon, 19 May 2025 18:55:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747680928; cv=none; b=mU5KQowaL58KzGIyEH2/mjviH94CU0VHBWZUhr35uga8pUbvVvYHsgzqU8OPjdImrX5AdPqrAePVF/0XbPEyE7COE9sDDRLxkU1lAuglZ2gvzG+peGfguJWh0wRFB5gzlzKen0/bwhf8TpWNIKPB9DDswx3D48CPaSoBUoUWUoA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747680928; c=relaxed/simple; bh=Eg9zFEBWogv292Cj09cTKdE0faIuxbBzttgoF0OU4WU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JRm9m1URQ9zZr8EokhfxNmldfKLy8oQ93/Hr4lkTs3zcsmJeg2ZDfXZQk6aR+LgBjq96WfDAOiIusNdKvipCWQoeFHQbseYB/QuwfxJS9K2e/fQCYmwyUhMrGNpKeRQ5Ob/wQX7FJmigh6HfDNxM1Pb5f+7rX4mcpaAP7aNLHZg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=N14Fb8ea; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="N14Fb8ea" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30e8aec4689so3508805a91.0 for ; Mon, 19 May 2025 11:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747680926; x=1748285726; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=zbVEyigtTpYBUfAmA+RROb1PjlOCwyJF/+KZ2DppXv0=; b=N14Fb8eafBXACEYHLDL/6AwcFxiDFjBGAhvMV8a4C8wPsByCz4uAKGiQJSDQ/tMbz5 L2oUFpexvHE1qp8eghbYNpaZBS3kH9IgoH2PtSLgbTlG7AHgHvQqrGwfvo9ywc0IKysf X3HSRLfR6w5qgI0TwlML6ujFtRGSnvL4tNdjqzvDive7g62jUhsjQD56VCGBy0TEuGEK w60Q6qYX9FZeafpradsPj5uRbmJ9ZY7/qEtLjpu9tNLUtCiD2s1mQsC3IhullTsmIkWX 0bTPqVeCwttjuk45l3NqZBf7DTh2Fekjsn/A0iQbe2eNW/a563CKOMX800nOaQCI5Ns8 6rHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747680926; x=1748285726; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zbVEyigtTpYBUfAmA+RROb1PjlOCwyJF/+KZ2DppXv0=; b=xVVmT/PPZi9bH1QDaFdHh40y9gmDFnFjNH2+lvWIR1QgSxKIs1LP4bjO1yuZuF7KaA KornKF1TNnk49hqJ9nE0h0iIlSCQ7heuqdISLG6AJOULtCCFCITk4buTqHa0YAAjB8Uy 5KCWWLh3Tm7jj/pNPgzZ20jFhBMMwVWTrNHwxT9PCE6n4gRS+9V152MTU11C2gfUsLaq /YY8eDJsKZC0lrFXS03YsD66Tou709O/0H2CJG8zfk20LyCRuLNmyO8V45UmvlaATtDb xiKheBKooIZFURiEhd1DOE7uhx9cn7zLOSKSiSWgprY1Qm2l0JZAWhlrUntXSIWc8xBu 8u4Q== X-Forwarded-Encrypted: i=1; AJvYcCXAg6te8XaFv+5Mk0burZSiWTGhdizTMA6Fig2A4AQKQFw15TJifNwyMKtV6QTWP266cyDfpNW/HS7zsTA=@vger.kernel.org X-Gm-Message-State: AOJu0YwTgUqIz+l/6ap/k3c+4aK/Pn4UPCQJjtGQXKqpYJvqRAKiRmY3 B3OrN+3oJin7qWMkxH+CuhTo/h3OSDSr1JHJfi86719fyHpBU18ZBysGXQ6MtE4xrvuXmd7xjdQ UP9e6NQ== X-Google-Smtp-Source: AGHT+IEwiAdRCUO6qDinbIwLc5KCjzKO7bUCCiwRhiB03/PtedWpTPViBkBaCXXTF0g4MeChAnr+N8qhDr0= X-Received: from pjbpv17.prod.google.com ([2002:a17:90b:3c91:b0:30c:4b1f:78ca]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1c05:b0:301:6343:1626 with SMTP id 98e67ed59e1d1-30e7d4f91ccmr18659853a91.1.1747680926230; Mon, 19 May 2025 11:55:26 -0700 (PDT) Reply-To: Sean Christopherson Date: Mon, 19 May 2025 11:55:06 -0700 In-Reply-To: <20250519185514.2678456-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250519185514.2678456-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1101.gccaa498523-goog Message-ID: <20250519185514.2678456-5-seanjc@google.com> Subject: [PATCH v2 04/12] KVM: Add irqfd to KVM's list via the vfs_poll() callback From: Sean Christopherson To: Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the irqfd structure to KVM's list of irqfds in kvm_irqfd_register(), i.e. via the vfs_poll() callback. This will allow taking irqfds.lock across the entire registration sequence (add to waitqueue, add to list), and more importantly will allow inserting into KVM's list if and only if adding to the waitqueue succeeds (spoiler alert), without needing to juggle return codes in weird ways. Signed-off-by: Sean Christopherson --- virt/kvm/eventfd.c | 102 +++++++++++++++++++++++++-------------------- 1 file changed, 57 insertions(+), 45 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 8b9a87daa2bb..99274d60335d 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -245,34 +245,14 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode,= int sync, void *key) return ret; } =20 -struct kvm_irqfd_pt { - struct kvm_kernel_irqfd *irqfd; - poll_table pt; -}; - -static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, - poll_table *pt) -{ - struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); - struct kvm_kernel_irqfd *irqfd =3D p->irqfd; - - /* - * Add the irqfd as a priority waiter on the eventfd, with a custom - * wake-up handler, so that KVM *and only KVM* is notified whenever the - * underlying eventfd is signaled. - */ - init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); - - add_wait_queue_priority(wqh, &irqfd->wait); -} - -/* Must be called under irqfds.lock */ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd) { struct kvm_kernel_irq_routing_entry *e; struct kvm_kernel_irq_routing_entry entries[KVM_NR_IRQCHIPS]; int n_entries; =20 + lockdep_assert_held(&kvm->irqfds.lock); + n_entries =3D kvm_irq_map_gsi(kvm, entries, irqfd->gsi); =20 write_seqcount_begin(&irqfd->irq_entry_sc); @@ -286,6 +266,49 @@ static void irqfd_update(struct kvm *kvm, struct kvm_k= ernel_irqfd *irqfd) write_seqcount_end(&irqfd->irq_entry_sc); } =20 +struct kvm_irqfd_pt { + struct kvm_kernel_irqfd *irqfd; + struct kvm *kvm; + poll_table pt; + int ret; +}; + +static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, + poll_table *pt) +{ + struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); + struct kvm_kernel_irqfd *irqfd =3D p->irqfd; + struct kvm_kernel_irqfd *tmp; + struct kvm *kvm =3D p->kvm; + + spin_lock_irq(&kvm->irqfds.lock); + + list_for_each_entry(tmp, &kvm->irqfds.items, list) { + if (irqfd->eventfd !=3D tmp->eventfd) + continue; + /* This fd is used for another irq already. */ + p->ret =3D -EBUSY; + spin_unlock_irq(&kvm->irqfds.lock); + return; + } + + irqfd_update(kvm, irqfd); + + list_add_tail(&irqfd->list, &kvm->irqfds.items); + + spin_unlock_irq(&kvm->irqfds.lock); + + /* + * Add the irqfd as a priority waiter on the eventfd, with a custom + * wake-up handler, so that KVM *and only KVM* is notified whenever the + * underlying eventfd is signaled. + */ + init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + + add_wait_queue_priority(wqh, &irqfd->wait); + p->ret =3D 0; +} + #if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS) void __attribute__((weak)) kvm_arch_irq_bypass_stop( struct irq_bypass_consumer *cons) @@ -315,7 +338,7 @@ bool __attribute__((weak)) kvm_arch_irqfd_route_changed( static int kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) { - struct kvm_kernel_irqfd *irqfd, *tmp; + struct kvm_kernel_irqfd *irqfd; struct eventfd_ctx *eventfd =3D NULL, *resamplefd =3D NULL; struct kvm_irqfd_pt irqfd_pt; int ret; @@ -414,32 +437,22 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *a= rgs) */ idx =3D srcu_read_lock(&kvm->irq_srcu); =20 - spin_lock_irq(&kvm->irqfds.lock); - - ret =3D 0; - list_for_each_entry(tmp, &kvm->irqfds.items, list) { - if (irqfd->eventfd !=3D tmp->eventfd) - continue; - /* This fd is used for another irq already. */ - ret =3D -EBUSY; - goto fail_duplicate; - } - - irqfd_update(kvm, irqfd); - - list_add_tail(&irqfd->list, &kvm->irqfds.items); - - spin_unlock_irq(&kvm->irqfds.lock); - /* - * Register the irqfd with the eventfd by polling on the eventfd. If - * there was en event pending on the eventfd prior to registering, - * manually trigger IRQ injection. + * Register the irqfd with the eventfd by polling on the eventfd, and + * simultaneously and the irqfd to KVM's list. If there was en event + * pending on the eventfd prior to registering, manually trigger IRQ + * injection. */ irqfd_pt.irqfd =3D irqfd; + irqfd_pt.kvm =3D kvm; init_poll_funcptr(&irqfd_pt.pt, kvm_irqfd_register); =20 events =3D vfs_poll(fd_file(f), &irqfd_pt.pt); + + ret =3D irqfd_pt.ret; + if (ret) + goto fail_poll; + if (events & EPOLLIN) schedule_work(&irqfd->inject); =20 @@ -460,8 +473,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *arg= s) srcu_read_unlock(&kvm->irq_srcu, idx); return 0; =20 -fail_duplicate: - spin_unlock_irq(&kvm->irqfds.lock); +fail_poll: srcu_read_unlock(&kvm->irq_srcu, idx); fail: if (irqfd->resampler) --=20 2.49.0.1101.gccaa498523-goog