From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C59152C3762 for ; Thu, 22 May 2025 23:52:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957959; cv=none; b=qMUbnamEH4AZKfSga7DawR4to8AYyS6y2ja32zIgp4ObOmpRE/63bzdQ1SjIMlGYSbP4r2geI8vnZwg88YNo5Qto/GA1QN667ZI4G9pid3ZvRG49cpeWrv0X5pMWvPqjjEUWGwOMq7SLMELinlBsIeDvDrkKLK5jaEvLxal9pwo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957959; c=relaxed/simple; bh=PS24WqnAD3SfQsFV7bXgSmj4OZ+55j2A7PCYOPxM2aA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rSxJvrNnKNznk9JA8YcR8O8E0SZ1CrARAN1wyKPMln0UN5E0YTl6bdiQXeijN1V00+A5INPbZvG34GXmOhQZQIaAYOCGi0GvlZX6dtzkvptA6EKNxRb7IboQfQiLg20beqBbthcrWm+VnpcIola+idJh1hugysf0Ar0Zrw3f5l8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TSfRYypy; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TSfRYypy" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b2c00e965d0so1147828a12.2 for ; Thu, 22 May 2025 16:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957956; x=1748562756; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=fNuJRaVfcGFWhloS77D8uzxF90F27fj3ZGGYCvgzPHs=; b=TSfRYypy7uw6SrGjp96sZasDo9w+4pEVRSewrcLxuT/DArmOT21XtQVGTethyLqN0H nHeXaVqC5FIk9+c810RfPtBd5g0ts9sqo3GtblT8f1MOdVpsKJhLwvKxzpDZ6/OIZDV8 gxFzIiNNDGUCsHLAzvNFVyQG4hiWr7ALY3LTls816ENJrYVf86JC+otFjkgm3aWXgzP4 v1LeHZ93s+QhdtQQmMQN6Dqq9XVz5UMWL2UJpI0+gC2ymD43imlzZz6U1qO9CLjxn7cM oUldgYJ5BIV2K64IqQ4OopXt4u+0llJcsioTUdyN0Wb1g1pDT2rZORwoDoMYt5kIyb2y dqpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957956; x=1748562756; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fNuJRaVfcGFWhloS77D8uzxF90F27fj3ZGGYCvgzPHs=; b=A+mttCEqyNT8UU0UFGaSDt+QAMDzas0+R1i5YtvwiyxMbFpePX7SQ3dcibbctfT2VF 97qgKMi2NCXjzn8SeiBt0LKSGjJfBYv3UnRAJKKmNIq++HLd5dg94/eWZA1WD+aF0l7x lgmhuFNvUS5TJgs6kSg6poiRelbKt85paGGRzBn8zjr4AmhC4Xnt/sq65oejKOUTDUXw Ab8yvpXn9w8rQuY1G/26KI7dhTIqjjiGYEN9GzTlgLJSfx28J8bVjuBvMPWwr3sllAIY yjjf6AutfXEnNPE2ewdrR7VjmFUzKvT9+mHBD4rdVO91qRKMjJQ9qRJEJ60zqqfVcW2c 814A== X-Gm-Message-State: AOJu0YwVTxNNI3kIj40X/GVYRL4Rp4cYDRSd9p2ns2Y5LYtyDyfNif7p Wiqn4OliPYXrdx0rxqwJR98jaOlKpsVRMU0xshsdeULlC+/eltfW+Xjb8INQ+HQCzAR0MFKxijH HtdXvJg== X-Google-Smtp-Source: AGHT+IHgxAzddO6m+7e1Ux4Efuqi1mCB//KSOWWOaHVhE0t5XleZFmJZnVv1KlYKH2ltYJauaNiIFWTKLOM= X-Received: from pjbqo12.prod.google.com ([2002:a17:90b:3dcc:b0:2ea:3a1b:f493]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a50:b0:2fa:157e:c790 with SMTP id 98e67ed59e1d1-30e7d4fe8c3mr35719761a91.5.1747957956077; Thu, 22 May 2025 16:52:36 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:11 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-2-seanjc@google.com> Subject: [PATCH v3 01/13] KVM: Use a local struct to do the initial vfs_poll() on an irqfd From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use a function-local struct for the poll_table passed to vfs_poll(), as nothing in the vfs_poll() callchain grabs a long-term reference to the structure, i.e. its lifetime doesn't need to be tied to the irqfd. Using a local structure will also allow propagating failures out of the polling callback without further polluting kvm_kernel_irqfd. Opportunstically rename irqfd_ptable_queue_proc() to kvm_irqfd_register() to capture what it actually does. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- include/linux/kvm_irqfd.h | 1 - virt/kvm/eventfd.c | 26 +++++++++++++++++--------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h index 8ad43692e3bb..44fd2a20b09e 100644 --- a/include/linux/kvm_irqfd.h +++ b/include/linux/kvm_irqfd.h @@ -55,7 +55,6 @@ struct kvm_kernel_irqfd { /* Used for setup/shutdown */ struct eventfd_ctx *eventfd; struct list_head list; - poll_table pt; struct work_struct shutdown; struct irq_bypass_consumer consumer; struct irq_bypass_producer *producer; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 11e5d1e3f12e..39e42b19d9f7 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -245,12 +245,17 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode,= int sync, void *key) return ret; } =20 -static void -irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, - poll_table *pt) +struct kvm_irqfd_pt { + struct kvm_kernel_irqfd *irqfd; + poll_table pt; +}; + +static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, + poll_table *pt) { - struct kvm_kernel_irqfd *irqfd =3D - container_of(pt, struct kvm_kernel_irqfd, pt); + struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); + struct kvm_kernel_irqfd *irqfd =3D p->irqfd; + add_wait_queue_priority(wqh, &irqfd->wait); } =20 @@ -305,6 +310,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *arg= s) { struct kvm_kernel_irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd =3D NULL, *resamplefd =3D NULL; + struct kvm_irqfd_pt irqfd_pt; int ret; __poll_t events; int idx; @@ -394,7 +400,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *arg= s) * a callback whenever someone signals the underlying eventfd */ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); - init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc); =20 spin_lock_irq(&kvm->irqfds.lock); =20 @@ -416,11 +421,14 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *a= rgs) spin_unlock_irq(&kvm->irqfds.lock); =20 /* - * Check if there was an event already pending on the eventfd - * before we registered, and trigger it as if we didn't miss it. + * Register the irqfd with the eventfd by polling on the eventfd. If + * there was en event pending on the eventfd prior to registering, + * manually trigger IRQ injection. */ - events =3D vfs_poll(fd_file(f), &irqfd->pt); + irqfd_pt.irqfd =3D irqfd; + init_poll_funcptr(&irqfd_pt.pt, kvm_irqfd_register); =20 + events =3D vfs_poll(fd_file(f), &irqfd_pt.pt); if (events & EPOLLIN) schedule_work(&irqfd->inject); =20 --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A5BA2D1F50 for ; Thu, 22 May 2025 23:52:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957960; cv=none; b=VcOMfaR0Zkbf1SDUmyGAV0j1eW1GWwFEfpzrOuAOvvOZuxVm89Am8fgExHvhRoFW/gTCB3cb+dKcd9BfnrX1+dMLVhnWij4aV7OyjIJwXp8+QfykrRaMrOJFbfTuMW9nvYOJbsfPQTOKFR/Eis8pjoXIa8JbV4WyHS1TbPZT9bo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957960; c=relaxed/simple; bh=FDCJRuUBom6tgSithRuLUJxsTxJG1RjcxbgGSLyCb6Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mpTAW4wdXJmlmV1lYXSPNm3P3+eQSY7OExfvBEAn6F2uv/sW1YI6fv7xEFgBdmS8N7c16PMooO2FiFC+8bgptTKpPaRBaFvah0wjdMJ3CY70T06Fs2Mfi7fZIOdDrpWAg1DH7zHL4GVrixxOb2A5Le8d4pnraOorJTPcoJawT5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=FO48EHXM; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FO48EHXM" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ebfc28a1eso8055833a91.2 for ; Thu, 22 May 2025 16:52:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957958; x=1748562758; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=tVNl4A6Kx8naMJGZ1E0eQITkQbAsVFllM1f24lX+kzQ=; b=FO48EHXM3g4ZCLKZEK3LbIF73cPjmm7e7Liu+jQ4qHBcU9RASxWUpo24FrXLalqjZo zNrmjloD79YSVN/3ylTp49dIRs+LCFLNOr2ctT0sR8xG6WC0EHRQE6QrV1yLul/NyJha 3DeoBzeUuezIpkgWc/kCAZMEt6p7C1Hfxg+Om1l7uw+CrpTU+WWizv/SKvJsgXwutFsr 58w7q7qULVR90RZFRLcKzqul4S8Rli0v4wUTm75SfNUqhKgISaRsaXGG+v8K3SD/MBbQ nWH48Pk5iOLXz3t13TqeOXhdd6NCgYNciT2daZq7gOjci9JB+lcAlpuHknarcHQLguu4 wNNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957958; x=1748562758; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tVNl4A6Kx8naMJGZ1E0eQITkQbAsVFllM1f24lX+kzQ=; b=qaV/ULPNRqBNi0t7VqUq91xLA+crycLeGmbABJA5VwG2rUGtSoUKESOikEYkmZh/G3 GSYHbMKbyxKsr9aVm3+rGKvk3pcAMuTLRY5TTn/qJ+P78E6WM+ZiqGc9+xkWrS8UI4zK PaPYa3Y9Kd4qRwEy3IlxC73iVEwfalVKeko5Q3YY9WlPxEyOawCEEowzhvAyW8A4g0ZT VGW603mrKrsEjz8+Eno7kkFpGKLQZvWmOGb72HEptW0Sd6k2p/U3mRWtxYY7xMHljpWY s6O6jCFUz31uHFqBGaJ2gPEDMcM8B6HRTOy8QMtnn1qkHTf7xIMfRtwnFp5grWmpKVb4 4oHA== X-Gm-Message-State: AOJu0YxeNKCt3IsFMYx6PNFT98NfMoWcc/POPppx1jjO0vTcqWall3cr EWIu7+hSDAvimPYwZGpHSh/NGXB0YFXGULoLfQxspCbOwivjsQl92nsmQlI2NbWAbyN6AY3nOPy srheCJA== X-Google-Smtp-Source: AGHT+IE81WM2WL+8EnB/6FCP5IRyDnOFUDXq+HJraLf558ghdMXYLyDo9m5RHDzKNMgflX319IgqoMhIQGU= X-Received: from pjg13.prod.google.com ([2002:a17:90b:3f4d:b0:2ee:4a90:3d06]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:e70f:b0:30a:2173:9f0b with SMTP id 98e67ed59e1d1-30e8322596bmr38028577a91.28.1747957957818; Thu, 22 May 2025 16:52:37 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:12 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-3-seanjc@google.com> Subject: [PATCH v3 02/13] KVM: Acquire SCRU lock outside of irqfds.lock during assignment From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Acquire SRCU outside of irqfds.lock so that the locking is symmetrical, and add a comment explaining why on earth KVM holds SRCU for so long. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- virt/kvm/eventfd.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 39e42b19d9f7..42c02c35e542 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -401,6 +401,18 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *ar= gs) */ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); =20 + /* + * Set the irqfd routing and add it to KVM's list before registering + * the irqfd with the eventfd, so that the routing information is valid + * and stays valid, e.g. if there are GSI routing changes, prior to + * making the irqfd visible, i.e. before it might be signaled. + * + * Note, holding SRCU ensures a stable read of routing information, and + * also prevents irqfd_shutdown() from freeing the irqfd before it's + * fully initialized. + */ + idx =3D srcu_read_lock(&kvm->irq_srcu); + spin_lock_irq(&kvm->irqfds.lock); =20 ret =3D 0; @@ -409,11 +421,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *ar= gs) continue; /* This fd is used for another irq already. */ ret =3D -EBUSY; - spin_unlock_irq(&kvm->irqfds.lock); - goto fail; + goto fail_duplicate; } =20 - idx =3D srcu_read_lock(&kvm->irq_srcu); irqfd_update(kvm, irqfd); =20 list_add_tail(&irqfd->list, &kvm->irqfds.items); @@ -449,6 +459,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *arg= s) srcu_read_unlock(&kvm->irq_srcu, idx); return 0; =20 +fail_duplicate: + spin_unlock_irq(&kvm->irqfds.lock); + srcu_read_unlock(&kvm->irq_srcu, idx); fail: if (irqfd->resampler) irqfd_resampler_shutdown(irqfd); --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F2B282D1F70 for ; Thu, 22 May 2025 23:52:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957961; cv=none; b=UcTuYDa0ryDx9ZhZDFtIgRQXo9uAgGR1f42pT/bf7IkoRSJxft0jhHsn7ha7D8TONerDhLYK7ZepGtsiCyL+Q/LtX9C2DkV+VnlrC+hQCP6RB82TWBNM8dujgvOp1bYjkJJAT3uHymki3gPfssxpoID2dQh9riBzvBJX20ZDDro= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957961; c=relaxed/simple; bh=p0PPeyoesUCt5v+rFyNj8jJ5WSx3eI7L2/HQ4wMbmx8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Py/i61MASxp0eWnWPc3ghcjRR5pTPSFYR3ETiKmuqTqrFOsSO6HW486s2pDQqbP5l7G6GpsVS9sCZBefCCNnsBK5cIRSezA0qdIAWRo2xH+ie0LmPcyuL1DpemHqTGPbqmX9mGOJb2FB+vEa9IA3EQDaVTfr/2Ubub8vCmtvr0A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4MZpCHZ/; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4MZpCHZ/" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b240fdc9c20so8799851a12.3 for ; Thu, 22 May 2025 16:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957959; x=1748562759; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=xAZka5nPnl0zzhBog63136UQG1pDC/RMIYFF12/LKjA=; b=4MZpCHZ/m2GJF3yEIMe5tjLIGx5zWx9E8da5m9TtUsiRfZbbh1iZ9VeXlleZQKBgLe ErRoMouyckaat/Gjl3+C6DoRFMM86OaPErCMvNeAhkGozVWPtx+i3XKto7GbZK56x25f uDDzbA7ansslFuS5fGTA9eQd7NSI/738Wj4yoN0pknBf5H/IdKbLRRvHVSZ1MC8vnlrL CHQHai+FkWbmaV/l0WXjZpdyvEz1UThGepr3RTNDTAhI/4HBeskIILtsUrPNtY4htt9+ PWLv8pRyMZvOGdVQO8jCQlewETJ/s5yjfwokTm5C/1XWMunPIq9oFVh4kXNJ+3NchU0V pZzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957959; x=1748562759; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xAZka5nPnl0zzhBog63136UQG1pDC/RMIYFF12/LKjA=; b=ZJiF+74tRP2mr6oD4LxALM3nx6w2n+TDil9RFU/fnHB5xPZ6Iw39kXJ+t934c/iqiQ BeA858J8nZFRIba6J2VvM8WzohBbmKG0+DLxi73PjIaTnVz/HhcimANo6M5O/E/BpyG2 gYwXxJbNsG42M7H95p5xfze4Cg9GyYYLhs3gmTe523UE7t7ZbpfCpimBoZrbSMChqHQv Ap7TblPEmPcBMmfYDlr2YGW1TmW+MpGzGrMYwaE0dexsS5Xr0h7AgEfO0rTeukS3yKVV XihMO1n9X4L2VRQhBPOqZpgw+b0nxT4g3dozNeLJrd+H5wECOG7qepBK81CYYiD0Blqj piCw== X-Gm-Message-State: AOJu0YzWiNVJgZ59BDUDUrr/0MdP09mr4AAjFtq8EyBLixRd8M4sMe9/ Gj8CgMTPsJv1YM4GvDRc181GlJmVb9MVCXIBApXTquJEszdK2lgpkd8r2RCvDMc0UCPa6g+oNnQ OXfJmrA== X-Google-Smtp-Source: AGHT+IFTlML74x7TQE+fmnLc60J6Zc3gXJFbZA8aAHEiRQLh+dPZA/uq2RIsrb3gSzl5/rJlFLqbanul+U8= X-Received: from pja13.prod.google.com ([2002:a17:90b:548d:b0:2ef:786a:1835]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:51cb:b0:2ee:fa0c:cebc with SMTP id 98e67ed59e1d1-310e96e87e5mr1351003a91.20.1747957959540; Thu, 22 May 2025 16:52:39 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:13 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-4-seanjc@google.com> Subject: [PATCH v3 03/13] KVM: Initialize irqfd waitqueue callback when adding to the queue From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Initialize the irqfd waitqueue callback immediately prior to inserting the irqfd into the eventfd's waitqueue. Pre-initializing the state in a completely different context is all kinds of confusing, and incorrectly suggests that the waitqueue function needs to be initialize prior to vfs_poll(). Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- virt/kvm/eventfd.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 42c02c35e542..8b9a87daa2bb 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -256,6 +256,13 @@ static void kvm_irqfd_register(struct file *file, wait= _queue_head_t *wqh, struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); struct kvm_kernel_irqfd *irqfd =3D p->irqfd; =20 + /* + * Add the irqfd as a priority waiter on the eventfd, with a custom + * wake-up handler, so that KVM *and only KVM* is notified whenever the + * underlying eventfd is signaled. + */ + init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + add_wait_queue_priority(wqh, &irqfd->wait); } =20 @@ -395,12 +402,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *ar= gs) mutex_unlock(&kvm->irqfds.resampler_lock); } =20 - /* - * Install our own custom wake-up handling so we are notified via - * a callback whenever someone signals the underlying eventfd - */ - init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); - /* * Set the irqfd routing and add it to KVM's list before registering * the irqfd with the eventfd, so that the routing information is valid --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9128E2D29C8 for ; Thu, 22 May 2025 23:52:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957963; cv=none; b=pCzunnQMDQ/S+ig3UDtpZz4r7iQ/5nzeGwUAOHGSkm/R+5Ql/zOUER0Z1nKfDz1p/ha5U427SUaagNY5JdwUtNTyxaspEJZeuA1v4AjtF+EioLLOSwfxuwO6lL/qpS8lbtnrsIAE8OfKO9UTPrIbNEpJ4RekvBiaKAbKF5juLks= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957963; c=relaxed/simple; bh=nT3odiErLVUjr4YXPHw8YhTvO26ovL3HqyGLIpyXkAw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=c28EyFJpiK9V8tSER9dhcKNLwmz47TVF0d+DtP/7eoPelRmsNSxq2YMavdlvRgqJKHJVRSVZ3cBdd0RlTE/p6VkbUMB70C+hcCb7UCLmKNR+OOQEjao/okEAo3BCUAMk8PQmGY7MtTfs2zfCXdB077e//2lfp04tisxH3+QW+0A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eovY5jLw; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eovY5jLw" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30e74ee960aso6798036a91.3 for ; Thu, 22 May 2025 16:52:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957961; x=1748562761; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=f7qvTnT5s0mxP5eCWb+NaTk8UKR3m+k6rkKzLbxKznI=; b=eovY5jLwTZBzafMvxLtrWra3lxyOdBwWHD0QRUUea824sSYRnBVpT9a5kgIe1KXYdh SV2En2eE45AJ8YXaT9u4IynsOcNDTScCCYWNJKFxX38gFecfEQyS9m00d4PM2pWW55nb crsLSCcMt/JqPvMfVI8hbp5XbJPGI0kbjrdeO6QCEubTYo7NVW11mRgKH4mOrMndhpre WddrFT/YJOJMMoRsFVGtknJAHTcymBm1dcM3T7PnnbF0Df5k8qSsEqEE1E8JOCaA7OvF CuqriDI8iohzR3/KGKiV1v+gVkABjaq8pQPrb5HSx76xYsOV6GMh5xyiTXFgmwzTWXJe k+Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957961; x=1748562761; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=f7qvTnT5s0mxP5eCWb+NaTk8UKR3m+k6rkKzLbxKznI=; b=V/S5RR4mZVw9PWBmmNVPGU4+PUHKCVpC7Y2aMYJu9TzHZbhipLIfkcJ4K/pe9fRC0j ZezCY4/cUDnEZRGbvKepuMPRTC+jDXcK0I7Szv66B5GuExycS7x1eb1K4bdhkJGkmPGi wwTNFPSI/szGRtLW44ilgqf2YglFknd7p667MHD82D1GHAHGKo5u3czIw3A9UCSQwIMF vVK3iMba9w3AW/hCa/C68TrloJDUuzgOcF7DRS26eE8+cjAmLTzTCyBmJaZTdkzaRHcF AsmHknVQIj5B9WHGkXLoPse0f1egqYLnYJc66A+uIPNDwC8KpQ+tJO8ezn78zzhDWwoF Sy0w== X-Gm-Message-State: AOJu0YwRg7msvAjNP9DCR+8i6VPVds/v4mzgtnTLgnZxKUb685tios4G b8raBdWNnulhuzhKgCpuCiUgvMroiZKmTsPmfU1xAcWhQJoDukt4siS++aFMSr9Buwf2oTe/7KA 4w7J8+w== X-Google-Smtp-Source: AGHT+IFhLIWMr3ct7tle+BqkfXynUt92dE5T1Ga0fMI6fA92WD5PFOynQ9QIYOuFaunMobUQi3Ht7v2BK94= X-Received: from pjbsz13.prod.google.com ([2002:a17:90b:2d4d:b0:30e:7d59:f3a7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:dfc7:b0:301:c5cb:7b13 with SMTP id 98e67ed59e1d1-30e830c6247mr35762810a91.3.1747957961111; Thu, 22 May 2025 16:52:41 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:14 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-5-seanjc@google.com> Subject: [PATCH v3 04/13] KVM: Add irqfd to KVM's list via the vfs_poll() callback From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the irqfd structure to KVM's list of irqfds in kvm_irqfd_register(), i.e. via the vfs_poll() callback. This will allow taking irqfds.lock across the entire registration sequence (add to waitqueue, add to list), and more importantly will allow inserting into KVM's list if and only if adding to the waitqueue succeeds (spoiler alert), without needing to juggle return codes in weird ways. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- virt/kvm/eventfd.c | 102 +++++++++++++++++++++++++-------------------- 1 file changed, 57 insertions(+), 45 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 8b9a87daa2bb..99274d60335d 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -245,34 +245,14 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode,= int sync, void *key) return ret; } =20 -struct kvm_irqfd_pt { - struct kvm_kernel_irqfd *irqfd; - poll_table pt; -}; - -static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, - poll_table *pt) -{ - struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); - struct kvm_kernel_irqfd *irqfd =3D p->irqfd; - - /* - * Add the irqfd as a priority waiter on the eventfd, with a custom - * wake-up handler, so that KVM *and only KVM* is notified whenever the - * underlying eventfd is signaled. - */ - init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); - - add_wait_queue_priority(wqh, &irqfd->wait); -} - -/* Must be called under irqfds.lock */ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd) { struct kvm_kernel_irq_routing_entry *e; struct kvm_kernel_irq_routing_entry entries[KVM_NR_IRQCHIPS]; int n_entries; =20 + lockdep_assert_held(&kvm->irqfds.lock); + n_entries =3D kvm_irq_map_gsi(kvm, entries, irqfd->gsi); =20 write_seqcount_begin(&irqfd->irq_entry_sc); @@ -286,6 +266,49 @@ static void irqfd_update(struct kvm *kvm, struct kvm_k= ernel_irqfd *irqfd) write_seqcount_end(&irqfd->irq_entry_sc); } =20 +struct kvm_irqfd_pt { + struct kvm_kernel_irqfd *irqfd; + struct kvm *kvm; + poll_table pt; + int ret; +}; + +static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, + poll_table *pt) +{ + struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); + struct kvm_kernel_irqfd *irqfd =3D p->irqfd; + struct kvm_kernel_irqfd *tmp; + struct kvm *kvm =3D p->kvm; + + spin_lock_irq(&kvm->irqfds.lock); + + list_for_each_entry(tmp, &kvm->irqfds.items, list) { + if (irqfd->eventfd !=3D tmp->eventfd) + continue; + /* This fd is used for another irq already. */ + p->ret =3D -EBUSY; + spin_unlock_irq(&kvm->irqfds.lock); + return; + } + + irqfd_update(kvm, irqfd); + + list_add_tail(&irqfd->list, &kvm->irqfds.items); + + spin_unlock_irq(&kvm->irqfds.lock); + + /* + * Add the irqfd as a priority waiter on the eventfd, with a custom + * wake-up handler, so that KVM *and only KVM* is notified whenever the + * underlying eventfd is signaled. + */ + init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + + add_wait_queue_priority(wqh, &irqfd->wait); + p->ret =3D 0; +} + #if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS) void __attribute__((weak)) kvm_arch_irq_bypass_stop( struct irq_bypass_consumer *cons) @@ -315,7 +338,7 @@ bool __attribute__((weak)) kvm_arch_irqfd_route_changed( static int kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) { - struct kvm_kernel_irqfd *irqfd, *tmp; + struct kvm_kernel_irqfd *irqfd; struct eventfd_ctx *eventfd =3D NULL, *resamplefd =3D NULL; struct kvm_irqfd_pt irqfd_pt; int ret; @@ -414,32 +437,22 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *a= rgs) */ idx =3D srcu_read_lock(&kvm->irq_srcu); =20 - spin_lock_irq(&kvm->irqfds.lock); - - ret =3D 0; - list_for_each_entry(tmp, &kvm->irqfds.items, list) { - if (irqfd->eventfd !=3D tmp->eventfd) - continue; - /* This fd is used for another irq already. */ - ret =3D -EBUSY; - goto fail_duplicate; - } - - irqfd_update(kvm, irqfd); - - list_add_tail(&irqfd->list, &kvm->irqfds.items); - - spin_unlock_irq(&kvm->irqfds.lock); - /* - * Register the irqfd with the eventfd by polling on the eventfd. If - * there was en event pending on the eventfd prior to registering, - * manually trigger IRQ injection. + * Register the irqfd with the eventfd by polling on the eventfd, and + * simultaneously and the irqfd to KVM's list. If there was en event + * pending on the eventfd prior to registering, manually trigger IRQ + * injection. */ irqfd_pt.irqfd =3D irqfd; + irqfd_pt.kvm =3D kvm; init_poll_funcptr(&irqfd_pt.pt, kvm_irqfd_register); =20 events =3D vfs_poll(fd_file(f), &irqfd_pt.pt); + + ret =3D irqfd_pt.ret; + if (ret) + goto fail_poll; + if (events & EPOLLIN) schedule_work(&irqfd->inject); =20 @@ -460,8 +473,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *arg= s) srcu_read_unlock(&kvm->irq_srcu, idx); return 0; =20 -fail_duplicate: - spin_unlock_irq(&kvm->irqfds.lock); +fail_poll: srcu_read_unlock(&kvm->irq_srcu, idx); fail: if (irqfd->resampler) --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3B062D29A9 for ; Thu, 22 May 2025 23:52:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957965; cv=none; b=uqoPgsmdkG6jR5hv90Cn7U0T9fPR2b2EgWQmXQ1/YFu8TVPRP6CperKVMuKr2PdAcgm8XodQXPgWDxErS3+j20p4JFXb1xyPRpTAH0U38PcvRBJRfUBz8BzE1zwhkJ4YY0TPyWrwJjS0M7gakkXchs1DIfTOqphMlOoiHqgVusM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957965; c=relaxed/simple; bh=JpleTdHHK7C1xQPDlVA5bM4aXnpZtQxgRshM++nWF7s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NtzSvLXLRskL58wcgEVt0UPhTtU2fvYysC/YnYUiMOXfMh37pWI3FvmlhuGStE4unr4lfHQFYuNxF/MNRfxtAlX8uwm0Rx3nmo69awwnF+x7TqkPuv19kxUeGr3hymFCapiuOxcDQMzrCpTTDqef6KmdQdHDFVunEc9FsahlvFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2PcJ5KCd; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2PcJ5KCd" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2321207ff20so85330345ad.0 for ; Thu, 22 May 2025 16:52:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957963; x=1748562763; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=LLDErP7lwjTldw1/pS35D8l5zAYowrsUowCS37TMy6E=; b=2PcJ5KCdGsyjyNx+ja5Vu9TdeI7Bd9WN7xtEXH9ilfxGpUNejh+MWer1qCPFgL+CWm fdG/fkDL4qlAfLM5AX30YWvRBrWnVcvyNE1JPTke+1nvJul3CilDJVR6P5bfVajUGK74 nl2ySuGJ5d0wRwd/NiBNYyBd4Cxt5f6d8i8orhTtTYh87PBBqiBk3Mq2cdGqB0UTDWGO k9kM6JYhOt1OecDHRWU+KdZyEf6tnG52dZDhjp5pavVg4EcjRbshkmeuGF8UD/MXaWLM Cs6oQq3JVQH6QHxd1T6DHlX+rQRYRU23ngawT/1gI/1zWjT29yd7bwqMYYElLmVoAgj0 YQAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957963; x=1748562763; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LLDErP7lwjTldw1/pS35D8l5zAYowrsUowCS37TMy6E=; b=cGNf2vi/ak8pOtnqqsu3t41FWOKBZC7+UwnmMSUkbGK523owfHtQaO/Y+raWu9ie3i zh209XVHSMijPDQ3zGD8xvwr912gMwtdgDOBz9XgdHt429X3G/BoqGZlEU8OhHE/+ZPQ SApGTbFL1QCSBUop/p7DsNFdg9K++3RCwOvcY5rCwtj36ILfDyuWT3xQKQnWoQ8Snf3y nrDhxaT7k+mlxLemkA+8WftnVY91boaddUUcjEC+iKERS3If4ptpMHtfoa2qiduFDdNI w72bu6bntOnfD0MQAOPEbF+Uuzyf7L4+HHnj0k3C4Eyij1d1uqGfn6qBxP05SPmqxnVt kCjg== X-Gm-Message-State: AOJu0YxUefbP8Onf2fe+Iecr03EWV//BnSbwTE1b9pc50k9NcVDiFGLU 4Cx7PV8SDJiN3HrfiOG0plsCLIR8yojVs7qCvhs9gf9vk91faXxUY3QdGXydNVqr60YILs8RzKN z7p+RUg== X-Google-Smtp-Source: AGHT+IHOq6Cu35CTphJBsObc11lhP6kmwx8FMCg+YC2dNqr/wE8KG8ou9486KEEIS+EkPDFQhABlYWoiNSA= X-Received: from plhi16.prod.google.com ([2002:a17:903:2ed0:b0:22e:4a61:5545]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e808:b0:22e:3c2:d477 with SMTP id d9443c01a7336-233f21ae905mr11694945ad.25.1747957962830; Thu, 22 May 2025 16:52:42 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:15 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-6-seanjc@google.com> Subject: [PATCH v3 05/13] KVM: Add irqfd to eventfd's waitqueue while holding irqfds.lock From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add an irqfd to its target eventfd's waitqueue while holding irqfds.lock, which is mildly terrifying but functionally safe. irqfds.lock is taken inside the waitqueue's lock, but if and only if the eventfd is being released, i.e. that path is mutually exclusive with registration as KVM holds a reference to the eventfd (and obviously must do so to avoid UAF). This will allow using the eventfd's waitqueue to enforce KVM's requirement that eventfd is assigned to at most one irqfd, without introducing races. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- virt/kvm/eventfd.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 99274d60335d..04877b297267 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -204,6 +204,11 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, = int sync, void *key) int ret =3D 0; =20 if (flags & EPOLLIN) { + /* + * WARNING: Do NOT take irqfds.lock in any path except EPOLLHUP, + * as KVM holds irqfds.lock when registering the irqfd with the + * eventfd. + */ u64 cnt; eventfd_ctx_do_read(irqfd->eventfd, &cnt); =20 @@ -225,6 +230,11 @@ irqfd_wakeup(wait_queue_entry_t *wait, unsigned mode, = int sync, void *key) /* The eventfd is closing, detach from KVM */ unsigned long iflags; =20 + /* + * Taking irqfds.lock is safe here, as KVM holds a reference to + * the eventfd when registering the irqfd, i.e. this path can't + * be reached while kvm_irqfd_add() is running. + */ spin_lock_irqsave(&kvm->irqfds.lock, iflags); =20 /* @@ -296,16 +306,21 @@ static void kvm_irqfd_register(struct file *file, wai= t_queue_head_t *wqh, =20 list_add_tail(&irqfd->list, &kvm->irqfds.items); =20 - spin_unlock_irq(&kvm->irqfds.lock); - /* * Add the irqfd as a priority waiter on the eventfd, with a custom * wake-up handler, so that KVM *and only KVM* is notified whenever the - * underlying eventfd is signaled. + * underlying eventfd is signaled. Temporarily lie to lockdep about + * holding irqfds.lock to avoid a false positive regarding potential + * deadlock with irqfd_wakeup() (see irqfd_wakeup() for details). */ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); =20 + spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_); add_wait_queue_priority(wqh, &irqfd->wait); + spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_); + + spin_unlock_irq(&kvm->irqfds.lock); + p->ret =3D 0; } =20 --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57C202D29CE for ; Thu, 22 May 2025 23:52:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957967; cv=none; b=E/VIGc6qmsaOrwCyPMgHxokrTKFlitaSOLggshKqGVGJxCi76edKsUkhuUJli8jsjL5ohd7kzOy5aqi9an/Qqzho2qpUwHWVwMF38E/4u+VQpAfoirXiiW9+6m5pwPxr11wXhAl1Cu6caOVomBipIxvf1JE66DW3foj/orzWc2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957967; c=relaxed/simple; bh=Q+vc4pb6WDvcUQF3b0USVqnlp5sa+HZsP4+IXWQxYWs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jiti6TSDcQgfo2R2TUGvDThcelM8youlyMA7XVYPYxT4rkltW0Y1S4Ua9v0/+0+P0GbtocE5FjAQgKI4FCDmjiKhThQMIUhdtBWRuDgWFrQjvos9wUIw/bJvx2zvL5AsXIuWEGQKCFv2fWHksh748Za3ZcKVMR1azrp2v5ZpIHo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NIdvL3sI; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NIdvL3sI" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-310a3196132so1787803a91.3 for ; Thu, 22 May 2025 16:52:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957964; x=1748562764; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=MSc6nvEfNa3Mk9v70O2GwrdpOydgfEuVH7IGLa1xnl8=; b=NIdvL3sIzNl31piD6L0t7pnrTkq0kqdSsSvAWyMNIlQW26/lXgnufAv5RnekSzD+rX GiPV3DaNG0fh8JX6UpsFT/hes+p2huzxysIyH8WLYINcHkw4YmAAiAx/ovs95HQwqPrH LYtCOwXZwXIpfP6qNzGJm8bz56zYIvmeU1yw0eX4qWSuYNQUFyezNRFEllr80y6rr7kK tUBTO2FlTmwBuyR5t6AQJOD7S+k0q1KM565aylXW927yTuhVZtFvYP1i9ELlHloUjeOm L5C8pE5pppGu/dfWSi3BQsFQ9lfNQlO8N8cTrmybJkXrdzcOEsNJipvGnTWWb2Hynhe3 9l6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957964; x=1748562764; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MSc6nvEfNa3Mk9v70O2GwrdpOydgfEuVH7IGLa1xnl8=; b=WoPy28MnBEaqjop1AjzCdxYaG3e1DRVjxYazCcs2wf18kLrUn25Pz/cpxjSjgcMGi6 pQlFOhnT1M0/5B+S655FfZz9Yl4jEbHMVJe7JgbiXrqUl4pGhIwRpMfaEdET2eMyqMiY AFELlEB7kyoq3atE1CCJ06V06sf4qgCYV3HYbikRZlpPYHkZFIe2UJzzC9dI+PBPusrI nu3//8UD1Szf8/6l4VZDKv4Xoeu29zdEXZEZCceox1yZ1Gtphf0ZsI7g969lzrLzg2E1 kVQQ7A5QhC910x0yLV9aSO/tRck8xt7az/H/oyDHDD+g0RnXc7KwrSXtVVx6zQ/XGZJ2 iZ6g== X-Gm-Message-State: AOJu0Yyplna1Lt18EWig1ANRgXnkGZgxFi7pBWapRIVYxeSi4IqfGvGX HKsdececWflBkMkq8mKfY/Nb5FjrLz/pifkeDYC/0TFpshdf+frmITdQvcCG6qz/offBM73FctZ 5Emgj7w== X-Google-Smtp-Source: AGHT+IFtihXA1ixlkcjuYPXGi/3nyMcTJqsYkMUIphzCaA6DcRFEslloczpIHH8OqTcj441vz3MgX+UAWJI= X-Received: from pjb7.prod.google.com ([2002:a17:90b:2f07:b0:2fe:800f:23a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4c02:b0:2fe:80cb:ac05 with SMTP id 98e67ed59e1d1-310e96c946emr1657552a91.9.1747957964380; Thu, 22 May 2025 16:52:44 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:16 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-7-seanjc@google.com> Subject: [PATCH v3 06/13] sched/wait: Drop WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop the setting of WQ_FLAG_EXCLUSIVE from add_wait_queue_priority() and instead have callers manually add the flag prior to adding their structure to the queue. Blindly setting WQ_FLAG_EXCLUSIVE is flawed, as the nature of exclusive, priority waiters means that only the first waiter added will ever receive notifications. Pushing the flawed behavior to callers will allow fixing the problem one hypervisor at a time (KVM added the flawed API, and then KVM's code was copy+pasted nearly verbatim by Xen and Hyper-V), and will also allow for adding an API that provides true exclusivity, i.e. that guarantees at most one priority waiter is in the queue. Opportunistically add a comment in Hyper-V to call out the mess. Xen privcmd's irqfd_wakefup() doesn't actually operate in exclusive mode, i.e. can be "fixed" simply by dropping WQ_FLAG_EXCLUSIVE. And KVM is primed to switch to the aforementioned fully exclusive API, i.e. won't be carrying the flawed code for long. No functional change intended. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- drivers/hv/mshv_eventfd.c | 8 ++++++++ drivers/xen/privcmd.c | 1 + kernel/sched/wait.c | 4 ++-- virt/kvm/eventfd.c | 1 + 4 files changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c index 8dd22be2ca0b..b348928871c2 100644 --- a/drivers/hv/mshv_eventfd.c +++ b/drivers/hv/mshv_eventfd.c @@ -368,6 +368,14 @@ static void mshv_irqfd_queue_proc(struct file *file, w= ait_queue_head_t *wqh, container_of(polltbl, struct mshv_irqfd, irqfd_polltbl); =20 irqfd->irqfd_wqh =3D wqh; + + /* + * TODO: Ensure there isn't already an exclusive, priority waiter, e.g. + * that the irqfd isn't already bound to another partition. Only the + * first exclusive waiter encountered will be notified, and + * add_wait_queue_priority() doesn't enforce exclusivity. + */ + irqfd->irqfd_wait.flags |=3D WQ_FLAG_EXCLUSIVE; add_wait_queue_priority(wqh, &irqfd->irqfd_wait); } =20 diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 13a10f3294a8..c08ec8a7d27c 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -957,6 +957,7 @@ irqfd_poll_func(struct file *file, wait_queue_head_t *w= qh, poll_table *pt) struct privcmd_kernel_irqfd *kirqfd =3D container_of(pt, struct privcmd_kernel_irqfd, pt); =20 + kirqfd->wait.flags |=3D WQ_FLAG_EXCLUSIVE; add_wait_queue_priority(wqh, &kirqfd->wait); } =20 diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 51e38f5f4701..4ab3ab195277 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -40,7 +40,7 @@ void add_wait_queue_priority(struct wait_queue_head *wq_h= ead, struct wait_queue_ { unsigned long flags; =20 - wq_entry->flags |=3D WQ_FLAG_EXCLUSIVE | WQ_FLAG_PRIORITY; + wq_entry->flags |=3D WQ_FLAG_PRIORITY; spin_lock_irqsave(&wq_head->lock, flags); __add_wait_queue(wq_head, wq_entry); spin_unlock_irqrestore(&wq_head->lock, flags); @@ -64,7 +64,7 @@ EXPORT_SYMBOL(remove_wait_queue); * the non-exclusive tasks. Normally, exclusive tasks will be at the end of * the list and any non-exclusive tasks will be woken first. A priority ta= sk * may be at the head of the list, and can consume the event without any o= ther - * tasks being woken. + * tasks being woken if it's also an exclusive task. * * There are circumstances in which we can try to wake a task which has al= ready * started to run but is not in state TASK_RUNNING. try_to_wake_up() retur= ns diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 04877b297267..c7969904637a 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -316,6 +316,7 @@ static void kvm_irqfd_register(struct file *file, wait_= queue_head_t *wqh, init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); =20 spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_); + irqfd->wait.flags |=3D WQ_FLAG_EXCLUSIVE; add_wait_queue_priority(wqh, &irqfd->wait); spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_); =20 --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5E2B2D4B4F for ; Thu, 22 May 2025 23:52:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957969; cv=none; b=GrRyf1sPWh5sOgkMI4tHZsseXlzZbeGQxviQa/nqooPc+JZpZVW1Q2rReirat3l+uEx/FV2zGQisPWnST0UILRG7U9gfy2hsNa7PvCt22rLbkbwCa6MkYeOEdeoxYeXevwdD+Bde1jlF5Zjbf24pNVek7p5fVyk7aT5AYxlqGLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957969; c=relaxed/simple; bh=Xjq+huymn2c342iWWiY1CbwR9+2osfFq6dqedGxHGtA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iNCsRogDR67chEeMU+h5sy4U/EERLxfucjC4O51V1LlF9sWoCfMMRYzpKWpVD22vO4aJHxBLWtavWh0sYQYHNSchmYx7WeFnbjv9qyxFMQLXzGXzVrtAIgOh6iIOc8p2LW+Y/dajY9f5efCq4fero0mNZGZriEzM2W+1UKjZ4bI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fzNquj/f; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fzNquj/f" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-745e89b0c32so1323986b3a.3 for ; Thu, 22 May 2025 16:52:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957966; x=1748562766; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=rhj01gCo8zRE4ihm7mjL7ntGfsdvBwkWydK82rp0ihs=; b=fzNquj/fhOTTgsHJHG8cKybKfCiPTn7g38z+zY2tFCRYgAnLXKxC3j0IqhrLfRNYuE WBZpV22YWTqqJ56zJ3nRI+JPNEtoT8c9aXTnj3PCgfXtveLCp4S/0EEUK05VLme5esFS eofAWTXdlInU2TSjQ1iW/1mCxbhJUM/TIL+ekZ05tcW6TiSiUlUO9wxAHxRFckfg1X8X UbKU50xYHUAK3kumKuyKAcdkrB+4uoAXC1dkEcMrKYV+yuBTcm37ZsjJK1h0MBfg7zDc 3wPVB9cuabrogf0dvueAJEvwOevRneC2LSi2rrRa57GuEETcuxxUT5HzW/HUqW8gIYjz ZYPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957966; x=1748562766; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rhj01gCo8zRE4ihm7mjL7ntGfsdvBwkWydK82rp0ihs=; b=Rg9/6g92YgNRSJak/rOVwlgZuoncmLDlf5dHBWMIwTSS0JxbJ8HsGiQtHzV+YgMXPG LMUcujjv/uP0Mbs5I8kkAOw3qpuXYGfMab9NZs7YmyYJZX2R9gT3N2Q2lbBMnp88DF28 sm+5/YFsGZmSIad3K0BezKglPnBkDVYlQfAG2TGWs6WV9kzVu8hDfF7EOPm9+UiB7ueE Kp9Lvjpdw1LYUkFnfiPGNapvu2BJoOJ5BptLC5fGtLYj55yXewBoNInYcYU8YEDgTSws 7u2QA0HfgbahES4UEpbXItF1Lw5iYASJT7LscMARZ/WeF5HxCZx+PzIYzlE3b4hzuZWm roNA== X-Gm-Message-State: AOJu0Yy+bQWu8HN3aYjy/sT1zL/6OhYma0KREt4BpKcTBn0INy293kte 9WCNNgjm447+r2mImJeckRcODDzsb+4/Q2u6ar0Vi9uodGOalYjgRU7Yt8057tWXIjujr1MZCjP Iun9JIg== X-Google-Smtp-Source: AGHT+IFcXQOzVhyYyYRvvhxCpwMSPB5FWKzf2e+esBNAhvIROXYgtUD8Jyc4OswUVSapVQEWgVgXYmNxl9c= X-Received: from pfbhd3.prod.google.com ([2002:a05:6a00:6583:b0:742:a60b:3336]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:8594:b0:740:9a42:a356 with SMTP id d2e1a72fcca58-742acce36c5mr31613679b3a.11.1747957966116; Thu, 22 May 2025 16:52:46 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:17 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-8-seanjc@google.com> Subject: [PATCH v3 07/13] xen: privcmd: Don't mark eventfd waiter as EXCLUSIVE From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Don't set WQ_FLAG_EXCLUSIVE when adding an irqfd to a wait queue, as irqfd_wakeup() unconditionally returns '0', i.e. doesn't actually operate in exclusive mode. Note, the use of WQ_FLAG_PRIORITY is also dubious, but that's a problem for another day. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- drivers/xen/privcmd.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index c08ec8a7d27c..13a10f3294a8 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -957,7 +957,6 @@ irqfd_poll_func(struct file *file, wait_queue_head_t *w= qh, poll_table *pt) struct privcmd_kernel_irqfd *kirqfd =3D container_of(pt, struct privcmd_kernel_irqfd, pt); =20 - kirqfd->wait.flags |=3D WQ_FLAG_EXCLUSIVE; add_wait_queue_priority(wqh, &kirqfd->wait); } =20 --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CB582D4B71 for ; Thu, 22 May 2025 23:52:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957970; cv=none; b=bS42Me9hbeiZeRuEt0HmvD7X9xej5SUYp5WSazFEW8i0UPwgN0MNMfjyDeBOiv+FvvO8hNlOMVVSF9cFzgQmM993lfqWlh2v8KZKPUcxYj8qCZCrGIiSesc3GNdDOAFEisrlKWUjcmr8/Kj5pvsPi4BVK23A9b8ckqAcCbTfzUA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957970; c=relaxed/simple; bh=10csqsTxzGOkEfQIv9Q5P5aleQ0W+QD5VjZMgcVNvw8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZTEc3x591y2CehsDVO1OMmb24btz68b/TMySSTw08SSBOv+BgeLm3xlXi2PjE4n26KIhy4kxYwbeguihFjvqYJRP8/KPoDz7ww5Co/qoDkcL9enQBLF5CaZYQON5pqRpL7R/gcl8AhRMjf8D1al6+FyBHmh1N4EAgXlel4U3eoU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Xv0ywX4+; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Xv0ywX4+" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b1ffc678adfso5492605a12.0 for ; Thu, 22 May 2025 16:52:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957968; x=1748562768; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=vuHVtgMNklQ3DNdNkMUrgN1tyOO7CrPEj2V6DIP2LQI=; b=Xv0ywX4+2yBdolyOwONRwxrwgk1Td4Dcp8bOX7h3iXl5OkIolDfTvr8hG3Lv8UF6d0 jCi92gYVND4WN3MKrosaVqjpVQUHXszSSuDIqeX1jAWNbIwqvok1tCZsaqZKx/0XLzXG gwIkW3BihX4TNM7Kl1kB/y9sB6PIIz1V/QecVghPXynAS5SXcETPd4y+kQDkM9qDHqrw 4NjxFAtQ4YsQIgLXVqB19hulSB6slqSM6I76e4h3hSMEEBE//G/un7+dCoUjYZMXIcfC fAzdrtBbk+1tHj3b3X01bu0uHyGA4fblWROStIYUMXWZvaJa7+g7aRWGD7ddbEsIkLxJ 3CHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957968; x=1748562768; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vuHVtgMNklQ3DNdNkMUrgN1tyOO7CrPEj2V6DIP2LQI=; b=DwVonOg46GOR2RHccVhe0m3WrL9ocDU5odn7VG7NfUZ/Dp0khI+OG/6VVLCyPkmQk/ 3+P5/8pHN7FCH0WCsBSRiaWtb4ih0rF/jZoe8W1SevA26oPieJJH69VXoJIe9j7t7M3S Wylc9Ua5Flufa3DznXr/OImqzFX5uIX0gQ4tjVRb/+AGkNOP/LOto/dflGDG8A3D8Ht7 UUNxjnfTOLLkwhAi2TU8bZrkiOyog0xyHU+lbcl1lNEV6UKbdE5WyNamzh+NLos5mW8j njh+KoO9HnoaZkxVVXQSBNuJumP2KUhup8jSRCbNq6fQd0NLH79BBiMN7F2B1xLH+KYU Pn/Q== X-Gm-Message-State: AOJu0YwC2dIf+/90aDtjrQAnSBYVpM9uKnfXTQfVIqGE/mSL1aNwsyX6 KdgVrUR/8d4Q4iHFI4IsksJFSYGFTiNO6eLBOJ1i9C3+6365pApZo6uM4MpBFlkXkMbnGV7hpXD qJTczjQ== X-Google-Smtp-Source: AGHT+IHikqs98z49j0Kkg2uX1c0LCZJ4TnpoZgPhfD1i2QFTska6FVqOIcx30KNoejfw/K9L53iBDs4oCi4= X-Received: from pga7.prod.google.com ([2002:a05:6a02:4f87:b0:b26:e751:bb70]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:faf:b0:223:62f5:fd44 with SMTP id d9443c01a7336-231d459a467mr365010445ad.40.1747957967798; Thu, 22 May 2025 16:52:47 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:18 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-9-seanjc@google.com> Subject: [PATCH v3 08/13] sched/wait: Add a waitqueue helper for fully exclusive priority waiters From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a waitqueue helper to add a priority waiter that requires exclusive wakeups, i.e. that requires that it be the _only_ priority waiter. The API will be used by KVM to ensure that at most one of KVM's irqfds is bound to a single eventfd (across the entire kernel). Open code the helper instead of using __add_wait_queue() so that the common path doesn't need to "handle" impossible failures. Cc: K Prateek Nayak Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Reviewed-by: K Prateek Nayak Tested-by: K Prateek Nayak --- include/linux/wait.h | 2 ++ kernel/sched/wait.c | 18 ++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/linux/wait.h b/include/linux/wait.h index 965a19809c7e..09855d819418 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -164,6 +164,8 @@ static inline bool wq_has_sleeper(struct wait_queue_hea= d *wq_head) extern void add_wait_queue(struct wait_queue_head *wq_head, struct wait_qu= eue_entry *wq_entry); extern void add_wait_queue_exclusive(struct wait_queue_head *wq_head, stru= ct wait_queue_entry *wq_entry); extern void add_wait_queue_priority(struct wait_queue_head *wq_head, struc= t wait_queue_entry *wq_entry); +extern int add_wait_queue_priority_exclusive(struct wait_queue_head *wq_he= ad, + struct wait_queue_entry *wq_entry); extern void remove_wait_queue(struct wait_queue_head *wq_head, struct wait= _queue_entry *wq_entry); =20 static inline void __add_wait_queue(struct wait_queue_head *wq_head, struc= t wait_queue_entry *wq_entry) diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 4ab3ab195277..15632c89c4f2 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -47,6 +47,24 @@ void add_wait_queue_priority(struct wait_queue_head *wq_= head, struct wait_queue_ } EXPORT_SYMBOL_GPL(add_wait_queue_priority); =20 +int add_wait_queue_priority_exclusive(struct wait_queue_head *wq_head, + struct wait_queue_entry *wq_entry) +{ + struct list_head *head =3D &wq_head->head; + + wq_entry->flags |=3D WQ_FLAG_EXCLUSIVE | WQ_FLAG_PRIORITY; + + guard(spinlock_irqsave)(&wq_head->lock); + + if (!list_empty(head) && + (list_first_entry(head, typeof(*wq_entry), entry)->flags & WQ_FLAG_PR= IORITY)) + return -EBUSY; + + list_add(&wq_entry->entry, head); + return 0; +} +EXPORT_SYMBOL_GPL(add_wait_queue_priority_exclusive); + void remove_wait_queue(struct wait_queue_head *wq_head, struct wait_queue_= entry *wq_entry) { unsigned long flags; --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 544862DFA24 for ; Thu, 22 May 2025 23:52:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957973; cv=none; b=h4GrXAk0wPN1H6oCAJ/RJQiq+J2iO/tAQDfOopBgvf1SfXD4/rx/fBFiNQ0jCiNCf6EDXh9vvAosnO/V9zZp0uWmBRKp7w2hte1gyBhiONqPGXLrST8QrFMDcE8hSGCA1SapF2kG5DLaIkNUCct0HHd+zCejHZGVCKrsWsP73L4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957973; c=relaxed/simple; bh=Ob2er87zWf/jP9Pn9aYWe5PdITMURa6xwa9Ym10iWu4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=K2x+Ue5ypAX49vFiTiRqw/zOW4RUYCDr3CdQ5lQBqyUkHbNPYKzkQM1jN6UqbL2j8kjj/dgtNkYsfMNe2XRmjqfL2wk6CNE4Lpj2X7gsjVWip0LfYWDy7Jtlgho8ySLndLybigTwmAZ0pqWVUNBXEsrUKxqJMAyPPMZ5L7cUp8M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZI0kWp6k; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZI0kWp6k" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-742d077bdfaso6775832b3a.2 for ; Thu, 22 May 2025 16:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957969; x=1748562769; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=m999gU/AqGpWG+L7ciDbK5wL4BJk8SCse59imt7h3LE=; b=ZI0kWp6kAer4RjmE1OrpaVExnz0g73Zi0udLol8A9fCI8B21xamW/dtlQZXCohL4j5 q6YqhmcBw6NA7OgxpCB6jPvWyzhiGtQx3zau6W0+kLmLDgc0MZ99V3gA0DzqzbSLf3le Ni9qDXv15vgzKwoyVl+SkFxhRWRisWtuD5p9nZ8Mc66GfSKkdPurJGOJdybXC0XODNxi qI/ny5QEJ21uEw+GjlliqQSVUmQoXZvPTwzDnXSOPi4l+1xXXYpvETegrm3TBCHKSCYi 3fd24yi8lpCv61zMXaltp7yxfciSQfbLxiL+NRfUmlskC0JM2lLF7qIajYPBOnr3l06D LNZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957969; x=1748562769; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=m999gU/AqGpWG+L7ciDbK5wL4BJk8SCse59imt7h3LE=; b=a5Rs9jlEG/5JirvX0RoKed2ZVBB2+adrIeVivUBsS+QiRmicWc8jTAkvof7CPHwVCM bbgqbZffVG/bJ5yAS+HluSamd1Zhh30QtV4JFmwFiT/gYLR2kLec+7AYCai9Z6JClQ4D CrrwFQtAVjDn3E9gIh0tLOYPnj46b8dphSAg+TzHrc2tMB56FdPTbtF3eSnvDcNagqVp kv3pHvsH2UEASZC/ZU3wY9ux/DPKw/B2J7l0NSMa5KjQy9j3BYHtZoX/TvvhnB+poNlx +s9EFp0rO9dksoCvLRoZ4WKvpCw23N5OmxGA45jV7S6gaAKRplV+On8l5dqeL1Mj17FF ALHw== X-Gm-Message-State: AOJu0Yy/CcNrCNAxTssOCg8QHUx6MZxvtyhQ/4138i4DqmYczXn+0VoR ZrtxY8b/2biFZ71f0RDasM9SbXsjjS8jRKwlqCB7Qgf7VsVdQS9YnM1ggiUp7UCvyiEvtVIHdz0 HrZYQOQ== X-Google-Smtp-Source: AGHT+IGGw5zwjY0zFLgdg4fHQ7nOB9U/xbrTeYpuWgtfAceUKGlpNANWH4gnTKB5jligEj9ALMrYFDn6jAg= X-Received: from pfjd1.prod.google.com ([2002:a05:6a00:2441:b0:730:743a:f2b0]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3c82:b0:742:b3a6:db16 with SMTP id d2e1a72fcca58-745ed90b8e2mr1286378b3a.20.1747957969447; Thu, 22 May 2025 16:52:49 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:19 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-10-seanjc@google.com> Subject: [PATCH v3 09/13] KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Disallow binding an irqfd to an eventfd that already has a priority waiter, i.e. to an eventfd that already has an attached irqfd. KVM always operates in exclusive mode for EPOLL_IN (unconditionally returns '1'), i.e. only the first waiter will be notified. KVM already disallows binding multiple irqfds to an eventfd in a single VM, but doesn't guard against multiple VMs binding to an eventfd. Adding the extra protection reduces the pain of a userspace VMM bug, e.g. if userspace fails to de-assign before re-assigning when transferring state for intra-host migration, then the migration will explicitly fail as opposed to dropping IRQs on the destination VM. Temporarily keep KVM's manual check on irqfds.items, but add a WARN, e.g. to allow sanity checking the waitqueue enforcement. Cc: Oliver Upton Cc: David Matlack Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- virt/kvm/eventfd.c | 55 +++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 18 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c7969904637a..7b2e1f858f6d 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -291,38 +291,57 @@ static void kvm_irqfd_register(struct file *file, wai= t_queue_head_t *wqh, struct kvm_kernel_irqfd *tmp; struct kvm *kvm =3D p->kvm; =20 + /* + * Note, irqfds.lock protects the irqfd's irq_entry, i.e. its routing, + * and irqfds.items. It does NOT protect registering with the eventfd. + */ spin_lock_irq(&kvm->irqfds.lock); =20 - list_for_each_entry(tmp, &kvm->irqfds.items, list) { - if (irqfd->eventfd !=3D tmp->eventfd) - continue; - /* This fd is used for another irq already. */ - p->ret =3D -EBUSY; - spin_unlock_irq(&kvm->irqfds.lock); - return; - } - + /* + * Initialize the routing information prior to adding the irqfd to the + * eventfd's waitqueue, as irqfd_wakeup() can be invoked as soon as the + * irqfd is registered. + */ irqfd_update(kvm, irqfd); =20 - list_add_tail(&irqfd->list, &kvm->irqfds.items); - /* * Add the irqfd as a priority waiter on the eventfd, with a custom * wake-up handler, so that KVM *and only KVM* is notified whenever the - * underlying eventfd is signaled. Temporarily lie to lockdep about - * holding irqfds.lock to avoid a false positive regarding potential - * deadlock with irqfd_wakeup() (see irqfd_wakeup() for details). + * underlying eventfd is signaled. */ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); =20 + /* + * Temporarily lie to lockdep about holding irqfds.lock to avoid a + * false positive regarding potential deadlock with irqfd_wakeup() + * (see irqfd_wakeup() for details). + * + * Adding to the wait queue will fail if there is already a priority + * waiter, i.e. if the eventfd is associated with another irqfd (in any + * VM). Note, kvm_irqfd_deassign() waits for all in-flight shutdown + * jobs to complete, i.e. ensures the irqfd has been removed from the + * eventfd's waitqueue before returning to userspace. + */ spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_); - irqfd->wait.flags |=3D WQ_FLAG_EXCLUSIVE; - add_wait_queue_priority(wqh, &irqfd->wait); + p->ret =3D add_wait_queue_priority_exclusive(wqh, &irqfd->wait); spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_); + if (p->ret) + goto out; =20 + list_for_each_entry(tmp, &kvm->irqfds.items, list) { + if (irqfd->eventfd !=3D tmp->eventfd) + continue; + + WARN_ON_ONCE(1); + /* This fd is used for another irq already. */ + p->ret =3D -EBUSY; + goto out; + } + + list_add_tail(&irqfd->list, &kvm->irqfds.items); + +out: spin_unlock_irq(&kvm->irqfds.lock); - - p->ret =3D 0; } =20 #if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS) --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A5A12DFA28 for ; Thu, 22 May 2025 23:52:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957973; cv=none; b=LNesjEIaZ5LwE1+Zo0OxbrHaw3PU6nYuhGQ+3h8Yqoad4Cs8xPweo/uWhyOZahk068wTK+fLXhtegKS+KcTtYnOpUNkU9Zd+uii+H8Vd8U+wVp8NObpLc03OzNDSC+iaY4wxY7LvT8buAk4Y0p2CT7eyOwd6Pi2Xd2tVFLuIbWg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957973; c=relaxed/simple; bh=P37+G4c2/Xetu+3X+HtrukOmywQ+BBmCzZW3DWLRSak=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uPuKHxPaPvt6ai2Y3jN4RlP5gP9s8DnZCNVOlvUckzdxcD0QWSmhI0844cUzp/Y0+LJDz3TPFTyB/0duymndqKpFBmbhzGcTGU1ycAzX7TwrqvDu5du0v4t9V+T/s3eQGi4ltt61C8Vws9JHohK9jVlrvtHKBsvMK+dbLuKFvtk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1Sekf+m9; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1Sekf+m9" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ec5cc994eso4775837a91.2 for ; Thu, 22 May 2025 16:52:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957971; x=1748562771; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=pJUovYaxFJgcXRp1TDFKS/NrFefgEp1AkGzEV1OM9Zc=; b=1Sekf+m9GkXG15cXDUMyCMD51s3BnD9WoIxFeWw5vLuRptVdU/uXyPjCz/SCAEqHHN 0AzLN1TH4qZiY4z8w5yjG32rRgpSPndz5EPnnLOQ1TMe9NRwfWSvruPPZyzVnDaTh/q5 vvvStG7IcHJAOS9zC5bU6x/90qdqPm0lgTrZhvEtK0SYsYXW3OHIvIxtsxeo+O+hwIFE rAL6r5r1par1TFEweY4yfZFor1c/oTqCCEiDoOC8+XAOiIBrl2KR9RRJExE+ABH/L5S+ +euBQR6f6p25nUNARjqefdJRJ090BsZLawko2yPLN/g8yFgVL81ok6c6HQZNOppDfGrb GQ9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957971; x=1748562771; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pJUovYaxFJgcXRp1TDFKS/NrFefgEp1AkGzEV1OM9Zc=; b=dl6tkI8z5xacxXh0tUPr1JEFjB3DSuw8uOZ/3mOQUiZTf/KSqf/KhQgNNu4Btbpr3u 2mN5aWEpeQJCoFlTWl77TvU9XY2gvCb7HlKMVIV9fBcgIWp4xgE06RtNEFVs7/IUs+RR PUjFyUV90r6Jte1vCEAWTAEcJkbf8AwaVktHPpyhx5scw9lz4H02DCOp/XGxIMsu+LD3 OljOq3E6DcbClp6iC+UWUp7iO9geNWgB+U8vxHsGifZTbNdwPBECUqTTS/POM6laSFSL cYxIPeIAHX6iGKcDob1eWQ/iHcQQdeYoL+DcbMtq9vAvT2Zu8Es6e3LxEXj5pagPcXBB DStA== X-Gm-Message-State: AOJu0Yylu1xJ491omn+j6g32gn3FKlr41eVAB7ORXY4jsQnz4MBb5GPG JDxuZM7TyzRT0lOilU8dRWYUazZbJYNTwMxB8Et5u2p+aG1OAoe4fJoYWCrtyD5K2biGVuKlfyq lbrnTNQ== X-Google-Smtp-Source: AGHT+IEDndqxTWcCtq5oGwy52YECfaW+Ize1PqvGwCK7FSF7/oiShVYNO6fCIsKX/sBpVCvd/V18cyz4IiY= X-Received: from pjx8.prod.google.com ([2002:a17:90b:5688:b0:30a:31eb:ec8e]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:33c2:b0:2ee:edae:780 with SMTP id 98e67ed59e1d1-30e7d548c90mr43873310a91.15.1747957970976; Thu, 22 May 2025 16:52:50 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:20 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-11-seanjc@google.com> Subject: [PATCH v3 10/13] KVM: Drop sanity check that per-VM list of irqfds is unique From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that the eventfd's waitqueue ensures it has at most one priority waiter, i.e. prevents KVM from binding multiple irqfds to one eventfd, drop KVM's sanity check that eventfds are unique for a single VM. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- virt/kvm/eventfd.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7b2e1f858f6d..d5258fd16033 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -288,7 +288,6 @@ static void kvm_irqfd_register(struct file *file, wait_= queue_head_t *wqh, { struct kvm_irqfd_pt *p =3D container_of(pt, struct kvm_irqfd_pt, pt); struct kvm_kernel_irqfd *irqfd =3D p->irqfd; - struct kvm_kernel_irqfd *tmp; struct kvm *kvm =3D p->kvm; =20 /* @@ -328,16 +327,6 @@ static void kvm_irqfd_register(struct file *file, wait= _queue_head_t *wqh, if (p->ret) goto out; =20 - list_for_each_entry(tmp, &kvm->irqfds.items, list) { - if (irqfd->eventfd !=3D tmp->eventfd) - continue; - - WARN_ON_ONCE(1); - /* This fd is used for another irq already. */ - p->ret =3D -EBUSY; - goto out; - } - list_add_tail(&irqfd->list, &kvm->irqfds.items); =20 out: --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B4782DFA4C for ; Thu, 22 May 2025 23:52:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957974; cv=none; b=bQ6tVtTgQWq81D33eCEu71j6M8AVay/LovHCYXmxfShVd8fj4QNCZY+lONTGqGxGvGwJoUPARMyfrSopqg/3lgISipzI1GuTWwC0cRpd6CwxcXtooHGHLPSvlg7WK165gQM/XIlBkuLkBxH+OcQ8yWQwIREgbC2mcO3zFEX7kCY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957974; c=relaxed/simple; bh=gmLT2D1WB9vmr8rFdHxMOpHNwXh7MFDJ6xUGtYQhimw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NLtsXv0REG5EK/fxTpLx5fkgkUyu1ZKSM6u/orLJ8/XyuWaM9lvE1vbqRQ9sE+c0vUoCuSAGJPph3nCyro7TBoeud9Ol7C+IPXlavVktsbOaAUwfMDi3lkFGF/D0u9ZZApfFoHP/oMwz55kOCRPcXk9U+nOjZP6K2r1n5MfZLQA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Gm251MBb; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Gm251MBb" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b26db9af463so7984429a12.2 for ; Thu, 22 May 2025 16:52:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957972; x=1748562772; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=jjMNjs+bdNtZTb3Wdy6sSLCs56ksmcfEkIHqJR2gBFc=; b=Gm251MBbxSTPwRH8ghuckno+ronDdEVl9sRm7VO9nmeiheBfzwBds6nbGRsRwcK7yd waCOi+bmfnP5ESrxPGUNt/ILIpm4yL8O/zCKI88eggpbSfA+oP4+ormkJhme3Do7huYt w51fnUMn7VRahjSMoyFfxbtlrdSysOJjoqXf4aaAjGu9ZuhkV9ClZw4ax+NqABswOwqX FC/W5QndXARcH0lEWopnRD0yHOPHaBneQcYSyuM6VFS04Nw7i9zqLXZVwa7QO2LurjDU 1d+4C3XWtG+oy3HZFOV0l+GEhj9eyi1UvZUt7mIebsMmWRcqXumfC2Kt6/SF79hAPmrr l0UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957972; x=1748562772; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jjMNjs+bdNtZTb3Wdy6sSLCs56ksmcfEkIHqJR2gBFc=; b=CEYWkdOSORCfxub7njiRDU3/SARJ1MVqlbbU0gxBrUxvONv0g9ormPZDwqgAP/pBw1 9H5UDcFwqi5L8xnTENBH6niDDfU1vKKY3Y+UJcU63gSd7PSiVz1wwY4aLRlpLpU9KrJl CUX3wV+VasLykZjVv5dy6NTKlZKAOE015lig2RZAvRkhIIkGmOsmbcFIMFMGVHzqWe7V N0lqHiiAfn4sdbKT0GYMmNLRkt9UeeG2Vhn7rTYfTH4r2dufm2ImvbLghLNiJmizP2Py PfkwSmc+swZ+Ui/nvu49HQ0qMsxiDPc47jifg1pgI50WUtkC6O/JMAuuL0IcHTjP+vIV 2pzw== X-Gm-Message-State: AOJu0YxraljQG73BEyuqObjjfQoAy6RcnIAUFhr5SY1Bm+aZHoMudC0V oS8IgE9Rf0unRFQI31r7Pn4xq8raH81sgcqk789/QyHjsd8Y5a1SfrxQ2ORUI2zepgRkPaGUQpt tYWkTcA== X-Google-Smtp-Source: AGHT+IH+y/4Vqk1KMF3LjOCFUbppIPh5ZUErAvUoiEDW2SE/2hXGiqyTLKxlZ9Xh8ZuSjxO0WF45GST49oM= X-Received: from pjbdy5.prod.google.com ([2002:a17:90b:6c5:b0:2fc:1356:bcc3]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:510f:b0:2ff:58e1:2bb1 with SMTP id 98e67ed59e1d1-310e973e510mr1311217a91.32.1747957972660; Thu, 22 May 2025 16:52:52 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:21 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-12-seanjc@google.com> Subject: [PATCH v3 11/13] KVM: selftests: Assert that eventfd() succeeds in Xen shinfo test From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Assert that eventfd() succeeds in the Xen shinfo test instead of skipping the associated testcase. While eventfd() is outside the scope of KVM, KVM unconditionally selects EVENTFD, i.e. the syscall should always succeed. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- tools/testing/selftests/kvm/x86/xen_shinfo_test.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/kvm/x86/xen_shinfo_test.c b/tools/test= ing/selftests/kvm/x86/xen_shinfo_test.c index 287829f850f7..34d180cf4eed 100644 --- a/tools/testing/selftests/kvm/x86/xen_shinfo_test.c +++ b/tools/testing/selftests/kvm/x86/xen_shinfo_test.c @@ -548,14 +548,11 @@ int main(int argc, char *argv[]) =20 if (do_eventfd_tests) { irq_fd[0] =3D eventfd(0, 0); + TEST_ASSERT(irq_fd[0] >=3D 0, __KVM_SYSCALL_ERROR("eventfd()", irq_fd[0]= )); + irq_fd[1] =3D eventfd(0, 0); + TEST_ASSERT(irq_fd[1] >=3D 0, __KVM_SYSCALL_ERROR("eventfd()", irq_fd[1]= )); =20 - /* Unexpected, but not a KVM failure */ - if (irq_fd[0] =3D=3D -1 || irq_fd[1] =3D=3D -1) - do_evtchn_tests =3D do_eventfd_tests =3D false; - } - - if (do_eventfd_tests) { irq_routes.info.nr =3D 2; =20 irq_routes.entries[0].gsi =3D 32; --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A1932E3375 for ; Thu, 22 May 2025 23:52:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957976; cv=none; b=oHH+XW8FhGPRMgO40dUBP1zHqzlyK52U7WqwwA7O48C6UpICfq9p0kvzkDyFcqFul5cQgIrMlNAutgzeWhgxBAdIHwvDnS6OeAu9Gej4Eetobqr3j0IWc07GdkPtzt2q6iZ4U85grFvdCwD+QoZXmD+x1mnD2NqtH1mKRUUuvsA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957976; c=relaxed/simple; bh=gLdwd1sbkJEcZs1IEVaRfWUXkNSzfXvHtgjA2aYSHac=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oVOvHNTIM4MTPk12vFJr2XnnOABRsj15nah5vwZbDPuzK2lMDJ4+2CbQghDC/1Cx8GeZKFVmSDAinxXe5OLCZzkxjz00Kvr8WG3igA2IDB13vv1B2QwKe31MVqERNwzmZZk55YQorLDBJ62Y+NDUU14VykJdEsvHmrHdnvPdvDQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AutYh9t4; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AutYh9t4" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-742aa6581caso6155933b3a.3 for ; Thu, 22 May 2025 16:52:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957974; x=1748562774; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=EKiGsGdj01NsitpTuPJ3Ew8X7lvRxIwEPSz2u8YI9Cs=; b=AutYh9t4YSwwTwf4IFTDPN2sw7vFZEs6x737T8za8dr+b7a6aD5NsOH9YcMyE8hACk eG1LobvQhOIphwEJSfR/hNI6GM1LFwtjsiY9CwNKn11DeIIca/pljzAAKEOZWOSsubJO hCNmzkpitI4LI1mQAIpIZmwG4xyj0sRlOPdbROlaFQtnwXfw8utMVnoeEkUE6SZf6bRC +UAolP2OFJu619tQSF6f6wzFS+GclpLKwWPigaAYwuuxfWYOL3JE4H4HOtjNQbJEF8Cu 3FsPkfU8HTYkw9ZQs1Ya66zELwozD89JniaIYsE3TQWEiAAxI38dbA7GOOTqRpGjLKio NwtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957974; x=1748562774; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EKiGsGdj01NsitpTuPJ3Ew8X7lvRxIwEPSz2u8YI9Cs=; b=TSVHAu2I+OraLfqydTr200NVImt9fa6dcreCCpMEmpH+CdZN2EYgj+biLPmxc4fwca 6nyEF73VVGy9z8ZK5GoT2zefERry2fKePSbq6EJ6Jd4lfyOQiCMh8XNBLnQPP729gUbB Hy+xZKd0Ewa5LUa/0OfN8G8UU/5XWE1u/flGnoijK2hZr4/gN30z/ZDC5ig0/aRcLj0r zL5cRkAt4rO+GmOiMZnkwBKPEwJidIT/ZhhHA5EoyoBOq49HGftv0HhWH6BSo77pFNYr bFq8x/1NmpSILOMGKrwIvOeGB4Svewbo+NBJtV1Q5HN1qyc7p0CGg2NZ3KvnZu4QItfa gDpg== X-Gm-Message-State: AOJu0YwlUHVJJ2EYdJ3TKfjgjepwwvH3Mx5K2rhaJ+HwyVC5hpwpdOcd 0YViJih4GMovdnwZ6YwEdc+dn9emFNa0vSuSVHJ9T1CpYgu6gTrkUzgnmYZy0LkTeGd7z454zxa F7C3fOA== X-Google-Smtp-Source: AGHT+IFfHuwPoraxH7fHOxcJKjCs5i19crB1g3gqYt+qVLkDFB/zR2OG4BEQkHLANHuLxCZWJd/rEXtT30s= X-Received: from pga16.prod.google.com ([2002:a05:6a02:4f90:b0:af2:3385:de87]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:d527:b0:206:aa42:8e7c with SMTP id adf61e73a8af0-2170cc715a0mr39501029637.18.1747957974292; Thu, 22 May 2025 16:52:54 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:22 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-13-seanjc@google.com> Subject: [PATCH v3 12/13] KVM: selftests: Add utilities to create eventfds and do KVM_IRQFD From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add helpers to create eventfds and to (de)assign eventfds via KVM_IRQFD. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- tools/testing/selftests/kvm/arm64/vgic_irq.c | 12 ++---- .../testing/selftests/kvm/include/kvm_util.h | 40 +++++++++++++++++++ .../selftests/kvm/x86/xen_shinfo_test.c | 18 ++------- 3 files changed, 47 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/kvm/arm64/vgic_irq.c b/tools/testing/s= elftests/kvm/arm64/vgic_irq.c index f4ac28d53747..a09dd423c2d7 100644 --- a/tools/testing/selftests/kvm/arm64/vgic_irq.c +++ b/tools/testing/selftests/kvm/arm64/vgic_irq.c @@ -620,18 +620,12 @@ static void kvm_routing_and_irqfd_check(struct kvm_vm= *vm, * that no actual interrupt was injected for those cases. */ =20 - for (f =3D 0, i =3D intid; i < (uint64_t)intid + num; i++, f++) { - fd[f] =3D eventfd(0, 0); - TEST_ASSERT(fd[f] !=3D -1, __KVM_SYSCALL_ERROR("eventfd()", fd[f])); - } + for (f =3D 0, i =3D intid; i < (uint64_t)intid + num; i++, f++) + fd[f] =3D kvm_new_eventfd(); =20 for (f =3D 0, i =3D intid; i < (uint64_t)intid + num; i++, f++) { - struct kvm_irqfd irqfd =3D { - .fd =3D fd[f], - .gsi =3D i - MIN_SPI, - }; assert(i <=3D (uint64_t)UINT_MAX); - vm_ioctl(vm, KVM_IRQFD, &irqfd); + kvm_assign_irqfd(vm, i - MIN_SPI, fd[f]); } =20 for (f =3D 0, i =3D intid; i < (uint64_t)intid + num; i++, f++) { diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing= /selftests/kvm/include/kvm_util.h index 373912464fb4..4f7bf8f000bb 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -18,6 +18,7 @@ #include #include =20 +#include #include =20 #include "kvm_util_arch.h" @@ -496,6 +497,45 @@ static inline int vm_get_stats_fd(struct kvm_vm *vm) return fd; } =20 +static inline int __kvm_irqfd(struct kvm_vm *vm, uint32_t gsi, int eventfd, + uint32_t flags) +{ + struct kvm_irqfd irqfd =3D { + .fd =3D eventfd, + .gsi =3D gsi, + .flags =3D flags, + .resamplefd =3D -1, + }; + + return __vm_ioctl(vm, KVM_IRQFD, &irqfd); +} + +static inline void kvm_irqfd(struct kvm_vm *vm, uint32_t gsi, int eventfd, + uint32_t flags) +{ + int ret =3D __kvm_irqfd(vm, gsi, eventfd, flags); + + TEST_ASSERT_VM_VCPU_IOCTL(!ret, KVM_IRQFD, ret, vm); +} + +static inline void kvm_assign_irqfd(struct kvm_vm *vm, uint32_t gsi, int e= ventfd) +{ + kvm_irqfd(vm, gsi, eventfd, 0); +} + +static inline void kvm_deassign_irqfd(struct kvm_vm *vm, uint32_t gsi, int= eventfd) +{ + kvm_irqfd(vm, gsi, eventfd, KVM_IRQFD_FLAG_DEASSIGN); +} + +static inline int kvm_new_eventfd(void) +{ + int fd =3D eventfd(0, 0); + + TEST_ASSERT(fd >=3D 0, __KVM_SYSCALL_ERROR("eventfd()", fd)); + return fd; +} + static inline void read_stats_header(int stats_fd, struct kvm_stats_header= *header) { ssize_t ret; diff --git a/tools/testing/selftests/kvm/x86/xen_shinfo_test.c b/tools/test= ing/selftests/kvm/x86/xen_shinfo_test.c index 34d180cf4eed..23909b501ac2 100644 --- a/tools/testing/selftests/kvm/x86/xen_shinfo_test.c +++ b/tools/testing/selftests/kvm/x86/xen_shinfo_test.c @@ -547,11 +547,8 @@ int main(int argc, char *argv[]) int irq_fd[2] =3D { -1, -1 }; =20 if (do_eventfd_tests) { - irq_fd[0] =3D eventfd(0, 0); - TEST_ASSERT(irq_fd[0] >=3D 0, __KVM_SYSCALL_ERROR("eventfd()", irq_fd[0]= )); - - irq_fd[1] =3D eventfd(0, 0); - TEST_ASSERT(irq_fd[1] >=3D 0, __KVM_SYSCALL_ERROR("eventfd()", irq_fd[1]= )); + irq_fd[0] =3D kvm_new_eventfd(); + irq_fd[1] =3D kvm_new_eventfd(); =20 irq_routes.info.nr =3D 2; =20 @@ -569,15 +566,8 @@ int main(int argc, char *argv[]) =20 vm_ioctl(vm, KVM_SET_GSI_ROUTING, &irq_routes.info); =20 - struct kvm_irqfd ifd =3D { }; - - ifd.fd =3D irq_fd[0]; - ifd.gsi =3D 32; - vm_ioctl(vm, KVM_IRQFD, &ifd); - - ifd.fd =3D irq_fd[1]; - ifd.gsi =3D 33; - vm_ioctl(vm, KVM_IRQFD, &ifd); + kvm_assign_irqfd(vm, 32, irq_fd[0]); + kvm_assign_irqfd(vm, 33, irq_fd[1]); =20 struct sigaction sa =3D { }; sa.sa_handler =3D handle_alrm; --=20 2.49.0.1151.ga128411c76-goog From nobody Fri Oct 31 09:41:29 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64ABB2E339E for ; Thu, 22 May 2025 23:52:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957978; cv=none; b=kc+TEeIa2Nbb8dp9qLJmR7v8eJymFShkiLUroKAMIjjiLLj8qFjY0NGDQLXYcyPiYQc8/36xd+IMoGgZwNl1McSo0iosbyWVyexMrLglt/n9flYZq7WC92YlEyj+CyaFdgPHZUAoefoQuqmgDxSMSUJ2j6zEXbyQwM/7vlg0ydE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747957978; c=relaxed/simple; bh=D5aAIRODPQ9yo1mu4fSx9H1qWYbevAdHUxF+P7XNjwY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mYfraOQc9hob+HOmvIsxP3N6UPvj99HX30v+oCLu1mMKToTEV+RhIUTaAA6JlYpQ3SDrpx85QICxiLB5UzWFBcI3BXdPGzKZOapXWhVuZ6BCxAVPeTyX6oYk3snzca8JIgi/SXA/0qXaybabtcEgKbnFvlMk2lqV9YhTxOiFUuQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TkymV+Fy; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TkymV+Fy" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-310c5c2c38cso1493348a91.1 for ; Thu, 22 May 2025 16:52:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747957976; x=1748562776; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=iNuUWcLjakGfkNNM+8DkBjBlwJhpM767OVzB+eUiNIQ=; b=TkymV+Fy4kBbHciiKl2zobYUUwCyg23vkLg1m7wGHJNb1p5DTQynbi109/vLiHpqe2 9SxPjPu9aQw4oyv31ZeEVZ5yqoYWDzrw6O5RF4ZgRuuRDpS+uH3ATwQler7CFU9+85FD 3kTThm5O/qChG+Zy5XOZW4m56gdX8/+a/boStAfKSTmCOhOaXay5F7YIQ7VGjBLfrNLo /Je+uVrmTTIWM9i5nJXW6b07TTP3027qsuuyoBpLtOFkcyuslkv34NTnCCdvwnlKr/5A xMCSZDjcTM9jYgWj+ftwdQXU78CDsAKS0cT+veT/nPyMp6ZWh+b0AhsxP/qxOEZKaoVn FJzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747957976; x=1748562776; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=iNuUWcLjakGfkNNM+8DkBjBlwJhpM767OVzB+eUiNIQ=; b=A8BCEAIItNJ/yt8JZvW0JYYF0dFK5NbBykI0s0nWJqtDJV6uoh732LTSfrgKFgbpdA NNBuQHFwm9dWoZfDcIgOJWNCrEayEOxa0ARlBR9XLpqIRAM9z3rvr8e+nREh9IYAbHUZ GcEmnpHONhIjT2w7OfkCjeetmXwkWHwuOC24LAWrBmmTMHjx9W9trcdkoD6HSoQDS9RT 1poXZJLz2W9kny2nW9/7EDV39+JZPwaTYwecTYOZnFqmQoCWWvTB7RIK8D73Mp3cJ0om W9dATJvaGNS1lGoxNH+6tmNFZxco63cy06rk0erQDXvFaNCXSxyVa5rRx007BiaDbkbK 6OhQ== X-Gm-Message-State: AOJu0Yx/Dm4dWySna5YwbZW8SLiLC/1xdJCGAoFW8FXkgb2Mx5WcXrai nchyQmZCexULWbsP1DyO8RcWpkmzDwnSz9sIaB1jc1XdczgkNM2XdNqpFgd+D7giWUrfoR6h4eu LA5/pBw== X-Google-Smtp-Source: AGHT+IEoOvxAnrgVMqS3TC3RLkVh5nLtBQqPosDJG4xPj/XHWemBkiBPXcS2x1YfUhLo+1j7jd9m06w7XHM= X-Received: from pjbpm5.prod.google.com ([2002:a17:90b:3c45:b0:30a:9720:ea33]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:dfc7:b0:30c:5479:c92e with SMTP id 98e67ed59e1d1-30e830c7988mr39961520a91.4.1747957975748; Thu, 22 May 2025 16:52:55 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 22 May 2025 16:52:23 -0700 In-Reply-To: <20250522235223.3178519-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250522235223.3178519-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog Message-ID: <20250522235223.3178519-14-seanjc@google.com> Subject: [PATCH v3 13/13] KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements From: Sean Christopherson To: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Juergen Gross , Stefano Stabellini , Paolo Bonzini , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Shuah Khan , Marc Zyngier , Oliver Upton , Sean Christopherson Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, K Prateek Nayak , David Matlack Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a selftest to verify that eventfd+irqfd bindings are globally unique, i.e. that KVM doesn't allow multiple irqfds to bind to a single eventfd, even across VMs. Signed-off-by: Sean Christopherson Acked-by: Peter Zijlstra (Intel) Tested-by: K Prateek Nayak --- tools/testing/selftests/kvm/Makefile.kvm | 1 + tools/testing/selftests/kvm/irqfd_test.c | 130 +++++++++++++++++++++++ 2 files changed, 131 insertions(+) create mode 100644 tools/testing/selftests/kvm/irqfd_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index f62b0a5aba35..318adf3ef6b6 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -54,6 +54,7 @@ TEST_PROGS_x86 +=3D x86/nx_huge_pages_test.sh TEST_GEN_PROGS_COMMON =3D demand_paging_test TEST_GEN_PROGS_COMMON +=3D dirty_log_test TEST_GEN_PROGS_COMMON +=3D guest_print_test +TEST_GEN_PROGS_COMMON +=3D irqfd_test TEST_GEN_PROGS_COMMON +=3D kvm_binary_stats_test TEST_GEN_PROGS_COMMON +=3D kvm_create_max_vcpus TEST_GEN_PROGS_COMMON +=3D kvm_page_table_test diff --git a/tools/testing/selftests/kvm/irqfd_test.c b/tools/testing/selft= ests/kvm/irqfd_test.c new file mode 100644 index 000000000000..286f2b15fde6 --- /dev/null +++ b/tools/testing/selftests/kvm/irqfd_test.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kvm_util.h" + +static struct kvm_vm *vm1; +static struct kvm_vm *vm2; +static int __eventfd; +static bool done; + +/* + * KVM de-assigns based on eventfd *and* GSI, but requires unique eventfds= when + * assigning (the API isn't symmetrical). Abuse the oddity and use a per-= task + * GSI base to avoid false failures due to cross-task de-assign, i.e. so t= hat + * the secondary doesn't de-assign the primary's eventfd and cause assign = to + * unexpectedly succeed on the primary. + */ +#define GSI_BASE_PRIMARY 0x20 +#define GSI_BASE_SECONDARY 0x30 + +static void juggle_eventfd_secondary(struct kvm_vm *vm, int eventfd) +{ + int r, i; + + /* + * The secondary task can encounter EBADF since the primary can close + * the eventfd at any time. And because the primary can recreate the + * eventfd, at the safe fd in the file table, the secondary can also + * encounter "unexpected" success, e.g. if the close+recreate happens + * between the first and second assignments. The secondary's role is + * mostly to antagonize KVM, not to detect bugs. + */ + for (i =3D 0; i < 2; i++) { + r =3D __kvm_irqfd(vm, GSI_BASE_SECONDARY, eventfd, 0); + TEST_ASSERT(!r || errno =3D=3D EBUSY || errno =3D=3D EBADF, + "Wanted success, EBUSY, or EBADF, r =3D %d, errno =3D %d", + r, errno); + + /* De-assign should succeed unless the eventfd was closed. */ + r =3D __kvm_irqfd(vm, GSI_BASE_SECONDARY + i, eventfd, KVM_IRQFD_FLAG_DE= ASSIGN); + TEST_ASSERT(!r || errno =3D=3D EBADF, + "De-assign should succeed unless the fd was closed"); + } +} + +static void *secondary_irqfd_juggler(void *ign) +{ + while (!READ_ONCE(done)) { + juggle_eventfd_secondary(vm1, READ_ONCE(__eventfd)); + juggle_eventfd_secondary(vm2, READ_ONCE(__eventfd)); + } + + return NULL; +} + +static void juggle_eventfd_primary(struct kvm_vm *vm, int eventfd) +{ + int r1, r2; + + /* + * At least one of the assigns should fail. KVM disallows assigning a + * single eventfd to multiple GSIs (or VMs), so it's possible that both + * assignments can fail, too. + */ + r1 =3D __kvm_irqfd(vm, GSI_BASE_PRIMARY, eventfd, 0); + TEST_ASSERT(!r1 || errno =3D=3D EBUSY, + "Wanted success or EBUSY, r =3D %d, errno =3D %d", r1, errno); + + r2 =3D __kvm_irqfd(vm, GSI_BASE_PRIMARY + 1, eventfd, 0); + TEST_ASSERT(r1 || (r2 && errno =3D=3D EBUSY), + "Wanted failure (EBUSY), r1 =3D %d, r2 =3D %d, errno =3D %d", + r1, r2, errno); + + /* + * De-assign should always succeed, even if the corresponding assign + * failed. + */ + kvm_irqfd(vm, GSI_BASE_PRIMARY, eventfd, KVM_IRQFD_FLAG_DEASSIGN); + kvm_irqfd(vm, GSI_BASE_PRIMARY + 1, eventfd, KVM_IRQFD_FLAG_DEASSIGN); +} + +int main(int argc, char *argv[]) +{ + pthread_t racing_thread; + int r, i; + + /* Create "full" VMs, as KVM_IRQFD requires an in-kernel IRQ chip. */ + vm1 =3D vm_create(1); + vm2 =3D vm_create(1); + + WRITE_ONCE(__eventfd, kvm_new_eventfd()); + + kvm_irqfd(vm1, 10, __eventfd, 0); + + r =3D __kvm_irqfd(vm1, 11, __eventfd, 0); + TEST_ASSERT(r && errno =3D=3D EBUSY, + "Wanted EBUSY, r =3D %d, errno =3D %d", r, errno); + + r =3D __kvm_irqfd(vm2, 12, __eventfd, 0); + TEST_ASSERT(r && errno =3D=3D EBUSY, + "Wanted EBUSY, r =3D %d, errno =3D %d", r, errno); + + kvm_irqfd(vm1, 11, READ_ONCE(__eventfd), KVM_IRQFD_FLAG_DEASSIGN); + kvm_irqfd(vm1, 12, READ_ONCE(__eventfd), KVM_IRQFD_FLAG_DEASSIGN); + kvm_irqfd(vm1, 13, READ_ONCE(__eventfd), KVM_IRQFD_FLAG_DEASSIGN); + kvm_irqfd(vm1, 14, READ_ONCE(__eventfd), KVM_IRQFD_FLAG_DEASSIGN); + kvm_irqfd(vm1, 10, READ_ONCE(__eventfd), KVM_IRQFD_FLAG_DEASSIGN); + + close(__eventfd); + + pthread_create(&racing_thread, NULL, secondary_irqfd_juggler, vm2); + + for (i =3D 0; i < 10000; i++) { + WRITE_ONCE(__eventfd, kvm_new_eventfd()); + + juggle_eventfd_primary(vm1, __eventfd); + juggle_eventfd_primary(vm2, __eventfd); + close(__eventfd); + } + + WRITE_ONCE(done, true); + pthread_join(racing_thread, NULL); +} --=20 2.49.0.1151.ga128411c76-goog