From nobody Mon May 25 01:58:58 2026 Received: from m16.mail.163.com (m16.mail.163.com [117.135.210.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F1894480DF4 for ; Tue, 19 May 2026 11:20:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=117.135.210.3 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779189654; cv=none; b=qg08LCq+CNQikxs36gurWgyneiF6QkqrIg0buFwEqK6Y884OsqbpqqHsAqHQ05soyPdDVrS8zcMX06LHPVngmNRYYST4uVPgqWRYL/UPmJXQ92G7EYgbFeeubzxZxQTDNPj83Ntz3sCXUbj0z1/UXDRQ5l/WSlwsUiIZ07WZAP8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779189654; c=relaxed/simple; bh=OAB5LST9Z3GMaejarxGbEe9m/P2QYAqEQTnuz7M9iWY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ISot8FnlaXGrBx6xqhcDO4IEqv7u6DpDMpIOjIslNWmZTAYBkm6UgonnO+DYAONTM6vBxnPuoxujM3OFNhjmfuyPALdSyet3QUyvcYzs8gp2OUF+lZyGbebWp5FTvopggoW6sMkZLN4ch9zjt5mGUg+zN4B/dCm4q+saZ8D8lOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com; spf=pass smtp.mailfrom=163.com; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b=hlefHYno; arc=none smtp.client-ip=117.135.210.3 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=163.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=163.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=163.com header.i=@163.com header.b="hlefHYno" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=mA eUjnj1xmIgE3/yloZNbZCLYkx3ru+8UpWXH4lOBaw=; b=hlefHYnozRhT/gbh6E WjRwWmyhdcXDgPKqFaKpGxSQCiou/WAo86CHPz6VnIAUmKwNhMK+o/MxOj2TUCFW Eh6QRC+inG5GgJ1XXX5pQizv5tDJ3lHEGaYI45SpoWBF3A/ArEEhz5LzBUZNvCZf gu273SO7f5+txA1fEtnlwgmUU= Received: from flipped.. (unknown []) by gzga-smtp-mtada-g0-2 (Coremail) with SMTP id _____wAnzjB0RwxqBwz3CA--.54184S3; Tue, 19 May 2026 19:20:26 +0800 (CST) From: Sicong Huang To: fei1.li@intel.com Cc: acrn-dev@lists.projectacrn.org, gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, Sicong Huang Subject: [PATCH v2 1/1] virt: acrn: Fix irqfd use-after-free during eventfd shutdown Date: Tue, 19 May 2026 19:20:18 +0800 Message-Id: <20260519112018.2135000-2-congei42@163.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260519112018.2135000-1-congei42@163.com> References: <20260511135737.2285411-1-congei42@163.com> <20260519112018.2135000-1-congei42@163.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: _____wAnzjB0RwxqBwz3CA--.54184S3 X-Coremail-Antispam: 1Uf129KBjvJXoWxtr1kXryrJw4xtr15Aw1rXrb_yoW3JFWfpr Wav3y5KF4xJrW09rs8KrsxuF13KrWrWanrtwn2k3WfKF4qyF13XFyUAryUKryFkFWkG343 Aay8t3y5WFZrKF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zu2NJ8UUUUU= X-CM-SenderInfo: 5frqwvrlusqiywtou0bp/xtbC1BqNgGoMR3ratQAA33 Content-Type: text/plain; charset="utf-8" acrn_irqfd_deassign() and the eventfd EPOLLHUP wakeup can race and free the same struct hsm_irqfd: CPU0 CPU1 ---- ---- eventfd_release() wake_up_poll(EPOLLHUP) hsm_irqfd_wakeup() queue_work(&irqfd->shutdown) acrn_irqfd_deassign() hsm_irqfd_shutdown() list_del_init() eventfd_ctx_remove_wait_queue() eventfd_ctx_put() kfree(irqfd) hsm_irqfd_shutdown_work() container_of(work, ..., shutdown) irqfd->vm <-- use-after-free The deassign path freed the irqfd while a shutdown work item was already queued by EPOLLHUP (or vice versa), so the work item could resurrect a dangling pointer through container_of(). Switch to the lifetime model used by KVM irqfds: - Deassign/deinit only deactivate the irqfd: remove it from vm->irqfds under irqfds_lock and queue the cleanup work. - hsm_irqfd_shutdown_work() becomes the sole owner that unhooks the eventfd waitqueue entry, drops the eventfd reference and frees the irqfd. - A new HSM_IRQFD_FLAG_SHUTDOWN bit guarded by test_and_set_bit() ensures the cleanup work is queued at most once, no matter how many of {EPOLLHUP, deassign, deinit} fire concurrently. This is safe to call from the waitqueue callback, which runs with wqh->lock held and IRQs disabled and therefore cannot take irqfds_lock. - acrn_irqfd_deassign() flushes vm->irqfd_wq before returning so the eventfd is fully detached on return. acrn_irqfd_deinit() deactivates every irqfd, flushes the workqueue and only then destroys it, so no path can queue_work() onto a torn-down workqueue. - acrn_irqfd_assign() now installs the eventfd waitqueue entry and publishes the irqfd to vm->irqfds under irqfds_lock, so the irqfd is never visible to deassign/deinit before its waitqueue entry is in place, and any EPOLLHUP that fires in the assign window queues cleanup work that blocks on irqfds_lock until publication is done. Signed-off-by: Sicong Huang Reviewed-by: Fei Li --- drivers/virt/acrn/irqfd.c | 71 ++++++++++++++++++++++++--------------- 1 file changed, 44 insertions(+), 27 deletions(-) diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c index acf8cd5f8f8c..feeba7eda494 100644 --- a/drivers/virt/acrn/irqfd.c +++ b/drivers/virt/acrn/irqfd.c @@ -16,6 +16,9 @@ =20 #include "acrn_drv.h" =20 +/* Cleanup work has been queued; set via test_and_set_bit(). */ +#define HSM_IRQFD_FLAG_SHUTDOWN 0 + /** * struct hsm_irqfd - Properties of HSM irqfd * @vm: Associated VM pointer @@ -25,6 +28,7 @@ * @list: Entry within &acrn_vm.irqfds of irqfds of a VM * @pt: Structure for select/poll on the associated eventfd * @msi: MSI data + * @flags: Internal lifecycle flags (HSM_IRQFD_FLAG_*) */ struct hsm_irqfd { struct acrn_vm *vm; @@ -34,6 +38,7 @@ struct hsm_irqfd { struct list_head list; poll_table pt; struct acrn_msi_entry msi; + unsigned long flags; }; =20 static void acrn_irqfd_inject(struct hsm_irqfd *irqfd) @@ -44,30 +49,29 @@ static void acrn_irqfd_inject(struct hsm_irqfd *irqfd) irqfd->msi.msi_data); } =20 -static void hsm_irqfd_shutdown(struct hsm_irqfd *irqfd) +/* Queue the cleanup work at most once. Safe from atomic context. */ +static void hsm_irqfd_queue_shutdown(struct hsm_irqfd *irqfd) { - u64 cnt; - - lockdep_assert_held(&irqfd->vm->irqfds_lock); - - /* remove from wait queue */ - list_del_init(&irqfd->list); - eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt); - eventfd_ctx_put(irqfd->eventfd); - kfree(irqfd); + if (!test_and_set_bit(HSM_IRQFD_FLAG_SHUTDOWN, &irqfd->flags)) + queue_work(irqfd->vm->irqfd_wq, &irqfd->shutdown); } =20 +/* Sole owner of @irqfd: unhook waitqueue, drop eventfd ref, free. */ static void hsm_irqfd_shutdown_work(struct work_struct *work) { - struct hsm_irqfd *irqfd; - struct acrn_vm *vm; + struct hsm_irqfd *irqfd =3D container_of(work, struct hsm_irqfd, + shutdown); + struct acrn_vm *vm =3D irqfd->vm; + u64 cnt; =20 - irqfd =3D container_of(work, struct hsm_irqfd, shutdown); - vm =3D irqfd->vm; mutex_lock(&vm->irqfds_lock); if (!list_empty(&irqfd->list)) - hsm_irqfd_shutdown(irqfd); + list_del_init(&irqfd->list); mutex_unlock(&vm->irqfds_lock); + + eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt); + eventfd_ctx_put(irqfd->eventfd); + kfree(irqfd); } =20 /* Called with wqh->lock held and interrupts disabled */ @@ -76,17 +80,16 @@ static int hsm_irqfd_wakeup(wait_queue_entry_t *wait, u= nsigned int mode, { unsigned long poll_bits =3D (unsigned long)key; struct hsm_irqfd *irqfd; - struct acrn_vm *vm; =20 irqfd =3D container_of(wait, struct hsm_irqfd, wait); - vm =3D irqfd->vm; + if (poll_bits & POLLIN) /* An event has been signaled, inject an interrupt */ acrn_irqfd_inject(irqfd); =20 if (poll_bits & POLLHUP) - /* Do shutdown work in thread to hold wqh->lock */ - queue_work(vm->irqfd_wq, &irqfd->shutdown); + /* Defer teardown to the cleanup work; can't sleep here. */ + hsm_irqfd_queue_shutdown(irqfd); =20 return 0; } @@ -142,6 +145,12 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struc= t acrn_irqfd *args) init_waitqueue_func_entry(&irqfd->wait, hsm_irqfd_wakeup); init_poll_funcptr(&irqfd->pt, hsm_irqfd_poll_func); =20 + /* + * Hold irqfds_lock across waitqueue install and list_add so the + * irqfd is not visible to deassign/deinit before its waitqueue + * entry is in place, and any racing EPOLLHUP cleanup work blocks + * on irqfds_lock until publication completes. + */ mutex_lock(&vm->irqfds_lock); list_for_each_entry(tmp, &vm->irqfds, list) { if (irqfd->eventfd !=3D tmp->eventfd) @@ -150,14 +159,12 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, stru= ct acrn_irqfd *args) mutex_unlock(&vm->irqfds_lock); goto fail; } - list_add_tail(&irqfd->list, &vm->irqfds); - mutex_unlock(&vm->irqfds_lock); =20 - /* Check the pending event in this stage */ events =3D vfs_poll(fd_file(f), &irqfd->pt); - + list_add_tail(&irqfd->list, &vm->irqfds); if (events & EPOLLIN) acrn_irqfd_inject(irqfd); + mutex_unlock(&vm->irqfds_lock); =20 return 0; fail: @@ -180,13 +187,17 @@ static int acrn_irqfd_deassign(struct acrn_vm *vm, mutex_lock(&vm->irqfds_lock); list_for_each_entry_safe(irqfd, tmp, &vm->irqfds, list) { if (irqfd->eventfd =3D=3D eventfd) { - hsm_irqfd_shutdown(irqfd); + list_del_init(&irqfd->list); + hsm_irqfd_queue_shutdown(irqfd); break; } } mutex_unlock(&vm->irqfds_lock); eventfd_ctx_put(eventfd); =20 + /* Wait for cleanup work to finish so the eventfd is fully detached. */ + flush_workqueue(vm->irqfd_wq); + return 0; } =20 @@ -219,9 +230,15 @@ void acrn_irqfd_deinit(struct acrn_vm *vm) struct hsm_irqfd *irqfd, *next; =20 dev_dbg(acrn_dev.this_device, "VM %u irqfd deinit.\n", vm->vmid); - destroy_workqueue(vm->irqfd_wq); + mutex_lock(&vm->irqfds_lock); - list_for_each_entry_safe(irqfd, next, &vm->irqfds, list) - hsm_irqfd_shutdown(irqfd); + list_for_each_entry_safe(irqfd, next, &vm->irqfds, list) { + list_del_init(&irqfd->list); + hsm_irqfd_queue_shutdown(irqfd); + } mutex_unlock(&vm->irqfds_lock); + + /* Drain all cleanup work before tearing the workqueue down. */ + flush_workqueue(vm->irqfd_wq); + destroy_workqueue(vm->irqfd_wq); } --=20 2.34.1