From nobody Tue Dec 16 10:48:59 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6E73C001DC for ; Wed, 12 Jul 2023 00:47:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231322AbjGLAri (ORCPT ); Tue, 11 Jul 2023 20:47:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39110 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231226AbjGLArY (ORCPT ); Tue, 11 Jul 2023 20:47:24 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AB611989 for ; Tue, 11 Jul 2023 17:47:21 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1b898cfa6a1so9141285ad.1 for ; Tue, 11 Jul 2023 17:47:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689122841; x=1689727641; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ybabf6Mx4VMuhxCixGmKPdJebzTyct2xBcgC4A4QTHE=; b=jplCC0xsZilG74DILpEjVEcJAUe8+fabKs3V5dzMOA/MRxevrMTrfCsq5J8kwB4it9 yaq6LQmO55fkvLM9PkUBtS8Vd+9hPjkLC/z+ctQwPYiD+IXWSn5VSM6ZOMB8w+S82lnN zJsNx2XDTbzxqfwZnAlQtzeb3c+aQ+7CPf4uEAoAY/3TnZ7CHx30kYV0pdFx8ugI2WpN +7zLN+EyS2SmVPA0PobdaLBphljLcWlQHCbTxGF33YU1BSOIkP1puvtC13xCFHSQ/FY5 pkAZ3E12S/zah08ptHaKyTMOQW92q0+qKaFslkh39kRHH1BXj2m+B66uV9mLDQa3gkro VN4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689122841; x=1689727641; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ybabf6Mx4VMuhxCixGmKPdJebzTyct2xBcgC4A4QTHE=; b=jYevEzIwdxb733gHiOArj6x9SZNnNV/Sl7Og+GgLRbW6V1KFAB0iBwfLPsX1Oh4A0S MTCSo7Z2Gq1U6CNHB5/lmi83btYgWWS7wjkCcC66CPGXmzMIPMf3K3HPGOnjiHmpEYgH MVTiAH60ioHpN9sOujvuT3PovflPA/R2xsR1F1ZH52j5nlORy10+SPe/svTtduFF7g9T 0WEelj2OChvAvJgjThHAzrMilMovqJgIbtToMm3Msv/kwTdO5DCvq2N/Vgi1h7SK2FEQ yNrsJfm0AjFa6ZnXGwY5KAh62FFbgiHUzpST+qxY9FqWGXD27igyx5akxXo5ki1qQMJl K45A== X-Gm-Message-State: ABy/qLZfxaFcP+BrL/nAxq0OV1r78RdN7fzCl8uNXi6a7ytFYSKw2zsR 7C+9zG+869V8F6aH3h/fxVClRA== X-Google-Smtp-Source: APBJJlHEg/AwTm78Dq6O8WVJlRDk8pyUyhJoGvYjko3I2P2q+wSFXyqKnOsqjZ0AQ/vpoX0AWye6Uw== X-Received: by 2002:a17:902:da92:b0:1b3:d8ac:8db3 with SMTP id j18-20020a170902da9200b001b3d8ac8db3mr21326390plx.6.1689122840987; Tue, 11 Jul 2023 17:47:20 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id s8-20020a170902b18800b001b694140d96sm2543542plr.170.2023.07.11.17.47.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 17:47:20 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, peterz@infradead.org, Jens Axboe Subject: [PATCH 7/7] io_uring: add futex waitv Date: Tue, 11 Jul 2023 18:47:05 -0600 Message-Id: <20230712004705.316157-8-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230712004705.316157-1-axboe@kernel.dk> References: <20230712004705.316157-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Needs a bit of splitting and a few hunks should go further back (like the wake handler typedef). WIP, adds IORING_OP_FUTEX_WAITV - pass in an array of futex addresses, and wait on all of them until one of them triggers. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 1 + io_uring/futex.c | 165 +++++++++++++++++++++++++++++++--- io_uring/futex.h | 2 + io_uring/opdef.c | 11 +++ 4 files changed, 169 insertions(+), 10 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 3bd2d765f593..420f38675769 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -238,6 +238,7 @@ enum io_uring_op { IORING_OP_SENDMSG_ZC, IORING_OP_FUTEX_WAIT, IORING_OP_FUTEX_WAKE, + IORING_OP_FUTEX_WAITV, =20 /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/futex.c b/io_uring/futex.c index ff0f6b394756..b22120545d31 100644 --- a/io_uring/futex.c +++ b/io_uring/futex.c @@ -14,11 +14,16 @@ =20 struct io_futex { struct file *file; - u32 __user *uaddr; + union { + u32 __user *uaddr; + struct futex_waitv __user *uwaitv; + }; int futex_op; unsigned int futex_val; unsigned int futex_flags; unsigned int futex_mask; + unsigned int futex_nr; + unsigned long futexv_owned; }; =20 struct io_futex_data { @@ -45,6 +50,13 @@ void io_futex_cache_free(struct io_ring_ctx *ctx) io_alloc_cache_free(&ctx->futex_cache, io_futex_cache_entry_free); } =20 +static void __io_futex_complete(struct io_kiocb *req, struct io_tw_state *= ts) +{ + req->async_data =3D NULL; + hlist_del_init(&req->hash_node); + io_req_task_complete(req, ts); +} + static void io_futex_complete(struct io_kiocb *req, struct io_tw_state *ts) { struct io_futex_data *ifd =3D req->async_data; @@ -53,22 +65,59 @@ static void io_futex_complete(struct io_kiocb *req, str= uct io_tw_state *ts) io_tw_lock(ctx, ts); if (!io_alloc_cache_put(&ctx->futex_cache, &ifd->cache)) kfree(ifd); - req->async_data =3D NULL; - hlist_del_init(&req->hash_node); - io_req_task_complete(req, ts); + __io_futex_complete(req, ts); } =20 -static bool __io_futex_cancel(struct io_ring_ctx *ctx, struct io_kiocb *re= q) +static void io_futexv_complete(struct io_kiocb *req, struct io_tw_state *t= s) { - struct io_futex_data *ifd =3D req->async_data; + struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); + struct futex_vector *futexv =3D req->async_data; + struct io_ring_ctx *ctx =3D req->ctx; + int res =3D 0; =20 - /* futex wake already done or in progress */ - if (!futex_unqueue(&ifd->q)) + io_tw_lock(ctx, ts); + + res =3D futex_unqueue_multiple(futexv, iof->futex_nr); + if (res !=3D -1) + io_req_set_res(req, res, 0); + + kfree(req->async_data); + req->flags &=3D ~REQ_F_ASYNC_DATA; + __io_futex_complete(req, ts); +} + +static bool io_futexv_claimed(struct io_futex *iof) +{ + return test_bit(0, &iof->futexv_owned); +} + +static bool io_futexv_claim(struct io_futex *iof) +{ + if (test_bit(0, &iof->futexv_owned) || + test_and_set_bit(0, &iof->futexv_owned)) return false; + return true; +} + +static bool __io_futex_cancel(struct io_ring_ctx *ctx, struct io_kiocb *re= q) +{ + /* futex wake already done or in progress */ + if (req->opcode =3D=3D IORING_OP_FUTEX_WAIT) { + struct io_futex_data *ifd =3D req->async_data; + + if (!futex_unqueue(&ifd->q)) + return false; + req->io_task_work.func =3D io_futex_complete; + } else { + struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); + + if (!io_futexv_claim(iof)) + return false; + req->io_task_work.func =3D io_futexv_complete; + } =20 hlist_del_init(&req->hash_node); io_req_set_res(req, -ECANCELED, 0); - req->io_task_work.func =3D io_futex_complete; io_req_task_work_add(req); return true; } @@ -124,7 +173,7 @@ int io_futex_prep(struct io_kiocb *req, const struct io= _uring_sqe *sqe) { struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); =20 - if (unlikely(sqe->addr2 || sqe->buf_index || sqe->addr3)) + if (unlikely(sqe->buf_index || sqe->addr3)) return -EINVAL; =20 iof->futex_op =3D READ_ONCE(sqe->fd); @@ -135,6 +184,53 @@ int io_futex_prep(struct io_kiocb *req, const struct i= o_uring_sqe *sqe) if (iof->futex_flags & FUTEX_CMD_MASK) return -EINVAL; =20 + iof->futexv_owned =3D 0; + return 0; +} + +static void io_futex_wakev_fn(struct wake_q_head *wake_q, struct futex_q *= q) +{ + struct io_kiocb *req =3D q->wake_data; + struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); + + if (!io_futexv_claim(iof)) + return; + + __futex_unqueue(q); + smp_store_release(&q->lock_ptr, NULL); + + io_req_set_res(req, 0, 0); + req->io_task_work.func =3D io_futexv_complete; + io_req_task_work_add(req); +} + +int io_futexv_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); + struct futex_vector *futexv; + int ret; + + ret =3D io_futex_prep(req, sqe); + if (ret) + return ret; + + iof->futex_nr =3D READ_ONCE(sqe->off); + if (!iof->futex_nr || iof->futex_nr > FUTEX_WAITV_MAX) + return -EINVAL; + + futexv =3D kcalloc(iof->futex_nr, sizeof(*futexv), GFP_KERNEL); + if (!futexv) + return -ENOMEM; + + ret =3D futex_parse_waitv(futexv, iof->uwaitv, iof->futex_nr, + io_futex_wakev_fn, req); + if (ret) { + kfree(futexv); + return ret; + } + + req->flags |=3D REQ_F_ASYNC_DATA; + req->async_data =3D futexv; return 0; } =20 @@ -162,6 +258,55 @@ static struct io_futex_data *io_alloc_ifd(struct io_ri= ng_ctx *ctx) return kmalloc(sizeof(struct io_futex_data), GFP_NOWAIT); } =20 +int io_futex_waitv(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); + struct futex_vector *futexv =3D req->async_data; + struct io_ring_ctx *ctx =3D req->ctx; + int ret, woken =3D -1; + + io_ring_submit_lock(ctx, issue_flags); + + ret =3D futex_wait_multiple_setup(futexv, iof->futex_nr, &woken); + + /* + * The above call leaves us potentially non-running. This is fine + * for the sync syscall as it'll be blocking unless we already got + * one of the futexes woken, but it obviously won't work for an async + * invocation. Mark is runnable again. + */ + __set_current_state(TASK_RUNNING); + + /* + * We got woken while setting up, let that side do the completion + */ + if (io_futexv_claimed(iof)) { +skip: + io_ring_submit_unlock(ctx, issue_flags); + return IOU_ISSUE_SKIP_COMPLETE; + } + + /* + * 0 return means that we successfully setup the waiters, and that + * nobody triggered a wakeup while we were doing so. < 0 or 1 return + * is either an error or we got a wakeup while setting up. + */ + if (!ret) { + hlist_add_head(&req->hash_node, &ctx->futex_list); + goto skip; + } + + io_ring_submit_unlock(ctx, issue_flags); + if (ret < 0) + req_set_fail(req); + else if (woken !=3D -1) + ret =3D woken; + io_req_set_res(req, ret, 0); + kfree(futexv); + req->flags &=3D ~REQ_F_ASYNC_DATA; + return IOU_OK; +} + int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags) { struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex); diff --git a/io_uring/futex.h b/io_uring/futex.h index ddc9e0d73c52..7828e27e4184 100644 --- a/io_uring/futex.h +++ b/io_uring/futex.h @@ -3,7 +3,9 @@ #include "cancel.h" =20 int io_futex_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_futexv_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags); +int io_futex_waitv(struct io_kiocb *req, unsigned int issue_flags); int io_futex_wake(struct io_kiocb *req, unsigned int issue_flags); =20 #if defined(CONFIG_FUTEX) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index c9f23c21a031..2034acfe10d0 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -443,6 +443,14 @@ const struct io_issue_def io_issue_defs[] =3D { .issue =3D io_futex_wake, #else .prep =3D io_eopnotsupp_prep, +#endif + }, + [IORING_OP_FUTEX_WAITV] =3D { +#if defined(CONFIG_FUTEX) + .prep =3D io_futexv_prep, + .issue =3D io_futex_waitv, +#else + .prep =3D io_eopnotsupp_prep, #endif }, }; @@ -670,6 +678,9 @@ const struct io_cold_def io_cold_defs[] =3D { [IORING_OP_FUTEX_WAKE] =3D { .name =3D "FUTEX_WAKE", }, + [IORING_OP_FUTEX_WAITV] =3D { + .name =3D "FUTEX_WAITV", + }, }; =20 const char *io_uring_get_opcode(u8 opcode) --=20 2.40.1