From nobody Wed Jun 17 03:57:17 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFDC2346E55; Wed, 22 Apr 2026 11:29:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857362; cv=none; b=b5xV4CGIVJDcqfawXCUK/I1CDEaVxAlHqcFUaghkja46FcOewhrxd+MKT3TAMgn9LmssekPTVtoZreGmoSOjcP1mQWaNEb/AVN6gxwRETRWI4oYKCEb5LxQlX9WX019xtemRpikQCs4WDOcxrhZyKdoIPXYpuuKe+I3AsizfWKY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857362; c=relaxed/simple; bh=L/CCGxjelU6Txzv7CCUM65il82JaXXFecSyF45RCBds=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hfFYM3reiuFUarczyPBv9mw/mjUuNPjPisEK0U3WN7utU6HWbSuGoP2jyHAlHe7Y4gB1fgby69TzHVanCYdhGdFe3eaizH4ASM56kT2Ex1YVItMlnL2RbDt7SXjze1egpJUKaHvikNzVjYXJE44jIlbZay0UtS1p1oLh/xn6WSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=UM1Zouaj; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="UM1Zouaj" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63LIe1in098496; Wed, 22 Apr 2026 04:29:12 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=5y0psTD6dFRi3Ku/PRitCdFv6ppKOmOvN+Pgt7QkhVw=; b=UM1Zouajk/G/ Rf/2PIjQNXbBPR9J6MwfEI1/FZ8o3AYSy3iaywa0PGMHfcJbfUFdvYOSVWufrx50 d5FZ0R4Cg1RdmJ7JsDbzrDdigOtzXUObecEb8W8WHh8iVAEpgUmjwo1ulmK6qhWv T9gQ/yJSfpQWzK5wc+1aCa3fbeigETgnyoJR/wcN/Z3kaNpQWX3eXGPMILEdvzKK kBcWFY/0jAVcHziTish48ox4q3cayRBfZWYiAo+27qy24izd0bXeNtRmoegSdsCc v90KTEnjaKQqZBn/EmZtifX3ZBlHHFNZrPIJOKWRQvZ9RFmNvxP8S1HxO0JitKNB q3NXLqIsdQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dpeq84p4p-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 22 Apr 2026 04:29:11 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Wed, 22 Apr 2026 11:29:09 +0000 From: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= To: , Pavel Begunkov , "Jens Axboe" CC: , , , , "David S. Miller" , Eric Dumazet , "Jakub Kicinski" , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Vishwanath Seshagiri , "Vishwanath Seshagiri" Subject: [PATCH 1/5] io_uring/zcrx: notify user when out of buffers Date: Wed, 22 Apr 2026 04:25:12 -0700 Message-ID: <20260422112522.3316660-2-cleger@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422112522.3316660-1-cleger@meta.com> References: <20260422112522.3316660-1-cleger@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: wmxlRId0whWjnPbwkrRVbJLeS6ZWv84o X-Authority-Analysis: v=2.4 cv=X4pi7mTe c=1 sm=1 tr=0 ts=69e8b107 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=A5OVakUREuEA:10 a=M51BFTxLslgA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=tpM8CJlwf7uhpglF1g9U:22 a=pGLkceISAAAA:8 a=VabnemYjAAAA:8 a=JHpVwyOmdC1rrBdYOEsA:9 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDIyMDExMCBTYWx0ZWRfX0QN2pJgCf7Ix 6kDzsK0sufNWYxxPfKfHdyaJEpwK5rsNtZ48pwOyKoudMm9+JJEDZnAeZjhcEEqRzeSYiwxsNVo CSarDKBwuNb/3jdsIVp19lbIB3ijFXstxsRsKGFqTBhHu2JU4gHtHIQgWmTKZGukuM7lf+it+JR J8+kgZjiNroxAtu549nxNrmY74QsDWRuvCBOS5hicuudx3V6Mevx+urc9jQ4aNq8JXBRipuK3Z8 l1/m7I9ah0aDlNFVb111wXQEDQAnfvhVw84+pxcUyUQOYd9EJ5AoFV2wS/rZy1TIRUp05vSi2xh 5Q42JSxCR2WIGHrkMhnrYACC35JaK2VUhpvzk2n/y/So3dK5eUrcIWgp4FWZBu6IuooncquckQm IbuDJgRo8NoPro/5EvPZXdYGV4gkTie8b3y30Y+afcQsxLP1Lo0klhjw4XJh9LfClnvKTkV+KGp e1we6K4ZKFNuTf5N6/A== X-Proofpoint-ORIG-GUID: wmxlRId0whWjnPbwkrRVbJLeS6ZWv84o X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-22_01,2026-04-21_02,2025-10-01_01 Content-Type: text/plain; charset="utf-8" From: Pavel Begunkov There are currently no easy ways for the user to know if zcrx is out of buffers and page pool fails to allocate. Add uapi for zcrx to communicate it back. It's implemented as a separate CQE, which for now is posted to the creator ctx. To use it, on registration the user space needs to pass an instance of struct zcrx_notification_desc, which tells the kernel the user_data for resulting CQEs and which event types are expected / allowed. When an allowed event happens, zcrx will post a CQE containing the specified user_data, and lower bits of cqe->res will be set to the event mask. Before the kernel could post another notification of the given type, the user needs to acknowledge that it processed the previous one by issuing IORING_REGISTER_ZCRX_CTRL with ZCRX_CTRL_ARM_NOTIFICATION. The only notification type the patch implements yet is ZCRX_NOTIF_NO_BUFFERS. Next commit adds copy fallback signaling. Co-developed-by: Vishwanath Seshagiri Signed-off-by: Vishwanath Seshagiri Signed-off-by: Pavel Begunkov --- include/uapi/linux/io_uring/zcrx.h | 22 ++++++- io_uring/zcrx.c | 98 +++++++++++++++++++++++++++++- io_uring/zcrx.h | 11 +++- 3 files changed, 128 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/io_uring/zcrx.h b/include/uapi/linux/io_uri= ng/zcrx.h index 5ce02c7a6096..b8596d7d47b6 100644 --- a/include/uapi/linux/io_uring/zcrx.h +++ b/include/uapi/linux/io_uring/zcrx.h @@ -65,6 +65,18 @@ enum zcrx_features { * value in struct io_uring_zcrx_ifq_reg::rx_buf_len. */ ZCRX_FEATURE_RX_PAGE_SIZE =3D 1 << 0, + ZCRX_FEATURE_NOTIFICATION =3D 1 << 1, +}; + +enum zcrx_notification_type { + ZCRX_NOTIF_NO_BUFFERS =3D 1 << 0, +}; + +struct zcrx_notification_desc { + __u64 user_data; + __u32 type_mask; + __u32 __resv1; + __u64 __resv2[10]; }; =20 /* @@ -82,12 +94,14 @@ struct io_uring_zcrx_ifq_reg { struct io_uring_zcrx_offsets offsets; __u32 zcrx_id; __u32 rx_buf_len; - __u64 __resv[3]; + __u64 notif_desc; /* see struct zcrx_notification_desc */ + __u64 __resv[2]; }; =20 enum zcrx_ctrl_op { ZCRX_CTRL_FLUSH_RQ, ZCRX_CTRL_EXPORT, + ZCRX_CTRL_ARM_NOTIFICATION, =20 __ZCRX_CTRL_LAST, }; @@ -101,6 +115,11 @@ struct zcrx_ctrl_export { __u32 __resv1[11]; }; =20 +struct zcrx_ctrl_arm_notif { + __u32 type_mask; + __u32 __resv[11]; +}; + struct zcrx_ctrl { __u32 zcrx_id; __u32 op; /* see enum zcrx_ctrl_op */ @@ -109,6 +128,7 @@ struct zcrx_ctrl { union { struct zcrx_ctrl_export zc_export; struct zcrx_ctrl_flush_rq zc_flush; + struct zcrx_ctrl_arm_notif zc_arm_notif; }; }; =20 diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 9a83d7eb4210..35ca28cb6583 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -44,6 +44,16 @@ static inline struct io_zcrx_area *io_zcrx_iov_to_area(c= onst struct net_iov *nio return container_of(owner, struct io_zcrx_area, nia); } =20 +static bool zcrx_set_ring_ctx(struct io_zcrx_ifq *zcrx, struct io_ring_ctx= *ctx) +{ + guard(spinlock_bh)(&zcrx->ctx_lock); + if (zcrx->master_ctx) + return false; + percpu_ref_get(&ctx->refs); + zcrx->master_ctx =3D ctx; + return true; +} + static inline struct page *io_zcrx_iov_page(const struct net_iov *niov) { struct io_zcrx_area *area =3D io_zcrx_iov_to_area(niov); @@ -531,6 +541,7 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_= ring_ctx *ctx) =20 ifq->if_rxq =3D -1; spin_lock_init(&ifq->rq.lock); + spin_lock_init(&ifq->ctx_lock); mutex_init(&ifq->pp_lock); refcount_set(&ifq->refs, 1); refcount_set(&ifq->user_refs, 1); @@ -585,6 +596,11 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) if (ifq->dev) put_device(ifq->dev); =20 + scoped_guard(spinlock_bh, &ifq->ctx_lock) { + if (ifq->master_ctx) + percpu_ref_put(&ifq->master_ctx->refs); + } + io_free_rbuf_ring(ifq); mutex_destroy(&ifq->pp_lock); kfree(ifq); @@ -738,6 +754,8 @@ static int import_zcrx(struct io_ring_ctx *ctx, return -EINVAL; if (reg->if_rxq || reg->rq_entries || reg->area_ptr || reg->region_ptr) return -EINVAL; + if (reg->notif_desc) + return -EINVAL; if (reg->flags & ~ZCRX_REG_IMPORT) return -EINVAL; =20 @@ -826,6 +844,7 @@ static int zcrx_register_netdev(struct io_zcrx_ifq *ifq, int io_register_zcrx(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) { + struct zcrx_notification_desc notif; struct io_uring_zcrx_area_reg area; struct io_uring_zcrx_ifq_reg reg; struct io_uring_region_desc rd; @@ -869,10 +888,22 @@ int io_register_zcrx(struct io_ring_ctx *ctx, if (copy_from_user(&area, u64_to_user_ptr(reg.area_ptr), sizeof(area))) return -EFAULT; =20 + memset(¬if, 0, sizeof(notif)); + if (reg.notif_desc && copy_from_user(¬if, u64_to_user_ptr(reg.notif_de= sc), + sizeof(notif))) + return -EFAULT; + if (notif.type_mask & ~ZCRX_NOTIF_TYPE_MASK) + return -EINVAL; + if (notif.__resv1 || !mem_is_zero(¬if.__resv2, sizeof(notif.__resv2))) + return -EINVAL; + ifq =3D io_zcrx_ifq_alloc(ctx); if (!ifq) return -ENOMEM; =20 + ifq->notif_data =3D notif.user_data; + ifq->allowed_notif_mask =3D notif.type_mask; + if (ctx->user) { get_uid(ctx->user); ifq->user =3D ctx->user; @@ -923,6 +954,9 @@ int io_register_zcrx(struct io_ring_ctx *ctx, ret =3D -EFAULT; goto err; } + + if (notif.type_mask) + zcrx_set_ring_ctx(ifq, ctx); return 0; err: scoped_guard(mutex, &ctx->mmap_lock) @@ -1089,6 +1123,46 @@ static unsigned io_zcrx_refill_slow(struct page_pool= *pp, struct io_zcrx_ifq *if return allocated; } =20 +static void zcrx_notif_tw(struct io_tw_req tw_req, io_tw_token_t tw) +{ + struct io_kiocb *req =3D tw_req.req; + struct io_ring_ctx *ctx =3D req->ctx; + + io_post_aux_cqe(ctx, req->cqe.user_data, req->cqe.res, 0); + percpu_ref_put(&ctx->refs); + kfree_rcu(req, rcu_head); +} + +static void zcrx_send_notif(struct io_zcrx_ifq *ifq, u32 type_mask) +{ + gfp_t gfp =3D GFP_ATOMIC | __GFP_NOWARN | __GFP_ZERO; + struct io_kiocb *req; + + if (!(type_mask & ifq->allowed_notif_mask)) + return; + + guard(spinlock_bh)(&ifq->ctx_lock); + if (!ifq->master_ctx) + return; + if (type_mask & ifq->fired_notifs) + return; + + req =3D kmem_cache_alloc(req_cachep, gfp); + if (unlikely(!req)) + return; + + ifq->fired_notifs |=3D type_mask; + + req->opcode =3D IORING_OP_NOP; + req->cqe.user_data =3D ifq->notif_data; + req->cqe.res =3D type_mask; + req->ctx =3D ifq->master_ctx; + percpu_ref_get(&req->ctx->refs); + req->tctx =3D NULL; + req->io_task_work.func =3D zcrx_notif_tw; + io_req_task_work_add(req); +} + static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp) { struct io_zcrx_ifq *ifq =3D io_pp_to_ifq(pp); @@ -1105,8 +1179,10 @@ static netmem_ref io_pp_zc_alloc_netmems(struct page= _pool *pp, gfp_t gfp) goto out_return; =20 allocated =3D io_zcrx_refill_slow(pp, ifq, netmems, to_alloc); - if (!allocated) + if (!allocated) { + zcrx_send_notif(ifq, ZCRX_NOTIF_NO_BUFFERS); return 0; + } out_return: zcrx_sync_for_device(pp, ifq, netmems, allocated); allocated--; @@ -1255,12 +1331,30 @@ static int zcrx_flush_rq(struct io_ring_ctx *ctx, s= truct io_zcrx_ifq *zcrx, return 0; } =20 +static int zcrx_arm_notif(struct io_ring_ctx *ctx, struct io_zcrx_ifq *zcr= x, + struct zcrx_ctrl *ctrl) +{ + const struct zcrx_ctrl_arm_notif *an =3D &ctrl->zc_arm_notif; + + if (an->type_mask & ~ZCRX_NOTIF_TYPE_MASK) + return -EINVAL; + if (!mem_is_zero(&an->__resv, sizeof(an->__resv))) + return -EINVAL; + + guard(spinlock_bh)(&zcrx->ctx_lock); + if (an->type_mask & ~zcrx->fired_notifs) + return -EINVAL; + zcrx->fired_notifs &=3D ~an->type_mask; + return 0; +} + int io_zcrx_ctrl(struct io_ring_ctx *ctx, void __user *arg, unsigned nr_ar= gs) { struct zcrx_ctrl ctrl; struct io_zcrx_ifq *zcrx; =20 BUILD_BUG_ON(sizeof(ctrl.zc_export) !=3D sizeof(ctrl.zc_flush)); + BUILD_BUG_ON(sizeof(ctrl.zc_export) !=3D sizeof(ctrl.zc_arm_notif)); =20 if (nr_args) return -EINVAL; @@ -1278,6 +1372,8 @@ int io_zcrx_ctrl(struct io_ring_ctx *ctx, void __user= *arg, unsigned nr_args) return zcrx_flush_rq(ctx, zcrx, &ctrl); case ZCRX_CTRL_EXPORT: return zcrx_export(ctx, zcrx, &ctrl, arg); + case ZCRX_CTRL_ARM_NOTIFICATION: + return zcrx_arm_notif(ctx, zcrx, &ctrl); } =20 return -EOPNOTSUPP; diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 75e0a4e6ef6e..3ddebed06d57 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -9,7 +9,9 @@ #include =20 #define ZCRX_SUPPORTED_REG_FLAGS (ZCRX_REG_IMPORT | ZCRX_REG_NODEV) -#define ZCRX_FEATURES (ZCRX_FEATURE_RX_PAGE_SIZE) +#define ZCRX_FEATURES (ZCRX_FEATURE_RX_PAGE_SIZE |\ + ZCRX_FEATURE_NOTIFICATION) +#define ZCRX_NOTIF_TYPE_MASK (ZCRX_NOTIF_NO_BUFFERS) =20 struct io_zcrx_mem { unsigned long size; @@ -72,6 +74,13 @@ struct io_zcrx_ifq { */ struct mutex pp_lock; struct io_mapped_region rq_region; + + /* Locks the access to notifification context data */ + spinlock_t ctx_lock; + struct io_ring_ctx *master_ctx; + u32 allowed_notif_mask; + u32 fired_notifs; + u64 notif_data; }; =20 #if defined(CONFIG_IO_URING_ZCRX) --=20 2.52.0 From nobody Wed Jun 17 03:57:17 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A07693CF67A; Wed, 22 Apr 2026 11:29:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857372; cv=none; b=fs0Wl/MdPMZRxYYhncGRM4sZjYB5N2FL/bEUkAs08wo7w0COWIrsjxJMnpdFroxIqFoKRSip5OjLZYBErSTl23Cf62E57GjluMG7nz4hwHuqPorNDyHJOrW1K0EkPgrrwaEj/qNuaELRIOXKbo0AwcnxyilBW2NMa+AAeP7b9RY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857372; c=relaxed/simple; bh=8Yu9AhycblC0jgrvSU9P5Iijhr7hjgqaVjh7IVp+nBM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WofRxT+tw8QrwWIuHt/yzMVfXlhXhe16AFD2gvSXa3Yu57qG/sbe1I0w3MsbTh5fFwr5ZuEO8cVPusBzL5WzBzplKk+BoAMEH+TIJFDY8A+S4oPknWHqRl/3wZpCdPP1gkhCp4aIiUvdd7vM5blR7dLuOzk8KUTbNmLjR/6/DWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Tg6eEF7F; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Tg6eEF7F" Received: from pps.filterd (m0528006.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63LIbdCc460608; Wed, 22 Apr 2026 04:29:21 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=PI2w7VIQwHzslKjhFhmPmWwQ5wHNLy5oscyWVJBbvz0=; b=Tg6eEF7Fo4by aHUzi4rNFxTUwdJcP6SZTtAqPt243jDRSc3GEt/VfhRsoWAoYvogio0y+L2vmP1M 4tcLKen1zQ6dGaxT9uUqMzN8aJh3dT8SSMW0cg0OEM7pLWptRb3HU7IMtFtp6m72 FK46de5CNOEMACtJ8/la8GT1q7ZLri9Ao54WjY9WD2oQAPKicO8HAddJsM70efAH 62cLr8qZbia5wAc1HWtPn2glPIK+VxkZw3siud1Hs+0DiZM/OBQv489qo3uAACNX zWt9FITZzArQrAinQf4wid6FUk9H12whXG+doYSErlok/uS2u+1oZ1FNjnxRw1CJ 881ZOtj2vg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dpep9mnf5-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 22 Apr 2026 04:29:20 -0700 (PDT) Received: from localhost (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Wed, 22 Apr 2026 11:29:19 +0000 From: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= To: , Pavel Begunkov , "Jens Axboe" CC: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , , , , , "David S. Miller" , Eric Dumazet , "Jakub Kicinski" , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Vishwanath Seshagiri Subject: [PATCH 2/5] io_uring/zcrx: notify user on frag copy fallback Date: Wed, 22 Apr 2026 04:25:13 -0700 Message-ID: <20260422112522.3316660-3-cleger@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422112522.3316660-1-cleger@meta.com> References: <20260422112522.3316660-1-cleger@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Authority-Analysis: v=2.4 cv=VYjH+lp9 c=1 sm=1 tr=0 ts=69e8b110 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=M51BFTxLslgA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=kkcUborcUVj0H7zxAXTl:22 a=VabnemYjAAAA:8 a=wIfzXL8Z3gsHLL3D1c4A:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-GUID: 894fdtc0NQTca4-eDW-25f5S5VO3Vesp X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDIyMDExMCBTYWx0ZWRfXygdFLHtkTFFe nsQMSPe8ubQ3ATPe8V/1/28fIibUTc8uvVg8/QC8Dr67TGIsAU2JXorbAHxtCTB1Az7zFgtosLI GpCbmEny2lwFNvQZwbdZDetlJnIRwy0HARDKZLBrUmsU3jicSdGygnPphWrACOxi0+KnBZouIH2 XYKwDIMLTHPhjDdarPbWp7H2zbGobkZjtqgyrgrg4x1Px+LY4D9YaBT8Eou7+4Y5/yLLmlNqR2h 8SkHQTJNVZtTb+JYZ2MN6qJa9390691leaEc1/G2js+k/NmtTWll/+Uo54n4ovporRXVgkPDXvh BUPsBkdiFzwipht+PWlXHlK93YWVEB0GKfJ3sSMQe/UTwmPr3t8VEMkPXIAGrIXdc0NPg/sXWi/ UjTxksJAPLiVoueHyrlhcsONgyQzR3Pb4IK3N5ObzINZnzXwIUnEMDcqGm5gOt6YcDQQdpjRp8A Qruf4VgD0GvekbULTKQ== X-Proofpoint-ORIG-GUID: 894fdtc0NQTca4-eDW-25f5S5VO3Vesp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-22_01,2026-04-21_02,2025-10-01_01 Add a ZCRX_NOTIF_COPY notification type to signal userspace when a received fragment could not be delivered using zero-copy and was instead copied into a buffer. Signed-off-by: Cl=C3=A9ment L=C3=A9ger --- include/uapi/linux/io_uring/zcrx.h | 1 + io_uring/zcrx.c | 7 ++++++- io_uring/zcrx.h | 3 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/io_uring/zcrx.h b/include/uapi/linux/io_uri= ng/zcrx.h index b8596d7d47b6..e0c0079626c8 100644 --- a/include/uapi/linux/io_uring/zcrx.h +++ b/include/uapi/linux/io_uring/zcrx.h @@ -70,6 +70,7 @@ enum zcrx_features { =20 enum zcrx_notification_type { ZCRX_NOTIF_NO_BUFFERS =3D 1 << 0, + ZCRX_NOTIF_COPY =3D 1 << 1 }; =20 struct zcrx_notification_desc { diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 35ca28cb6583..732e585aa13a 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -1510,8 +1510,13 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, s= truct io_zcrx_ifq *ifq, const skb_frag_t *frag, int off, int len) { struct page *page =3D skb_frag_page(frag); + int ret; + + ret =3D io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len); + if (ret > 0) + zcrx_send_notif(ifq, ZCRX_NOTIF_COPY); =20 - return io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len); + return ret; } =20 static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 3ddebed06d57..1bd63adaa711 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -11,7 +11,8 @@ #define ZCRX_SUPPORTED_REG_FLAGS (ZCRX_REG_IMPORT | ZCRX_REG_NODEV) #define ZCRX_FEATURES (ZCRX_FEATURE_RX_PAGE_SIZE |\ ZCRX_FEATURE_NOTIFICATION) -#define ZCRX_NOTIF_TYPE_MASK (ZCRX_NOTIF_NO_BUFFERS) +#define ZCRX_NOTIF_TYPE_MASK (ZCRX_NOTIF_NO_BUFFERS |\ + ZCRX_NOTIF_COPY) =20 struct io_zcrx_mem { unsigned long size; --=20 2.52.0 From nobody Wed Jun 17 03:57:17 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1BEA53D090B; Wed, 22 Apr 2026 11:29:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857400; cv=none; b=gLMc7l09/Vi9oalpoH2sPW9Fsa6uqiyCywETuYznxF5PZXQAlb+TBWrTZVMics9biHBQ7s3F17NnJxnxPnjgRBIFeTWMWTiqUVUgY0TM1OaI3NKscdCt+X/8Ox0MDjmgusBIHZDS1njxa+9A+ipnLjZXegJEh6qDEM73VMtIL6Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857400; c=relaxed/simple; bh=GimcXQ5Vr7CVHkQ95G4KcAipFyJMm6dpkfIxF3FoFeo=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pcclBv/HAVmHMVUYMYdurZFJLXwEEGH/9Kj9WSX/c4dSjeUXPUjFIsfRkpLksaPpFgKS0Ntw9Kl/wubo1t0esG5Vy1xwRE35YGKg+EBdmztvSJg/wDmhAIlYB3PnTL9/1C6DvBRtF7Q/HKPXiQaSy/z9s8Z6hCfBfK0ChRuDkq8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Y3FHiGy5; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Y3FHiGy5" Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63MAsJdR2107212; Wed, 22 Apr 2026 04:29:50 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=+1dg+cv4+2vyQ5Kuqxeb+I3OzBBIDzcSrfPOrCeqLDU=; b=Y3FHiGy5Sd9v /iOTQC0BFMT2bAtMyxMXaexQGEhKcM2oO/hQkq4zXL2LQsB+IYUdNJftJgFLErM0 S3N9fVi6FSAgdBp7NJg8y2LCUzqz0M3sfKpryG2+y2eH+c9v8d0aGMEZfWQK8eW2 wOU+Okk7FuOlHAsemHpYcZzxI1cCGp12iMhjbBXpqq79fsbT+Y8KXDaTuYk3lITv chOp2Eak70xNKe8OtLFY1khngdIPZo/Ir6AeGXz1Ec0F78GHAu3cguhzAArQ456O QG/EWwuUtdaFnxdAJSIVm92rlAfMXHL/JstqnW4RYd5NaDMo5ChgtRTPCjqFP/xe ovfrcGwWEg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dpepgvmmf-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 22 Apr 2026 04:29:50 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Wed, 22 Apr 2026 11:29:35 +0000 From: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= To: , Pavel Begunkov , "Jens Axboe" CC: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , , , , , "David S. Miller" , Eric Dumazet , "Jakub Kicinski" , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Vishwanath Seshagiri Subject: [PATCH 3/5] io_uring/zcrx: add shared-memory notification statistics Date: Wed, 22 Apr 2026 04:25:14 -0700 Message-ID: <20260422112522.3316660-4-cleger@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422112522.3316660-1-cleger@meta.com> References: <20260422112522.3316660-1-cleger@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDIyMDExMCBTYWx0ZWRfXx6qk8u92gex7 iVrQBfaC1TY62gDMMFA7kgOubJmwrzcQxbjWz+nQPpiVGf7/hBgInTr158z1WLjquvfVfNYNpQ+ AHl7t9lRSnglmiUaRhAkBWaeW0T5Igf/2WZqTJepFxnFKSCdQEzD0QkY23ki1mTA/iNogvMhQZW GZKDbmnbl63jl652XGeLk2jsHr8+LSHUaeA6OlpGuWXh47jhpaniPhBaQmfSXOK6O9Li71Lv/bn Sy3xszo9yrvYPVjYO28dGXqhai3enD29K+uGAKHtDoYVZW5Dg29eZ/rffEHUXWxQyAvvt+5I3Up qe05QnjKbP3bN61S1UrbD/nNOZ0wozglbPoy0hO5KAs5KpBNAPde4uj6S0j4bxFgcrgXzbFb7jf iH6yAiaLrf/TN9NLlN9VtvirjhHzxAqUPvyeihbWxS9R9BByTscwHhCndE0VMYqSRj3Xm44CSK8 HrBCogqw3t7siWw8jpA== X-Proofpoint-GUID: 9m11s-4Q0WIabXoPclC7bbh7V70kZMkH X-Proofpoint-ORIG-GUID: 9m11s-4Q0WIabXoPclC7bbh7V70kZMkH X-Authority-Analysis: v=2.4 cv=B8SJFutM c=1 sm=1 tr=0 ts=69e8b12e cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=M51BFTxLslgA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=8elwO82fXORLTBIkMd32:22 a=VabnemYjAAAA:8 a=tUABAHM7GXuONXiLVFsA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-22_01,2026-04-21_02,2025-10-01_01 Add support for an optional stats struct embedded in the refill queue region, allowing userspace to monitor copy-fallback and no-buffers events in real-time. Userspace queries the stats struct size and alignment via IO_URING_QUERY_ZCRX (notif_stats_size / notif_stats_alignment), then provides a stats_offset in zcrx_notification_desc pointing to a location within the refill queue region. The kernel updates the stats counters in-place using atomic ops on every copy-fallback and no-buffers event. Signed-off-by: Cl=C3=A9ment L=C3=A9ger --- include/uapi/linux/io_uring/query.h | 12 +++++++ include/uapi/linux/io_uring/zcrx.h | 15 +++++++-- io_uring/query.c | 14 ++++++++ io_uring/zcrx.c | 50 +++++++++++++++++++++++++++-- io_uring/zcrx.h | 1 + 5 files changed, 88 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/io_uring/query.h b/include/uapi/linux/io_ur= ing/query.h index 95500759cc13..738c35c7d05c 100644 --- a/include/uapi/linux/io_uring/query.h +++ b/include/uapi/linux/io_uring/query.h @@ -23,6 +23,7 @@ enum { IO_URING_QUERY_OPCODES =3D 0, IO_URING_QUERY_ZCRX =3D 1, IO_URING_QUERY_SCQ =3D 2, + IO_URING_QUERY_ZCRX_NOTIF =3D 3, =20 __IO_URING_QUERY_MAX, }; @@ -62,6 +63,17 @@ struct io_uring_query_zcrx { __u64 __resv2; }; =20 +struct io_uring_query_zcrx_notif { + /* Bitmask of supported ZCRX_NOTIF_* flags*/ + __u32 notif_flags; + /* Size of io_uring_zcrx_notif_stats */ + __u32 notif_stats_size; + /* Required alignment for the stats struct within the region (ie stats_of= fset) */ + __u32 notif_stats_off_alignment; + __u32 resv1; + __u64 __resv2[10]; +}; + struct io_uring_query_scq { /* The SQ/CQ rings header size */ __u64 hdr_size; diff --git a/include/uapi/linux/io_uring/zcrx.h b/include/uapi/linux/io_uri= ng/zcrx.h index e0c0079626c8..ae9bbca3004c 100644 --- a/include/uapi/linux/io_uring/zcrx.h +++ b/include/uapi/linux/io_uring/zcrx.h @@ -73,11 +73,22 @@ enum zcrx_notification_type { ZCRX_NOTIF_COPY =3D 1 << 1 }; =20 +enum zcrx_notification_desc_flags { + /* If set, stats_offset holds a valid offset to a notif_stats struct */ + ZCRX_NOTIF_DESC_FLAG_STATS =3D 1 << 0, +}; + +struct io_uring_zcrx_notif_stats { + __u64 copy_count; /* cumulative copy-fallback CQEs */ + __u64 copy_bytes; /* cumulative bytes copied */ +}; + struct zcrx_notification_desc { __u64 user_data; __u32 type_mask; - __u32 __resv1; - __u64 __resv2[10]; + __u32 flags; /* see enum zcrx_notification_desc_flags */ + __u64 stats_offset; /* offset from the beginning of refill ring region fo= r stats */ + __u64 __resv2[9]; }; =20 /* diff --git a/io_uring/query.c b/io_uring/query.c index c1704d088374..3591106e139d 100644 --- a/io_uring/query.c +++ b/io_uring/query.c @@ -9,6 +9,7 @@ union io_query_data { struct io_uring_query_opcode opcodes; struct io_uring_query_zcrx zcrx; + struct io_uring_query_zcrx_notif zcrx_notif; struct io_uring_query_scq scq; }; =20 @@ -44,6 +45,16 @@ static ssize_t io_query_zcrx(union io_query_data *data) return sizeof(*e); } =20 +static ssize_t io_query_zcrx_notif(union io_query_data *data) +{ + struct io_uring_query_zcrx_notif *e =3D &data->zcrx_notif; + + e->notif_flags =3D ZCRX_NOTIF_TYPE_MASK; + e->notif_stats_size =3D sizeof(struct io_uring_zcrx_notif_stats); + e->notif_stats_off_alignment =3D __alignof__(struct io_uring_zcrx_notif_s= tats); + return sizeof(*e); +} + static ssize_t io_query_scq(union io_query_data *data) { struct io_uring_query_scq *e =3D &data->scq; @@ -83,6 +94,9 @@ static int io_handle_query_entry(union io_query_data *dat= a, void __user *uhdr, case IO_URING_QUERY_ZCRX: ret =3D io_query_zcrx(data); break; + case IO_URING_QUERY_ZCRX_NOTIF: + ret =3D io_query_zcrx_notif(data); + break; case IO_URING_QUERY_SCQ: ret =3D io_query_scq(data); break; diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 732e585aa13a..c61f94fb14c3 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -414,6 +414,7 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq) io_free_region(ifq->user, &ifq->rq_region); ifq->rq.ring =3D NULL; ifq->rq.rqes =3D NULL; + ifq->notif_stats =3D NULL; } =20 static void io_zcrx_free_area(struct io_zcrx_ifq *ifq, @@ -841,6 +842,33 @@ static int zcrx_register_netdev(struct io_zcrx_ifq *if= q, return ret; } =20 +static int zcrx_validate_notif_stats(struct io_zcrx_ifq *ifq, + const struct io_uring_zcrx_ifq_reg *reg, + const struct zcrx_notification_desc *notif) +{ + size_t stats_off =3D notif->stats_offset; + size_t used, end; + + used =3D reg->offsets.rqes + + sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries; + + if (!IS_ALIGNED(stats_off, __alignof__(struct io_uring_zcrx_notif_stats))) + return -EINVAL; + if (stats_off < used) + return -ERANGE; + if (check_add_overflow(stats_off, + sizeof(struct io_uring_zcrx_notif_stats), + &end)) + return -ERANGE; + if (end > io_region_size(&ifq->rq_region)) + return -ERANGE; + + ifq->notif_stats =3D io_region_get_ptr(&ifq->rq_region) + stats_off; + memset(ifq->notif_stats, 0, sizeof(*ifq->notif_stats)); + + return 0; +} + int io_register_zcrx(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) { @@ -894,7 +922,9 @@ int io_register_zcrx(struct io_ring_ctx *ctx, return -EFAULT; if (notif.type_mask & ~ZCRX_NOTIF_TYPE_MASK) return -EINVAL; - if (notif.__resv1 || !mem_is_zero(¬if.__resv2, sizeof(notif.__resv2))) + if (notif.flags & ~ZCRX_NOTIF_DESC_FLAG_STATS) + return -EINVAL; + if (!mem_is_zero(¬if.__resv2, sizeof(notif.__resv2))) return -EINVAL; =20 ifq =3D io_zcrx_ifq_alloc(ctx); @@ -925,6 +955,12 @@ int io_register_zcrx(struct io_ring_ctx *ctx, if (ret) goto err; =20 + if (notif.flags & ZCRX_NOTIF_DESC_FLAG_STATS) { + ret =3D zcrx_validate_notif_stats(ifq, ®, ¬if); + if (ret) + goto err; + } + ifq->kern_readable =3D !(area.flags & IORING_ZCRX_AREA_DMABUF); =20 if (!(reg.flags & ZCRX_REG_NODEV)) { @@ -1133,6 +1169,11 @@ static void zcrx_notif_tw(struct io_tw_req tw_req, i= o_tw_token_t tw) kfree_rcu(req, rcu_head); } =20 +static void zcrx_stat_add(__u64 *p, s64 v) +{ + WRITE_ONCE(*p, READ_ONCE(*p) + v); +} + static void zcrx_send_notif(struct io_zcrx_ifq *ifq, u32 type_mask) { gfp_t gfp =3D GFP_ATOMIC | __GFP_NOWARN | __GFP_ZERO; @@ -1513,8 +1554,13 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, s= truct io_zcrx_ifq *ifq, int ret; =20 ret =3D io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len); - if (ret > 0) + if (ret > 0) { + if (ifq->notif_stats) { + zcrx_stat_add(&ifq->notif_stats->copy_count, 1); + zcrx_stat_add(&ifq->notif_stats->copy_bytes, ret); + } zcrx_send_notif(ifq, ZCRX_NOTIF_COPY); + } =20 return ret; } diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 1bd63adaa711..0dcf486ff530 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -82,6 +82,7 @@ struct io_zcrx_ifq { u32 allowed_notif_mask; u32 fired_notifs; u64 notif_data; + struct io_uring_zcrx_notif_stats *notif_stats; }; =20 #if defined(CONFIG_IO_URING_ZCRX) --=20 2.52.0 From nobody Wed Jun 17 03:57:17 2026 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BDF53D093B; Wed, 22 Apr 2026 11:30:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857402; cv=none; b=Y/qaVMrERzOEyIFYA1DJxTCmEMjABiFbJGihLFo+rEW2aMdPdzlK+2wzGdQDCmYTWkTHqECwzK/uyEHP/m9nXKD9/TD8vXXrbXx8WaINXo8amh68QU8xAk7utgEkTYs7iLCnvVnoVjJxLKmJaHorx9x0S3ohuKPgUJC1sPWY1Yk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857402; c=relaxed/simple; bh=83JslrFf9334MEp6KqDh4ZOAascOu9xrBr4UIjBgfEw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nYI+cnEED0MWP5OfcCAByxWlrFoDDZCb22fLBUa2La6l7lc4jKo6eqO9FTlgqjLDmn/zox4vHRlFNTv6eeWdqOTpExgAQS3AFwr5NYqcB5pNt0eHocUU+xsHY0L8v00Z9Itdult1TX1gAhzjO3iDW3rJnMNn5zlzruTMupNP/Cg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=sIXq/2qF; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="sIXq/2qF" Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63MAsJdU2107212; Wed, 22 Apr 2026 04:29:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=092DUjtk9S7riuWAenbtkpfmunugAPl99EMBelOHirw=; b=sIXq/2qFp0v7 pGM6pew3fMAhHRNrag/T5eHY9YSO51vmTuM4Qzz2CTz3/+C7ooGSb5wXHUihwjTz QDnGQPEo0HV5ZNQJAIg6B31p2XyUa2cy8Utf84sOuJVsLJhTpFfsvHtfeYazxTOF OtjLDEjbNMDi7nAD72X60U9OvI9PhJCZSwKfK11vZtc5Igrp1tWJqxnIlBHAdt1M nGL+kp1lHVCp3nqI/s9/Kfowf+rZ2nCLnHnvanY7AQFgDoCU5ZU/t+gLf3P/ohFz Ejh7k3kF3LYr2GSzyw2g5Q+yh/JR8UHwHTapS15ve80+Yj4+gu7624nukjohhAUT F+S6BFkFlQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dpepgvmmf-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 22 Apr 2026 04:29:51 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Wed, 22 Apr 2026 11:29:44 +0000 From: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= To: , Pavel Begunkov , "Jens Axboe" CC: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , , , , , "David S. Miller" , Eric Dumazet , "Jakub Kicinski" , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Vishwanath Seshagiri Subject: [PATCH 4/5] Documentation: networking: document zcrx notifications and statistics Date: Wed, 22 Apr 2026 04:25:15 -0700 Message-ID: <20260422112522.3316660-5-cleger@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422112522.3316660-1-cleger@meta.com> References: <20260422112522.3316660-1-cleger@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDIyMDExMCBTYWx0ZWRfX+FWo+bobXiwM aRbPa1gF05euoviqe6WbFWGell+TWnqk29jHK3NsRsUtBTKQT/EfufsDzcMce+xVmXOJb5k7lhd jGgF1MH3Wz+kK/dT0jvGyWYpAHy+Xl8mdxpf9lC2zTMaoQOyorOuwvybcnHbucizDPzwBzV+lXz 6db9IGWkNApoMAStvSvZph3IhJWJZjDBCl0UgcqF5QFxTQqLUFaNH+M2eqMcMteJ/uuWUwOSmTf /l9EmQ9IGQ8XhrWFuc8rRo62gtgHKDD8VVbIE4tO7kbdg3jwNX3UpL2cZ7GzCAJseIsFcrsQsvm ADXYwZx02wdUu1Tc6BidZQxsAKzmrZSP6+7QPZR+iNYi44Uy6Nvwyv39T+rXQTf9rd3DutT8yPv v2uekyS5HfnuenXKbkaTP9StzaA7lrjB5vm2G6S1y8XuALvdz11brdWVJ/ZBIsQBomH0JPUw0+1 sNi7STYCCp+o+SBQBxg== X-Proofpoint-GUID: l_mqzUbBP5GDCXF3nSkDXAkA4t64ClT- X-Proofpoint-ORIG-GUID: l_mqzUbBP5GDCXF3nSkDXAkA4t64ClT- X-Authority-Analysis: v=2.4 cv=B8SJFutM c=1 sm=1 tr=0 ts=69e8b12f cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=M51BFTxLslgA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=8elwO82fXORLTBIkMd32:22 a=VabnemYjAAAA:8 a=8Z81dfnnkPiAomAV-AkA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-22_01,2026-04-21_02,2025-10-01_01 Document the zcrx notification system and shared-memory statistics that were introduced to let userspace monitor zero-copy receive health. The notification section covers the two notification types (ZCRX_NOTIF_NO_BUFFERS, ZCRX_NOTIF_COPY), registration via zcrx_notification_desc, and the fire-once / re-arm mechanism via ZCRX_CTRL_ARM_NOTIFICATION. The statistics section covers the optional shared-memory io_uring_zcrx_notif_stats structure placed in the refill ring region, including how to query its layout via IO_URING_QUERY_ZCRX_NOTIF. Signed-off-by: Cl=C3=A9ment L=C3=A9ger --- Documentation/networking/iou-zcrx.rst | 106 ++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) diff --git a/Documentation/networking/iou-zcrx.rst b/Documentation/networki= ng/iou-zcrx.rst index 7f3f4b2e6cf2..b17205fe55aa 100644 --- a/Documentation/networking/iou-zcrx.rst +++ b/Documentation/networking/iou-zcrx.rst @@ -196,6 +196,112 @@ Return buffers back to the kernel to be used again:: rqe->len =3D cqe->res; IO_URING_WRITE_ONCE(*refill_ring.ktail, ++refill_ring.rq_tail); =20 +Notifications +------------- + +When zero-copy receive encounters conditions that affect performance or +functionality, the kernel can notify userspace via dedicated CQE notificat= ions. +The application must register a notification descriptor during +``IORING_REGISTER_ZCRX_IFQ`` to receive them. + +Supported features can be detected by checking for ``ZCRX_FEATURE_NOTIFICA= TION`` +in the features bitmask returned by ``IO_URING_QUERY_ZCRX``. + +**Notification types** + +``ZCRX_NOTIF_NO_BUFFERS`` + Fired when the page pool fails to allocate because the zcrx buffer area = is + exhausted. + +``ZCRX_NOTIF_COPY`` + Fired when a received fragment could not be delivered zero-copy and was + instead copied into a buffer. + +**Registering notifications** + +Allocate and fill a ``struct zcrx_notification_desc``:: + + struct zcrx_notification_desc notif =3D { + .user_data =3D MY_NOTIF_USER_DATA, + .type_mask =3D ZCRX_NOTIF_NO_BUFFERS | ZCRX_NOTIF_COPY, + }; + + reg.notif_desc =3D (__u64)(unsigned long)¬if; + +``user_data`` is the value that will appear in the notification CQE's +``user_data`` field. ``type_mask`` selects which notification types the +application wants to receive. + +When a registered event occurs, the kernel posts a CQE with the specified +``user_data`` and ``cqe->res`` set to a bitmask of the triggered notificat= ion +types. + +**Rate limiting** + +Each notification type fires once until the application explicitly re-arms= it. +To re-arm, issue ``IORING_REGISTER_ZCRX_CTRL`` with +``ZCRX_CTRL_ARM_NOTIFICATION``:: + + struct zcrx_ctrl ctrl =3D { + .zcrx_id =3D zcrx_id, + .op =3D ZCRX_CTRL_ARM_NOTIFICATION, + .zc_arm_notif =3D { + .type_mask =3D ZCRX_NOTIF_NO_BUFFERS | ZCRX_NOTIF_COPY, + }, + }; + + io_uring_register(ring_fd, IORING_REGISTER_ZCRX_CTRL, &ctrl, 0); + +Only notification types that have previously fired can be re-armed. + +Notification statistics +----------------------- + +In addition to CQE-based notifications, the kernel can maintain a shared-m= emory +statistics structure that is updated on every relevant event. All stats are +updated regardless of which notification flags were registered. + +The statistics structure layout and alignment requirements can be queried = via +``IO_URING_QUERY_ZCRX_NOTIF``. The application must query the structure si= ze +and alignment requirements so that it allocates enough memory for the regi= on +to fit both the refill ring and the stats structure. + +To enable statistics, place the stats structure after the refill ring entr= ies +within the same mapped region, and set the ``ZCRX_NOTIF_DESC_FLAG_STATS`` = flag +in the notification descriptor:: + + /* Compute offset for the stats struct (after refill ring entries) */ + size_t stats_offset =3D ring_size; + ring_size +=3D ALIGN_UP(sizeof(struct io_uring_zcrx_notif_stats), PAGE_S= IZE); + + /* Map the region with the extra space */ + ring_ptr =3D mmap(NULL, ring_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); + + struct zcrx_notification_desc notif =3D { + .user_data =3D MY_NOTIF_USER_DATA, + .type_mask =3D ZCRX_NOTIF_COPY, + .flags =3D ZCRX_NOTIF_DESC_FLAG_STATS, + .stats_offset =3D stats_offset, + }; + +The ``stats_offset`` must satisfy the alignment reported by +``notif_stats_off_alignment`` and must point to a location within the mapp= ed +region that does not overlap with the refill ring header or entries. + +Application can read stat counters them at any time:: + + volatile struct io_uring_zcrx_notif_stats *stats =3D + (void *)((char *)ring_ptr + stats_offset); + + printf("copy fallbacks: %llu (%llu bytes)\n", + IO_URING_READ_ONCE(stats->copy_count), + IO_URING_READ_ONCE(stats->copy_bytes)); + +``copy_count`` is incremented each time a fragment is copied instead of be= ing +delivered via zero-copy. ``copy_bytes`` accumulates the total number of by= tes +copied. + Area chunking ------------- =20 --=20 2.52.0 From nobody Wed Jun 17 03:57:17 2026 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4F5534D39B; Wed, 22 Apr 2026 11:30:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857424; cv=none; b=jNty3oaGvDY9bPs36XeE8qdy0lJv5Rx4yWujf71a0+tO9PFF98fnATOH8DW5OylmKvCswY1ul6v6gF6/xS+IhA/LA44H7oSGdj1UO9qb48Lbq+qko3/o3/Vd8fDndYesr68oD+lfSKpdxgadHjViTWExX0DarLWj4VJc83XISxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776857424; c=relaxed/simple; bh=jlWcXL5qFGJMeIaua8rhnaRefnti71TwgpOwqoEITQ4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=om6htKz18ojuEYay3XjngJxxhWNBRwo2doJ4roxYqmjORngeEa7QUESLIpgz1LHZRXtUyzpLh2P6AJOkB7FirJCYMFQw0VEd2OyvigZ2Zuutao2vQS71dxu5bjwiWBMzUW9/lnw4YlXRuEs16btrkUCOG9Jc+cwaLWzf7pWv6wM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=FeyrDftL; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="FeyrDftL" Received: from pps.filterd (m0528005.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 63LIkchr624972; Wed, 22 Apr 2026 04:30:14 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=3QXOcOD0wueZJZFrD8NqKljSbt85OuvLWwPFpEKe4XE=; b=FeyrDftLa3Q9 osIc0JT7SENKwrdPLssv0+Vude44VQwTGf0E1U9H9FWRJ34+pwnWuei0Id/dIou7 WmtUZQzK6NJtvEFeCzheBa4Lf6JrvYteLxCgs9rnvH5SweItBih78rezgVhDzWnF vIetDuIcVVqGYtbW9jBZBcA2gHrXJDYpLBlRZw37uWe73gHXDMitgCKrhG9cy26l u9FS5pDuX1buIeGHl7Xie6ykJsNAMgug/Bz5Yj74PMNmGWP2DBASGkvp9EqlLQiq bHXBBo4tPW+2vjI0IXMGosKDJAlJS3vif8BUd1B7I4xd2lP7HRn5FIWKdgPa8o/y rVxnT3EKmg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 4dpepa4rd4-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT); Wed, 22 Apr 2026 04:30:14 -0700 (PDT) Received: from localhost (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.37; Wed, 22 Apr 2026 11:30:00 +0000 From: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= To: , Pavel Begunkov , "Jens Axboe" CC: =?UTF-8?q?Cl=C3=A9ment=20L=C3=A9ger?= , , , , , "David S. Miller" , Eric Dumazet , "Jakub Kicinski" , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Vishwanath Seshagiri Subject: [PATCH 5/5] selftests: iou-zcrx: add notification and stats test for zcrx Date: Wed, 22 Apr 2026 04:25:16 -0700 Message-ID: <20260422112522.3316660-6-cleger@meta.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260422112522.3316660-1-cleger@meta.com> References: <20260422112522.3316660-1-cleger@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-ORIG-GUID: qJ21zn8NynDKXKjQrPJkjScQLJgSedUH X-Authority-Analysis: v=2.4 cv=KMxqylFo c=1 sm=1 tr=0 ts=69e8b146 cx=c_pps a=MfjaFnPeirRr97d5FC5oHw==:117 a=MfjaFnPeirRr97d5FC5oHw==:17 a=IkcTkHD0fZMA:10 a=A5OVakUREuEA:10 a=M51BFTxLslgA:10 a=VkNPw1HP01LnGYTKEx00:22 a=7x6HtfJdh03M6CCDgxCd:22 a=jCddH8ec0KUNCymVuxII:22 a=VabnemYjAAAA:8 a=hM4E8hCzYzPdoCAyIpQA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNDIyMDExMCBTYWx0ZWRfX3AT1DPJviY6c km2XnOLcwJxoS6NfMdyonLJAjPrFt2P2x1/5QAMqeDeY/3DO8Zfqu5gElbhqPWMNfu/Qcm0wo7j 42nV9U39wKxfv8xK0kaDo4FUoBmfwL5L/pboA/g91zB+99Nd6r1DlpjChmoVrJG+j0Y9G/I8jeO N1iRfBcKNXv43g+Oj9M0hn7HpHubWm81Ebe19vIj3C7OYlihfEyBnUegVEe0N7mzLZhX+Er55fd ErCrDOOqqhLVXR0Pj2M6VUF362rdJGARd9G514rXBLEzex18IzhV/XAMCAc0ZFVUoRpAVFXIy7t j2VOsWj5fyp6xDm0rRfbDi03oMeZJlqAljmU6EcurgZnsE1gFEbXGH9iB3hRc9mBMo9fTPdjeL3 yX5VXfXfloHJncqRfXHeN9qJdA2gNCd4OoqOtSDw1mlo6BIow917cdaLcfcoVlWW/J2EIcwbW3s CtQbFqjwc5TmnFO6zxQ== X-Proofpoint-GUID: qJ21zn8NynDKXKjQrPJkjScQLJgSedUH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.51,FMLib:17.12.100.49 definitions=2026-04-22_01,2026-04-21_02,2025-10-01_01 Add a selftest to verify that ZCRX notification are properly delivered to userspace and that the shared-memory notification stats (copy_count, copy_bytes) are correctly incremented when zero-copy RX falls back to copying or when it runs out of buffers. The test registers a notification descriptor during IORING_REGISTER_ZCRX_IFQ with a stats region placed after the refill queue entries. A new -n flag verifies that the copy fallback is triggered and -b/-a flags allows to check for out of buffer notification. To reliably trigger copy fallback, the Python test uses a new single_no_flow() setup variant that configures tcp-data-split and RSS but without ethtool flow rule. Without flow steering, traffic arrives on non-zcrx queues as regular pages, forcing the kernel copy-fallback path in io_zcrx_copy_frag(). Out-of-buffer notification is verified by using a smaller receive area and by avoiding recycling the buffers so that the kernel runs out of buffer quickly. Signed-off-by: Cl=C3=A9ment L=C3=A9ger --- .../selftests/drivers/net/hw/iou-zcrx.c | 112 ++++++++++++++++-- .../selftests/drivers/net/hw/iou-zcrx.py | 49 +++++++- 2 files changed, 149 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c b/tools/test= ing/selftests/drivers/net/hw/iou-zcrx.c index 240d13dbc54e..3c95e6460c24 100644 --- a/tools/testing/selftests/drivers/net/hw/iou-zcrx.c +++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.c @@ -52,7 +52,27 @@ struct t_io_uring_zcrx_ifq_reg { struct io_uring_zcrx_offsets offsets; __u32 zcrx_id; __u32 rx_buf_len; - __u64 __resv[3]; + __u64 notif_desc; + __u64 __resv[2]; +}; + +#define ZCRX_NOTIF_NO_BUFFERS (1 << 0) +#define ZCRX_NOTIF_COPY (1 << 1) +#define ZCRX_NOTIF_DESC_FLAG_STATS (1 << 0) + +#define NOTIF_USER_DATA 3 + +struct t_zcrx_notification_desc { + __u64 user_data; + __u32 type_mask; + __u32 flags; + __u64 stats_offset; + __u64 __resv2[9]; +}; + +struct t_io_uring_zcrx_notif_stats { + __u64 copy_count; + __u64 copy_bytes; }; =20 static long page_size; @@ -84,7 +104,10 @@ static int cfg_oneshot_recvs; static int cfg_send_size =3D SEND_SIZE; static struct sockaddr_in6 cfg_addr; static unsigned int cfg_rx_buf_len; +static size_t cfg_area_size; static bool cfg_dry_run; +static bool cfg_copy_fallback; +static bool cfg_no_buffers; =20 static char *payload; static void *area_ptr; @@ -95,6 +118,8 @@ static unsigned long area_token; static int connfd; static bool stop; static size_t received; +static unsigned int notif_received_mask; +static size_t notif_stats_offset; =20 static unsigned long gettimeofday_ms(void) { @@ -142,6 +167,7 @@ static void setup_zcrx(struct io_uring *ring) { unsigned int ifindex; unsigned int rq_entries =3D 4096; + size_t area_size =3D cfg_area_size ? cfg_area_size : AREA_SIZE; int ret; =20 ifindex =3D if_nametoindex(cfg_ifname); @@ -150,7 +176,7 @@ static void setup_zcrx(struct io_uring *ring) =20 if (cfg_rx_buf_len && cfg_rx_buf_len !=3D page_size) { area_ptr =3D mmap(NULL, - AREA_SIZE, + area_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | MAP_HUGE_2MB, @@ -162,7 +188,7 @@ static void setup_zcrx(struct io_uring *ring) } } else { area_ptr =3D mmap(NULL, - AREA_SIZE, + area_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, @@ -172,6 +198,12 @@ static void setup_zcrx(struct io_uring *ring) } =20 ring_size =3D get_refill_ring_size(rq_entries); + + if (cfg_copy_fallback) { + notif_stats_offset =3D ring_size; + ring_size +=3D ALIGN_UP(sizeof(struct t_io_uring_zcrx_notif_stats), page= _size); + } + ring_ptr =3D mmap(NULL, ring_size, PROT_READ | PROT_WRITE, @@ -187,10 +219,11 @@ static void setup_zcrx(struct io_uring *ring) =20 struct io_uring_zcrx_area_reg area_reg =3D { .addr =3D (__u64)(unsigned long)area_ptr, - .len =3D AREA_SIZE, + .len =3D area_size, .flags =3D 0, }; =20 + struct t_zcrx_notification_desc notif_desc; struct t_io_uring_zcrx_ifq_reg reg =3D { .if_idx =3D ifindex, .if_rxq =3D cfg_queue_id, @@ -200,11 +233,32 @@ static void setup_zcrx(struct io_uring *ring) .rx_buf_len =3D cfg_rx_buf_len, }; =20 + if (cfg_copy_fallback || cfg_no_buffers) { + __u32 type_mask =3D 0; + + if (cfg_copy_fallback) + type_mask =3D ZCRX_NOTIF_COPY; + if (cfg_no_buffers) + type_mask =3D ZCRX_NOTIF_NO_BUFFERS; + + memset(¬if_desc, 0, sizeof(notif_desc)); + notif_desc.user_data =3D NOTIF_USER_DATA; + notif_desc.type_mask =3D type_mask; + if (cfg_copy_fallback) { + notif_desc.flags =3D ZCRX_NOTIF_DESC_FLAG_STATS; + notif_desc.stats_offset =3D notif_stats_offset; + } + reg.notif_desc =3D (__u64)(unsigned long)¬if_desc; + } + ret =3D io_uring_register_ifq(ring, (void *)®); if (cfg_rx_buf_len && (ret =3D=3D -EINVAL || ret =3D=3D -EOPNOTSUPP || ret =3D=3D -ERANGE)) { printf("Large chunks are not supported %i\n", ret); exit(SKIP_CODE); + } else if ((cfg_copy_fallback || cfg_no_buffers) && ret =3D=3D -EINVAL) { + printf("Notifications not supported %i\n", ret); + exit(SKIP_CODE); } else if (ret) { error(1, 0, "io_uring_register_ifq(): %d", ret); } @@ -304,10 +358,13 @@ static void process_recvzc(struct io_uring *ring, str= uct io_uring_cqe *cqe) } received +=3D n; =20 - rqe =3D &rq_ring.rqes[(rq_ring.rq_tail & rq_mask)]; - rqe->off =3D (rcqe->off & ~IORING_ZCRX_AREA_MASK) | area_token; - rqe->len =3D cqe->res; - io_uring_smp_store_release(rq_ring.ktail, ++rq_ring.rq_tail); + /* Skip ring refill so that we ran out of buffers quickly */ + if (!cfg_no_buffers) { + rqe =3D &rq_ring.rqes[(rq_ring.rq_tail & rq_mask)]; + rqe->off =3D (rcqe->off & ~IORING_ZCRX_AREA_MASK) | area_token; + rqe->len =3D cqe->res; + io_uring_smp_store_release(rq_ring.ktail, ++rq_ring.rq_tail); + } } =20 static void server_loop(struct io_uring *ring) @@ -324,8 +381,15 @@ static void server_loop(struct io_uring *ring) process_accept(ring, cqe); else if (cqe->user_data =3D=3D 2) process_recvzc(ring, cqe); - else + else if ((cfg_copy_fallback || cfg_no_buffers) && + cqe->user_data =3D=3D NOTIF_USER_DATA) { + notif_received_mask |=3D cqe->res; + if (cfg_no_buffers && + (cqe->res & ZCRX_NOTIF_NO_BUFFERS)) + stop =3D true; + } else { error(1, 0, "unknown cqe"); + } count++; } io_uring_cq_advance(ring, count); @@ -374,6 +438,23 @@ static void run_server(void) =20 if (!stop) error(1, 0, "test failed\n"); + + if (cfg_copy_fallback) { + struct t_io_uring_zcrx_notif_stats *stats =3D + (void *)((char *)ring_ptr + notif_stats_offset); + + if (!(notif_received_mask & ZCRX_NOTIF_COPY)) + error(1, 0, "expected copy fallback notification"); + if (!IO_URING_READ_ONCE(stats->copy_count)) + error(1, 0, "expected copy_count > 0"); + if (!IO_URING_READ_ONCE(stats->copy_bytes)) + error(1, 0, "expected copy_bytes > 0"); + } + + if (cfg_no_buffers) { + if (!(notif_received_mask & ZCRX_NOTIF_NO_BUFFERS)) + error(1, 0, "expected no-buffers notification"); + } } =20 static void run_client(void) @@ -425,7 +506,7 @@ static void parse_opts(int argc, char **argv) usage(argv[0]); cfg_payload_len =3D max_payload_len; =20 - while ((c =3D getopt(argc, argv, "sch:p:l:i:q:o:z:x:d")) !=3D -1) { + while ((c =3D getopt(argc, argv, "sch:p:l:i:q:o:z:x:a:dnb")) !=3D -1) { switch (c) { case 's': if (cfg_client) @@ -466,8 +547,19 @@ static void parse_opts(int argc, char **argv) case 'd': cfg_dry_run =3D true; break; + case 'n': + cfg_copy_fallback =3D true; + break; + case 'b': + cfg_no_buffers =3D true; + break; + case 'a': + cfg_area_size =3D strtoul(optarg, NULL, 0) * page_size; + break; } } + if (cfg_copy_fallback && cfg_no_buffers) + error(1, 0, "Pass one of -n or -b"); =20 if (cfg_server && addr) error(1, 0, "Receiver cannot have -h specified"); diff --git a/tools/testing/selftests/drivers/net/hw/iou-zcrx.py b/tools/tes= ting/selftests/drivers/net/hw/iou-zcrx.py index e81724cb5542..f7f1cbff5959 100755 --- a/tools/testing/selftests/drivers/net/hw/iou-zcrx.py +++ b/tools/testing/selftests/drivers/net/hw/iou-zcrx.py @@ -41,7 +41,9 @@ def set_flow_rule_rss(cfg, rss_ctx_id): return int(values) =20 =20 -def single(cfg): +def single_no_flow(cfg): + """Like single() but without a flow rule.""" + channels =3D cfg.ethnl.channels_get({'header': {'dev-index': cfg.ifind= ex}}) channels =3D channels['combined-count'] if channels < 2: @@ -65,6 +67,9 @@ def single(cfg): ethtool(f"-X {cfg.ifname} equal {cfg.target}") defer(ethtool, f"-X {cfg.ifname} default") =20 +def single(cfg): + single_no_flow(cfg) + flow_rule_id =3D set_flow_rule(cfg) defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}") =20 @@ -130,6 +135,26 @@ def test_zcrx_oneshot(cfg, setup) -> None: cmd(tx_cmd, host=3Dcfg.remote) =20 =20 +@ksft_variants([ + KsftNamedVariant("single", single_no_flow), +]) +def test_zcrx_notif(cfg, setup) -> None: + """Test zcrx copy fallback notification. + + Omits the flow rule so traffic arrives on non-zcrx queues as regular + pages, forcing the kernel copy-fallback path. Asserts that the + ZCRX_NOTIF_COPY notification CQE is delivered.""" + + cfg.require_ipver('6') + + setup(cfg) + rx_cmd =3D f"{cfg.bin_local} -s -p {cfg.port} -i {cfg.ifname} -q {cfg.= target} -n" + tx_cmd =3D f"{cfg.bin_remote} -c -h {cfg.addr_v['6']} -p {cfg.port} -l= 12840" + with bkg(rx_cmd, exit_wait=3DTrue): + wait_port_listen(cfg.port, proto=3D"tcp") + cmd(tx_cmd, host=3Dcfg.remote) + + def test_zcrx_large_chunks(cfg) -> None: """Test zcrx with large buffer chunks.""" =20 @@ -157,6 +182,25 @@ def test_zcrx_large_chunks(cfg) -> None: cmd(tx_cmd, host=3Dcfg.remote) =20 =20 +@ksft_variants([ + KsftNamedVariant("single", single), +]) +def test_zcrx_notif_no_buffers(cfg, setup) -> None: + """Test zcrx out-of-buffer notification. + + Skips buffer refill so the pool is quickly exhausted, triggering + a ZCRX_NOTIF_NO_BUFFERS notification CQE.""" + + cfg.require_ipver('6') + + setup(cfg) + rx_cmd =3D f"{cfg.bin_local} -s -p {cfg.port} -i {cfg.ifname} -q {cfg.= target} -b -a 64" + tx_cmd =3D f"{cfg.bin_remote} -c -h {cfg.addr_v['6']} -p {cfg.port} -l= 12840" + with bkg(rx_cmd, exit_wait=3DTrue): + wait_port_listen(cfg.port, proto=3D"tcp") + cmd(tx_cmd, host=3Dcfg.remote, fail=3DFalse) + + def main() -> None: with NetDrvEpEnv(__file__) as cfg: cfg.bin_local =3D path.abspath(path.dirname(__file__) + "/../../..= /drivers/net/hw/iou-zcrx") @@ -166,7 +210,8 @@ def main() -> None: cfg.netnl =3D NetdevFamily() cfg.port =3D rand_port() ksft_run(globs=3Dglobals(), cases=3D[test_zcrx, test_zcrx_oneshot, - test_zcrx_large_chunks], args=3D(c= fg, )) + test_zcrx_large_chunks, test_zcrx_= notif, + test_zcrx_notif_no_buffers], args= =3D(cfg, )) ksft_exit() =20 =20 --=20 2.52.0