From nobody Fri Jun 12 11:37:43 2026
Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12B9F410D0C;
	Fri, 15 May 2026 10:29:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=82.195.75.108
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778840944; cv=none;
 b=Gz+9EXeH3Sj6vjRP8h5KvR6FDgeStGDJl9qWdJWCogZmNCib8SnzBbHSrkUnYXBYz4Cynf7+PNZsHc2lidMOWMtU6NzTTGtFGqEEJ2H6IR54i9mLkHmQOxXDAMwbli0FFByELWE51qD2mFpS70Z5MxoQtHsuFHMw9Qgz/6nzrR0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778840944; c=relaxed/simple;
	bh=MPURX6JFlAquZBbR7yyzF8vZZq340uZMJDshW8LZBPY=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=ZXUR5peJEXEK8cqkCI2qedqlsHTXuQ+Fm96k8riltmP5Qv151y5yYL2sdbfcnzXWYHcHsXydW59dkimuqviRDVETHiSR+gv0XM9FYmbq672kqpmNjSpG1vAx3BvXZ2p1zDs2Qz8pKjuXFxKovHjy+mhXNUKh8OSMcJ27teEzfY8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=debian.org;
 spf=pass smtp.mailfrom=debian.org;
 dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org
 header.b=g2tNiYs7; arc=none smtp.client-ip=82.195.75.108
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=debian.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=debian.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org
 header.b="g2tNiYs7"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org;
	s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References:
	Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:
	From:Reply-To:Content-ID:Content-Description;
	bh=rCPR2AgJvJjAbhXX3W+kZLQf9geRDBUDTHySUo4oNXE=; b=g2tNiYs7eN8jSN2OF+QImOBBua
	rP3j29vXHwRTfiXoJatZ/Vn1aG4RvBW+IoJyL6a935TQx4G1o4Xv++mjo5IpnZbDMqHX4yozXGuU4
	rvGZL9yobnZRY7sm9Y0uRt/waBVACyL/L5ahl2TDtuQalx/2Vnn2MOUztj/pf8WftUuymcm0L7UnK
	fIwuMOq/6nSV3dBApm8mc5VPikmJu7xcz9uuHbRUerWtzg7LgmjVLPiRtxqr8Zor93epI14DJaWo9
	X6V9Z/waiOOL6NGW3vFWkMggE0KDycGtdNyVC19ebPRWR2fkSTzGfUv3QpIQA5b/JSw4Kry7eZsoC
	CZmRUa4A==;
Received: from authenticated user
	by stravinsky.debian.org with esmtpsa
 (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256)
	(Exim 4.96)
	(envelope-from <leitao@debian.org>)
	id 1wNpmt-004fTo-1o;
	Fri, 15 May 2026 10:28:59 +0000
From: Breno Leitao <leitao@debian.org>
Date: Fri, 15 May 2026 03:28:44 -0700
Subject: [PATCH 1/2] fs/pipe: bulk pre-allocate pages outside pipe->mutex
 in anon_pipe_write
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260515-fix_pipe-v1-1-b14c840c7555@debian.org>
References: <20260515-fix_pipe-v1-0-b14c840c7555@debian.org>
In-Reply-To: <20260515-fix_pipe-v1-0-b14c840c7555@debian.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
 Shuah Khan <shuah@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
 linux-kselftest@vger.kernel.org, Breno Leitao <leitao@debian.org>,
 usama.arif@linux.dev, kernel-team@meta.com
X-Mailer: b4 0.16-dev-d5d98
X-Developer-Signature: v=1; a=openpgp-sha256; l=4832; i=leitao@debian.org;
 h=from:subject:message-id; bh=MPURX6JFlAquZBbR7yyzF8vZZq340uZMJDshW8LZBPY=;
 b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqBvVimeYTLLrcByW/6F4+ao69Ef/k3DDp0EI+z
 YIsm1d7hD+JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCagb1YgAKCRA1o5Of/Hh3
 bUCuEACOYF3vXZNvkAYTQvVN4r/lZHxlgA+/Q3Eg06G1UqOHH61r+sCRFGDd8zRGjOLMZWhdv40
 c+0rsmOgU6b8z2hk5W/MXnZeVLkpUIjnFc1x+ZX7XanDFqJK0Jrp6W7UYqLiZ4iE4ffOtzK6MZy
 5hng4c9kh4NhDvywihwQWR4R1aLUlFIjoQLeGtVHNYHMIOP8IotODEKTD2kTL46mKywcym65WI4
 qIeudgEjPGLa4XxAWG0jYsSLEOZ0vaVL5+4x/Hn5+RdvtzUrnBBsC41VbZ2PFM2eF9pP170yK9w
 Ipx/WDPXCWjiA2KrY6AeTb1xnQdtlC/ntrocV1rQyvsK9sS/OC505eM8P1LqwqedcXdP6AYoqLg
 RlwZLrZjl3qt7V3pPb/WPXp10z7ECaH9kJY3NmehT1fz/EJNNJdkKvgNQt5sck5eLejRaWavFgf
 c34b2MiTW1CxwUd9Obh6BH0jAcsuhJy2+A355QNmhpS3dg8+mEiGPnl2R2Ci/Vx+BmABD20MyFN
 NQfzd1u9cDpkvCrAY8YLofdRuN3LPPh/Au2dw2Ih7cks/RcVyGD/VtSOLqJDvF4RvamARjK90zb
 TxqxbhTc5IUqoYPk0kHfEhLXuKxxNGsuv00rFWkbP0dw78tAHhkXcSJEfR0EjMd0HDbNJx9yiz9
 RwX22iPH2yrnzlA==
X-Developer-Key: i=leitao@debian.org; a=openpgp;
 fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D
X-Debian-User: leitao

anon_pipe_write() takes pipe->mutex and then, from the per-iteration
anon_pipe_get_page() helper, used to call alloc_page(GFP_HIGHUSER |
__GFP_ACCOUNT) once per page while still holding it. That allocation
can sleep doing direct reclaim and/or runs memcg charging, which extends
the critical section and stalls a concurrent reader on the very same
mutex.

Bulk pre-allocate DIV_ROUND_UP(total_len, PAGE_SIZE) pages (up to
PIPE_PREALLOC_MAX (8)) pages outside the mutex when total_len >=3D
PAGE_SIZE, using alloc_pages_bulk().  (Under memcg, alloc_pages_bulk()
with __GFP_ACCOUNT might return less pages than requested, but, this is
still a win, given some pages allocation is moved outside of the
lock).

Pass the array into anon_pipe_get_page(), which now consumes from
tmp_page[] first, then from the prealloc array, and only as a last
resort falls back to alloc_page() under the mutex (reached only for
writes larger than 8 pages, where the prealloc cap is exhausted).

Doing this in one bulk call before the lock keeps the fast path's
mutex held for a single, write-bounded critical section -- no extra
mutex_unlock/_lock cycles -- so it avoids the per-page lock-handoff
overhead that a per-page drop-and-retake design would introduce, while
still moving the typical multi-page allocation off the critical
section. Unused prealloc pages are pushed to the pipe's tmp_page[]
cache (or freed) before unlock, so a subsequent write to the same
pipe gets a hot cached page rather than paying for an alloc again.

Sub-PAGE_SIZE writes are unchanged: the merge path handles them
without ever needing a fresh page, so it is not worth speculatively
allocating for them.

This can improve the pipe throughput up to 48% and reduce the latency in
33%.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 fs/pipe.c | 40 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 9841648c9cf3e..7a1517d15107a 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -111,7 +111,11 @@ void pipe_double_lock(struct pipe_inode_info *pipe1,
 	pipe_lock(pipe2);
 }
=20
-static struct page *anon_pipe_get_page(struct pipe_inode_info *pipe)
+#define PIPE_PREALLOC_MAX 8
+
+static struct page *anon_pipe_get_page(struct pipe_inode_info *pipe,
+				       struct page **prealloc,
+				       unsigned int *prealloc_n)
 {
 	for (int i =3D 0; i < ARRAY_SIZE(pipe->tmp_page); i++) {
 		if (pipe->tmp_page[i]) {
@@ -121,6 +125,14 @@ static struct page *anon_pipe_get_page(struct pipe_ino=
de_info *pipe)
 		}
 	}
=20
+	if (*prealloc_n) {
+		unsigned int idx =3D --(*prealloc_n);
+		struct page *page =3D prealloc[idx];
+
+		prealloc[idx] =3D NULL;
+		return page;
+	}
+
 	return alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
 }
=20
@@ -438,6 +450,8 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *fr=
om)
 	ssize_t chars;
 	bool was_empty =3D false;
 	bool wake_next_writer =3D false;
+	struct page *prealloc[PIPE_PREALLOC_MAX] =3D { NULL };
+	unsigned int prealloc_n =3D 0;
=20
 	/*
 	 * Reject writing to watch queue pipes before the point where we lock
@@ -455,6 +469,26 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *f=
rom)
 	if (unlikely(total_len =3D=3D 0))
 		return 0;
=20
+	/*
+	 * Bulk pre-allocate up to PIPE_PREALLOC_MAX pages outside pipe->mutex
+	 * for writes that span at least one full page. alloc_page() with
+	 * GFP_HIGHUSER may sleep doing reclaim and runs memcg charging, so
+	 * doing it under the mutex extends the critical section and stalls
+	 * the reader. The merge path handles sub-PAGE_SIZE writes without
+	 * needing a fresh page; for writes larger than PIPE_PREALLOC_MAX
+	 * pages, anon_pipe_get_page() falls back to a single alloc_page()
+	 * under the mutex for the remainder. Unused prealloc pages are
+	 * returned to the pipe's tmp_page[] cache (or freed) before unlock.
+	 */
+	if (total_len >=3D PAGE_SIZE) {
+		unsigned int want =3D min_t(unsigned int,
+					  DIV_ROUND_UP(total_len, PAGE_SIZE),
+					  PIPE_PREALLOC_MAX);
+
+		prealloc_n =3D alloc_pages_bulk(GFP_HIGHUSER | __GFP_ACCOUNT,
+					      want, prealloc);
+	}
+
 	mutex_lock(&pipe->mutex);
=20
 	if (!pipe->readers) {
@@ -512,7 +546,7 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *fr=
om)
 			struct page *page;
 			int copied;
=20
-			page =3D anon_pipe_get_page(pipe);
+			page =3D anon_pipe_get_page(pipe, prealloc, &prealloc_n);
 			if (unlikely(!page)) {
 				if (!ret)
 					ret =3D -ENOMEM;
@@ -576,6 +610,8 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *fr=
om)
 		wake_next_writer =3D true;
 	}
 out:
+	while (prealloc_n)
+		anon_pipe_put_page(pipe, prealloc[--prealloc_n]);
 	if (pipe_is_full(pipe))
 		wake_next_writer =3D false;
 	mutex_unlock(&pipe->mutex);

--=20
2.53.0-Meta
From nobody Fri Jun 12 11:37:43 2026
Received: from stravinsky.debian.org (stravinsky.debian.org [82.195.75.108])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4344444CF34;
	Fri, 15 May 2026 10:29:06 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=82.195.75.108
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778840948; cv=none;
 b=qir24mP0IlQ7+4s4ntt2c4tTIyzXnPLsSTj6NmEwWentMCT53DQGeWBbjLkmr+FYgrZpm4k9dyYxRXU9BsuLLPCqgyJRJvSjarJbCEKKBDMzTHqVF28/2h6W8+7U32AezJlvwkNN+mOiLG6YqipGBN1ifwp/MQiIEjLIIv/1lXU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778840948; c=relaxed/simple;
	bh=m4i7PFdLlUnqCYqxMfa+Q5xPbTjCIMkbcSCwB3xnLgI=;
	h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References:
	 In-Reply-To:To:Cc;
 b=KAvdfRlXr0WpTBrmHvWXB7dFmy5OFp0k2wQ3vh5330lAxO+PcIdwwaybtI18GRDlqz5gklCcP2zvHqL21EwpNQonNenxOgYmBKr52TEhOXSxqFeo+tyOiwQtus1eM1G5gzElOYoBHvb4rrXsiEpEGkkT3hlSer7+4yRzZ+JP8qc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=debian.org;
 spf=pass smtp.mailfrom=debian.org;
 dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org
 header.b=RUAEB/1f; arc=none smtp.client-ip=82.195.75.108
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=debian.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=debian.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=debian.org header.i=@debian.org
 header.b="RUAEB/1f"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org;
	s=smtpauto.stravinsky; h=X-Debian-User:Cc:To:In-Reply-To:References:
	Message-Id:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject:Date:
	From:Reply-To:Content-ID:Content-Description;
	bh=g6M0G7xa3gkyrDAsFQTV6ZevphthtWis3w+qpXkc1IE=; b=RUAEB/1fZjMSkc2KRP+ThU1sG9
	uuAQvbsY5m11jJThtmok+wCoXXGAPxAEmkLhUU8qCzfKiOaI0rXfhC4+kAg+IeTwYtfPs9r6dPAZp
	jL08HfwnvkPs8MNkm3jZsWOFjbXoZm3yePSqROgQRuB32dekBL/N8ZeMR/d6vXfjD4qXYV6J3xoGf
	Xrk28yWTygy3dnx2Ep+0qHuv6mRXRKIhO9AtOMzWEJR3cAgek7pV3j0e+HMuyfTVsHXSy+lURC3V9
	3QLzDW3czqPGXJzNZ6SaOJKh1e5lfeL8jCxNyV7CNU/znD91X4DWf5tUfXHb6oTveYtkX1r02Pbes
	gHmXyKRg==;
Received: from authenticated user
	by stravinsky.debian.org with esmtpsa
 (TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256)
	(Exim 4.96)
	(envelope-from <leitao@debian.org>)
	id 1wNpmx-004fTy-1W;
	Fri, 15 May 2026 10:29:03 +0000
From: Breno Leitao <leitao@debian.org>
Date: Fri, 15 May 2026 03:28:45 -0700
Subject: [PATCH 2/2] selftests/pipe: add pipe_bench microbenchmark
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260515-fix_pipe-v1-2-b14c840c7555@debian.org>
References: <20260515-fix_pipe-v1-0-b14c840c7555@debian.org>
In-Reply-To: <20260515-fix_pipe-v1-0-b14c840c7555@debian.org>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
 Shuah Khan <shuah@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
 linux-kselftest@vger.kernel.org, Breno Leitao <leitao@debian.org>,
 usama.arif@linux.dev, kernel-team@meta.com
X-Mailer: b4 0.16-dev-d5d98
X-Developer-Signature: v=1; a=openpgp-sha256; l=12094; i=leitao@debian.org;
 h=from:subject:message-id; bh=m4i7PFdLlUnqCYqxMfa+Q5xPbTjCIMkbcSCwB3xnLgI=;
 b=owEBbQKS/ZANAwAIATWjk5/8eHdtAcsmYgBqBvVi7rSMMc7Rtye391f+acYxLNVd4aoYM1gVw
 lDiO8C+HY2JAjMEAAEIAB0WIQSshTmm6PRnAspKQ5s1o5Of/Hh3bQUCagb1YgAKCRA1o5Of/Hh3
 bYTNEACZLi7GN6mwm0UMF2ed5AwXhzuIMGmB+ylEzE6y3kpGEKnp+XPTyXxP3dBOozSWrWL4y+n
 JVJWDmtyUrqbZlCCG4Ykr2Ms8PA59ZNvVoaznb+frMkekOPguUHTJtOdsNicBsuZPgoFigVRQM6
 4sSwhlEkxcigLQeohtB4zktlYxb7qCQQBBt0/4Lx3StOHDm8ErIwo0Slk9IaeMR30o09DfxO1pF
 SMPDPLRqhbJQHGaJ3mE0RILRWT1rufE/ackS9kJs7/qGwQhJ3hVyCKeBn+npf+Fru4sJKLMmfp+
 S5tq877Xsm2wRJ7ic/1UIFqNLAwysYU/DBbM+OGIrv8oi/prDMfmXWZFsJFkhlarzScvjazixwr
 pneDXe9ss3adctUFN35/z1xidrhTKXAIzyd6PbKpjS8Iz6pinEJkSfeV0aQROsM8Q5KlQdK13vU
 1QG8qlTKL20rAYFD1zadgtq7RkTOPzpkkbpdDctHH3Qer3lsh8u5ISvSC70HpIPQxcSQD5gkUfM
 kHNFiUDh9lXA7QOd8UtBGkp9TywTpl0C/6SN8xdPykwXjqQr+u+wyrOrlzRBhXy+H44CVvhPeRS
 KyI5PRcGKFLrd6QraQd6lOrd8RmPgDWExkwXD7VccuMYnuQPZdMHha31dAijTSZNNqiDetS7mkK
 wTLf4R6jlsQy2lw==
X-Developer-Key: i=leitao@debian.org; a=openpgp;
 fpr=AC8539A6E8F46702CA4A439B35A3939FFC78776D
X-Debian-User: leitao

Add a small selftest that stresses pipe->mutex contention by spawning N
writer threads that hammer a single pipe with multi-page writes, plus M
reader threads that drain. Each writer records its own write() latency
samples into a log2-bucketed histogram; main aggregates and prints
total writes, throughput, average and percentile (p50/p99/p99.9)
latencies, and the maximum observed latency.

This was used to validate "fs/pipe: bulk pre-allocate pages outside
pipe->mutex in anon_pipe_write". By default it sweeps over
writers=3D{1,2,5} x readers=3D{1,5,10} using 64KB writes for 3s on a 1 MB
pipe (~27s total); -w/-r switch to a single configuration and -s/-d/-p
tune msgsize/duration/pipe size. Output is one-line-per-metric with a
"---" separator between configurations so two runs (e.g. baseline vs
patched) can be diffed directly.

Pass --memory-pressure to fork stress-ng (--vm 4 --vm-bytes 80%
--vm-method all) for the duration of the run, so alloc_page() in
anon_pipe_write() routinely hits direct reclaim. The flag fails
fast if stress-ng is not on $PATH.

The program exits 0 on a clean run and reports its results to stdout,
so it integrates with the kselftest framework via TEST_GEN_PROGS.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 tools/testing/selftests/Makefile          |   1 +
 tools/testing/selftests/pipe/.gitignore   |   1 +
 tools/testing/selftests/pipe/Makefile     |   9 +
 tools/testing/selftests/pipe/pipe_bench.c | 351 ++++++++++++++++++++++++++=
++++
 4 files changed, 362 insertions(+)

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Mak=
efile
index 6e59b8f63e416..bcd9db9d292ca 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -91,6 +91,7 @@ TARGETS +=3D pcie_bwctrl
 TARGETS +=3D perf_events
 TARGETS +=3D pidfd
 TARGETS +=3D pid_namespace
+TARGETS +=3D pipe
 TARGETS +=3D power_supply
 TARGETS +=3D powerpc
 TARGETS +=3D prctl
diff --git a/tools/testing/selftests/pipe/.gitignore b/tools/testing/selfte=
sts/pipe/.gitignore
new file mode 100644
index 0000000000000..20b549361a152
--- /dev/null
+++ b/tools/testing/selftests/pipe/.gitignore
@@ -0,0 +1 @@
+pipe_bench
diff --git a/tools/testing/selftests/pipe/Makefile b/tools/testing/selftest=
s/pipe/Makefile
new file mode 100644
index 0000000000000..1810c680117b3
--- /dev/null
+++ b/tools/testing/selftests/pipe/Makefile
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+# Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+
+CFLAGS +=3D -O2 -Wall -Wextra -pthread
+
+TEST_GEN_PROGS :=3D pipe_bench
+
+include ../lib.mk
diff --git a/tools/testing/selftests/pipe/pipe_bench.c b/tools/testing/self=
tests/pipe/pipe_bench.c
new file mode 100644
index 0000000000000..4b4ee6c8c0ced
--- /dev/null
+++ b/tools/testing/selftests/pipe/pipe_bench.c
@@ -0,0 +1,351 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pipe_bench - exercise pipe->mutex contention under concurrent writers.
+ *
+ * N writer threads hammer a single pipe with multi-page writes; M reader
+ * threads drain it. Each writer records its own write() latency histogram.
+ * Multi-page writes (msgsize >=3D PAGE_SIZE) force the loop in
+ * anon_pipe_write() to call alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT) under
+ * pipe->mutex, which is the critical section the patch shrinks.
+ *
+ * By default the benchmark sweeps writers in {1, 2, 5} x readers in
+ * {1, 5, 10} and prints one block per configuration so two runs (e.g.
+ * baseline vs patched) can be diffed directly. Pass -w and -r to run a
+ * single configuration instead. Pass --memory-pressure to spawn stress-ng
+ * alongside the sweep so the per-page alloc_page() path under pipe->mutex
+ * has to dip into reclaim.
+ *
+ * Copyright (c) 2026 Meta Platforms, Inc. and affiliates
+ * Copyright (c) 2026 Breno Leitao <leitao@debian.org>
+ */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <pthread.h>
+#include <signal.h>
+#include <stdatomic.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+
+#define ARRAY_SIZE(a)	(sizeof(a) / sizeof((a)[0]))
+#define HIST_BUCKETS	32
+
+static size_t g_msgsize =3D 16 * 4096;
+static int g_duration =3D 3;
+static int g_pipe_size =3D 1024 * 1024;
+static int g_memory_pressure;
+
+static atomic_int g_stop;
+static int g_pipe[2];
+
+struct wstats {
+	uint64_t writes;
+	uint64_t bytes;
+	uint64_t lat_sum_ns;
+	uint64_t lat_max_ns;
+	uint64_t lat_hist[HIST_BUCKETS];
+};
+
+static inline uint64_t now_ns(void)
+{
+	struct timespec ts;
+
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+	return (uint64_t)ts.tv_sec * 1000000000ull + ts.tv_nsec;
+}
+
+static inline int log2_bucket(uint64_t v)
+{
+	int b =3D 0;
+
+	if (!v)
+		return 0;
+	while (v >>=3D 1)
+		b++;
+	return b < HIST_BUCKETS ? b : HIST_BUCKETS - 1;
+}
+
+static void *writer(void *arg)
+{
+	struct wstats *s =3D arg;
+	char *buf =3D aligned_alloc(4096, g_msgsize);
+
+	if (!buf)
+		return NULL;
+	memset(buf, 0xAA, g_msgsize);
+
+	while (!atomic_load_explicit(&g_stop, memory_order_relaxed)) {
+		uint64_t t0 =3D now_ns();
+		ssize_t n =3D write(g_pipe[1], buf, g_msgsize);
+		uint64_t dt =3D now_ns() - t0;
+
+		if (n > 0) {
+			s->writes++;
+			s->bytes +=3D n;
+			s->lat_sum_ns +=3D dt;
+			if (dt > s->lat_max_ns)
+				s->lat_max_ns =3D dt;
+			s->lat_hist[log2_bucket(dt)]++;
+		} else if (n < 0 && errno =3D=3D EPIPE) {
+			break;
+		}
+	}
+	free(buf);
+	return NULL;
+}
+
+static void *reader(void *arg)
+{
+	char *buf =3D aligned_alloc(4096, g_msgsize);
+
+	(void)arg;
+	if (!buf)
+		return NULL;
+	/*
+	 * Drain until EOF (write end closed by main). g_stop is not checked
+	 * here on purpose: writers may be blocked in write() with the pipe
+	 * full when g_stop is set, so the reader must keep draining until
+	 * main closes the write end.
+	 */
+	for (;;) {
+		ssize_t n =3D read(g_pipe[0], buf, g_msgsize);
+
+		if (n <=3D 0)
+			break;
+	}
+	free(buf);
+	return NULL;
+}
+
+static void summarize(struct wstats *all, int nw, int nr)
+{
+	uint64_t total_writes =3D 0, total_bytes =3D 0, total_lat =3D 0;
+	uint64_t max_lat =3D 0;
+	uint64_t agg[HIST_BUCKETS] =3D {0};
+	uint64_t p50_target, p99_target, p999_target;
+	uint64_t cum =3D 0, p50 =3D 0, p99 =3D 0, p999 =3D 0;
+	uint64_t avg_ns;
+	double sec;
+
+	for (int i =3D 0; i < nw; i++) {
+		total_writes +=3D all[i].writes;
+		total_bytes +=3D all[i].bytes;
+		total_lat +=3D all[i].lat_sum_ns;
+		if (all[i].lat_max_ns > max_lat)
+			max_lat =3D all[i].lat_max_ns;
+		for (int b =3D 0; b < HIST_BUCKETS; b++)
+			agg[b] +=3D all[i].lat_hist[b];
+	}
+
+	p50_target =3D total_writes * 50 / 100;
+	p99_target =3D total_writes * 99 / 100;
+	p999_target =3D total_writes * 999 / 1000;
+
+	for (int b =3D 0; b < HIST_BUCKETS; b++) {
+		cum +=3D agg[b];
+		if (!p50 && cum >=3D p50_target)
+			p50 =3D 1ULL << b;
+		if (!p99 && cum >=3D p99_target)
+			p99 =3D 1ULL << b;
+		if (!p999 && cum >=3D p999_target)
+			p999 =3D 1ULL << b;
+	}
+
+	sec =3D g_duration;
+	avg_ns =3D total_writes ? total_lat / total_writes : 0;
+
+	printf("config: writers=3D%d readers=3D%d msgsize=3D%zu duration=3D%d pip=
e_size=3D%d memory_pressure=3D%s\n",
+	       nw, nr, g_msgsize, g_duration, g_pipe_size,
+	       g_memory_pressure ? "yes" : "no");
+	printf("writes: total=3D%llu rate=3D%.0f/s\n",
+	       (unsigned long long)total_writes, total_writes / sec);
+	printf("throughput_MBps: %.2f\n",
+	       (total_bytes / sec) / (1024.0 * 1024.0));
+	printf("lat_avg_ns: %llu\n", (unsigned long long)avg_ns);
+	printf("lat_p50_ns_upper: %llu\n", (unsigned long long)p50);
+	printf("lat_p99_ns_upper: %llu\n", (unsigned long long)p99);
+	printf("lat_p999_ns_upper: %llu\n", (unsigned long long)p999);
+	printf("lat_max_ns: %llu\n", (unsigned long long)max_lat);
+}
+
+static pid_t spawn_stress_ng(void)
+{
+	pid_t pid =3D fork();
+
+	if (pid < 0) {
+		perror("fork");
+		return -1;
+	}
+	if (pid =3D=3D 0) {
+		execlp("stress-ng", "stress-ng",
+		       "--vm", "4", "--vm-bytes", "80%",
+		       "--vm-method", "all",
+		       (char *)NULL);
+		fprintf(stderr, "exec stress-ng failed: %s\n",
+			strerror(errno));
+		_exit(127);
+	}
+	/* Give stress-ng a moment to map its VM regions before measuring. */
+	sleep(1);
+	return pid;
+}
+
+static void kill_stress_ng(pid_t pid)
+{
+	int status;
+
+	if (pid <=3D 0)
+		return;
+	kill(pid, SIGTERM);
+	for (int i =3D 0; i < 20; i++) {
+		if (waitpid(pid, &status, WNOHANG) > 0)
+			return;
+		usleep(100 * 1000);
+	}
+	kill(pid, SIGKILL);
+	waitpid(pid, &status, 0);
+}
+
+static int run_one(int nw, int nr)
+{
+	pthread_t *wt, *rt;
+	struct wstats *ws;
+
+	atomic_store(&g_stop, 0);
+
+	if (pipe(g_pipe) < 0) {
+		perror("pipe");
+		return -1;
+	}
+	if (fcntl(g_pipe[1], F_SETPIPE_SZ, g_pipe_size) < 0)
+		perror("F_SETPIPE_SZ (continuing)");
+
+	wt =3D calloc(nw, sizeof(*wt));
+	rt =3D calloc(nr, sizeof(*rt));
+	ws =3D calloc(nw, sizeof(*ws));
+
+	if (!wt || !rt || !ws) {
+		fprintf(stderr, "alloc failed\n");
+		free(wt);
+		free(rt);
+		free(ws);
+		close(g_pipe[0]);
+		close(g_pipe[1]);
+		return -1;
+	}
+
+	for (int i =3D 0; i < nr; i++)
+		pthread_create(&rt[i], NULL, reader, NULL);
+	for (int i =3D 0; i < nw; i++)
+		pthread_create(&wt[i], NULL, writer, &ws[i]);
+
+	sleep(g_duration);
+	atomic_store(&g_stop, 1);
+
+	/*
+	 * Close write end first so any writer blocked in write() gets EPIPE
+	 * and exits, and so the readers see EOF after draining.
+	 */
+	close(g_pipe[1]);
+	for (int i =3D 0; i < nw; i++)
+		pthread_join(wt[i], NULL);
+	for (int i =3D 0; i < nr; i++)
+		pthread_join(rt[i], NULL);
+	close(g_pipe[0]);
+
+	summarize(ws, nw, nr);
+	fflush(stdout);
+
+	free(wt);
+	free(rt);
+	free(ws);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	static const int writers_sweep[] =3D {1, 2, 5};
+	static const int readers_sweep[] =3D {1, 5, 10};
+	static const struct option long_opts[] =3D {
+		{"memory-pressure", no_argument, NULL, 'M'},
+		{0, 0, 0, 0},
+	};
+	int writers_override =3D 0, readers_override =3D 0;
+	pid_t stress_pid =3D -1;
+	int rc =3D 0, opt;
+
+	while ((opt =3D getopt_long(argc, argv, "w:r:s:d:p:",
+				  long_opts, NULL)) !=3D -1) {
+		switch (opt) {
+		case 'w':
+			writers_override =3D atoi(optarg);
+			break;
+		case 'r':
+			readers_override =3D atoi(optarg);
+			break;
+		case 's':
+			g_msgsize =3D atol(optarg);
+			break;
+		case 'd':
+			g_duration =3D atoi(optarg);
+			break;
+		case 'p':
+			g_pipe_size =3D atoi(optarg);
+			break;
+		case 'M':
+			g_memory_pressure =3D 1;
+			break;
+		default:
+			fprintf(stderr,
+				"usage: %s [-w writers] [-r readers] [-s msgsize] [-d secs] [-p pipe_s=
ize] [--memory-pressure]\n"
+				"  default: sweep writers=3D{1,2,5} x readers=3D{1,5,10}\n"
+				"  --memory-pressure: spawn stress-ng (--vm 4 --vm-bytes 80%% --vm-met=
hod all) for the run\n",
+				argv[0]);
+			return 1;
+		}
+	}
+
+	signal(SIGPIPE, SIG_IGN);
+	setvbuf(stdout, NULL, _IOLBF, 0);
+	setvbuf(stderr, NULL, _IOLBF, 0);
+
+	fprintf(stderr, "pid=3D%d\n", getpid());
+	fflush(stderr);
+
+	if (g_memory_pressure) {
+		stress_pid =3D spawn_stress_ng();
+		if (stress_pid < 0) {
+			fprintf(stderr,
+				"memory_pressure requested but stress-ng could not be spawned\n");
+			return 1;
+		}
+	}
+
+	if (writers_override > 0 || readers_override > 0) {
+		int nw =3D writers_override > 0 ? writers_override : 1;
+		int nr =3D readers_override > 0 ? readers_override : 1;
+
+		rc =3D run_one(nw, nr) < 0 ? 1 : 0;
+		goto out;
+	}
+
+	for (size_t i =3D 0; i < ARRAY_SIZE(writers_sweep); i++) {
+		for (size_t j =3D 0; j < ARRAY_SIZE(readers_sweep); j++) {
+			printf("---\n");
+			if (run_one(writers_sweep[i], readers_sweep[j]) < 0) {
+				rc =3D 1;
+				goto out;
+			}
+		}
+	}
+out:
+	kill_stress_ng(stress_pid);
+	return rc;
+}

--=20
2.53.0-Meta