From nobody Fri Jun 19 08:59:21 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A34A3318ED2; Sun, 26 Apr 2026 11:56:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204604; cv=none; b=nhBsnEqUVXH0Ay0TPr2ArtpYLQ4yJ9ZE+VPMSX84EHP1Sei49MOi1uB2aAm/FKXVaS28LHdq5NEnqn8fCupnSQHIXjuIlq6Oe6F7FBCt52raxUWOZHRlBjkH6QK8pHV+xJ01txsApQwRZfg1iroAxZMapxici526BTEs7OQA6R0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204604; c=relaxed/simple; bh=lShvPUZtkzEbs9UjOJnKGezjl/aC+9FCH6gTzp0HT98=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=SHxJoFSe4/IZV8YRa1BeIeYZmuZNg8k1icjYLebpDFbsgj6+/Xk/ofgETKsVg/8UJ5LMbce1D9Im4BrZygl46j3DSf0ER1/dqF1HCnfCb4e2p+EE1/WLX0Zi71d7bdYz2QsvzIICEFX4Zdfr/51DpPu9xecHLEHUI4YH5v7sKGU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CpQWXeX6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CpQWXeX6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CCE65C2BCC7; Sun, 26 Apr 2026 11:56:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777204604; bh=lShvPUZtkzEbs9UjOJnKGezjl/aC+9FCH6gTzp0HT98=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=CpQWXeX6Lr0IU6cU5AKbX7juxAPZ2vfsbItLZDeWUgH3Z4eJIwsM6Gc4nsVK841XD RyBWlxKyKlvt89D8DbRvb9wNjA7sy9BlqhoYEWEA699T1gXPxnXdWEY9HshDvScyLr /wcTEK+Eqoazk77ZQa+RFa6yhYnubfoJ9aDQ8A1CEOgvEuBg5vSqvzkTr4BIXJr/2W 2EcFxwBNH/w0f2VWxqpcBbDmhGn/xXCyJ8YrAWa5106JL7z7pyPw4ht1vBqq2DDWA/ H2DaxEwm7Og21ULSsJFM+aMPJ8XakxjLJdvLs6wOINSuWraZnuSRoEdxsb9uDZOCsn vZdWnb5DHBXbw== From: Jeff Layton Date: Sun, 26 Apr 2026 07:56:07 -0400 Subject: [PATCH v3 1/4] mm: add NR_DONTCACHE_DIRTY node page counter Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260426-dontcache-v3-1-79eb37da9547@kernel.org> References: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> In-Reply-To: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Christoph Hellwig , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3971; i=jlayton@kernel.org; h=from:subject:message-id; bh=lShvPUZtkzEbs9UjOJnKGezjl/aC+9FCH6gTzp0HT98=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp7f1zf9CTlq6JPeG4iiERHYH/s7/r/mpN6W0Zy r/eX9lzvNWJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCae39cwAKCRAADmhBGVaC FcMUEADGRbm+5W/V/7ipFOu8wKZLLjr66aeLjuTR64OiEWQt/7Mf8tX1jqzZeYEGG7citRX3Ha6 maC5/LmG+ZFi1yXJ6yPJydpPjVKgo1zC5iLxQSeXNs3k+zaBtKkeOTaFF+WgRUu9G/Uf2amnkhP 0K3mbty3oitNjI9JoLjVd/r8haA/dNB9PNIJxPfyqrwXYvVcqEyik8JUda4+9VuuRrPsKray2ie xwlWxU696drZ0K1I3QaevnUL+sVlWI4sAjBPvDheanRzu8uRtIliQ5Go2e2SV9AtYZn72zkm5Qg 28Ux5aoKw11unzRxBMixl4xS5b9V2481YKdsQApM3D8qnQkJa/if5XthtDh77SP4fyqn70x1N93 GeeePY3fjn1XJVeyuntR4qgH/perFKxwC49yMpE4zjkW3FQZ7//kOFyJdhVCj7I0Vwntp7Gmslk bSNrxyaPW5Q0LFb45FcG61hDtT2ySjpw4L956modzI7heHQ+z6gAZ1MWTPvDxoJrVzXmcRMsZWV PZpkUcv9JUf968U/7dU8Tzqi6rAzIrJaSJ/E92quPKyck15bPAIDeEmGn5955nQ4elCIrl4/axE STdCKtcr1dcAg/WdZ18pY6dwBaoEQw83xXyZcoKbc00oGqqhKk/DRkkm9UU9czBD6OJ998hY8TT /Gc26l2XZkJL9tQ== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a per-node page counter that tracks the number of dirty pages with the dropbehind flag set (i.e., pages dirtied via RWF_DONTCACHE writes). Increment the counter alongside NR_FILE_DIRTY in folio_account_dirtied() when the folio has the dropbehind flag set, and decrement it in folio_clear_dirty_for_io(), folio_account_cleaned(), and when a non-DONTCACHE access clears the dropbehind flag on a dirty folio. The counter is visible via /proc/vmstat as "nr_dontcache_dirty" and will be used by the writeback flusher to determine how many pages to write back when expediting writeback for IOCB_DONTCACHE writes, without flushing the entire BDI's dirty pages. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton --- include/linux/mmzone.h | 1 + mm/filemap.c | 6 +++++- mm/page-writeback.c | 7 +++++++ mm/vmstat.c | 1 + 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9adb2ad21da5..ed9cc61c7627 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -259,6 +259,7 @@ enum node_stat_item { only modified from process context */ NR_FILE_PAGES, NR_FILE_DIRTY, + NR_DONTCACHE_DIRTY, NR_WRITEBACK, NR_SHMEM, /* shmem pages (included tmpfs/GEM pages) */ NR_SHMEM_THPS, diff --git a/mm/filemap.c b/mm/filemap.c index 4e636647100c..45089fde5150 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2052,8 +2052,12 @@ struct folio *__filemap_get_folio_mpol(struct addres= s_space *mapping, if (!folio) return ERR_PTR(-ENOENT); /* not an uncached lookup, clear uncached if set */ - if (folio_test_dropbehind(folio) && !(fgp_flags & FGP_DONTCACHE)) + if (folio_test_dropbehind(folio) && !(fgp_flags & FGP_DONTCACHE)) { + if (folio_test_dirty(folio)) + lruvec_stat_mod_folio(folio, NR_DONTCACHE_DIRTY, + -folio_nr_pages(folio)); folio_clear_dropbehind(folio); + } return folio; } EXPORT_SYMBOL(__filemap_get_folio_mpol); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 88cd53d4ba09..e1df93fb3e3b 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2630,6 +2630,8 @@ static void folio_account_dirtied(struct folio *folio, wb =3D inode_to_wb(inode); =20 lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, nr); + if (folio_test_dropbehind(folio)) + lruvec_stat_mod_folio(folio, NR_DONTCACHE_DIRTY, nr); __zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr); __node_stat_mod_folio(folio, NR_DIRTIED, nr); wb_stat_mod(wb, WB_RECLAIMABLE, nr); @@ -2651,6 +2653,8 @@ void folio_account_cleaned(struct folio *folio, struc= t bdi_writeback *wb) long nr =3D folio_nr_pages(folio); =20 lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr); + if (folio_test_dropbehind(folio)) + lruvec_stat_mod_folio(folio, NR_DONTCACHE_DIRTY, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); wb_stat_mod(wb, WB_RECLAIMABLE, -nr); task_io_account_cancelled_write(nr * PAGE_SIZE); @@ -2920,6 +2924,9 @@ bool folio_clear_dirty_for_io(struct folio *folio) if (folio_test_clear_dirty(folio)) { long nr =3D folio_nr_pages(folio); lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr); + if (folio_test_dropbehind(folio)) + lruvec_stat_mod_folio(folio, + NR_DONTCACHE_DIRTY, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); wb_stat_mod(wb, WB_RECLAIMABLE, -nr); ret =3D true; diff --git a/mm/vmstat.c b/mm/vmstat.c index f534972f517d..c3e5dfadb9a5 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1240,6 +1240,7 @@ const char * const vmstat_text[] =3D { [I(NR_FILE_MAPPED)] =3D "nr_mapped", [I(NR_FILE_PAGES)] =3D "nr_file_pages", [I(NR_FILE_DIRTY)] =3D "nr_dirty", + [I(NR_DONTCACHE_DIRTY)] =3D "nr_dontcache_dirty", [I(NR_WRITEBACK)] =3D "nr_writeback", [I(NR_SHMEM)] =3D "nr_shmem", [I(NR_SHMEM_THPS)] =3D "nr_shmem_hugepages", --=20 2.53.0 From nobody Fri Jun 19 08:59:21 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E3B5318ED2; Sun, 26 Apr 2026 11:56:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204608; cv=none; b=BqTZQLatXozPiX+3WVsO/hoUHEjW/g/O9WUg9uN0UJYqV3DOLgQbpWg0TiZM3i2ns7gduOPohNcl7z6xYlUeoKVTNQJlLdGZQrVxvvNSxOVC110UFk7EK7tIj1WoV+cOBVChqhlj9tpd9/7AQvRHXbkN0SihBa+B9YY56igQHos= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204608; c=relaxed/simple; bh=5hBl0zioJo5bmMYBsa/ayjneMPX9D/E0U5nhPvmz3Ag=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=LKSWdOQx8aEF6Yfu0iGn/EpY3Bb2SVtJPzbYnfuY4T33xUlNW8EBtyAjJbNobXjZIb7+rxCsaT47ng3WLeWjktnd3ifRDb0mwaq1bxo5vvm7CvH6ae+NZPpPlTa7fsKgqOlnDO9xTVXRm9tOPdQiuwySMEnrZTFegVSTe+S0Mtk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ooZJKXZq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ooZJKXZq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8F856C2BCAF; Sun, 26 Apr 2026 11:56:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777204607; bh=5hBl0zioJo5bmMYBsa/ayjneMPX9D/E0U5nhPvmz3Ag=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=ooZJKXZqm0RN4As5t7yp7UYg1yu3tEkFQjZEmM8nPsoeCGMVy3rDmDeNBVajbkgLf QIADhlIEZ6EfFzFvo6gMAPZHKezwp2U/rjmNbqqeJIIvhqo4H1V9CcK1oonXxlNv4P JRsyGCXXyCOosabi8sWpWNGuwS2iXNsAEkvPeyyGFWC1QBGd1kBojbIu67BdtUKzAq yesLLXGUVTgk6mi2deep1Kj3q84eWZPsY+Xo4ypp6ZRYB9zposMCp8muPo5r/PPWEX 1QdLj8DBu/7gBbGWedtD8wAUzcPuChtBD60r/1mChVCWdhJBIkjpellVjR2eQFRvWH 8OR/xmZVr/sLA== From: Jeff Layton Date: Sun, 26 Apr 2026 07:56:08 -0400 Subject: [PATCH v3 2/4] mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260426-dontcache-v3-2-79eb37da9547@kernel.org> References: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> In-Reply-To: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Christoph Hellwig , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=9483; i=jlayton@kernel.org; h=from:subject:message-id; bh=5hBl0zioJo5bmMYBsa/ayjneMPX9D/E0U5nhPvmz3Ag=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp7f1zmVJGTA+Tu2f1JtkgZ4Wu4QF3JzIvd1NAx QeDTBATFNWJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCae39cwAKCRAADmhBGVaC FeF3D/9yCQENYERa2fOF8sEnGIr+VfDwMDVMKgp6/vpNwbtri6dShe5ipAoG1nNJw2Ec8fYuiFg Imjx0pFCtOHJuShyO4+5vUueRxuIHyxkF7MIEO+nn0GapxZzXdLryLM28vxnSvFOP5UWUcrlRVT X1XJg64SsFTYWrf6TqLLZGJih2OjGZ/f8sQUYW6+piUA97X4h7ya4/FUr8MWv12fANionDuGBRQ l/zALxRhRgT+Xa9GJ/cjP4oVmx3ZXXIRbDUoW5qJX5cjxx0cEi/xg0fAh3Yb6qT7iOWKAuqXoN5 4dm9IXRaDvssYpbwMUHfSVKcxoiIhSGRULtO0bazlUA6uAxK8HVyzEBXB/DQkHqL2uVMT4E+63z 2Tlyh0wn6NUVzaxu6D39C1BO8EZCIefUUGMI2sZHZj/UzEkTAe6hpjph3RTTvsIn3HygosnhqQP eGSjsFc55/57rjQzR2EQwitPA+Bcl+P1uZnYKrUKKPlfKzqbXEbml4VfiQTynzJ95mmDSr09RVy 942UbUjLOXIyjmqzjx3vNiSw138EfZYebQTmL/LKeXIRRheDBE3xi6tG1NnFIUm+gYE38ZvUAka aHK2QJipOHP63BECxbbGddNccTr1e1Hx7Nw75TaIYoHQjgoTm4k/S6I9EjhwjgyyB16UO6L0Bea wuMVICRdIU/Oh4g== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 The IOCB_DONTCACHE writeback path in generic_write_sync() calls filemap_flush_range() on every write, submitting writeback inline in the writer's context. Perf lock contention profiling shows the performance problem is not lock contention but the writeback submission work itself =E2=80=94 walking the page tree and submitting I/O blocks the w= riter for milliseconds, inflating p99.9 latency from 23ms (buffered) to 93ms (dontcache). Replace the inline filemap_flush_range() call with a flusher kick that drains dirty pages in the background. This moves writeback submission completely off the writer's hot path. To avoid flushing unrelated buffered dirty data, add a dedicated WB_start_dontcache bit and wb_check_start_dontcache() handler that uses the new NR_DONTCACHE_DIRTY counter to determine how many pages to write back. The flusher writes back that many pages from the oldest dirty inodes (not restricted to dontcache-specific inodes). This helps preserve I/O batching while limiting the scope of expedited writeback. Like WB_start_all, the WB_start_dontcache bit coalesces multiple DONTCACHE writes into a single flusher wakeup without per-write allocations. Also add WB_REASON_DONTCACHE as a new writeback reason for tracing visibility, and target the correct cgroup writeback domain via unlocked_inode_to_wb_begin(). dontcache-bench results on dual-socket Xeon Gold 6138 (80 CPUs, 256 GB RAM, Samsung MZ1LB1T9HALS 1.7 TB NVMe, local XFS, io_uring, file size ~503 GB, compared to a v6.19-ish baseline): Single-client sequential write (MB/s): baseline patched change buffered 1449.8 1440.1 -0.7% dontcache 1347.9 1461.5 +8.4% direct 1450.0 1440.1 -0.7% Single-client sequential write latency (us): baseline patched change dontcache p50 3031.0 10551.3 +248.1% dontcache p99 74973.2 21626.9 -71.2% dontcache p99.9 85459.0 23199.7 -72.9% Single-client random write (MB/s): baseline patched change dontcache 284.2 295.4 +3.9% Single-client random write p99.9 latency (us): baseline patched change dontcache 2277.4 872.4 -61.7% Multi-writer aggregate throughput (MB/s): baseline patched change buffered 1619.5 1611.2 -0.5% dontcache 1281.1 1629.4 +27.2% direct 1545.4 1609.4 +4.1% Mixed-mode noisy neighbor (dontcache writer + buffered readers): baseline patched change writer (MB/s) 1297.6 1471.1 +13.4% readers avg (MB/s) 855.0 462.4 -45.9% nfsd-io-bench results on same hardware (XFS on NVMe, NFSv3 via fio NFS engine with libnfs, 1024 NFSD threads, pool_mode=3Dpernode, file size ~502 GB, compared to v6.19-ish baseline): Single-client sequential write (MB/s): baseline patched change buffered 4844.2 4653.4 -3.9% dontcache 3028.3 3723.1 +22.9% direct 957.6 987.8 +3.2% Single-client sequential write p99.9 latency (us): baseline patched change dontcache 759169.0 175112.2 -76.9% Single-client random write (MB/s): baseline patched change dontcache 590.0 1561.0 +164.6% Multi-writer aggregate throughput (MB/s): baseline patched change buffered 9636.3 9422.9 -2.2% dontcache 1894.9 9442.6 +398.3% direct 809.6 975.1 +20.4% Noisy neighbor (dontcache writer + random readers): baseline patched change writer (MB/s) 1854.5 4063.6 +119.1% readers avg (MB/s) 131.2 101.6 -22.5% The NFS results show even larger improvements than the local benchmarks. Multi-writer dontcache throughput improves nearly 5x, matching buffered I/O. Dirty page footprint drops 85-95% in sequential workloads vs. buffered. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton --- fs/fs-writeback.c | 60 ++++++++++++++++++++++++++++++++++++= ++++ include/linux/backing-dev-defs.h | 2 ++ include/linux/fs.h | 6 ++-- include/trace/events/writeback.h | 3 +- 4 files changed, 66 insertions(+), 5 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index a65694cbfe68..377767db48f7 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1334,6 +1334,18 @@ static void wb_start_writeback(struct bdi_writeback = *wb, enum wb_reason reason) wb_wakeup(wb); } =20 +static void wb_start_dontcache_writeback(struct bdi_writeback *wb) +{ + if (!wb_has_dirty_io(wb)) + return; + + if (test_bit(WB_start_dontcache, &wb->state) || + test_and_set_bit(WB_start_dontcache, &wb->state)) + return; + + wb_wakeup(wb); +} + /** * wb_start_background_writeback - start background writeback * @wb: bdi_writback to write from @@ -2373,6 +2385,28 @@ static long wb_check_start_all(struct bdi_writeback = *wb) return nr_pages; } =20 +static long wb_check_start_dontcache(struct bdi_writeback *wb) +{ + long nr_pages; + + if (!test_bit(WB_start_dontcache, &wb->state)) + return 0; + + nr_pages =3D global_node_page_state(NR_DONTCACHE_DIRTY); + if (nr_pages) { + struct wb_writeback_work work =3D { + .nr_pages =3D wb_split_bdi_pages(wb, nr_pages), + .sync_mode =3D WB_SYNC_NONE, + .range_cyclic =3D 1, + .reason =3D WB_REASON_DONTCACHE, + }; + + nr_pages =3D wb_writeback(wb, &work); + } + + clear_bit(WB_start_dontcache, &wb->state); + return nr_pages; +} =20 /* * Retrieve work items and do the writeback they describe @@ -2394,6 +2428,11 @@ static long wb_do_writeback(struct bdi_writeback *wb) */ wrote +=3D wb_check_start_all(wb); =20 + /* + * Check for dontcache writeback request + */ + wrote +=3D wb_check_start_dontcache(wb); + /* * Check for periodic writeback, kupdated() style */ @@ -2468,6 +2507,27 @@ void wakeup_flusher_threads_bdi(struct backing_dev_i= nfo *bdi, rcu_read_unlock(); } =20 +/** + * filemap_dontcache_kick_writeback - kick flusher for IOCB_DONTCACHE writ= es + * @mapping: address_space that was just written to + * + * Kick the writeback flusher thread to expedite writeback of dontcache + * dirty pages. Uses a dedicated WB_start_dontcache bit so that only + * pages tracked by NR_DONTCACHE_DIRTY are written back, rather than + * flushing the entire BDI's dirty pages. + */ +void filemap_dontcache_kick_writeback(struct address_space *mapping) +{ + struct inode *inode =3D mapping->host; + struct bdi_writeback *wb; + struct wb_lock_cookie cookie =3D {}; + + wb =3D unlocked_inode_to_wb_begin(inode, &cookie); + wb_start_dontcache_writeback(wb); + unlocked_inode_to_wb_end(inode, &cookie); +} +EXPORT_SYMBOL_GPL(filemap_dontcache_kick_writeback); + /* * Wakeup the flusher threads to start writeback of all currently dirty pa= ges */ diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-d= efs.h index a06b93446d10..74f8a9977f5d 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -26,6 +26,7 @@ enum wb_state { WB_writeback_running, /* Writeback is in progress */ WB_has_dirty_io, /* Dirty inodes on ->b_{dirty|io|more_io} */ WB_start_all, /* nr_pages =3D=3D 0 (all) work pending */ + WB_start_dontcache, /* dontcache writeback pending */ }; =20 enum wb_stat_item { @@ -55,6 +56,7 @@ enum wb_reason { */ WB_REASON_FORKER_THREAD, WB_REASON_FOREIGN_FLUSH, + WB_REASON_DONTCACHE, =20 WB_REASON_MAX, }; diff --git a/include/linux/fs.h b/include/linux/fs.h index 11559c513dfb..df72b42a9e9b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2624,6 +2624,7 @@ extern int __must_check file_write_and_wait_range(str= uct file *file, loff_t start, loff_t end); int filemap_flush_range(struct address_space *mapping, loff_t start, loff_t end); +void filemap_dontcache_kick_writeback(struct address_space *mapping); =20 static inline int file_write_and_wait(struct file *file) { @@ -2657,10 +2658,7 @@ static inline ssize_t generic_write_sync(struct kioc= b *iocb, ssize_t count) if (ret) return ret; } else if (iocb->ki_flags & IOCB_DONTCACHE) { - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - - filemap_flush_range(mapping, iocb->ki_pos - count, - iocb->ki_pos - 1); + filemap_dontcache_kick_writeback(iocb->ki_filp->f_mapping); } =20 return count; diff --git a/include/trace/events/writeback.h b/include/trace/events/writeb= ack.h index bdac0d685a98..13ee076ccd16 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -44,7 +44,8 @@ EM( WB_REASON_PERIODIC, "periodic") \ EM( WB_REASON_FS_FREE_SPACE, "fs_free_space") \ EM( WB_REASON_FORKER_THREAD, "forker_thread") \ - EMe(WB_REASON_FOREIGN_FLUSH, "foreign_flush") + EM( WB_REASON_FOREIGN_FLUSH, "foreign_flush") \ + EMe(WB_REASON_DONTCACHE, "dontcache") =20 WB_WORK_REASON =20 --=20 2.53.0 From nobody Fri Jun 19 08:59:21 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0005D359A91; Sun, 26 Apr 2026 11:56:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204611; cv=none; b=tAi6PtjV0b3i0YiRB16XAFrSnPjMoqodxgBtOV8s0TvSvIK3ORaP3k11OJQ8voEMGNK8vVIgLEyOgX2cThLs60eRr3gWvfbez3fpcxu8t3ZnXzjpD1Wiugh2gJNh0qFRgbl3nj6rKAzGNqBegepdGGS9lb+Cr72aID1gPe5buWA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204611; c=relaxed/simple; bh=3rGbHnW89sprv4o9TvoVNiVPni8sqxyGFa1Ef5gfoCk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=AfLVDHJKDL7o9C+ODh+3GaKqZp1xUSg0JzFH81p2oK+ZGRN9EZCHMSonIhIqzrh5AglzB9ppTSgezCSAzmjIeuvceDYfVn2hmqgWw8Z9CV1GuEwc4OucWhGePtbPfStNxspmCBObXtZykXuPPvtHSQaUWlJRs9Ys3bOpTmAEfMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nvVhaqRp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nvVhaqRp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A7AADC2BCB7; Sun, 26 Apr 2026 11:56:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777204610; bh=3rGbHnW89sprv4o9TvoVNiVPni8sqxyGFa1Ef5gfoCk=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=nvVhaqRpNv8KbPv8XSlgCQdXLlGPTmLNSyp6Tx9vlke3hTB18Cw08R3dEigEAFeXN BqPVxt+R4/4kbY1daPDIe9CGya5OLPPxTugyLcnYj2UDSUzqmbD9oCycVdntBFzt/4 mdmGbOqjvkiQE/aI7rRfkud5Dyz/GLd/rseL9q0zAmITfimbEJwV2PJ9FUwpNWE2Bi 4jFR1y3NkagvGaVK8FkIHjvG/QbSe6cPyGSNnYpnqoQDblF/JWTIL7MyKLXeFeaUW9 2Vie3vnvHDwk/eJeArus3xpY1MCBLlOuohCeM4VC7SpOo6UXHKRsrR5elw+k8yl5ap 2gi/fWlbLkH/Q== From: Jeff Layton Date: Sun, 26 Apr 2026 07:56:09 -0400 Subject: [PATCH v3 3/4] testing: add nfsd-io-bench NFS server benchmark suite Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260426-dontcache-v3-3-79eb37da9547@kernel.org> References: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> In-Reply-To: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Christoph Hellwig , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=33014; i=jlayton@kernel.org; h=from:subject:message-id; bh=3rGbHnW89sprv4o9TvoVNiVPni8sqxyGFa1Ef5gfoCk=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp7f1zFOL9bSLqiMYXDMPopEHuw5kPJyQBYc1p+ 6aCOH1TgeuJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCae39cwAKCRAADmhBGVaC FbWrD/4wt4kFBK2ABjBkKshLr4dbvxMZRX7Ql3I/7lYdui9XbufsnD2w+iHf8k+i2xbmFJFjR5I m7FVNCIFCLJ+xMWtojK2uaTnrlSBxDMXLh48J09/rkcy7Pywd3ypKdrWIjHgMARXY3/W3SowSAx YCF1+/IVGP2UCOYNEg2+8gUT3V0WFPLcTqsgBk8lTgE/lw6TsmLaEweLXrs0C3UYKtS1wyyX3po SRNk99UqtQxEi4/7SoBflv9JWZDSx47THEnAcNQ+ev//nMwgTkJZGRsKgRbfahSQg4pvlQi/DSR NibZGhLfPMSgjQ/RQtB0in10/FrWSBjzS/iQ81wiHlpL94/w+KFIk90JlPdxlMAWrX829PhJqTJ h8GHX8mNvD6geoOLOqhQHgz9zZdSJUI3ALz0i0SsAxMHmorEfy2qck379U2Pe489vmKXsDGdk/7 inuoCTuBcKa4B6reNB4urQdYmwc+bTjJLp55sTueBZLiczuli/Scij55nWuQmk9YX4iazAhDhYN bpZ3RdE6U8aNnfrv0R/MUGyOCWXs7Jb84PxuUkdQ+mT164VEaxZVPK34dwvGVCHEMxQh1IkOKOd cqVCGGz4SHihTzg6x+0dyFq4+Y9ovBdqwSNrlu/EGVyWxvTzgVlE0ieNCnRIm9RJL8Sn/nEzhy0 MwgN0SFgpy+l+yg== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a benchmark suite for testing NFSD I/O mode performance using fio with the libnfs backend against an NFS server on localhost. Tests buffered, dontcache, and direct I/O modes via NFSD debugfs controls. Includes: - fio job files for sequential/random read/write, multi-writer, noisy-neighbor, and latency-sensitive reader workloads - run-benchmarks.sh: orchestrates test matrix with mode switching - parse-results.sh: extracts metrics from fio JSON output - setup-server.sh: configures NFS export for testing Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton --- .../testing/nfsd-io-bench/fio-jobs/lat-reader.fio | 15 + .../testing/nfsd-io-bench/fio-jobs/multi-write.fio | 14 + .../nfsd-io-bench/fio-jobs/noisy-writer.fio | 14 + tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio | 15 + .../testing/nfsd-io-bench/fio-jobs/rand-write.fio | 15 + tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio | 14 + tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio | 14 + .../testing/nfsd-io-bench/scripts/parse-results.sh | 238 +++++++++ .../nfsd-io-bench/scripts/run-benchmarks.sh | 591 +++++++++++++++++= ++++ .../testing/nfsd-io-bench/scripts/setup-server.sh | 94 ++++ 10 files changed, 1024 insertions(+) diff --git a/tools/testing/nfsd-io-bench/fio-jobs/lat-reader.fio b/tools/te= sting/nfsd-io-bench/fio-jobs/lat-reader.fio new file mode 100644 index 000000000000..61af37e8b860 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/lat-reader.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D4k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Dlatreader +write_lat_log=3Dlatreader + +[lat_reader] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/multi-write.fio b/tools/t= esting/nfsd-io-bench/fio-jobs/multi-write.fio new file mode 100644 index 000000000000..16b792aecabb --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/multi-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dmultiwrite +write_lat_log=3Dmultiwrite + +[writer] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/noisy-writer.fio b/tools/= testing/nfsd-io-bench/fio-jobs/noisy-writer.fio new file mode 100644 index 000000000000..615154a7737e --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/noisy-writer.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dnoisywriter +write_lat_log=3Dnoisywriter + +[bulk_writer] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio b/tools/tes= ting/nfsd-io-bench/fio-jobs/rand-read.fio new file mode 100644 index 000000000000..501bae7416a8 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D4k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Drandread +write_lat_log=3Drandread + +[randread] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/rand-write.fio b/tools/te= sting/nfsd-io-bench/fio-jobs/rand-write.fio new file mode 100644 index 000000000000..d891d04197ae --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/rand-write.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D64k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandwrite +log_avg_msec=3D1000 +write_bw_log=3Drandwrite +write_lat_log=3Drandwrite + +[randwrite] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio b/tools/test= ing/nfsd-io-bench/fio-jobs/seq-read.fio new file mode 100644 index 000000000000..6e24ab355026 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dseqread +write_lat_log=3Dseqread + +[seqread] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio b/tools/tes= ting/nfsd-io-bench/fio-jobs/seq-write.fio new file mode 100644 index 000000000000..260858e345f5 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dseqwrite +write_lat_log=3Dseqwrite + +[seqwrite] diff --git a/tools/testing/nfsd-io-bench/scripts/parse-results.sh b/tools/t= esting/nfsd-io-bench/scripts/parse-results.sh new file mode 100755 index 000000000000..0427d411db04 --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/parse-results.sh @@ -0,0 +1,238 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Parse fio JSON output and generate comparison tables. +# +# Usage: ./parse-results.sh + +set -euo pipefail + +if [ $# -lt 1 ]; then + echo "Usage: $0 " + exit 1 +fi + +RESULTS_DIR=3D"$1" + +if ! command -v jq &>/dev/null; then + echo "ERROR: jq is required" + exit 1 +fi + +# Extract metrics from a single fio JSON result +extract_metrics() { + local json_file=3D$1 + local rw_type=3D$2 # read or write + + if [ ! -f "$json_file" ]; then + echo "N/A N/A N/A N/A N/A N/A" + return + fi + + jq -r --arg rw "$rw_type" ' + .jobs[0][$rw] as $d | + [ + (($d.bw // 0) / 1024 | . * 10 | round / 10), # MB/s + ($d.iops // 0), # IOPS + ((($d.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), # avg lat us + (($d.clat_ns.percentile["50.000000"] // 0) / 1000), # p50 us + (($d.clat_ns.percentile["99.000000"] // 0) / 1000), # p99 us + (($d.clat_ns.percentile["99.900000"] // 0) / 1000) # p99.9 us + ] | @tsv + ' "$json_file" 2>/dev/null || echo "N/A N/A N/A N/A N/A N/A" +} + +# Extract server CPU from vmstat log (average sys%) +extract_cpu() { + local vmstat_log=3D$1 + if [ ! -f "$vmstat_log" ]; then + echo "N/A" + return + fi + # vmstat columns: us sy id wa st =E2=80=94 skip header lines + awk 'NR>2 {sum+=3D$14; n++} END {if(n>0) printf "%.1f", sum/n; else print= "N/A"}' \ + "$vmstat_log" 2>/dev/null || echo "N/A" +} + +# Extract peak dirty pages from meminfo log +extract_peak_dirty() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Dirty:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || e= cho "N/A" +} + +# Extract peak cached from meminfo log +extract_peak_cached() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Cached:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || = echo "N/A" +} + +print_separator() { + printf '%*s\n' 120 '' | tr ' ' '-' +} + +######################################################################## +# Deliverable 1: Single-client results +######################################################################## +echo "" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 1: Single-Client fio Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for workload in seq-write rand-write seq-read rand-read; do + case $workload in + seq-write|rand-write) rw_type=3D"write" ;; + seq-read|rand-read) rw_type=3D"read" ;; + esac + + echo "--- $workload ---" + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "Mode" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" "Sys CPU%= " "PeakDirty(kB)" "PeakCache(kB)" + print_separator + + for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/${workload}/${mode}" + json_file=3D$(find "$dir" -name '*.json' -not -name 'client*' 2>/dev/nul= l | head -1 || true) + if [ -z "$json_file" ]; then + printf "%-16s %10s\n" "$mode" "(no data)" + continue + fi + + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "$rw_type")" + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "$mode" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + done + echo "" +done + +######################################################################## +# Deliverable 2: Multi-client results +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 2: Noisy-Neighbor Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +# Scenario A: Multiple writers +echo "--- Scenario A: Multiple Writers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/multi-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "Client" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + total_bw=3D0 + count=3D0 + for json_file in "${dir}"/client*.json; do + [ -f "$json_file" ] || continue + client=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "write")" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "$client" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + total_bw=3D$(echo "$total_bw + ${mbps:-0}" | bc 2>/dev/null || echo "$to= tal_bw") + count=3D$(( count + 1 )) + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario C: Noisy neighbor +echo "--- Scenario C: Noisy Writer + Latency-Sensitive Readers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/noisy-neighbor/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario D: Mixed-mode noisy neighbor +echo "--- Scenario D: Mixed-Mode Noisy Writer + Readers ---" +for dir in "${RESULTS_DIR}"/noisy-neighbor-mixed/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " System Info" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +if [ -f "${RESULTS_DIR}/sysinfo.txt" ]; then + head -6 "${RESULTS_DIR}/sysinfo.txt" +fi +echo "" diff --git a/tools/testing/nfsd-io-bench/scripts/run-benchmarks.sh b/tools/= testing/nfsd-io-bench/scripts/run-benchmarks.sh new file mode 100755 index 000000000000..2b0cf6e79dff --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/run-benchmarks.sh @@ -0,0 +1,591 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# NFS server I/O mode benchmark suite +# +# Runs fio with the NFS ioengine against an NFS server on localhost, +# testing buffered, dontcache, and direct I/O modes. +# +# Usage: ./run-benchmarks.sh [OPTIONS] +# +# Options: +# -e EXPORT_PATH Server export path (default: /export) +# -s SIZE fio file size, should be >=3D 2x RAM (default: auto-d= etect) +# -r RESULTS_DIR Where to store results (default: ./results) +# -n NFS_VER NFS version: 3 or 4 (default: 3) +# -j FIO_JOBS_DIR Path to fio job files (default: ../fio-jobs) +# -d Dry run: print commands without executing +# -h Show this help + +set -euo pipefail + +# Defaults +EXPORT_PATH=3D"/export" +SIZE=3D"" +RESULTS_DIR=3D"./results" +NFS_VER=3D3 +SCRIPT_DIR=3D"$(cd "$(dirname "$0")" && pwd)" +FIO_JOBS_DIR=3D"${SCRIPT_DIR}/../fio-jobs" +DRY_RUN=3D0 +MODES=3D"0 1 2" +PERF_LOCK=3D0 + +DEBUGFS_BASE=3D"/sys/kernel/debug/nfsd" +IO_CACHE_READ=3D"${DEBUGFS_BASE}/io_cache_read" +IO_CACHE_WRITE=3D"${DEBUGFS_BASE}/io_cache_write" +DISABLE_SPLICE=3D"${DEBUGFS_BASE}/disable-splice-read" + +usage() { + echo "Usage: $0 [OPTIONS]" + echo " -e EXPORT_PATH Server export path (default: /export)" + echo " -s SIZE fio file size (default: 2x RAM)" + echo " -r RESULTS_DIR Results directory (default: ./results)" + echo " -n NFS_VER NFS version: 3 or 4 (default: 3)" + echo " -j FIO_JOBS_DIR Path to fio job files" + echo " -D Dontcache only (skip buffered and direct tests)" + echo " -p Profile kernel lock contention with perf lock" + echo " -d Dry run" + echo " -h Help" + exit 1 +} + +while getopts "e:s:r:n:j:Dpdh" opt; do + case $opt in + e) EXPORT_PATH=3D"$OPTARG" ;; + s) SIZE=3D"$OPTARG" ;; + r) RESULTS_DIR=3D"$OPTARG" ;; + n) NFS_VER=3D"$OPTARG" ;; + j) FIO_JOBS_DIR=3D"$OPTARG" ;; + D) MODES=3D"1" ;; + p) PERF_LOCK=3D1 ;; + d) DRY_RUN=3D1 ;; + h) usage ;; + *) usage ;; + esac +done + +# Auto-detect size: 2x total RAM +if [ -z "$SIZE" ]; then + MEM_KB=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + MEM_GB=3D$(( MEM_KB / 1024 / 1024 )) + SIZE=3D"$(( MEM_GB * 2 ))G" + echo "Auto-detected RAM: ${MEM_GB}G, using file size: ${SIZE}" +fi + + +log() { + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" +} + +run_cmd() { + if [ "$DRY_RUN" -eq 1 ]; then + echo " [DRY RUN] $*" + else + "$@" + fi +} + +# Preflight checks +preflight() { + log "=3D=3D=3D Preflight checks =3D=3D=3D" + + if ! command -v fio &>/dev/null; then + echo "ERROR: fio not found in PATH" + exit 1 + fi + + # Check fio has nfs ioengine + if ! fio --enghelp=3Dnfs &>/dev/null; then + echo "ERROR: fio does not have the nfs ioengine (needs libnfs)" + exit 1 + fi + + # Check debugfs knobs exist + for knob in "$IO_CACHE_READ" "$IO_CACHE_WRITE" "$DISABLE_SPLICE"; do + if [ ! -f "$knob" ]; then + echo "ERROR: $knob not found. Is the kernel new enough?" + exit 1 + fi + done + + # Check NFS server is exporting + if ! showmount -e localhost 2>/dev/null | grep -q "$EXPORT_PATH"; then + echo "WARNING: $EXPORT_PATH not in showmount output, proceeding anyway" + fi + + # Print system info + echo "Kernel: $(uname -r)" + echo "RAM: $(awk '/MemTotal/ {printf "%.1f GB", $2/1024/1024}' /pr= oc/meminfo)" + echo "Export: $EXPORT_PATH" + echo "NFS ver: $NFS_VER" + echo "File size: $SIZE" + echo "Results: $RESULTS_DIR" + echo "" +} + +# Set server I/O mode via debugfs +set_io_mode() { + local cache_write=3D$1 + local cache_read=3D$2 + local splice_off=3D$3 + + log "Setting io_cache_write=3D$cache_write io_cache_read=3D$cache_read di= sable-splice-read=3D$splice_off" + run_cmd bash -c "echo $cache_write > $IO_CACHE_WRITE" + run_cmd bash -c "echo $cache_read > $IO_CACHE_READ" + run_cmd bash -c "echo $splice_off > $DISABLE_SPLICE" +} + +# Drop page cache on server +drop_caches() { + log "Dropping page cache" + run_cmd bash -c "sync && echo 3 > /proc/sys/vm/drop_caches" + sleep 1 +} + +# Start background server monitoring +start_monitors() { + local outdir=3D$1 + + log "Starting server monitors in $outdir" + run_cmd vmstat 1 > "${outdir}/vmstat.log" 2>&1 & + VMSTAT_PID=3D$! + + run_cmd iostat -x 1 > "${outdir}/iostat.log" 2>&1 & + IOSTAT_PID=3D$! + + # Sample /proc/meminfo every second + (while true; do + echo "=3D=3D=3D $(date '+%s') =3D=3D=3D" + cat /proc/meminfo + sleep 1 + done) > "${outdir}/meminfo.log" 2>&1 & + MEMINFO_PID=3D$! +} + +# Stop background monitors +stop_monitors() { + log "Stopping monitors" + kill "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true + wait "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true +} + +# perf lock profiling =E2=80=94 uses BPF-based live contention tracing +PERF_LOCK_PID=3D"" + +start_perf_lock() { + local outdir=3D$1 + + if [ "$PERF_LOCK" -ne 1 ]; then + return + fi + + log "Starting perf lock contention tracing" + perf lock contention -a -b --max-stack 8 \ + > "${outdir}/perf-lock-contention.txt" 2>&1 & + PERF_LOCK_PID=3D$! +} + +stop_perf_lock() { + local outdir=3D$1 + + if [ -z "$PERF_LOCK_PID" ]; then + return + fi + + log "Stopping perf lock contention tracing" + kill -TERM "$PERF_LOCK_PID" 2>/dev/null || true + wait "$PERF_LOCK_PID" 2>/dev/null || true + PERF_LOCK_PID=3D"" +} + +# Run a single fio benchmark. +# nfs_url is set in the job files; we pass --filename and --size on +# the command line to vary the target file and data volume per run. +# Pass "keep" as 5th arg to preserve the test file after the run. +run_fio() { + local job_file=3D$1 + local outdir=3D$2 + local filename=3D$3 + local fio_size=3D${4:-$SIZE} + local keep=3D${5:-} + + local job_name + job_name=3D$(basename "$job_file" .fio) + + log "Running fio job: $job_name -> $outdir (file=3D$filename size=3D$fio_= size)" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + # Skip perf lock profiling for precreate/setup runs + [ "$keep" !=3D "keep" ] && start_perf_lock "$outdir" + + run_cmd fio "$job_file" \ + --output-format=3Djson \ + --output=3D"${outdir}/${job_name}.json" \ + --filename=3D"$filename" \ + --size=3D"$fio_size" + + [ "$keep" !=3D "keep" ] && stop_perf_lock "$outdir" + stop_monitors + + log "Finished: $job_name" + + # Clean up test file to free disk space unless told to keep it + if [ "$keep" !=3D "keep" ]; then + cleanup_test_files "$filename" + fi +} + +# Remove test files from the export to free disk space +cleanup_test_files() { + local filename + for filename in "$@"; do + local filepath=3D"${EXPORT_PATH}/${filename}" + log "Cleaning up: $filepath" + run_cmd rm -f "$filepath" + done +} + +# Ensure parent directories exist under the export for a given filename +ensure_export_dirs() { + local filename + for filename in "$@"; do + local dirpath=3D"${EXPORT_PATH}/$(dirname "$filename")" + if [ "$dirpath" !=3D "${EXPORT_PATH}/." ] && [ ! -d "$dirpath" ]; then + log "Creating directory: $dirpath" + run_cmd mkdir -p "$dirpath" + fi + done +} + +# Mode name from numeric value +mode_name() { + case $1 in + 0) echo "buffered" ;; + 1) echo "dontcache" ;; + 2) echo "direct" ;; + esac +} + +######################################################################## +# Deliverable 1: Single-client fio benchmarks +######################################################################## +run_deliverable1() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 1: Single-client fio benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Write test matrix: + # mode 0 (buffered): splice on (default) + # mode 1 (dontcache): splice off (required) + # mode 2 (direct): splice off (required) + + # Sequential write + for wmode in $MODES; do + local mname + mname=3D$(mode_name $wmode) + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + + drop_caches + set_io_mode "$wmode" 0 "$splice_off" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-write/${mname}" \ + "seq-write_testfile" + done + + # Random write + for wmode in $MODES; do + local mname + mname=3D$(mode_name $wmode) + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + + drop_caches + set_io_mode "$wmode" 0 "$splice_off" + run_fio "${FIO_JOBS_DIR}/rand-write.fio" \ + "${RESULTS_DIR}/rand-write/${mname}" \ + "rand-write_testfile" + done + + # Sequential read =E2=80=94 vary read mode, write stays buffered + # Pre-create the file for reading + log "Pre-creating sequential read test file" + set_io_mode 0 0 0 + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-read/precreate" \ + "seq-read_testfile" "$SIZE" "keep" + + # shellcheck disable=3DSC2086 + local last_mode + last_mode=3D$(echo $MODES | awk '{print $NF}') + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local splice_off=3D0 + [ "$rmode" -ne 0 ] && splice_off=3D1 + # Keep file for subsequent modes; clean up after last + local keep=3D"keep" + [ "$rmode" =3D "$last_mode" ] && keep=3D"" + + drop_caches + set_io_mode 0 "$rmode" "$splice_off" + run_fio "${FIO_JOBS_DIR}/seq-read.fio" \ + "${RESULTS_DIR}/seq-read/${mname}" \ + "seq-read_testfile" "$SIZE" "$keep" + done + + # Random read =E2=80=94 vary read mode, write stays buffered + # Pre-create the file for reading + log "Pre-creating random read test file" + set_io_mode 0 0 0 + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/rand-read/precreate" \ + "rand-read_testfile" "$SIZE" "keep" + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local splice_off=3D0 + [ "$rmode" -ne 0 ] && splice_off=3D1 + # Keep file for subsequent modes; clean up after last + local keep=3D"keep" + [ "$rmode" =3D "$last_mode" ] && keep=3D"" + + drop_caches + set_io_mode 0 "$rmode" "$splice_off" + run_fio "${FIO_JOBS_DIR}/rand-read.fio" \ + "${RESULTS_DIR}/rand-read/${mname}" \ + "rand-read_testfile" "$SIZE" "$keep" + done +} + +######################################################################## +# Deliverable 2: Multi-client (simulated with multiple fio jobs) +######################################################################## +run_deliverable2() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 2: Noisy-neighbor benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + local num_clients=3D4 + local client_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + # Each client gets RAM/num_clients so total > RAM + client_size=3D"$(( mem_kb / 1024 / num_clients ))M" + + # Scenario A: Multiple writers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local splice_off=3D0 + [ "$mode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/multi-write/${mname}" + mkdir -p "$outdir" + + set_io_mode "$mode" "$mode" "$splice_off" + drop_caches + + # Ensure client directories exist on export + for i in $(seq 1 $num_clients); do + ensure_export_dirs "client${i}/testfile" + done + + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Launch N parallel fio writers + local pids=3D() + for i in $(seq 1 $num_clients); do + run_cmd fio "${FIO_JOBS_DIR}/multi-write.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/client${i}.json" \ + --filename=3D"client${i}/testfile" \ + --size=3D"$client_size" & + pids+=3D($!) + done + + # Wait for all + local rc=3D0 + for pid in "${pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + for i in $(seq 1 $num_clients); do + cleanup_test_files "client${i}/testfile" + done + done + + # Scenario C: Noisy writer + latency-sensitive readers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local splice_off=3D0 + [ "$mode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/noisy-neighbor/${mname}" + mkdir -p "$outdir" + + set_io_mode "$mode" "$mode" "$splice_off" + drop_caches + + # Pre-create read files for latency readers + for i in $(seq 1 $(( num_clients - 1 ))); do + ensure_export_dirs "reader${i}/readfile" + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}/readfile" \ + "512M" "keep" + done + drop_caches + ensure_export_dirs "bulk/testfile" + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Noisy writer + run_cmd fio "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"bulk/testfile" \ + --size=3D"$SIZE" & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd fio "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"reader${i}/readfile" \ + --size=3D"512M" & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + cleanup_test_files "bulk/testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}/readfile" + done + done + # Scenario D: Mixed-mode noisy neighbor + # Test write/read mode combinations where the writer uses a + # cache-friendly mode and readers use buffered reads to benefit + # from warm cache. + local mixed_modes=3D( + # write_mode read_mode label + "1 0 dontcache-w_buffered-r" + ) + + for combo in "${mixed_modes[@]}"; do + local wmode rmode label + read -r wmode rmode label <<< "$combo" + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/noisy-neighbor-mixed/${label}" + mkdir -p "$outdir" + + set_io_mode "$wmode" "$rmode" "$splice_off" + drop_caches + + # Pre-create read files for latency readers + for i in $(seq 1 $(( num_clients - 1 ))); do + ensure_export_dirs "reader${i}/readfile" + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}/readfile" \ + "512M" "keep" + done + drop_caches + ensure_export_dirs "bulk/testfile" + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Noisy writer + run_cmd fio "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"bulk/testfile" \ + --size=3D"$SIZE" & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd fio "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"reader${i}/readfile" \ + --size=3D"512M" & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + cleanup_test_files "bulk/testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}/readfile" + done + done +} + +######################################################################## +# Main +######################################################################## +preflight + +TIMESTAMP=3D$(date '+%Y%m%d-%H%M%S') +RESULTS_DIR=3D"${RESULTS_DIR}/${TIMESTAMP}" +mkdir -p "$RESULTS_DIR" + +# Save system info +{ + echo "Timestamp: $TIMESTAMP" + echo "Kernel: $(uname -r)" + echo "Hostname: $(hostname)" + echo "NFS version: $NFS_VER" + echo "File size: $SIZE" + echo "Export: $EXPORT_PATH" + cat /proc/meminfo +} > "${RESULTS_DIR}/sysinfo.txt" + +log "Results will be saved to: $RESULTS_DIR" + +run_deliverable1 +run_deliverable2 + +# Reset to defaults +set_io_mode 0 0 0 + +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +log "All benchmarks complete." +log "Results in: $RESULTS_DIR" +log "Run: scripts/parse-results.sh $RESULTS_DIR" +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" diff --git a/tools/testing/nfsd-io-bench/scripts/setup-server.sh b/tools/te= sting/nfsd-io-bench/scripts/setup-server.sh new file mode 100755 index 000000000000..0efdd74a705e --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/setup-server.sh @@ -0,0 +1,94 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# One-time setup script for the NFS test server. +# Run this once before running benchmarks. +# +# Usage: sudo ./setup-server.sh [EXPORT_PATH] + +set -euo pipefail + +EXPORT_PATH=3D"${1:-/export}" +FSTYPE=3D"ext4" + +log() { + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" +} + +if [ "$(id -u)" -ne 0 ]; then + echo "ERROR: must run as root" + exit 1 +fi + +# Check for required tools +for cmd in fio exportfs showmount jq; do + if ! command -v "$cmd" &>/dev/null; then + echo "WARNING: $cmd not found, attempting install" + dnf install -y "$cmd" 2>/dev/null || \ + apt-get install -y "$cmd" 2>/dev/null || \ + echo "ERROR: cannot install $cmd, please install manually" + fi +done + +# Check fio has nfs ioengine +if ! fio --enghelp=3Dnfs &>/dev/null; then + echo "ERROR: fio nfs ioengine not available." + echo "You may need to install fio with libnfs support." + echo "Try: dnf install fio libnfs-devel (or build fio from source with -= -enable-nfs)" + exit 1 +fi + +# Create export directory if needed +if [ ! -d "$EXPORT_PATH" ]; then + log "Creating export directory: $EXPORT_PATH" + mkdir -p "$EXPORT_PATH" +fi + +# Create subdirectories for multi-client tests +for i in 1 2 3 4; do + mkdir -p "${EXPORT_PATH}/client${i}" + mkdir -p "${EXPORT_PATH}/reader${i}" +done +mkdir -p "${EXPORT_PATH}/bulk" + +# Check if already exported +if ! exportfs -s 2>/dev/null | grep -q "$EXPORT_PATH"; then + log "Adding NFS export for $EXPORT_PATH" + if ! grep -q "$EXPORT_PATH" /etc/exports 2>/dev/null; then + echo "${EXPORT_PATH} 127.0.0.1/32(rw,sync,no_root_squash,no_subtree_chec= k)" >> /etc/exports + fi + exportfs -ra +fi + +# Ensure NFS server is running +if ! systemctl is-active --quiet nfs-server 2>/dev/null; then + log "Starting NFS server" + systemctl start nfs-server +fi + +# Verify export +log "Current exports:" +showmount -e localhost + +# Check debugfs knobs +log "Checking debugfs knobs:" +DEBUGFS_BASE=3D"/sys/kernel/debug/nfsd" +for knob in io_cache_read io_cache_write disable-splice-read; do + if [ -f "${DEBUGFS_BASE}/${knob}" ]; then + echo " ${knob} =3D $(cat "${DEBUGFS_BASE}/${knob}")" + else + echo " ${knob}: NOT FOUND (kernel may be too old)" + fi +done + +# Print system summary +echo "" +log "=3D=3D=3D System Summary =3D=3D=3D" +echo "Kernel: $(uname -r)" +echo "RAM: $(awk '/MemTotal/ {printf "%.1f GB", $2/1024/1024}' /pr= oc/meminfo)" +echo "Export: $EXPORT_PATH" +echo "Filesystem: $(df -T "$EXPORT_PATH" | awk 'NR=3D=3D2 {print $2}')" +echo "Disk: $(df -h "$EXPORT_PATH" | awk 'NR=3D=3D2 {print $2, "tot= al,", $4, "free"}')" +echo "" +log "Setup complete. Run benchmarks with:" +echo " sudo ./scripts/run-benchmarks.sh -e $EXPORT_PATH" --=20 2.53.0 From nobody Fri Jun 19 08:59:21 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1003535E95E; Sun, 26 Apr 2026 11:56:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204614; cv=none; b=mUty/ULyjCye2UD6pi4Q7E+cPHoYoKojf4muaoU4hEEqJGjKdRpAOIX+dG5TuKBKAqwdQQSa52AlcJSl/Yg4+kk7GSKTV2YFwkUKRYruZo+/POqBDmUQ9LFu1I65OETsFNmPlp0D/L9UHR5qvB0wIAcT5I8rU9XUtmXYeme2yBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777204614; c=relaxed/simple; bh=t/R0Lqby4vE+4rsuqjribBP4UQlS9qH9tnTihBcaNTk=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BqA7ywrsi6S304dNqn4iHiAaqsXqiwyd9enSNnVDKpBujTiOBi0hyqrIpALA4SwUpBizfNA4OTxDMRwx/2FYX5bpzTA36jEUKUHmrB0sOP/7GRD7NAbVgMSO507lTN9q5fWyyi4cH7FjhgKNzm0OQeVKmsraNcO22wMyy80ojFY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CC/YcSoD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CC/YcSoD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D4117C2BCB6; Sun, 26 Apr 2026 11:56:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777204613; bh=t/R0Lqby4vE+4rsuqjribBP4UQlS9qH9tnTihBcaNTk=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=CC/YcSoDBgO7YOw6BcnnYzyAZtNOxzfHE4E9/G+4rW16ARliUxxBsvxPj000g4eQJ CW2uWFVaZ28PcNuNOoltyy6EBZwyGStqX5RXMfU9w6MHSYV0uamw3lwKqxjM30kl2i jRnRC7ONUON6BAEmMFQqYJyMew4Q2gTGhc/aSyr69H8hKchUZeYxJLwPZHV+5CGUvy GzUaLm+VrQHq+I/kZBxHD8d5rm5cS2ju1VpIAdZ6yoSb0hvPq3vAks+uSu83RUm4gm SqNp8sSBj3O7/Z9pEH/hw9Lj/t7aWf26PNzf87Kpnn9bCwkbVzHJXBdpZ7ByshA+l2 wrtdpgSTuFBtQ== From: Jeff Layton Date: Sun, 26 Apr 2026 07:56:10 -0400 Subject: [PATCH v3 4/4] testing: add dontcache-bench local filesystem benchmark suite Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260426-dontcache-v3-4-79eb37da9547@kernel.org> References: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> In-Reply-To: <20260426-dontcache-v3-0-79eb37da9547@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Christoph Hellwig , Kairui Song , Qi Zheng , Shakeel Butt , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=28555; i=jlayton@kernel.org; h=from:subject:message-id; bh=t/R0Lqby4vE+4rsuqjribBP4UQlS9qH9tnTihBcaNTk=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp7f102l7Yh4nSLrER4g/6pBLdkja4EDvtu3fKJ pXCVH8/RK6JAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCae39dAAKCRAADmhBGVaC FX7KD/9PRhQbbIAwPAsF7EPip+54p2ogFRLNciGY2ob3lHYxLUQ642jZ+rVbRdTvsZsnILwifvb 66IfHoSorFbeqEDkokKtG5fgW3HH4oms4ZVnQRKgs5LOc1rZTXvJdD2k9TnMbt42WWX5cnBtkRq EB720xJD+T8mqbl9J0M+i19FgFlIId4Ev8CmyYpbUgv6ywDpZYNna8kWbB0lSlc3/qlYU/5I3ps WT4RDCaCWSVg6AfRQg/pTzrV+g6/I7Eponq/k5QHPeZHbL/LUNK1Tz9iP3zHGBLkPF+vuBaIrxP ujey+Rk+7W3FNWEgjDZZSjYdtP7b5n7No5WRXdIhxS3/4MYe4uG0QdbBsFC48MEj/q1Emmm/iYs p+TCFi1bqukjwsIdUdaMljydfEjMq0iXimszI7Cwj4wneyfxZKRm6XK5+SDHBSQ52JZnE3Y+LZz vGJD1cWtkTQCdAB9K6KfRmCsdn8tIgpCYfvGIZPJuvrKglrpvrbIsRhgF0od2+14qSdgGwkEtD5 Gfo9NCvmjJtcy+NFY4nS70CADywPPGJ7x/CQbluvT47keNqsHhw+ZzHif+zHJ6Mmwrnw/0j0PGO HN+qKU+RlV5O05bkBbPN7EDfZs01kAhAjE0qOuSX1ONzHMX/sqjofNZtNMJZhilnkQXMuaRZTiJ PPiWK8v0vmXURWw== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a benchmark suite for testing IOCB_DONTCACHE on local filesystems via fio's io_uring engine with the RWF_DONTCACHE flag. The suite mirrors the nfsd-io-bench test matrix but uses io_uring with the "uncached" fio option instead of NFSD debugfs mode switching: - uncached=3D0: standard buffered I/O - uncached=3D1: RWF_DONTCACHE - Mode 2 uses O_DIRECT via fio's --direct=3D1 Includes fio job files, run-benchmarks.sh, and parse-results.sh. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton --- .../dontcache-bench/fio-jobs/lat-reader.fio | 12 + .../dontcache-bench/fio-jobs/multi-write.fio | 9 + .../dontcache-bench/fio-jobs/noisy-writer.fio | 12 + .../testing/dontcache-bench/fio-jobs/rand-read.fio | 13 + .../dontcache-bench/fio-jobs/rand-write.fio | 13 + .../testing/dontcache-bench/fio-jobs/seq-read.fio | 13 + .../testing/dontcache-bench/fio-jobs/seq-write.fio | 13 + .../dontcache-bench/scripts/parse-results.sh | 238 +++++++++ .../dontcache-bench/scripts/run-benchmarks.sh | 562 +++++++++++++++++= ++++ 9 files changed, 885 insertions(+) diff --git a/tools/testing/dontcache-bench/fio-jobs/lat-reader.fio b/tools/= testing/dontcache-bench/fio-jobs/lat-reader.fio new file mode 100644 index 000000000000..e221e7aedec9 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/lat-reader.fio @@ -0,0 +1,12 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +time_based=3D0 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dlatreader +write_lat_log=3Dlatreader + +[latreader] diff --git a/tools/testing/dontcache-bench/fio-jobs/multi-write.fio b/tools= /testing/dontcache-bench/fio-jobs/multi-write.fio new file mode 100644 index 000000000000..8fc0770f5860 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/multi-write.fio @@ -0,0 +1,9 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +time_based=3D0 +rw=3Dwrite + +[multiwrite] diff --git a/tools/testing/dontcache-bench/fio-jobs/noisy-writer.fio b/tool= s/testing/dontcache-bench/fio-jobs/noisy-writer.fio new file mode 100644 index 000000000000..4524eebd4642 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/noisy-writer.fio @@ -0,0 +1,12 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +time_based=3D0 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dnoisywriter +write_lat_log=3Dnoisywriter + +[noisywriter] diff --git a/tools/testing/dontcache-bench/fio-jobs/rand-read.fio b/tools/t= esting/dontcache-bench/fio-jobs/rand-read.fio new file mode 100644 index 000000000000..e281fa82b86a --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/rand-read.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Drandread +write_lat_log=3Drandread + +[randread] diff --git a/tools/testing/dontcache-bench/fio-jobs/rand-write.fio b/tools/= testing/dontcache-bench/fio-jobs/rand-write.fio new file mode 100644 index 000000000000..cf53bc6f14b9 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/rand-write.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Drandwrite +log_avg_msec=3D1000 +write_bw_log=3Drandwrite +write_lat_log=3Drandwrite + +[randwrite] diff --git a/tools/testing/dontcache-bench/fio-jobs/seq-read.fio b/tools/te= sting/dontcache-bench/fio-jobs/seq-read.fio new file mode 100644 index 000000000000..ef87921465a7 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/seq-read.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dseqread +write_lat_log=3Dseqread + +[seqread] diff --git a/tools/testing/dontcache-bench/fio-jobs/seq-write.fio b/tools/t= esting/dontcache-bench/fio-jobs/seq-write.fio new file mode 100644 index 000000000000..da3082f9b391 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/seq-write.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dseqwrite +write_lat_log=3Dseqwrite + +[seqwrite] diff --git a/tools/testing/dontcache-bench/scripts/parse-results.sh b/tools= /testing/dontcache-bench/scripts/parse-results.sh new file mode 100755 index 000000000000..0427d411db04 --- /dev/null +++ b/tools/testing/dontcache-bench/scripts/parse-results.sh @@ -0,0 +1,238 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Parse fio JSON output and generate comparison tables. +# +# Usage: ./parse-results.sh + +set -euo pipefail + +if [ $# -lt 1 ]; then + echo "Usage: $0 " + exit 1 +fi + +RESULTS_DIR=3D"$1" + +if ! command -v jq &>/dev/null; then + echo "ERROR: jq is required" + exit 1 +fi + +# Extract metrics from a single fio JSON result +extract_metrics() { + local json_file=3D$1 + local rw_type=3D$2 # read or write + + if [ ! -f "$json_file" ]; then + echo "N/A N/A N/A N/A N/A N/A" + return + fi + + jq -r --arg rw "$rw_type" ' + .jobs[0][$rw] as $d | + [ + (($d.bw // 0) / 1024 | . * 10 | round / 10), # MB/s + ($d.iops // 0), # IOPS + ((($d.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), # avg lat us + (($d.clat_ns.percentile["50.000000"] // 0) / 1000), # p50 us + (($d.clat_ns.percentile["99.000000"] // 0) / 1000), # p99 us + (($d.clat_ns.percentile["99.900000"] // 0) / 1000) # p99.9 us + ] | @tsv + ' "$json_file" 2>/dev/null || echo "N/A N/A N/A N/A N/A N/A" +} + +# Extract server CPU from vmstat log (average sys%) +extract_cpu() { + local vmstat_log=3D$1 + if [ ! -f "$vmstat_log" ]; then + echo "N/A" + return + fi + # vmstat columns: us sy id wa st =E2=80=94 skip header lines + awk 'NR>2 {sum+=3D$14; n++} END {if(n>0) printf "%.1f", sum/n; else print= "N/A"}' \ + "$vmstat_log" 2>/dev/null || echo "N/A" +} + +# Extract peak dirty pages from meminfo log +extract_peak_dirty() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Dirty:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || e= cho "N/A" +} + +# Extract peak cached from meminfo log +extract_peak_cached() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Cached:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || = echo "N/A" +} + +print_separator() { + printf '%*s\n' 120 '' | tr ' ' '-' +} + +######################################################################## +# Deliverable 1: Single-client results +######################################################################## +echo "" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 1: Single-Client fio Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for workload in seq-write rand-write seq-read rand-read; do + case $workload in + seq-write|rand-write) rw_type=3D"write" ;; + seq-read|rand-read) rw_type=3D"read" ;; + esac + + echo "--- $workload ---" + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "Mode" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" "Sys CPU%= " "PeakDirty(kB)" "PeakCache(kB)" + print_separator + + for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/${workload}/${mode}" + json_file=3D$(find "$dir" -name '*.json' -not -name 'client*' 2>/dev/nul= l | head -1 || true) + if [ -z "$json_file" ]; then + printf "%-16s %10s\n" "$mode" "(no data)" + continue + fi + + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "$rw_type")" + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "$mode" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + done + echo "" +done + +######################################################################## +# Deliverable 2: Multi-client results +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 2: Noisy-Neighbor Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +# Scenario A: Multiple writers +echo "--- Scenario A: Multiple Writers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/multi-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "Client" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + total_bw=3D0 + count=3D0 + for json_file in "${dir}"/client*.json; do + [ -f "$json_file" ] || continue + client=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "write")" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "$client" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + total_bw=3D$(echo "$total_bw + ${mbps:-0}" | bc 2>/dev/null || echo "$to= tal_bw") + count=3D$(( count + 1 )) + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario C: Noisy neighbor +echo "--- Scenario C: Noisy Writer + Latency-Sensitive Readers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/noisy-neighbor/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario D: Mixed-mode noisy neighbor +echo "--- Scenario D: Mixed-Mode Noisy Writer + Readers ---" +for dir in "${RESULTS_DIR}"/noisy-neighbor-mixed/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " System Info" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +if [ -f "${RESULTS_DIR}/sysinfo.txt" ]; then + head -6 "${RESULTS_DIR}/sysinfo.txt" +fi +echo "" diff --git a/tools/testing/dontcache-bench/scripts/run-benchmarks.sh b/tool= s/testing/dontcache-bench/scripts/run-benchmarks.sh new file mode 100755 index 000000000000..11bf400ef092 --- /dev/null +++ b/tools/testing/dontcache-bench/scripts/run-benchmarks.sh @@ -0,0 +1,562 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Local filesystem I/O mode benchmark suite. +# +# Runs the same test matrix as run-benchmarks.sh but on a local filesystem +# using fio's io_uring engine with the RWF_DONTCACHE flag instead of NFSD's +# debugfs mode knobs. +# +# Usage: ./run-local-benchmarks.sh [options] +# -t Test directory (must be on a filesystem supporting FOP_DON= TCACHE) +# -s File size (default: auto-sized to exceed RAM) +# -f Path to fio binary (default: fio in PATH) +# -o Output directory for results (default: ./results/) +# -d Dry run (print commands without executing) + +set -euo pipefail + +# Defaults +TEST_DIR=3D"" +SIZE=3D"" +FIO_BIN=3D"fio" +RESULTS_DIR=3D"" +DRY_RUN=3D0 +MODES=3D"0 1 2" +PERF_LOCK=3D0 +SCRIPT_DIR=3D"$(cd "$(dirname "$0")" && pwd)" +FIO_JOBS_DIR=3D"${SCRIPT_DIR}/../fio-jobs" + +usage() { + echo "Usage: $0 -t [-s ] [-f ] [-o ] [-D] [-p] [-d]" + echo "" + echo " -t Test directory (required, must support RWF_DONTCACHE)" + echo " -s File size (default: 2x RAM)" + echo " -f Path to fio binary (default: fio)" + echo " -o Output directory (default: ./results/)" + echo " -D Dontcache only (skip buffered and direct tests)" + echo " -p Profile kernel lock contention with perf lock" + echo " -d Dry run" + exit 1 +} + +while getopts "t:s:f:o:Dpdh" opt; do + case $opt in + t) TEST_DIR=3D"$OPTARG" ;; + s) SIZE=3D"$OPTARG" ;; + f) FIO_BIN=3D"$OPTARG" ;; + o) RESULTS_DIR=3D"$OPTARG" ;; + D) MODES=3D"1" ;; + p) PERF_LOCK=3D1 ;; + d) DRY_RUN=3D1 ;; + h) usage ;; + *) usage ;; + esac +done + +if [ -z "$TEST_DIR" ]; then + echo "ERROR: -t is required" + usage +fi + +# Auto-size to 2x RAM if not specified +if [ -z "$SIZE" ]; then + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + SIZE=3D"$(( mem_kb * 2 / 1024 ))M" +fi + +if [ -z "$RESULTS_DIR" ]; then + RESULTS_DIR=3D"./results/local-$(date +%Y%m%d-%H%M%S)" +fi + +mkdir -p "$RESULTS_DIR" + +log() { + echo "[$(date '+%H:%M:%S')] $*" +} + +run_cmd() { + if [ "$DRY_RUN" -eq 1 ]; then + echo " [DRY RUN] $*" + else + "$@" + fi +} + +# I/O mode definitions: +# buffered: direct=3D0, uncached=3D0 +# dontcache: direct=3D0, uncached=3D1 +# direct: direct=3D1, uncached=3D0 +# +# Mode name from numeric value +mode_name() { + case $1 in + 0) echo "buffered" ;; + 1) echo "dontcache" ;; + 2) echo "direct" ;; + esac +} + +# Return fio command-line flags for a given mode. +# "direct" is a standard fio option and works on the command line. +# "uncached" is an io_uring engine option that must be in the job file, +# so we inject it via make_job_file() below. +mode_fio_args() { + case $1 in + 0) echo "--direct=3D0" ;; # buffered + 1) echo "--direct=3D0" ;; # dontcache + 2) echo "--direct=3D1" ;; # direct + esac +} + +# Return the uncached=3D value for a given mode. +mode_uncached() { + case $1 in + 0) echo "0" ;; + 1) echo "1" ;; + 2) echo "0" ;; + esac +} + +# Create a temporary job file with uncached=3DN injected into [global]. +# For uncached=3D0 (buffered/direct), return the original file unchanged. +make_job_file() { + local job_file=3D$1 + local uncached=3D$2 + + if [ "$uncached" -eq 0 ]; then + echo "$job_file" + return + fi + + local tmp + tmp=3D$(mktemp) + sed "/^\[global\]/a uncached=3D${uncached}" "$job_file" > "$tmp" + echo "$tmp" +} + +drop_caches() { + run_cmd bash -c "sync && echo 3 > /proc/sys/vm/drop_caches" +} + +# perf lock profiling =E2=80=94 uses BPF-based live contention tracing +PERF_LOCK_PID=3D"" + +start_perf_lock() { + local outdir=3D$1 + + if [ "$PERF_LOCK" -ne 1 ]; then + return + fi + + log "Starting perf lock contention tracing" + perf lock contention -a -b --max-stack 8 \ + > "${outdir}/perf-lock-contention.txt" 2>&1 & + PERF_LOCK_PID=3D$! +} + +stop_perf_lock() { + local outdir=3D$1 + + if [ -z "$PERF_LOCK_PID" ]; then + return + fi + + log "Stopping perf lock contention tracing" + kill -TERM "$PERF_LOCK_PID" 2>/dev/null || true + wait "$PERF_LOCK_PID" 2>/dev/null || true + PERF_LOCK_PID=3D"" +} + +# Background monitors +VMSTAT_PID=3D"" +IOSTAT_PID=3D"" +MEMINFO_PID=3D"" + +start_monitors() { + local outdir=3D$1 + log "Starting monitors in $outdir" + run_cmd vmstat 1 > "${outdir}/vmstat.log" 2>&1 & + VMSTAT_PID=3D$! + run_cmd iostat -x 1 > "${outdir}/iostat.log" 2>&1 & + IOSTAT_PID=3D$! + (while true; do + echo "=3D=3D=3D $(date '+%s') =3D=3D=3D" + cat /proc/meminfo + sleep 1 + done) > "${outdir}/meminfo.log" 2>&1 & + MEMINFO_PID=3D$! +} + +stop_monitors() { + log "Stopping monitors" + kill "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true + wait "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true +} + +cleanup_test_files() { + local filepath=3D"${TEST_DIR}/$1" + log "Cleaning up $filepath" + run_cmd rm -f "$filepath" +} + +# Run a single fio benchmark +run_fio() { + local job_file=3D$1 + local outdir=3D$2 + local filename=3D$3 + local fio_size=3D${4:-$SIZE} + local keep=3D${5:-} + local extra_args=3D${6:-} + local uncached=3D${7:-0} + + # Inject uncached=3DN into the job file if needed + local actual_job + actual_job=3D$(make_job_file "$job_file" "$uncached") + + local job_name + job_name=3D$(basename "$job_file" .fio) + + log "Running fio job: $job_name -> $outdir (file=3D${TEST_DIR}/$filename = size=3D$fio_size)" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + # Skip perf lock profiling for precreate/setup runs + [ "$keep" !=3D "keep" ] && start_perf_lock "$outdir" + + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$actual_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/${job_name}.json" \ + --filename=3D"${TEST_DIR}/$filename" \ + --size=3D"$fio_size" \ + $extra_args + + [ "$keep" !=3D "keep" ] && stop_perf_lock "$outdir" + stop_monitors + log "Finished: $job_name" + + # Clean up temp job file if one was created + [ "$actual_job" !=3D "$job_file" ] && rm -f "$actual_job" + + if [ "$keep" !=3D "keep" ]; then + cleanup_test_files "$filename" + fi +} + +######################################################################## +# Preflight +######################################################################## +preflight() { + log "=3D=3D=3D Preflight checks =3D=3D=3D" + + if ! command -v "$FIO_BIN" &>/dev/null; then + echo "ERROR: fio not found at $FIO_BIN" + exit 1 + fi + + if [ ! -d "$TEST_DIR" ]; then + echo "ERROR: Test directory $TEST_DIR does not exist" + exit 1 + fi + + # Quick check that RWF_DONTCACHE works on this filesystem + local testfile=3D"${TEST_DIR}/.dontcache_test" + if ! "$FIO_BIN" --name=3Dtest --ioengine=3Dio_uring --rw=3Dwrite \ + --bs=3D4k --size=3D4k --direct=3D0 --uncached=3D1 \ + --filename=3D"$testfile" 2>/dev/null; then + echo "WARNING: RWF_DONTCACHE may not be supported on $TEST_DIR" + echo " (filesystem must support FOP_DONTCACHE)" + fi + rm -f "$testfile" + + log "Test directory: $TEST_DIR" + log "File size: $SIZE" + log "fio binary: $FIO_BIN" + log "Results: $RESULTS_DIR" + + # Record system info + { + echo "Timestamp: $(date +%Y%m%d-%H%M%S)" + echo "Kernel: $(uname -r)" + echo "Hostname: $(hostname)" + echo "Filesystem: $(df -T "$TEST_DIR" | tail -1 | awk '{print $2}')" + echo "File size: $SIZE" + echo "Test dir: $TEST_DIR" + } > "${RESULTS_DIR}/sysinfo.txt" +} + +######################################################################## +# Deliverable 1: Single-client benchmarks +######################################################################## +run_deliverable1() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 1: Single-client benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Sequential write + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-write/${mname}" \ + "seq-write_testfile" "$SIZE" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Random write + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/rand-write.fio" \ + "${RESULTS_DIR}/rand-write/${mname}" \ + "rand-write_testfile" "$SIZE" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Sequential read =E2=80=94 pre-create file, then read with each mode + log "Pre-creating sequential read test file" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-read/precreate" \ + "seq-read_testfile" "$SIZE" "keep" + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local fio_args + fio_args=3D$(mode_fio_args $rmode) + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + run_fio "${FIO_JOBS_DIR}/seq-read.fio" \ + "${RESULTS_DIR}/seq-read/${mname}" \ + "seq-read_testfile" "$SIZE" "$keep" "$fio_args" \ + "$(mode_uncached $rmode)" + done + + # Random read =E2=80=94 pre-create file, then read with each mode + log "Pre-creating random read test file" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/rand-read/precreate" \ + "rand-read_testfile" "$SIZE" "keep" + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local fio_args + fio_args=3D$(mode_fio_args $rmode) + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + run_fio "${FIO_JOBS_DIR}/rand-read.fio" \ + "${RESULTS_DIR}/rand-read/${mname}" \ + "rand-read_testfile" "$SIZE" "$keep" "$fio_args" \ + "$(mode_uncached $rmode)" + done +} + +######################################################################## +# Deliverable 2: Multi-client tests +######################################################################## +run_deliverable2() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 2: Noisy-neighbor benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + local num_clients=3D4 + local client_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + client_size=3D"$(( mem_kb / 1024 / num_clients ))M" + + # Scenario A: Multiple writers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + local uncached + uncached=3D$(mode_uncached $mode) + local actual_job + actual_job=3D$(make_job_file "${FIO_JOBS_DIR}/multi-write.fio" "$uncache= d") + local outdir=3D"${RESULTS_DIR}/multi-write/${mname}" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + local pids=3D() + for i in $(seq 1 $num_clients); do + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$actual_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/client${i}.json" \ + --filename=3D"${TEST_DIR}/client${i}_testfile" \ + --size=3D"$client_size" \ + $fio_args & + pids+=3D($!) + done + + local rc=3D0 + for pid in "${pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$actual_job" !=3D "${FIO_JOBS_DIR}/multi-write.fio" ] && rm -f "$actu= al_job" + for i in $(seq 1 $num_clients); do + cleanup_test_files "client${i}_testfile" + done + done + + # Scenario C: Noisy writer + latency-sensitive readers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + local uncached + uncached=3D$(mode_uncached $mode) + local writer_job + writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" "$uncach= ed") + local reader_job + reader_job=3D$(make_job_file "${FIO_JOBS_DIR}/lat-reader.fio" "$uncached= ") + local outdir=3D"${RESULTS_DIR}/noisy-neighbor/${mname}" + mkdir -p "$outdir" + + # Pre-create read files + for i in $(seq 1 $(( num_clients - 1 ))); do + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}_readfile" \ + "512M" "keep" + done + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Noisy writer + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"${TEST_DIR}/bulk_testfile" \ + --size=3D"$SIZE" \ + $fio_args & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$reader_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"${TEST_DIR}/reader${i}_readfile" \ + --size=3D"512M" \ + $fio_args & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$wri= ter_job" + [ "$reader_job" !=3D "${FIO_JOBS_DIR}/lat-reader.fio" ] && rm -f "$reade= r_job" + cleanup_test_files "bulk_testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}_readfile" + done + done + + # Scenario D: Mixed-mode noisy neighbor + # dontcache writes + buffered reads + local outdir=3D"${RESULTS_DIR}/noisy-neighbor-mixed/dontcache-w_buffered-= r" + mkdir -p "$outdir" + local writer_job + writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" 1) + + for i in $(seq 1 $(( num_clients - 1 ))); do + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}_readfile" \ + "512M" "keep" + done + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Writer with dontcache + run_cmd "$FIO_BIN" "$writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"${TEST_DIR}/bulk_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + local writer_pid=3D$! + + # Readers with buffered (no uncached flag) + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd "$FIO_BIN" "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"${TEST_DIR}/reader${i}_readfile" \ + --size=3D"512M" \ + --direct=3D0 & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$writ= er_job" + cleanup_test_files "bulk_testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}_readfile" + done +} + +######################################################################## +# Main +######################################################################## +preflight +run_deliverable1 +run_deliverable2 + +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +log "All benchmarks complete." +log "Results in: $RESULTS_DIR" +log "Parse with: scripts/parse-results.sh $RESULTS_DIR" +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" --=20 2.53.0