From nobody Sun Jun 14 07:34:34 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 321F03659FB; Fri, 1 May 2026 09:50:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629020; cv=none; b=ndU3P2BCwvKH1Xg2+Pnr0JG9EIvK7WrurNtLNdT0pC4jZgGq1Uql7omknCC82Dkw2xPrrrFfGIfkH2gyofORnOZB2+Jh2VLpl+zEluoMwvIxtiH41+VyAeac+JCi87TLy8HhkF+yTkVT2amXMi7qPkZwpie/6cwUTnvWEyGC32M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629020; c=relaxed/simple; bh=4CYXeOlFdNJ/l03z2utNHrOYCz+YwEmVX7CEA57VKaQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VN8S7tuKcjb91wcocshIxSZ6YNPnJIw5A+MQ8tWLze5itQWY/O5pgGr6CURJKOQQV2PiWKPElelhvcUv3izu/Ei0s7gwdroKmSmbfqfbg+wBqyLCdeiFzzTsUgSy6DPxcpYCTxsVRDQ4KMh357HGQqugwMTWW02aLRm8t+hjHNs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kKVnoSQr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kKVnoSQr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D515DC2BCB7; Fri, 1 May 2026 09:50:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777629019; bh=4CYXeOlFdNJ/l03z2utNHrOYCz+YwEmVX7CEA57VKaQ=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=kKVnoSQrWJeDILj7+mIfDLX2WD/0fJg2ViILcSv46jpdeO9Vx4w9kZ3AW5tXAdSOT VqjrDTlg+ZM8yUOrQeTKFdDtbqtAJbkuE25QQv9n4mh+DLkOSnWP9ltsCfvQ0iKTrO aPUX4tQALCljkv8adxFtd6xicjVU+UKZSKcCcE0D9gnA67ltC8miBdOYnuiLS+UiGz 6qSbqes/CKa+6PETChdfQtDAvz+VZELHp/F9gTEozl9Cm7a5HqbsZYKT+QD6ecbhug 9IhKebdVDSvcBDKqPWOAAbQcqA+7duHlwUNjdhWLAWges5mS9/vKTAjVypZTmtO6u1 vYYq+7Mcfoc6g== From: Jeff Layton Date: Fri, 01 May 2026 10:49:35 +0100 Subject: [PATCH v4 1/4] mm: track DONTCACHE dirty pages per bdi_writeback Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260501-dontcache-v4-1-5d5e6dc71cb3@kernel.org> References: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> In-Reply-To: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3683; i=jlayton@kernel.org; h=from:subject:message-id; bh=4CYXeOlFdNJ/l03z2utNHrOYCz+YwEmVX7CEA57VKaQ=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp9HdPPFn55EVKdK2kssKhcP5f4apt0ClJopsC7 t1wRkqHf96JAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCafR3TwAKCRAADmhBGVaC FeqCD/0Zqep3sm7p/GXonKUD3Hrr+qlUGhzPqP4bv5R18zbFwInLPz9G7ypQwV1i3w8OE/BfKcb zYYwJQi7Oke6qUeEkVur0wj0z0ZP+G5rUK/HlhNVYzG+XQgomTnznPcMEBtbtwS7VYL+0anFLQf IG4wpFtM959wNowJe0wgV79AwhezsYimbD2BZzSiqk9vSibo+7jkRChHIpQCK9flDS10GJoSFO6 hPRiLSO4y8Gc6HE77vD+Cu6JinCmzhQ1Amy9Z7b94vzcyi4wjgDdYKlHtUMI/9zQP5Z5lWV9QXX Kk5sfuv1g8KQ885ZXKDkq91bWtTZCJdHzPDPu0L5KI3w7jIWZh+39y+C/XIYHlxJ9AMFcVYwpo+ 4BFipwwcEpXuO064LT9CkbFddWSfPR83RDoV2H4N5Inylh8sqVp7LdnRT2H9TmDbEDWAaEdJoff G5HgvlhpoPevt7C9nFP9h8Kz4Gnz4hsF0xaeyMLzJIF01liLM5VNPr32utprTE75JCxrX5Vag4e kYevA9gwOe40xAjnOZmeH24qW0Aeesuc21gzO4DM1+fNikEjzHkQ609t6spwJrMp168x/jOx47w iLuinj8ZUYiBP2sxFQYgStBcRqVP3l1wlQtoeU3oOyMR5UX9vi3vHlqXjSTy0Ex+VbbX9AS5XUX STkQPWt0m0Q1q0Q== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a per-wb WB_DONTCACHE_DIRTY counter that tracks the number of dirty pages with the dropbehind flag set (i.e., pages dirtied via RWF_DONTCACHE writes). Increment the counter alongside WB_RECLAIMABLE in folio_account_dirtied() when the folio has the dropbehind flag set, and decrement it in folio_clear_dirty_for_io() and folio_account_cleaned(). Also decrement it when a non-DONTCACHE lookup clears the dropbehind flag on a dirty folio in __filemap_get_folio_mpol(), using proper writeback domain locking. The counter will be used by the writeback flusher to determine how many pages to write back when expediting writeback for IOCB_DONTCACHE writes, without flushing the entire BDI's dirty pages. Suggested-by: Jan Kara Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton Reviewed-by: Jan Kara --- include/linux/backing-dev-defs.h | 1 + mm/filemap.c | 13 ++++++++++++- mm/page-writeback.c | 6 ++++++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-d= efs.h index a06b93446d10..cb660dd37286 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -33,6 +33,7 @@ enum wb_stat_item { WB_WRITEBACK, WB_DIRTIED, WB_WRITTEN, + WB_DONTCACHE_DIRTY, NR_WB_STAT_ITEMS }; =20 diff --git a/mm/filemap.c b/mm/filemap.c index 4e636647100c..1c9c0d5f495f 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2052,8 +2052,19 @@ struct folio *__filemap_get_folio_mpol(struct addres= s_space *mapping, if (!folio) return ERR_PTR(-ENOENT); /* not an uncached lookup, clear uncached if set */ - if (folio_test_dropbehind(folio) && !(fgp_flags & FGP_DONTCACHE)) + if (folio_test_dropbehind(folio) && !(fgp_flags & FGP_DONTCACHE)) { + if (folio_test_dirty(folio)) { + struct inode *inode =3D mapping->host; + struct bdi_writeback *wb; + struct wb_lock_cookie cookie =3D {}; + + wb =3D unlocked_inode_to_wb_begin(inode, &cookie); + wb_stat_mod(wb, WB_DONTCACHE_DIRTY, + -folio_nr_pages(folio)); + unlocked_inode_to_wb_end(inode, &cookie); + } folio_clear_dropbehind(folio); + } return folio; } EXPORT_SYMBOL(__filemap_get_folio_mpol); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 88cd53d4ba09..8e520717d1f6 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2630,6 +2630,8 @@ static void folio_account_dirtied(struct folio *folio, wb =3D inode_to_wb(inode); =20 lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, nr); + if (folio_test_dropbehind(folio)) + wb_stat_mod(wb, WB_DONTCACHE_DIRTY, nr); __zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr); __node_stat_mod_folio(folio, NR_DIRTIED, nr); wb_stat_mod(wb, WB_RECLAIMABLE, nr); @@ -2651,6 +2653,8 @@ void folio_account_cleaned(struct folio *folio, struc= t bdi_writeback *wb) long nr =3D folio_nr_pages(folio); =20 lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr); + if (folio_test_dropbehind(folio)) + wb_stat_mod(wb, WB_DONTCACHE_DIRTY, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); wb_stat_mod(wb, WB_RECLAIMABLE, -nr); task_io_account_cancelled_write(nr * PAGE_SIZE); @@ -2920,6 +2924,8 @@ bool folio_clear_dirty_for_io(struct folio *folio) if (folio_test_clear_dirty(folio)) { long nr =3D folio_nr_pages(folio); lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, -nr); + if (folio_test_dropbehind(folio)) + wb_stat_mod(wb, WB_DONTCACHE_DIRTY, -nr); zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr); wb_stat_mod(wb, WB_RECLAIMABLE, -nr); ret =3D true; --=20 2.54.0 From nobody Sun Jun 14 07:34:34 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07218350D7D; Fri, 1 May 2026 09:50:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629025; cv=none; b=lJ/BNo7Vl/edWeVg8/VIOg1it9/3qNl5ozRU/GeGrG09IcoYK1ItyGixpcasToyroXtDEDz1bWI8DYmubIoFke5Rwmpg8LhcgBhuaF/DJvoKzD87HTyRHed2mS0BjugzFaWeYPoAxZbQ19ivUbhtdufyuZFIEeikeKep5SCeo9A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629025; c=relaxed/simple; bh=yQTt3liEqaBCXR2rlPWa8sEqJC3iU4De0vNfeKvDjTo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VzVqXK9GeHU3bUVzr2mCnPSWLw1+7Lvxz+JEkfbe4LL0ts75MnLebXLYe31gdpBl7i/iAz272329IjlMqbmmvxoThsa/gRev/bj7x5PzgOU4bAnZwBfzp9NFCzrVlAdcNiudUjQJGBbSOcaNwV+l0WY+u7lP6/pYFZStcjwjLcE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=G+nS8Pjb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="G+nS8Pjb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 44FFFC2BCB4; Fri, 1 May 2026 09:50:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777629024; bh=yQTt3liEqaBCXR2rlPWa8sEqJC3iU4De0vNfeKvDjTo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=G+nS8Pjbnmo9t/IZ57mzwBlpCRQmSurm/m1WdYyt4lnMR1H1j+7xVvyCTXLeeLbuZ cLRKLgklu+q86CKFnx3bd4R5BFOWCXb0JIOI6MAKLU1CW+//W8nAUSZT9VJy8lb6zQ ShcnX/VMJ0dzb1Cx5NHJ4LgRUIYeYT0r/amFEA17PzUtVhBaOTXKzIhBFeBUlvb9oY RF9n+TJ687D/9O7xYFGvTtcccjbeljDpZbT8XA0MW8W0N5vQ5F44W8UugIXIZbM6iR 4HqcP2jqt06D+oJ22nJNuHbbkcPqASO7nZtZ6TJQOb6D/l2aBUtHdtxmonuqfD+kqe EwnQOWE5UDTxw== From: Jeff Layton Date: Fri, 01 May 2026 10:49:36 +0100 Subject: [PATCH v4 2/4] mm: kick writeback flusher for IOCB_DONTCACHE with targeted dirty tracking Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260501-dontcache-v4-2-5d5e6dc71cb3@kernel.org> References: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> In-Reply-To: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=8810; i=jlayton@kernel.org; h=from:subject:message-id; bh=yQTt3liEqaBCXR2rlPWa8sEqJC3iU4De0vNfeKvDjTo=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp9HdQEPnjarLJF089nZsNrHY1wCEnRipW9HnmX Ak9A1pY4MyJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCafR3UAAKCRAADmhBGVaC FX3TD/9FjtZLOba+69gJe2qy6mDT7cmra2LtxLO/aVBzPzXDnPDrSmcdfM9UEygP/7hJX8zOsKH LzpeSPO1qYPIrhzqKiCv5pEtFoiPjHQFiJkl5Q4nkV5W1VFQ+ixBabI0TilzTu7Rkh0YqlObh8Q NUybJwTdLdmGzoiaZjucNOT0G92X6i+08nnVAAndvKOSNzsEIimkSOO5TnZBhoQl0SN4SSxCXiG pLtGCL6+s69HOVeJs/RKgiB6oaGuIzJXgsWFCU6wlD1/l1vXI+X5RlC8KYnH/ZeW5FK3Bb/Tczi KtLMSdDcJtB+SeNaiOmNQIU+T8+wLOG7YO1XSKi2itGHz2aLH9bsU8qFZz6DKTg7bwRI9TAc197 TNkU1dBpvTKgLw4RqjWlbAsI0awNASKAXoV8D5Ik7iDlWZR8Grn6o8ApebcrnkU867kqV/GUtlN B4IEZEqkcQXwkvUHQtMNaoRByWkl1nOFjfOrncCJhwxyKkIj1zCd+P7WI6tehiJocXZyJBchJ+L LQQlTPWUH4Dz6ASuI08fX9p3hupvNnwtaMhD4UEIwvOoMtQWIzyaA2zcTY2blc0j/xoWxFG04rM W8xJqJD59HoP3WZ9AwC+iHWNHKBASOEizv0rBSozfXzfWnjnuNZxhQrn69YJ4FX5/x5qYwmoYdk ACBMGMVwOGQLDuA== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 The IOCB_DONTCACHE writeback path in generic_write_sync() calls filemap_flush_range() on every write, submitting writeback inline in the writer's context. Perf lock contention profiling shows the performance problem is not lock contention but the writeback submission work itself =E2=80=94 walking the page tree and submitting I/O blocks the w= riter for milliseconds, inflating p99.9 latency from 23ms (buffered) to 93ms (dontcache). Replace the inline filemap_flush_range() call with a flusher kick that drains dirty pages in the background. This moves writeback submission completely off the writer's hot path. To avoid flushing unrelated buffered dirty data, add a dedicated WB_start_dontcache bit and wb_check_start_dontcache() handler that uses the per-wb WB_DONTCACHE_DIRTY counter to determine how many pages to write back. The flusher writes back that many pages from the oldest dirty inodes (not restricted to dontcache-specific inodes). This helps preserve I/O batching while limiting the scope of expedited writeback. Like WB_start_all, the WB_start_dontcache bit coalesces multiple DONTCACHE writes into a single flusher wakeup without per-write allocations. Also add WB_REASON_DONTCACHE as a new writeback reason for tracing visibility, and target the correct cgroup writeback domain via unlocked_inode_to_wb_begin(). dontcache-bench results (same host, T6F_SKL_1920GBF, 251 GiB RAM, xfs on NVMe, fio io_uring): Buffered and direct I/O paths are unaffected by this patchset. All improvements are confined to the dontcache path: Single-stream throughput (MB/s): Before After Change seq-write/dontcache 298 897 +201% rand-write/dontcache 131 236 +80% Tail latency improvements (seq-write/dontcache): p99: 135,266 us -> 23,986 us (-82%) p99.9: 8,925,479 us -> 28,443 us (-99.7%) Multi-writer (4 jobs, sequential write): Before After Change dontcache aggregate (MB/s) 2,529 4,532 +79% dontcache p99 (us) 8,553 1,002 -88% dontcache p99.9 (us) 109,314 1,057 -99% Dontcache multi-writer throughput now matches buffered (4,532 vs 4,616 MB/s). 32-file write (Axboe test): Before After Change dontcache aggregate (MB/s) 1,548 3,499 +126% dontcache p99 (us) 10,170 602 -94% Peak dirty pages (MB) 1,837 213 -88% Dontcache now reaches 81% of buffered throughput (was 35%). Competing writers (dontcache vs buffered, separate files): Before After buffered writer 868 433 MB/s dontcache writer 415 433 MB/s Aggregate 1,284 866 MB/s Previously the buffered writer starved the dontcache writer 2:1. With per-bdi_writeback tracking, both writers now receive equal bandwidth. The aggregate matches the buffered-vs-buffered baseline (863 MB/s), indicating fair sharing regardless of I/O mode. The dontcache writer's p99.9 latency collapsed from 119 ms to 33 ms (-73%), eliminating the severe periodic stalls seen in the baseline. Both writers now share identical latency profiles, matching the buffered-vs-buffered pattern. The per-bdi_writeback dirty tracking dramatically reduces peak dirty pages in dontcache workloads, with the 32-file test dropping from 1.8 GB to 213 MB. Dontcache sequential write throughput triples and multi-writer throughput reaches parity with buffered I/O, with tail latencies collapsing by 1-2 orders of magnitude. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton Reviewed-by: Jan Kara Reviewed-by: Jens Axboe --- fs/fs-writeback.c | 60 ++++++++++++++++++++++++++++++++++++= ++++ include/linux/backing-dev-defs.h | 2 ++ include/linux/fs.h | 6 ++-- include/trace/events/writeback.h | 3 +- 4 files changed, 66 insertions(+), 5 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index a65694cbfe68..b06a51fb5d6c 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1334,6 +1334,18 @@ static void wb_start_writeback(struct bdi_writeback = *wb, enum wb_reason reason) wb_wakeup(wb); } =20 +static void wb_start_dontcache_writeback(struct bdi_writeback *wb) +{ + if (!wb_has_dirty_io(wb)) + return; + + if (test_bit(WB_start_dontcache, &wb->state) || + test_and_set_bit(WB_start_dontcache, &wb->state)) + return; + + wb_wakeup(wb); +} + /** * wb_start_background_writeback - start background writeback * @wb: bdi_writback to write from @@ -2373,6 +2385,28 @@ static long wb_check_start_all(struct bdi_writeback = *wb) return nr_pages; } =20 +static long wb_check_start_dontcache(struct bdi_writeback *wb) +{ + long nr_pages; + + if (!test_bit(WB_start_dontcache, &wb->state)) + return 0; + + nr_pages =3D wb_stat(wb, WB_DONTCACHE_DIRTY); + if (nr_pages) { + struct wb_writeback_work work =3D { + .nr_pages =3D nr_pages, + .sync_mode =3D WB_SYNC_NONE, + .range_cyclic =3D 1, + .reason =3D WB_REASON_DONTCACHE, + }; + + nr_pages =3D wb_writeback(wb, &work); + } + + clear_bit(WB_start_dontcache, &wb->state); + return nr_pages; +} =20 /* * Retrieve work items and do the writeback they describe @@ -2394,6 +2428,11 @@ static long wb_do_writeback(struct bdi_writeback *wb) */ wrote +=3D wb_check_start_all(wb); =20 + /* + * Check for dontcache writeback request + */ + wrote +=3D wb_check_start_dontcache(wb); + /* * Check for periodic writeback, kupdated() style */ @@ -2468,6 +2507,27 @@ void wakeup_flusher_threads_bdi(struct backing_dev_i= nfo *bdi, rcu_read_unlock(); } =20 +/** + * filemap_dontcache_kick_writeback - kick flusher for IOCB_DONTCACHE writ= es + * @mapping: address_space that was just written to + * + * Kick the writeback flusher thread to expedite writeback of dontcache + * dirty pages. Uses a dedicated WB_start_dontcache bit so that only + * pages tracked by WB_DONTCACHE_DIRTY are written back, rather than + * flushing the entire BDI's dirty pages. + */ +void filemap_dontcache_kick_writeback(struct address_space *mapping) +{ + struct inode *inode =3D mapping->host; + struct bdi_writeback *wb; + struct wb_lock_cookie cookie =3D {}; + + wb =3D unlocked_inode_to_wb_begin(inode, &cookie); + wb_start_dontcache_writeback(wb); + unlocked_inode_to_wb_end(inode, &cookie); +} +EXPORT_SYMBOL_GPL(filemap_dontcache_kick_writeback); + /* * Wakeup the flusher threads to start writeback of all currently dirty pa= ges */ diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-d= efs.h index cb660dd37286..4f1084937315 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -26,6 +26,7 @@ enum wb_state { WB_writeback_running, /* Writeback is in progress */ WB_has_dirty_io, /* Dirty inodes on ->b_{dirty|io|more_io} */ WB_start_all, /* nr_pages =3D=3D 0 (all) work pending */ + WB_start_dontcache, /* dontcache writeback pending */ }; =20 enum wb_stat_item { @@ -56,6 +57,7 @@ enum wb_reason { */ WB_REASON_FORKER_THREAD, WB_REASON_FOREIGN_FLUSH, + WB_REASON_DONTCACHE, =20 WB_REASON_MAX, }; diff --git a/include/linux/fs.h b/include/linux/fs.h index 11559c513dfb..df72b42a9e9b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2624,6 +2624,7 @@ extern int __must_check file_write_and_wait_range(str= uct file *file, loff_t start, loff_t end); int filemap_flush_range(struct address_space *mapping, loff_t start, loff_t end); +void filemap_dontcache_kick_writeback(struct address_space *mapping); =20 static inline int file_write_and_wait(struct file *file) { @@ -2657,10 +2658,7 @@ static inline ssize_t generic_write_sync(struct kioc= b *iocb, ssize_t count) if (ret) return ret; } else if (iocb->ki_flags & IOCB_DONTCACHE) { - struct address_space *mapping =3D iocb->ki_filp->f_mapping; - - filemap_flush_range(mapping, iocb->ki_pos - count, - iocb->ki_pos - 1); + filemap_dontcache_kick_writeback(iocb->ki_filp->f_mapping); } =20 return count; diff --git a/include/trace/events/writeback.h b/include/trace/events/writeb= ack.h index bdac0d685a98..13ee076ccd16 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -44,7 +44,8 @@ EM( WB_REASON_PERIODIC, "periodic") \ EM( WB_REASON_FS_FREE_SPACE, "fs_free_space") \ EM( WB_REASON_FORKER_THREAD, "forker_thread") \ - EMe(WB_REASON_FOREIGN_FLUSH, "foreign_flush") + EM( WB_REASON_FOREIGN_FLUSH, "foreign_flush") \ + EMe(WB_REASON_DONTCACHE, "dontcache") =20 WB_WORK_REASON =20 --=20 2.54.0 From nobody Sun Jun 14 07:34:34 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C30BB350D7D; Fri, 1 May 2026 09:50:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629029; cv=none; b=Ty7sEnggF1UKaGMbTbJcpCoG2TG8+f+fwvkp1qdIJXeOwTDYoRbeP5op35MIatWYIu6OG1aAGSaKmWSK29NHxLlz0ZybWf2BqEOrGpxn2KSuIx2Kvj0nXRsje6PXW2I+vZQSD4xpPrWi9LKciIS7qSUAXLcRR4A2+si/Zrvvf7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629029; c=relaxed/simple; bh=jV7a7LUyoH0f0RoHJHLT9uBGUlICsHxyGyx+o9bJTtI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=lRXMrWZscJjIq4f6gARk3elSCM8pNCPSgjfiILCDNfzNwUaNEPAY9VMexToSB/KG+0DUaaAwbXG4+eqnDjtoSDHSHho4gOcPO6Zotjm/8quDXKmlzU9Ax09JXOrlqEO8UoBiaytLj+tdQbU6NGcQS1Of1GfPqCCQf76qDKr9n+k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=uh+nidCJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="uh+nidCJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0F54DC2BCC4; Fri, 1 May 2026 09:50:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777629029; bh=jV7a7LUyoH0f0RoHJHLT9uBGUlICsHxyGyx+o9bJTtI=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=uh+nidCJi/sOd4lQsUVDRkIgU5I4IIgTqEKJTNknwQ57jyjkX81jvT6cUsWpVJgsH Ri4FzT0Zt5RQwK108lXIMEDULtfWwzucLFpatxPrWvYsPf5Cip+wCY8zifV8XQ4EWU 9OEEcweP8P8cnyRAbeMoZdUlIV98d2QvlFECXcWnhERkO/v98Zjsj2LG4ZWqwzWV+T dqTAiudeQS9bKAVjloRIU8fag38PyUjDza+1nFxLlEI6gxojPavbrT7Ar/HTH2ZOOO 3GlIbE8JGJPg0VdgD1ckb+lUefCR7bmFKoDH8TxZ/WQtDCnLrbaCXB6Q9+B8aylY5U Oo53erTpWg8vA== From: Jeff Layton Date: Fri, 01 May 2026 10:49:37 +0100 Subject: [PATCH v4 3/4] testing: add nfsd-io-bench NFS server benchmark suite Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260501-dontcache-v4-3-5d5e6dc71cb3@kernel.org> References: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> In-Reply-To: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=33014; i=jlayton@kernel.org; h=from:subject:message-id; bh=jV7a7LUyoH0f0RoHJHLT9uBGUlICsHxyGyx+o9bJTtI=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp9HdQrX893ocMsgI4tjk9bLvYImOOhtJN1N8qj tWdxezood2JAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCafR3UAAKCRAADmhBGVaC FciTD/0US5otTUnfEjSdm+PshItR1GMelJuQQvtnpjs/WLidrm8x0uMaly/Q5GzRuJ+8nDFOfYz acDGJl1ICciKe1LfjsuHvVhEGZB+8oqgem6ZF8JoArYPz31e4jTNfH41Nk+/q3MdJtzj4rum7K6 tk5EkTpW9KPZvWGqhT06+3JACHJ2/q+cyCh4MVfGJW+JN+6VTC7cHaBige1hnojfxk+Mtcjhd8r 1A1IDSjq/8X3YIUiW11LjY87FKmMNFRzHVHmKwrgM0GO2ueiiNGasGaETCPizuWeUfZ8y/wC+bO SnDWsMYHTmno7vL8s74d0Cz4lbeyBmaxGQ2B2/f93Pf1nXUotWyauLJKdoXH0cSkKpgMDiWMois jSRwbaPaR1LiyBOS7u6kJ+q9GKs/n7IDzi8NC847BFHgUHLxxlo2IV6kJmKVfqjDQYQtCLH/4jX a3MeHjPh4BcUOLrN8hft0Cw+fYCfTPDAdLz/Y+gtEaukVvQdlIb1xFpfjt2ttQddAJc2gI1K9S+ 8FOh1RozedY7Ru72iWQsJ3Ov7j+4ghZ5X6vrHRL2GBQz6TyLy/C7zZJQeBodscwTAo72hZMP1FB 3FAKKuFoA2R6PqK2mMQIQtWNwsli8hFARYU5sX/Cw7laezJZMfAlOCQZfkWZf1jsPBhFToPNMpM g5qFWI1wqix2Rsw== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a benchmark suite for testing NFSD I/O mode performance using fio with the libnfs backend against an NFS server on localhost. Tests buffered, dontcache, and direct I/O modes via NFSD debugfs controls. Includes: - fio job files for sequential/random read/write, multi-writer, noisy-neighbor, and latency-sensitive reader workloads - run-benchmarks.sh: orchestrates test matrix with mode switching - parse-results.sh: extracts metrics from fio JSON output - setup-server.sh: configures NFS export for testing Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton --- .../testing/nfsd-io-bench/fio-jobs/lat-reader.fio | 15 + .../testing/nfsd-io-bench/fio-jobs/multi-write.fio | 14 + .../nfsd-io-bench/fio-jobs/noisy-writer.fio | 14 + tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio | 15 + .../testing/nfsd-io-bench/fio-jobs/rand-write.fio | 15 + tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio | 14 + tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio | 14 + .../testing/nfsd-io-bench/scripts/parse-results.sh | 238 +++++++++ .../nfsd-io-bench/scripts/run-benchmarks.sh | 591 +++++++++++++++++= ++++ .../testing/nfsd-io-bench/scripts/setup-server.sh | 94 ++++ 10 files changed, 1024 insertions(+) diff --git a/tools/testing/nfsd-io-bench/fio-jobs/lat-reader.fio b/tools/te= sting/nfsd-io-bench/fio-jobs/lat-reader.fio new file mode 100644 index 000000000000..61af37e8b860 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/lat-reader.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D4k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Dlatreader +write_lat_log=3Dlatreader + +[lat_reader] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/multi-write.fio b/tools/t= esting/nfsd-io-bench/fio-jobs/multi-write.fio new file mode 100644 index 000000000000..16b792aecabb --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/multi-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dmultiwrite +write_lat_log=3Dmultiwrite + +[writer] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/noisy-writer.fio b/tools/= testing/nfsd-io-bench/fio-jobs/noisy-writer.fio new file mode 100644 index 000000000000..615154a7737e --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/noisy-writer.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dnoisywriter +write_lat_log=3Dnoisywriter + +[bulk_writer] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio b/tools/tes= ting/nfsd-io-bench/fio-jobs/rand-read.fio new file mode 100644 index 000000000000..501bae7416a8 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D4k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Drandread +write_lat_log=3Drandread + +[randread] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/rand-write.fio b/tools/te= sting/nfsd-io-bench/fio-jobs/rand-write.fio new file mode 100644 index 000000000000..d891d04197ae --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/rand-write.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D64k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandwrite +log_avg_msec=3D1000 +write_bw_log=3Drandwrite +write_lat_log=3Drandwrite + +[randwrite] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio b/tools/test= ing/nfsd-io-bench/fio-jobs/seq-read.fio new file mode 100644 index 000000000000..6e24ab355026 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dseqread +write_lat_log=3Dseqread + +[seqread] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio b/tools/tes= ting/nfsd-io-bench/fio-jobs/seq-write.fio new file mode 100644 index 000000000000..260858e345f5 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dseqwrite +write_lat_log=3Dseqwrite + +[seqwrite] diff --git a/tools/testing/nfsd-io-bench/scripts/parse-results.sh b/tools/t= esting/nfsd-io-bench/scripts/parse-results.sh new file mode 100755 index 000000000000..0427d411db04 --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/parse-results.sh @@ -0,0 +1,238 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Parse fio JSON output and generate comparison tables. +# +# Usage: ./parse-results.sh + +set -euo pipefail + +if [ $# -lt 1 ]; then + echo "Usage: $0 " + exit 1 +fi + +RESULTS_DIR=3D"$1" + +if ! command -v jq &>/dev/null; then + echo "ERROR: jq is required" + exit 1 +fi + +# Extract metrics from a single fio JSON result +extract_metrics() { + local json_file=3D$1 + local rw_type=3D$2 # read or write + + if [ ! -f "$json_file" ]; then + echo "N/A N/A N/A N/A N/A N/A" + return + fi + + jq -r --arg rw "$rw_type" ' + .jobs[0][$rw] as $d | + [ + (($d.bw // 0) / 1024 | . * 10 | round / 10), # MB/s + ($d.iops // 0), # IOPS + ((($d.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), # avg lat us + (($d.clat_ns.percentile["50.000000"] // 0) / 1000), # p50 us + (($d.clat_ns.percentile["99.000000"] // 0) / 1000), # p99 us + (($d.clat_ns.percentile["99.900000"] // 0) / 1000) # p99.9 us + ] | @tsv + ' "$json_file" 2>/dev/null || echo "N/A N/A N/A N/A N/A N/A" +} + +# Extract server CPU from vmstat log (average sys%) +extract_cpu() { + local vmstat_log=3D$1 + if [ ! -f "$vmstat_log" ]; then + echo "N/A" + return + fi + # vmstat columns: us sy id wa st =E2=80=94 skip header lines + awk 'NR>2 {sum+=3D$14; n++} END {if(n>0) printf "%.1f", sum/n; else print= "N/A"}' \ + "$vmstat_log" 2>/dev/null || echo "N/A" +} + +# Extract peak dirty pages from meminfo log +extract_peak_dirty() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Dirty:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || e= cho "N/A" +} + +# Extract peak cached from meminfo log +extract_peak_cached() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Cached:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || = echo "N/A" +} + +print_separator() { + printf '%*s\n' 120 '' | tr ' ' '-' +} + +######################################################################## +# Deliverable 1: Single-client results +######################################################################## +echo "" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 1: Single-Client fio Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for workload in seq-write rand-write seq-read rand-read; do + case $workload in + seq-write|rand-write) rw_type=3D"write" ;; + seq-read|rand-read) rw_type=3D"read" ;; + esac + + echo "--- $workload ---" + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "Mode" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" "Sys CPU%= " "PeakDirty(kB)" "PeakCache(kB)" + print_separator + + for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/${workload}/${mode}" + json_file=3D$(find "$dir" -name '*.json' -not -name 'client*' 2>/dev/nul= l | head -1 || true) + if [ -z "$json_file" ]; then + printf "%-16s %10s\n" "$mode" "(no data)" + continue + fi + + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "$rw_type")" + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "$mode" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + done + echo "" +done + +######################################################################## +# Deliverable 2: Multi-client results +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 2: Noisy-Neighbor Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +# Scenario A: Multiple writers +echo "--- Scenario A: Multiple Writers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/multi-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "Client" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + total_bw=3D0 + count=3D0 + for json_file in "${dir}"/client*.json; do + [ -f "$json_file" ] || continue + client=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "write")" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "$client" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + total_bw=3D$(echo "$total_bw + ${mbps:-0}" | bc 2>/dev/null || echo "$to= tal_bw") + count=3D$(( count + 1 )) + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario C: Noisy neighbor +echo "--- Scenario C: Noisy Writer + Latency-Sensitive Readers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/noisy-neighbor/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario D: Mixed-mode noisy neighbor +echo "--- Scenario D: Mixed-Mode Noisy Writer + Readers ---" +for dir in "${RESULTS_DIR}"/noisy-neighbor-mixed/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " System Info" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +if [ -f "${RESULTS_DIR}/sysinfo.txt" ]; then + head -6 "${RESULTS_DIR}/sysinfo.txt" +fi +echo "" diff --git a/tools/testing/nfsd-io-bench/scripts/run-benchmarks.sh b/tools/= testing/nfsd-io-bench/scripts/run-benchmarks.sh new file mode 100755 index 000000000000..2b0cf6e79dff --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/run-benchmarks.sh @@ -0,0 +1,591 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# NFS server I/O mode benchmark suite +# +# Runs fio with the NFS ioengine against an NFS server on localhost, +# testing buffered, dontcache, and direct I/O modes. +# +# Usage: ./run-benchmarks.sh [OPTIONS] +# +# Options: +# -e EXPORT_PATH Server export path (default: /export) +# -s SIZE fio file size, should be >=3D 2x RAM (default: auto-d= etect) +# -r RESULTS_DIR Where to store results (default: ./results) +# -n NFS_VER NFS version: 3 or 4 (default: 3) +# -j FIO_JOBS_DIR Path to fio job files (default: ../fio-jobs) +# -d Dry run: print commands without executing +# -h Show this help + +set -euo pipefail + +# Defaults +EXPORT_PATH=3D"/export" +SIZE=3D"" +RESULTS_DIR=3D"./results" +NFS_VER=3D3 +SCRIPT_DIR=3D"$(cd "$(dirname "$0")" && pwd)" +FIO_JOBS_DIR=3D"${SCRIPT_DIR}/../fio-jobs" +DRY_RUN=3D0 +MODES=3D"0 1 2" +PERF_LOCK=3D0 + +DEBUGFS_BASE=3D"/sys/kernel/debug/nfsd" +IO_CACHE_READ=3D"${DEBUGFS_BASE}/io_cache_read" +IO_CACHE_WRITE=3D"${DEBUGFS_BASE}/io_cache_write" +DISABLE_SPLICE=3D"${DEBUGFS_BASE}/disable-splice-read" + +usage() { + echo "Usage: $0 [OPTIONS]" + echo " -e EXPORT_PATH Server export path (default: /export)" + echo " -s SIZE fio file size (default: 2x RAM)" + echo " -r RESULTS_DIR Results directory (default: ./results)" + echo " -n NFS_VER NFS version: 3 or 4 (default: 3)" + echo " -j FIO_JOBS_DIR Path to fio job files" + echo " -D Dontcache only (skip buffered and direct tests)" + echo " -p Profile kernel lock contention with perf lock" + echo " -d Dry run" + echo " -h Help" + exit 1 +} + +while getopts "e:s:r:n:j:Dpdh" opt; do + case $opt in + e) EXPORT_PATH=3D"$OPTARG" ;; + s) SIZE=3D"$OPTARG" ;; + r) RESULTS_DIR=3D"$OPTARG" ;; + n) NFS_VER=3D"$OPTARG" ;; + j) FIO_JOBS_DIR=3D"$OPTARG" ;; + D) MODES=3D"1" ;; + p) PERF_LOCK=3D1 ;; + d) DRY_RUN=3D1 ;; + h) usage ;; + *) usage ;; + esac +done + +# Auto-detect size: 2x total RAM +if [ -z "$SIZE" ]; then + MEM_KB=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + MEM_GB=3D$(( MEM_KB / 1024 / 1024 )) + SIZE=3D"$(( MEM_GB * 2 ))G" + echo "Auto-detected RAM: ${MEM_GB}G, using file size: ${SIZE}" +fi + + +log() { + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" +} + +run_cmd() { + if [ "$DRY_RUN" -eq 1 ]; then + echo " [DRY RUN] $*" + else + "$@" + fi +} + +# Preflight checks +preflight() { + log "=3D=3D=3D Preflight checks =3D=3D=3D" + + if ! command -v fio &>/dev/null; then + echo "ERROR: fio not found in PATH" + exit 1 + fi + + # Check fio has nfs ioengine + if ! fio --enghelp=3Dnfs &>/dev/null; then + echo "ERROR: fio does not have the nfs ioengine (needs libnfs)" + exit 1 + fi + + # Check debugfs knobs exist + for knob in "$IO_CACHE_READ" "$IO_CACHE_WRITE" "$DISABLE_SPLICE"; do + if [ ! -f "$knob" ]; then + echo "ERROR: $knob not found. Is the kernel new enough?" + exit 1 + fi + done + + # Check NFS server is exporting + if ! showmount -e localhost 2>/dev/null | grep -q "$EXPORT_PATH"; then + echo "WARNING: $EXPORT_PATH not in showmount output, proceeding anyway" + fi + + # Print system info + echo "Kernel: $(uname -r)" + echo "RAM: $(awk '/MemTotal/ {printf "%.1f GB", $2/1024/1024}' /pr= oc/meminfo)" + echo "Export: $EXPORT_PATH" + echo "NFS ver: $NFS_VER" + echo "File size: $SIZE" + echo "Results: $RESULTS_DIR" + echo "" +} + +# Set server I/O mode via debugfs +set_io_mode() { + local cache_write=3D$1 + local cache_read=3D$2 + local splice_off=3D$3 + + log "Setting io_cache_write=3D$cache_write io_cache_read=3D$cache_read di= sable-splice-read=3D$splice_off" + run_cmd bash -c "echo $cache_write > $IO_CACHE_WRITE" + run_cmd bash -c "echo $cache_read > $IO_CACHE_READ" + run_cmd bash -c "echo $splice_off > $DISABLE_SPLICE" +} + +# Drop page cache on server +drop_caches() { + log "Dropping page cache" + run_cmd bash -c "sync && echo 3 > /proc/sys/vm/drop_caches" + sleep 1 +} + +# Start background server monitoring +start_monitors() { + local outdir=3D$1 + + log "Starting server monitors in $outdir" + run_cmd vmstat 1 > "${outdir}/vmstat.log" 2>&1 & + VMSTAT_PID=3D$! + + run_cmd iostat -x 1 > "${outdir}/iostat.log" 2>&1 & + IOSTAT_PID=3D$! + + # Sample /proc/meminfo every second + (while true; do + echo "=3D=3D=3D $(date '+%s') =3D=3D=3D" + cat /proc/meminfo + sleep 1 + done) > "${outdir}/meminfo.log" 2>&1 & + MEMINFO_PID=3D$! +} + +# Stop background monitors +stop_monitors() { + log "Stopping monitors" + kill "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true + wait "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true +} + +# perf lock profiling =E2=80=94 uses BPF-based live contention tracing +PERF_LOCK_PID=3D"" + +start_perf_lock() { + local outdir=3D$1 + + if [ "$PERF_LOCK" -ne 1 ]; then + return + fi + + log "Starting perf lock contention tracing" + perf lock contention -a -b --max-stack 8 \ + > "${outdir}/perf-lock-contention.txt" 2>&1 & + PERF_LOCK_PID=3D$! +} + +stop_perf_lock() { + local outdir=3D$1 + + if [ -z "$PERF_LOCK_PID" ]; then + return + fi + + log "Stopping perf lock contention tracing" + kill -TERM "$PERF_LOCK_PID" 2>/dev/null || true + wait "$PERF_LOCK_PID" 2>/dev/null || true + PERF_LOCK_PID=3D"" +} + +# Run a single fio benchmark. +# nfs_url is set in the job files; we pass --filename and --size on +# the command line to vary the target file and data volume per run. +# Pass "keep" as 5th arg to preserve the test file after the run. +run_fio() { + local job_file=3D$1 + local outdir=3D$2 + local filename=3D$3 + local fio_size=3D${4:-$SIZE} + local keep=3D${5:-} + + local job_name + job_name=3D$(basename "$job_file" .fio) + + log "Running fio job: $job_name -> $outdir (file=3D$filename size=3D$fio_= size)" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + # Skip perf lock profiling for precreate/setup runs + [ "$keep" !=3D "keep" ] && start_perf_lock "$outdir" + + run_cmd fio "$job_file" \ + --output-format=3Djson \ + --output=3D"${outdir}/${job_name}.json" \ + --filename=3D"$filename" \ + --size=3D"$fio_size" + + [ "$keep" !=3D "keep" ] && stop_perf_lock "$outdir" + stop_monitors + + log "Finished: $job_name" + + # Clean up test file to free disk space unless told to keep it + if [ "$keep" !=3D "keep" ]; then + cleanup_test_files "$filename" + fi +} + +# Remove test files from the export to free disk space +cleanup_test_files() { + local filename + for filename in "$@"; do + local filepath=3D"${EXPORT_PATH}/${filename}" + log "Cleaning up: $filepath" + run_cmd rm -f "$filepath" + done +} + +# Ensure parent directories exist under the export for a given filename +ensure_export_dirs() { + local filename + for filename in "$@"; do + local dirpath=3D"${EXPORT_PATH}/$(dirname "$filename")" + if [ "$dirpath" !=3D "${EXPORT_PATH}/." ] && [ ! -d "$dirpath" ]; then + log "Creating directory: $dirpath" + run_cmd mkdir -p "$dirpath" + fi + done +} + +# Mode name from numeric value +mode_name() { + case $1 in + 0) echo "buffered" ;; + 1) echo "dontcache" ;; + 2) echo "direct" ;; + esac +} + +######################################################################## +# Deliverable 1: Single-client fio benchmarks +######################################################################## +run_deliverable1() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 1: Single-client fio benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Write test matrix: + # mode 0 (buffered): splice on (default) + # mode 1 (dontcache): splice off (required) + # mode 2 (direct): splice off (required) + + # Sequential write + for wmode in $MODES; do + local mname + mname=3D$(mode_name $wmode) + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + + drop_caches + set_io_mode "$wmode" 0 "$splice_off" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-write/${mname}" \ + "seq-write_testfile" + done + + # Random write + for wmode in $MODES; do + local mname + mname=3D$(mode_name $wmode) + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + + drop_caches + set_io_mode "$wmode" 0 "$splice_off" + run_fio "${FIO_JOBS_DIR}/rand-write.fio" \ + "${RESULTS_DIR}/rand-write/${mname}" \ + "rand-write_testfile" + done + + # Sequential read =E2=80=94 vary read mode, write stays buffered + # Pre-create the file for reading + log "Pre-creating sequential read test file" + set_io_mode 0 0 0 + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-read/precreate" \ + "seq-read_testfile" "$SIZE" "keep" + + # shellcheck disable=3DSC2086 + local last_mode + last_mode=3D$(echo $MODES | awk '{print $NF}') + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local splice_off=3D0 + [ "$rmode" -ne 0 ] && splice_off=3D1 + # Keep file for subsequent modes; clean up after last + local keep=3D"keep" + [ "$rmode" =3D "$last_mode" ] && keep=3D"" + + drop_caches + set_io_mode 0 "$rmode" "$splice_off" + run_fio "${FIO_JOBS_DIR}/seq-read.fio" \ + "${RESULTS_DIR}/seq-read/${mname}" \ + "seq-read_testfile" "$SIZE" "$keep" + done + + # Random read =E2=80=94 vary read mode, write stays buffered + # Pre-create the file for reading + log "Pre-creating random read test file" + set_io_mode 0 0 0 + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/rand-read/precreate" \ + "rand-read_testfile" "$SIZE" "keep" + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local splice_off=3D0 + [ "$rmode" -ne 0 ] && splice_off=3D1 + # Keep file for subsequent modes; clean up after last + local keep=3D"keep" + [ "$rmode" =3D "$last_mode" ] && keep=3D"" + + drop_caches + set_io_mode 0 "$rmode" "$splice_off" + run_fio "${FIO_JOBS_DIR}/rand-read.fio" \ + "${RESULTS_DIR}/rand-read/${mname}" \ + "rand-read_testfile" "$SIZE" "$keep" + done +} + +######################################################################## +# Deliverable 2: Multi-client (simulated with multiple fio jobs) +######################################################################## +run_deliverable2() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 2: Noisy-neighbor benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + local num_clients=3D4 + local client_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + # Each client gets RAM/num_clients so total > RAM + client_size=3D"$(( mem_kb / 1024 / num_clients ))M" + + # Scenario A: Multiple writers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local splice_off=3D0 + [ "$mode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/multi-write/${mname}" + mkdir -p "$outdir" + + set_io_mode "$mode" "$mode" "$splice_off" + drop_caches + + # Ensure client directories exist on export + for i in $(seq 1 $num_clients); do + ensure_export_dirs "client${i}/testfile" + done + + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Launch N parallel fio writers + local pids=3D() + for i in $(seq 1 $num_clients); do + run_cmd fio "${FIO_JOBS_DIR}/multi-write.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/client${i}.json" \ + --filename=3D"client${i}/testfile" \ + --size=3D"$client_size" & + pids+=3D($!) + done + + # Wait for all + local rc=3D0 + for pid in "${pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + for i in $(seq 1 $num_clients); do + cleanup_test_files "client${i}/testfile" + done + done + + # Scenario C: Noisy writer + latency-sensitive readers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local splice_off=3D0 + [ "$mode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/noisy-neighbor/${mname}" + mkdir -p "$outdir" + + set_io_mode "$mode" "$mode" "$splice_off" + drop_caches + + # Pre-create read files for latency readers + for i in $(seq 1 $(( num_clients - 1 ))); do + ensure_export_dirs "reader${i}/readfile" + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}/readfile" \ + "512M" "keep" + done + drop_caches + ensure_export_dirs "bulk/testfile" + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Noisy writer + run_cmd fio "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"bulk/testfile" \ + --size=3D"$SIZE" & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd fio "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"reader${i}/readfile" \ + --size=3D"512M" & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + cleanup_test_files "bulk/testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}/readfile" + done + done + # Scenario D: Mixed-mode noisy neighbor + # Test write/read mode combinations where the writer uses a + # cache-friendly mode and readers use buffered reads to benefit + # from warm cache. + local mixed_modes=3D( + # write_mode read_mode label + "1 0 dontcache-w_buffered-r" + ) + + for combo in "${mixed_modes[@]}"; do + local wmode rmode label + read -r wmode rmode label <<< "$combo" + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/noisy-neighbor-mixed/${label}" + mkdir -p "$outdir" + + set_io_mode "$wmode" "$rmode" "$splice_off" + drop_caches + + # Pre-create read files for latency readers + for i in $(seq 1 $(( num_clients - 1 ))); do + ensure_export_dirs "reader${i}/readfile" + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}/readfile" \ + "512M" "keep" + done + drop_caches + ensure_export_dirs "bulk/testfile" + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Noisy writer + run_cmd fio "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"bulk/testfile" \ + --size=3D"$SIZE" & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd fio "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"reader${i}/readfile" \ + --size=3D"512M" & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + cleanup_test_files "bulk/testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}/readfile" + done + done +} + +######################################################################## +# Main +######################################################################## +preflight + +TIMESTAMP=3D$(date '+%Y%m%d-%H%M%S') +RESULTS_DIR=3D"${RESULTS_DIR}/${TIMESTAMP}" +mkdir -p "$RESULTS_DIR" + +# Save system info +{ + echo "Timestamp: $TIMESTAMP" + echo "Kernel: $(uname -r)" + echo "Hostname: $(hostname)" + echo "NFS version: $NFS_VER" + echo "File size: $SIZE" + echo "Export: $EXPORT_PATH" + cat /proc/meminfo +} > "${RESULTS_DIR}/sysinfo.txt" + +log "Results will be saved to: $RESULTS_DIR" + +run_deliverable1 +run_deliverable2 + +# Reset to defaults +set_io_mode 0 0 0 + +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +log "All benchmarks complete." +log "Results in: $RESULTS_DIR" +log "Run: scripts/parse-results.sh $RESULTS_DIR" +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" diff --git a/tools/testing/nfsd-io-bench/scripts/setup-server.sh b/tools/te= sting/nfsd-io-bench/scripts/setup-server.sh new file mode 100755 index 000000000000..0efdd74a705e --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/setup-server.sh @@ -0,0 +1,94 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# One-time setup script for the NFS test server. +# Run this once before running benchmarks. +# +# Usage: sudo ./setup-server.sh [EXPORT_PATH] + +set -euo pipefail + +EXPORT_PATH=3D"${1:-/export}" +FSTYPE=3D"ext4" + +log() { + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" +} + +if [ "$(id -u)" -ne 0 ]; then + echo "ERROR: must run as root" + exit 1 +fi + +# Check for required tools +for cmd in fio exportfs showmount jq; do + if ! command -v "$cmd" &>/dev/null; then + echo "WARNING: $cmd not found, attempting install" + dnf install -y "$cmd" 2>/dev/null || \ + apt-get install -y "$cmd" 2>/dev/null || \ + echo "ERROR: cannot install $cmd, please install manually" + fi +done + +# Check fio has nfs ioengine +if ! fio --enghelp=3Dnfs &>/dev/null; then + echo "ERROR: fio nfs ioengine not available." + echo "You may need to install fio with libnfs support." + echo "Try: dnf install fio libnfs-devel (or build fio from source with -= -enable-nfs)" + exit 1 +fi + +# Create export directory if needed +if [ ! -d "$EXPORT_PATH" ]; then + log "Creating export directory: $EXPORT_PATH" + mkdir -p "$EXPORT_PATH" +fi + +# Create subdirectories for multi-client tests +for i in 1 2 3 4; do + mkdir -p "${EXPORT_PATH}/client${i}" + mkdir -p "${EXPORT_PATH}/reader${i}" +done +mkdir -p "${EXPORT_PATH}/bulk" + +# Check if already exported +if ! exportfs -s 2>/dev/null | grep -q "$EXPORT_PATH"; then + log "Adding NFS export for $EXPORT_PATH" + if ! grep -q "$EXPORT_PATH" /etc/exports 2>/dev/null; then + echo "${EXPORT_PATH} 127.0.0.1/32(rw,sync,no_root_squash,no_subtree_chec= k)" >> /etc/exports + fi + exportfs -ra +fi + +# Ensure NFS server is running +if ! systemctl is-active --quiet nfs-server 2>/dev/null; then + log "Starting NFS server" + systemctl start nfs-server +fi + +# Verify export +log "Current exports:" +showmount -e localhost + +# Check debugfs knobs +log "Checking debugfs knobs:" +DEBUGFS_BASE=3D"/sys/kernel/debug/nfsd" +for knob in io_cache_read io_cache_write disable-splice-read; do + if [ -f "${DEBUGFS_BASE}/${knob}" ]; then + echo " ${knob} =3D $(cat "${DEBUGFS_BASE}/${knob}")" + else + echo " ${knob}: NOT FOUND (kernel may be too old)" + fi +done + +# Print system summary +echo "" +log "=3D=3D=3D System Summary =3D=3D=3D" +echo "Kernel: $(uname -r)" +echo "RAM: $(awk '/MemTotal/ {printf "%.1f GB", $2/1024/1024}' /pr= oc/meminfo)" +echo "Export: $EXPORT_PATH" +echo "Filesystem: $(df -T "$EXPORT_PATH" | awk 'NR=3D=3D2 {print $2}')" +echo "Disk: $(df -h "$EXPORT_PATH" | awk 'NR=3D=3D2 {print $2, "tot= al,", $4, "free"}')" +echo "" +log "Setup complete. Run benchmarks with:" +echo " sudo ./scripts/run-benchmarks.sh -e $EXPORT_PATH" --=20 2.54.0 From nobody Sun Jun 14 07:34:34 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE525381B17; Fri, 1 May 2026 09:50:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629033; cv=none; b=ZvAuZZnQZBroHXQoRVyePv2QrecoGAYEERvbh+VrVpkbQsBByyoDluwECESVRzKUutpcmNdRdL4pZpfAh7xMcBMuQCPZkqKrfkcf3qZZJOOFOY0nMzktsa92u2OS/e4qIXHDx7JMtv1DXI2D6JqKOeIdouwUEAisLBFsrsqBHAw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777629033; c=relaxed/simple; bh=hUHYNCWIhSe+/Qh/K+BvAkMBd2YFsTCdG8HEoxi1nKc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=J9T7aPtHNJa0GJlgwnAsO9+/dCnnbVA7rTNJ3udOCQCUXpGZi1ZxMInqPbJgJny/guhGqpN0l0yf/y7KTlnrYXFNcIlD300ShBCYrmBSeEHkeRhFB3Zue8ucJZGZvLm10S22ULYyyfqLXvGmGfFQx3N9co44b9BlUk6F+7jJ7CI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KFr0k+75; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KFr0k+75" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D9062C2BCC6; Fri, 1 May 2026 09:50:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777629033; bh=hUHYNCWIhSe+/Qh/K+BvAkMBd2YFsTCdG8HEoxi1nKc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=KFr0k+75l3BFWhtSlVbO2I3cFTGv5p0dYo4MMDPn8eYBsBvduwWX9gCoZDsDEL+/s yk0UN5i3MONRX4tN+vjpmb+HbeQ0DOC5sfEpxhfyqL2ysD6GvUvyYYI79K/9P24vWK EKKHSXnJtBP3IMwc+FWdvvOI25anLnd5mjK8G7JHxxoyrGq52dlnim+3gOkaaYBH2m RtjQxOSZHqosjgOaK8mRHtGAXUaGjgIVgbxckisBCF5TphHHBFdp5cmhmPSbG3uWfG AKlbIIVoOozNz7ULwMVTyrYP7BdlYEoHsFyiIN609bd1cbNOr6gD79HfH+AHId5R7x 8VHAYLlXknj0w== From: Jeff Layton Date: Fri, 01 May 2026 10:49:38 +0100 Subject: [PATCH v4 4/4] testing: add dontcache-bench local filesystem benchmark suite Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260501-dontcache-v4-4-5d5e6dc71cb3@kernel.org> References: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> In-Reply-To: <20260501-dontcache-v4-0-5d5e6dc71cb3@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Jens Axboe , Ritesh Harjani , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=36514; i=jlayton@kernel.org; h=from:subject:message-id; bh=hUHYNCWIhSe+/Qh/K+BvAkMBd2YFsTCdG8HEoxi1nKc=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBp9HdRJAO6PBfMG3nSIH53z1iYHC1PfmzCF6wYD JjAVTpM5bSJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCafR3UQAKCRAADmhBGVaC FWzzD/9njwPC3Frl8e3NHB04t4oA55Q5AHAntkIn7NWKuUpEvu5DeJvwpOU59quxVLOjXtDtRwG Encfl48/cJ9bYVfbUy55OnZ/Kgr2uFd4lmp6ugVHC/WkWEwX48GG8VQ0nQ1rq2cNZiC2D/l0g3c rPvhYhxipDO3sCfz04etWRCj/LdRazZic/xT0RMoLwZuGhGFQMZ201/oeD7Q+d7r0O8x62jMQLA P2h2byXwAQqLdrHjTuTWJ6FZM7O/KI8grPw//RzlbjLv6Lvu3n8gQvGAgAXVoDdQ0PvsL7SmvtV ux9xjgzua3zKMZNd3FxNkfaDYDckwBllCIgjWuzH/7qYIWj8eKNihGcSSfYaQjnb/bLUbkD5h3C EfLSoe+WeSI2S+YLeuRRnO11yrQh3sif3BS/4d/4l90pIzuG+UDGWlH4J8RVXLfRaSKjFz8meNz ntqqnmONaJ/SftPTyniXoUpyutDg2M+dYM7upIFJO/voR65WRAl7j1rNqxnr8PEdhQYoO0D/rVD EBCiRBexU2etL9+f+Tj+jpx2xfiPn6vRFpdA56Wv7Zo9loDz8vRbAfm/wkXSNJrYc9rXzQMLfc+ m0+Fhj3/cSJb5QI4Dok1nWGdKmxabtqhZeZ2Bh87eyFbkVOHboojktPh9ERP/qXet91YfOcySiE m1XiAkOAmdoJwUg== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a benchmark suite for testing IOCB_DONTCACHE on local filesystems via fio's io_uring engine with the RWF_DONTCACHE flag. The suite mirrors the nfsd-io-bench test matrix but uses io_uring with the "uncached" fio option instead of NFSD debugfs mode switching: - uncached=3D0: standard buffered I/O - uncached=3D1: RWF_DONTCACHE - Mode 2 uses O_DIRECT via fio's --direct=3D1 Additionally, this benchmark includes: - a benchmark for competing buffered vs. dontcache writers to different files on the same backing device. - a benchmark mirroring Jens Axboe's original RWF_UNCACHED write test: 32 concurrent writers with 64K block size, time-based (300s), with per-second bandwidth logging Includes fio job files, run-benchmarks.sh, and parse-results.sh. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Jeff Layton --- .../dontcache-bench/fio-jobs/axboe-write.fio | 14 + .../dontcache-bench/fio-jobs/lat-reader.fio | 12 + .../dontcache-bench/fio-jobs/multi-write.fio | 11 + .../dontcache-bench/fio-jobs/noisy-writer.fio | 12 + .../testing/dontcache-bench/fio-jobs/rand-read.fio | 13 + .../dontcache-bench/fio-jobs/rand-write.fio | 13 + .../testing/dontcache-bench/fio-jobs/seq-read.fio | 13 + .../testing/dontcache-bench/fio-jobs/seq-write.fio | 13 + .../dontcache-bench/scripts/parse-results.sh | 346 +++++++++++ .../dontcache-bench/scripts/run-benchmarks.sh | 643 +++++++++++++++++= ++++ 10 files changed, 1090 insertions(+) diff --git a/tools/testing/dontcache-bench/fio-jobs/axboe-write.fio b/tools= /testing/dontcache-bench/fio-jobs/axboe-write.fio new file mode 100644 index 000000000000..7cabcb740f0d --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/axboe-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D64k +numjobs=3D32 +time_based +runtime=3D300 +rw=3Dwrite +group_reporting=3D0 +filename_format=3D$jobname.$jobnum +log_avg_msec=3D1000 +write_bw_log=3Daxboe-write + +[axboe-write] diff --git a/tools/testing/dontcache-bench/fio-jobs/lat-reader.fio b/tools/= testing/dontcache-bench/fio-jobs/lat-reader.fio new file mode 100644 index 000000000000..e221e7aedec9 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/lat-reader.fio @@ -0,0 +1,12 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +time_based=3D0 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dlatreader +write_lat_log=3Dlatreader + +[latreader] diff --git a/tools/testing/dontcache-bench/fio-jobs/multi-write.fio b/tools= /testing/dontcache-bench/fio-jobs/multi-write.fio new file mode 100644 index 000000000000..c9cd11ec40fd --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/multi-write.fio @@ -0,0 +1,11 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D4 +time_based=3D0 +rw=3Dwrite +group_reporting=3D0 +filename_format=3D$jobname.$jobnum + +[multiwrite] diff --git a/tools/testing/dontcache-bench/fio-jobs/noisy-writer.fio b/tool= s/testing/dontcache-bench/fio-jobs/noisy-writer.fio new file mode 100644 index 000000000000..4524eebd4642 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/noisy-writer.fio @@ -0,0 +1,12 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +time_based=3D0 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dnoisywriter +write_lat_log=3Dnoisywriter + +[noisywriter] diff --git a/tools/testing/dontcache-bench/fio-jobs/rand-read.fio b/tools/t= esting/dontcache-bench/fio-jobs/rand-read.fio new file mode 100644 index 000000000000..e281fa82b86a --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/rand-read.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Drandread +write_lat_log=3Drandread + +[randread] diff --git a/tools/testing/dontcache-bench/fio-jobs/rand-write.fio b/tools/= testing/dontcache-bench/fio-jobs/rand-write.fio new file mode 100644 index 000000000000..cf53bc6f14b9 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/rand-write.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Drandwrite +log_avg_msec=3D1000 +write_bw_log=3Drandwrite +write_lat_log=3Drandwrite + +[randwrite] diff --git a/tools/testing/dontcache-bench/fio-jobs/seq-read.fio b/tools/te= sting/dontcache-bench/fio-jobs/seq-read.fio new file mode 100644 index 000000000000..ef87921465a7 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/seq-read.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dseqread +write_lat_log=3Dseqread + +[seqread] diff --git a/tools/testing/dontcache-bench/fio-jobs/seq-write.fio b/tools/t= esting/dontcache-bench/fio-jobs/seq-write.fio new file mode 100644 index 000000000000..da3082f9b391 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/seq-write.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dseqwrite +write_lat_log=3Dseqwrite + +[seqwrite] diff --git a/tools/testing/dontcache-bench/scripts/parse-results.sh b/tools= /testing/dontcache-bench/scripts/parse-results.sh new file mode 100755 index 000000000000..ba43a039153f --- /dev/null +++ b/tools/testing/dontcache-bench/scripts/parse-results.sh @@ -0,0 +1,346 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Parse fio JSON output and generate comparison tables. +# +# Usage: ./parse-results.sh + +set -euo pipefail + +if [ $# -lt 1 ]; then + echo "Usage: $0 " + exit 1 +fi + +RESULTS_DIR=3D"$1" + +if ! command -v jq &>/dev/null; then + echo "ERROR: jq is required" + exit 1 +fi + +# Extract metrics from a single fio JSON result +extract_metrics() { + local json_file=3D$1 + local rw_type=3D$2 # read or write + + if [ ! -f "$json_file" ]; then + echo "N/A N/A N/A N/A N/A N/A" + return + fi + + jq -r --arg rw "$rw_type" ' + .jobs[0][$rw] as $d | + [ + (($d.bw // 0) / 1024 | . * 10 | round / 10), # MB/s + ($d.iops // 0), # IOPS + ((($d.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), # avg lat us + (($d.clat_ns.percentile["50.000000"] // 0) / 1000), # p50 us + (($d.clat_ns.percentile["99.000000"] // 0) / 1000), # p99 us + (($d.clat_ns.percentile["99.900000"] // 0) / 1000) # p99.9 us + ] | @tsv + ' "$json_file" 2>/dev/null || echo "N/A N/A N/A N/A N/A N/A" +} + +# Extract server CPU from vmstat log (average sys%) +extract_cpu() { + local vmstat_log=3D$1 + if [ ! -f "$vmstat_log" ]; then + echo "N/A" + return + fi + # vmstat columns: us sy id wa st =E2=80=94 skip header lines + awk 'NR>2 {sum+=3D$14; n++} END {if(n>0) printf "%.1f", sum/n; else print= "N/A"}' \ + "$vmstat_log" 2>/dev/null || echo "N/A" +} + +# Extract peak dirty pages from meminfo log +extract_peak_dirty() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Dirty:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || e= cho "N/A" +} + +# Extract peak cached from meminfo log +extract_peak_cached() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Cached:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || = echo "N/A" +} + +print_separator() { + printf '%*s\n' 120 '' | tr ' ' '-' +} + +######################################################################## +# Deliverable 1: Single-client results +######################################################################## +echo "" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 1: Single-Client fio Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for workload in seq-write rand-write seq-read rand-read; do + case $workload in + seq-write|rand-write) rw_type=3D"write" ;; + seq-read|rand-read) rw_type=3D"read" ;; + esac + + echo "--- $workload ---" + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "Mode" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" "Sys CPU%= " "PeakDirty(kB)" "PeakCache(kB)" + print_separator + + for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/${workload}/${mode}" + json_file=3D$(find "$dir" -name '*.json' -not -name 'client*' 2>/dev/nul= l | head -1 || true) + if [ -z "$json_file" ]; then + printf "%-16s %10s\n" "$mode" "(no data)" + continue + fi + + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "$rw_type")" + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "$mode" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + done + echo "" +done + +######################################################################## +# Deliverable 2: Multi-client results +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 2: Noisy-Neighbor Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +# Scenario A: Multiple writers +echo "--- Scenario A: Multiple Writers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/multi-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + json_file=3D$(find "$dir" -name '*.json' 2>/dev/null | head -1 || true) + if [ -z "$json_file" ] || [ ! -f "$json_file" ]; then + echo " Mode: $mode (no data)" + continue + fi + + echo " Mode: $mode" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Parse per-job stats from the single fio JSON output + jq -r '.jobs[] | + [ + .jobname, + ((.write.bw // 0) / 1024 | . * 10 | round / 10), + (.write.iops // 0), + (((.write.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), + ((.write.clat_ns.percentile["50.000000"] // 0) / 1000), + ((.write.clat_ns.percentile["99.000000"] // 0) / 1000), + ((.write.clat_ns.percentile["99.900000"] // 0) / 1000) + ] | @tsv + ' "$json_file" 2>/dev/null | while IFS=3D$'\t' read -r name mbps iops avg= _lat p50 p99 p999; do + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "$name" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + # Aggregate bandwidth + total_bw=3D$(jq '[.jobs[].write.bw // 0] | add / 1024 | . * 10 | round / = 10' \ + "$json_file" 2>/dev/null || echo "N/A") + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario C: Noisy neighbor +echo "--- Scenario C: Noisy Writer + Latency-Sensitive Readers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/noisy-neighbor/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario D: Mixed-mode noisy neighbor +echo "--- Scenario D: Mixed-Mode Noisy Writer + Readers ---" +for dir in "${RESULTS_DIR}"/noisy-neighbor-mixed/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario E: Competing writers +echo "--- Scenario E: Competing Writers (Separate Files) ---" +for dir in "${RESULTS_DIR}"/competing-writers/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-20s %10s %10s %10s %10s %10s %10s\n" \ + "Writer" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + total_bw=3D0 + for json_file in "${dir}"/writer*.json; do + [ -f "$json_file" ] || continue + writer=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "write")" + printf " %-20s %10s %10s %10s %10s %10s %10s\n" \ + "$writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + total_bw=3D$(echo "$total_bw + ${mbps:-0}" | bc 2>/dev/null || echo "$to= tal_bw") + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +######################################################################## +# Deliverable 3: Axboe 32-file write benchmark +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 3: 32-File Write (Axboe Test)" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/axboe-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + json_file=3D$(find "$dir" -name '*.json' 2>/dev/null | head -1 || true) + if [ -z "$json_file" ] || [ ! -f "$json_file" ]; then + echo "--- $mode: (no data) ---" + continue + fi + + echo "--- $mode ---" + + # Aggregate stats across all 32 jobs + agg_bw=3D$(jq '[.jobs[].write.bw // 0] | add / 1024 | . * 10 | round / 10= ' \ + "$json_file" 2>/dev/null || echo "N/A") + agg_iops=3D$(jq '[.jobs[].write.iops // 0] | add | round' \ + "$json_file" 2>/dev/null || echo "N/A") + + # Average latency across jobs + avg_lat=3D$(jq '[.jobs[].write.clat_ns.mean // 0] | (add / length / 1000)= | + . * 10 | round / 10' "$json_file" 2>/dev/null || echo "N/A") + avg_p50=3D$(jq '[.jobs[].write.clat_ns.percentile["50.000000"] // 0] | + (add / length / 1000) | round' "$json_file" 2>/dev/null || echo "N/A") + avg_p99=3D$(jq '[.jobs[].write.clat_ns.percentile["99.000000"] // 0] | + (add / length / 1000) | round' "$json_file" 2>/dev/null || echo "N/A") + avg_p999=3D$(jq '[.jobs[].write.clat_ns.percentile["99.900000"] // 0] | + (add / length / 1000) | round' "$json_file" 2>/dev/null || echo "N/A") + + printf " Aggregate BW: %s MB/s | IOPS: %s\n" "$agg_bw" "$agg_iops" + printf " Avg Latency: %s us | p50: %s us | p99: %s us | p99.9: %s us\n" \ + "$avg_lat" "$avg_p50" "$avg_p99" "$avg_p999" + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB | Peak Cached: %s kB\n" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + + # Per-second bandwidth from fio bw log (shows the page-cache cliff) + bw_log=3D$(find "$dir" -name '*_bw.*.log' 2>/dev/null | head -1 || true) + if [ -n "$bw_log" ] && [ -f "$bw_log" ]; then + echo " Per-second aggregate BW (MB/s):" + # fio bw logs: msec, bw_kB, rw, bs =E2=80=94 aggregate across all job lo= gs + for logfile in "${dir}"/*_bw.*.log; do + [ -f "$logfile" ] || continue + cat "$logfile" + done | awk -F',' '{ + sec =3D int($1 / 1000) + 1 + bw[sec] +=3D $2 + } END { + n =3D asorti(bw, sorted, "@ind_num_asc") + for (i =3D 1; i <=3D n; i++) + printf " %2ds: %.0f MB/s\n", sorted[i], bw[sorted[i]] / 1024 + }' + fi + echo "" +done + +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " System Info" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +if [ -f "${RESULTS_DIR}/sysinfo.txt" ]; then + head -6 "${RESULTS_DIR}/sysinfo.txt" +fi +echo "" diff --git a/tools/testing/dontcache-bench/scripts/run-benchmarks.sh b/tool= s/testing/dontcache-bench/scripts/run-benchmarks.sh new file mode 100755 index 000000000000..e7278567e1a5 --- /dev/null +++ b/tools/testing/dontcache-bench/scripts/run-benchmarks.sh @@ -0,0 +1,643 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Local filesystem I/O mode benchmark suite. +# +# Runs the same test matrix as run-benchmarks.sh but on a local filesystem +# using fio's io_uring engine with the RWF_DONTCACHE flag instead of NFSD's +# debugfs mode knobs. +# +# Usage: ./run-local-benchmarks.sh [options] +# -t Test directory (must be on a filesystem supporting FOP_DON= TCACHE) +# -s File size (default: auto-sized to exceed RAM) +# -f Path to fio binary (default: fio in PATH) +# -o Output directory for results (default: ./results/) +# -d Dry run (print commands without executing) + +set -euo pipefail + +# Defaults +TEST_DIR=3D"" +SIZE=3D"" +FIO_BIN=3D"fio" +RESULTS_DIR=3D"" +DRY_RUN=3D0 +MODES=3D"0 1 2" +PERF_LOCK=3D0 +SCRIPT_DIR=3D"$(cd "$(dirname "$0")" && pwd)" +FIO_JOBS_DIR=3D"${SCRIPT_DIR}/../fio-jobs" + +usage() { + echo "Usage: $0 -t [-s ] [-f ] [-o ] [-D] [-p] [-d]" + echo "" + echo " -t Test directory (required, must support RWF_DONTCACHE)" + echo " -s File size (default: 2x RAM)" + echo " -f Path to fio binary (default: fio)" + echo " -o Output directory (default: ./results/)" + echo " -D Dontcache only (skip buffered and direct tests)" + echo " -p Profile kernel lock contention with perf lock" + echo " -d Dry run" + exit 1 +} + +while getopts "t:s:f:o:Dpdh" opt; do + case $opt in + t) TEST_DIR=3D"$OPTARG" ;; + s) SIZE=3D"$OPTARG" ;; + f) FIO_BIN=3D"$OPTARG" ;; + o) RESULTS_DIR=3D"$OPTARG" ;; + D) MODES=3D"1" ;; + p) PERF_LOCK=3D1 ;; + d) DRY_RUN=3D1 ;; + h) usage ;; + *) usage ;; + esac +done + +if [ -z "$TEST_DIR" ]; then + echo "ERROR: -t is required" + usage +fi + +# Auto-size to 2x RAM if not specified +if [ -z "$SIZE" ]; then + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + SIZE=3D"$(( mem_kb * 2 / 1024 ))M" +fi + +if [ -z "$RESULTS_DIR" ]; then + RESULTS_DIR=3D"./results/local-$(date +%Y%m%d-%H%M%S)" +fi + +mkdir -p "$RESULTS_DIR" + +log() { + echo "[$(date '+%H:%M:%S')] $*" +} + +run_cmd() { + if [ "$DRY_RUN" -eq 1 ]; then + echo " [DRY RUN] $*" + else + "$@" + fi +} + +# I/O mode definitions: +# buffered: direct=3D0, uncached=3D0 +# dontcache: direct=3D0, uncached=3D1 +# direct: direct=3D1, uncached=3D0 +# +# Mode name from numeric value +mode_name() { + case $1 in + 0) echo "buffered" ;; + 1) echo "dontcache" ;; + 2) echo "direct" ;; + esac +} + +# Return fio command-line flags for a given mode. +# "direct" is a standard fio option and works on the command line. +# "uncached" is an io_uring engine option that must be in the job file, +# so we inject it via make_job_file() below. +mode_fio_args() { + case $1 in + 0) echo "--direct=3D0" ;; # buffered + 1) echo "--direct=3D0" ;; # dontcache + 2) echo "--direct=3D1" ;; # direct + esac +} + +# Return the uncached=3D value for a given mode. +mode_uncached() { + case $1 in + 0) echo "0" ;; + 1) echo "1" ;; + 2) echo "0" ;; + esac +} + +# Create a temporary job file with uncached=3DN injected into [global]. +# For uncached=3D0 (buffered/direct), return the original file unchanged. +make_job_file() { + local job_file=3D$1 + local uncached=3D$2 + + if [ "$uncached" -eq 0 ]; then + echo "$job_file" + return + fi + + local tmp + tmp=3D$(mktemp) + sed "/^\[global\]/a uncached=3D${uncached}" "$job_file" > "$tmp" + echo "$tmp" +} + +drop_caches() { + run_cmd bash -c "sync && echo 3 > /proc/sys/vm/drop_caches" +} + +# perf lock profiling =E2=80=94 uses BPF-based live contention tracing +PERF_LOCK_PID=3D"" + +start_perf_lock() { + local outdir=3D$1 + + if [ "$PERF_LOCK" -ne 1 ]; then + return + fi + + log "Starting perf lock contention tracing" + perf lock contention -a -b --max-stack 8 \ + > "${outdir}/perf-lock-contention.txt" 2>&1 & + PERF_LOCK_PID=3D$! +} + +stop_perf_lock() { + local outdir=3D$1 + + if [ -z "$PERF_LOCK_PID" ]; then + return + fi + + log "Stopping perf lock contention tracing" + kill -TERM "$PERF_LOCK_PID" 2>/dev/null || true + wait "$PERF_LOCK_PID" 2>/dev/null || true + PERF_LOCK_PID=3D"" +} + +# Background monitors +VMSTAT_PID=3D"" +IOSTAT_PID=3D"" +MEMINFO_PID=3D"" + +start_monitors() { + local outdir=3D$1 + log "Starting monitors in $outdir" + run_cmd vmstat 1 > "${outdir}/vmstat.log" 2>&1 & + VMSTAT_PID=3D$! + run_cmd iostat -x 1 > "${outdir}/iostat.log" 2>&1 & + IOSTAT_PID=3D$! + (while true; do + echo "=3D=3D=3D $(date '+%s') =3D=3D=3D" + cat /proc/meminfo + sleep 1 + done) > "${outdir}/meminfo.log" 2>&1 & + MEMINFO_PID=3D$! +} + +stop_monitors() { + log "Stopping monitors" + kill "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true + wait "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true +} + +cleanup_test_files() { + local filepath=3D"${TEST_DIR}/$1" + log "Cleaning up $filepath" + run_cmd rm -f "$filepath" +} + +# Run a single fio benchmark +run_fio() { + local job_file=3D$1 + local outdir=3D$2 + local filename=3D$3 + local fio_size=3D${4:-$SIZE} + local keep=3D${5:-} + local extra_args=3D${6:-} + local uncached=3D${7:-0} + + # Inject uncached=3DN into the job file if needed + local actual_job + actual_job=3D$(make_job_file "$job_file" "$uncached") + + local job_name + job_name=3D$(basename "$job_file" .fio) + + log "Running fio job: $job_name -> $outdir (file=3D${TEST_DIR}/$filename = size=3D$fio_size)" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + # Skip perf lock profiling for precreate/setup runs + [ "$keep" !=3D "keep" ] && start_perf_lock "$outdir" + + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$actual_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/${job_name}.json" \ + --filename=3D"${TEST_DIR}/$filename" \ + --size=3D"$fio_size" \ + $extra_args + + [ "$keep" !=3D "keep" ] && stop_perf_lock "$outdir" + stop_monitors + log "Finished: $job_name" + + # Clean up temp job file if one was created + [ "$actual_job" !=3D "$job_file" ] && rm -f "$actual_job" + + if [ "$keep" !=3D "keep" ]; then + cleanup_test_files "$filename" + fi +} + +######################################################################## +# Preflight +######################################################################## +preflight() { + log "=3D=3D=3D Preflight checks =3D=3D=3D" + + if ! command -v "$FIO_BIN" &>/dev/null; then + echo "ERROR: fio not found at $FIO_BIN" + exit 1 + fi + + if [ ! -d "$TEST_DIR" ]; then + echo "ERROR: Test directory $TEST_DIR does not exist" + exit 1 + fi + + # Quick check that RWF_DONTCACHE works on this filesystem + local testfile=3D"${TEST_DIR}/.dontcache_test" + if ! "$FIO_BIN" --name=3Dtest --ioengine=3Dio_uring --rw=3Dwrite \ + --bs=3D4k --size=3D4k --direct=3D0 --uncached=3D1 \ + --filename=3D"$testfile" 2>/dev/null; then + echo "WARNING: RWF_DONTCACHE may not be supported on $TEST_DIR" + echo " (filesystem must support FOP_DONTCACHE)" + fi + rm -f "$testfile" + + log "Test directory: $TEST_DIR" + log "File size: $SIZE" + log "fio binary: $FIO_BIN" + log "Results: $RESULTS_DIR" + + # Record system info + { + echo "Timestamp: $(date +%Y%m%d-%H%M%S)" + echo "Kernel: $(uname -r)" + echo "Hostname: $(hostname)" + echo "Filesystem: $(df -T "$TEST_DIR" | tail -1 | awk '{print $2}')" + echo "File size: $SIZE" + echo "Test dir: $TEST_DIR" + } > "${RESULTS_DIR}/sysinfo.txt" +} + +######################################################################## +# Deliverable 1: Single-client benchmarks +######################################################################## +run_deliverable1() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 1: Single-client benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Sequential write + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-write/${mname}" \ + "seq-write_testfile" "$SIZE" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Random write + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/rand-write.fio" \ + "${RESULTS_DIR}/rand-write/${mname}" \ + "rand-write_testfile" "$SIZE" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Sequential read =E2=80=94 pre-create file, then read with each mode + log "Pre-creating sequential read test file" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-read/precreate" \ + "seq-read_testfile" "$SIZE" "keep" + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local fio_args + fio_args=3D$(mode_fio_args $rmode) + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + run_fio "${FIO_JOBS_DIR}/seq-read.fio" \ + "${RESULTS_DIR}/seq-read/${mname}" \ + "seq-read_testfile" "$SIZE" "$keep" "$fio_args" \ + "$(mode_uncached $rmode)" + done + + # Random read =E2=80=94 pre-create file, then read with each mode + log "Pre-creating random read test file" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/rand-read/precreate" \ + "rand-read_testfile" "$SIZE" "keep" + + for rmode in $MODES; do + local mname + mname=3D$(mode_name $rmode) + local fio_args + fio_args=3D$(mode_fio_args $rmode) + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + run_fio "${FIO_JOBS_DIR}/rand-read.fio" \ + "${RESULTS_DIR}/rand-read/${mname}" \ + "rand-read_testfile" "$SIZE" "$keep" "$fio_args" \ + "$(mode_uncached $rmode)" + done +} + +######################################################################## +# Deliverable 2: Multi-client tests +######################################################################## +run_deliverable2() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 2: Noisy-neighbor benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + local num_clients=3D4 + local client_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + client_size=3D"$(( mem_kb / 1024 / num_clients ))M" + + # Scenario A: Multiple writers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${RESULTS_DIR}/multi-write/${mname}" \ + "multi-write_testfile" "$client_size" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Scenario C: Noisy writer + latency-sensitive readers + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + local uncached + uncached=3D$(mode_uncached $mode) + local writer_job + writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" "$uncach= ed") + local reader_job + reader_job=3D$(make_job_file "${FIO_JOBS_DIR}/lat-reader.fio" "$uncached= ") + local outdir=3D"${RESULTS_DIR}/noisy-neighbor/${mname}" + mkdir -p "$outdir" + + # Pre-create read files + for i in $(seq 1 $(( num_clients - 1 ))); do + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}_readfile" \ + "512M" "keep" + done + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Noisy writer + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"${TEST_DIR}/bulk_testfile" \ + --size=3D"$SIZE" \ + $fio_args & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$reader_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"${TEST_DIR}/reader${i}_readfile" \ + --size=3D"512M" \ + $fio_args & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$wri= ter_job" + [ "$reader_job" !=3D "${FIO_JOBS_DIR}/lat-reader.fio" ] && rm -f "$reade= r_job" + cleanup_test_files "bulk_testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}_readfile" + done + done + + # Scenario D: Mixed-mode noisy neighbor + # dontcache writes + buffered reads + local outdir=3D"${RESULTS_DIR}/noisy-neighbor-mixed/dontcache-w_buffered-= r" + mkdir -p "$outdir" + local writer_job + writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" 1) + + for i in $(seq 1 $(( num_clients - 1 ))); do + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}_readfile" \ + "512M" "keep" + done + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Writer with dontcache + run_cmd "$FIO_BIN" "$writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"${TEST_DIR}/bulk_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + local writer_pid=3D$! + + # Readers with buffered (no uncached flag) + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd "$FIO_BIN" "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"${TEST_DIR}/reader${i}_readfile" \ + --size=3D"512M" \ + --direct=3D0 & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$writ= er_job" + cleanup_test_files "bulk_testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}_readfile" + done + + # Scenario E: Competing writers (dontcache vs buffered on separate files) + # This tests whether the dontcache flusher kick interferes with a + # normal buffered writer sharing the same backing device. + log "--- Scenario E: Competing writers (separate files) ---" + + # Sub-scenario: dontcache writer vs buffered writer + local outdir=3D"${RESULTS_DIR}/competing-writers/dontcache-vs-buffered" + mkdir -p "$outdir" + local dc_writer_job + dc_writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" 1) + + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Writer A: dontcache + run_cmd "$FIO_BIN" "$dc_writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/writer_dontcache.json" \ + --filename=3D"${TEST_DIR}/writer_a_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + local writer_a_pid=3D$! + + # Writer B: buffered + run_cmd "$FIO_BIN" "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/writer_buffered.json" \ + --filename=3D"${TEST_DIR}/writer_b_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + local writer_b_pid=3D$! + + local rc=3D0 + wait "$writer_a_pid" || rc=3D$? + wait "$writer_b_pid" || rc=3D$? + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$dc_writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$d= c_writer_job" + cleanup_test_files "writer_a_testfile" + cleanup_test_files "writer_b_testfile" + + # Sub-scenario: buffered writer vs buffered writer (baseline) + outdir=3D"${RESULTS_DIR}/competing-writers/buffered-vs-buffered" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + start_perf_lock "$outdir" + + # Writer A: buffered + run_cmd "$FIO_BIN" "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/writer_a.json" \ + --filename=3D"${TEST_DIR}/writer_a_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + writer_a_pid=3D$! + + # Writer B: buffered + run_cmd "$FIO_BIN" "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/writer_b.json" \ + --filename=3D"${TEST_DIR}/writer_b_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + writer_b_pid=3D$! + + rc=3D0 + wait "$writer_a_pid" || rc=3D$? + wait "$writer_b_pid" || rc=3D$? + + stop_perf_lock "$outdir" + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + cleanup_test_files "writer_a_testfile" + cleanup_test_files "writer_b_testfile" +} + +######################################################################## +# Deliverable 3: Axboe 32-file write benchmark +######################################################################## +run_deliverable3() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 3: 32-file write (Axboe test)" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Per-file size: 2x RAM / 32 so total written exceeds RAM + local per_file_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + per_file_size=3D"$(( mem_kb * 2 / 1024 / 32 ))M" + + for mode in $MODES; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/axboe-write.fio" \ + "${RESULTS_DIR}/axboe-write/${mname}" \ + "axboe-write_testfile" "$per_file_size" "" "$fio_args" \ + "$(mode_uncached $mode)" + done +} + +######################################################################## +# Main +######################################################################## +preflight +run_deliverable1 +run_deliverable2 +run_deliverable3 + +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +log "All benchmarks complete." +log "Results in: $RESULTS_DIR" +log "Parse with: scripts/parse-results.sh $RESULTS_DIR" +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" --=20 2.54.0