From nobody Sun Jun 14 08:15:44 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F1D931E822; Wed, 1 Apr 2026 19:11:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070680; cv=none; b=e16rmGk+Y125mGkCH4y/imOLRr26IFfhS033pgeDENvhOFbac/Nh6BmmZtTjyt5+Y7Is8FAMqejCmOqnq1GG3AjVdHowngLqwe7qPvagJtM2RAC32YQi62P6Gy/2YjsV5xSLa2EYOslVhiJBz7M8qBiGvyuwMzMkrb9aKuJWKFQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070680; c=relaxed/simple; bh=bOe16hkUqbqlZCJU8w30Z5j55Ar6Gej61o9MPI+OrvA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=N9n1W6UaxK3SJARl+oXyCpyXnCI3LhDP71hmcJbZyJPkaIil1l9Xhrm9leEhVrSh3iiNVe38Oe10xbQoTifO06OmhrDapLpGrP8rUazPYXFpwf8vfwOqLgiLuxg7PkQkpVy1iVYuJYSzejwYU83EqzIj21+K6WAZGw4WhNpNd3E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MsFz6DVr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MsFz6DVr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 18ED2C2BCB2; Wed, 1 Apr 2026 19:11:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775070680; bh=bOe16hkUqbqlZCJU8w30Z5j55Ar6Gej61o9MPI+OrvA=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=MsFz6DVr/Ad+viBQxt7EKPRr3hWIB+yuzGZEntIDplLs3LBGri8Xl8j1CJU4brkSm FiS3M3355DIrRglMNdt2+q33GkdPVfLqMY8afHJ9EhxSCLPa4JAKHcrQQeQJvv1SWr e0VkGtKq61zGX/umvM1kOgzoPH+iCzFOSrafUnkSIoRcYCGAoEaL+IBSeA/OjD8ii7 8HJRTQbOc+nGQn2kxq/uuKBJVnR9n1R8t1dtx586spadqAHtsgZ2u7KrJnFoIarVin 323HmyWIQY93UihThmi3VXjSvCaadaNOe/80lj4qR7kj9os0ScD1e4UXSHHkLOTFnk wRET25BGoJgMw== From: Jeff Layton Date: Wed, 01 Apr 2026 15:10:58 -0400 Subject: [PATCH 1/4] mm: fix IOCB_DONTCACHE write performance with rate-limited writeback Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260401-dontcache-v1-1-1f5746fab47a@kernel.org> References: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=4439; i=jlayton@kernel.org; h=from:subject:message-id; bh=bOe16hkUqbqlZCJU8w30Z5j55Ar6Gej61o9MPI+OrvA=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBpzW3TYYBGxB56wbiSNOObN6S4exPYnQ+w1A6S3 USLKZ4d5smJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCac1t0wAKCRAADmhBGVaC FSj/EADCTOcmKwq7ER+CHpVYKKkKxZKs6LtlI+SGxgJR7bR2Ytq1OHRCgdxX/mFBolJ3jWmFGCo UN5zb8G7MsLL7+RA3wZch+lVZ+HtrT8VK33trWUwWiZ3/Y0pT8/szAL6kjkKF/YyEUxjn0ps2IM 3bJd1f6fPRKwsnk0X7x/rhvRocAol7QnLsRgZBMeQ7+j+ZF9RpYhKf1J8bN4sdEkYdetphsVqaV 5nNKaCKmGRfA7q7sl9RgeBLHr68nZpA0aFhZoKO10UvVqVp+TgBVcvLTHOiI9qx3pxU3ZH4lhtm QHAbpLt2FFphXNir6sLBoR5sJW1u/2ScCVxj5f+AJHxA0ha0jHhQMkeYgipgPITocCDghBr9N1g Xlilb2Vwf6YqvVKZiQE4+B2U7XDKGz6XcGTPntUzVCWKFBrzzrtr0gBXKsn7eCagRYhucoo4F1/ UkjNW6of6HjuRG0XcSVvEpQBfD/d6rcV9zm7K7IUUaS/QNLrQWqR6yT+1EWoBTHj4NxfPorpvp0 dH8Z/31MrgFT1mPQJxvuqmwQxDtZrHe6zz50HF8RVlC1SFd6eAQKnUHcUNW+RLq+kHlydHjEdIR WrvYJtr23sc2IK3WkYXB1lRgpoSya7nuyslzkCuySX1bpQstObmlZ6YrYycq0UkSvXazf0nJRz7 UhUIwXTGWPBRjoA== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 IOCB_DONTCACHE calls filemap_flush_range() with nr_to_write=3DLONG_MAX on every write, which flushes all dirty pages in the written range. Under concurrent writers this creates severe serialization on the writeback submission path, causing throughput to collapse to ~47% of buffered I/O with multi-second tail latency. Even single-client sequential writes suffer: on a 512GB file with 256GB RAM, the aggressive flushing triggers dirty throttling that limits throughput to 575 MB/s vs 1442 MB/s with rate-limited writeback. Replace the filemap_flush_range() call in generic_write_sync() with a new filemap_dontcache_writeback_range() that uses two rate-limiting mechanisms: 1. Skip-if-busy: check mapping_tagged(PAGECACHE_TAG_WRITEBACK) before flushing. If writeback is already in progress on the mapping, skip the flush entirely. This eliminates writeback submission contention between concurrent writers. 2. Proportional cap: when flushing does occur, cap nr_to_write to the number of pages just written. This prevents any single write from triggering a large flush that would starve concurrent readers. Both mechanisms are necessary: skip-if-busy alone causes I/O bursts when the tag clears (reader p99.9 spikes 83x); proportional cap alone still serializes on xarray locks regardless of submission size. Pages touched under IOCB_DONTCACHE continue to be marked for eviction (dropbehind), so page cache usage remains bounded. Ranges skipped by the busy check are eventually flushed by background writeback or by the next writer to find the tag clear. Signed-off-by: Jeff Layton --- include/linux/fs.h | 7 +++++-- mm/filemap.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 8b3dd145b25ec12b00ac1df17a952d9116b88047..53e9cca1b50a946a1276c499022= 94c3ae0ab3500 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2610,6 +2610,8 @@ extern int __must_check file_write_and_wait_range(str= uct file *file, loff_t start, loff_t end); int filemap_flush_range(struct address_space *mapping, loff_t start, loff_t end); +int filemap_dontcache_writeback_range(struct address_space *mapping, + loff_t start, loff_t end, ssize_t nr_written); =20 static inline int file_write_and_wait(struct file *file) { @@ -2645,8 +2647,9 @@ static inline ssize_t generic_write_sync(struct kiocb= *iocb, ssize_t count) } else if (iocb->ki_flags & IOCB_DONTCACHE) { struct address_space *mapping =3D iocb->ki_filp->f_mapping; =20 - filemap_flush_range(mapping, iocb->ki_pos - count, - iocb->ki_pos - 1); + filemap_dontcache_writeback_range(mapping, + iocb->ki_pos - count, + iocb->ki_pos - 1, count); } =20 return count; diff --git a/mm/filemap.c b/mm/filemap.c index 406cef06b684a84a1e0c27d8267e95f32282ffdc..af2024b736bef74571cc22ab7e3= cde2c8e872efe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -437,6 +437,35 @@ int filemap_flush_range(struct address_space *mapping,= loff_t start, } EXPORT_SYMBOL_GPL(filemap_flush_range); =20 +/** + * filemap_dontcache_writeback_range - rate-limited writeback for dontcach= e I/O + * @mapping: target address_space + * @start: byte offset to start writeback + * @end: last byte offset (inclusive) for writeback + * @nr_written: number of bytes just written by the caller + * + * Rate-limited writeback for IOCB_DONTCACHE writes. Skips the flush + * entirely if writeback is already in progress on the mapping (skip-if-bu= sy), + * and when flushing, caps nr_to_write to the number of pages just written + * (proportional cap). Together these avoid writeback contention between + * concurrent writers and prevent I/O bursts that starve readers. + * + * Return: %0 on success, negative error code otherwise. + */ +int filemap_dontcache_writeback_range(struct address_space *mapping, + loff_t start, loff_t end, ssize_t nr_written) +{ + long nr; + + if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) + return 0; + + nr =3D (nr_written + PAGE_SIZE - 1) >> PAGE_SHIFT; + return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr, + WB_REASON_BACKGROUND); +} +EXPORT_SYMBOL_GPL(filemap_dontcache_writeback_range); + /** * filemap_flush - mostly a non-blocking flush * @mapping: target address_space --=20 2.53.0 From nobody Sun Jun 14 08:15:44 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3987633F8C3; Wed, 1 Apr 2026 19:11:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070682; cv=none; b=u4IlPW/vENbJlNCufvOOf/Da+c7Ly4XTKNQbbBsT0hLhDJS+UeO+X+2UXLdcSx4iec6SgVaXM1+7wOTAhUfu2KFV7sraeV/z9LrLiJN/XJZni2FP6G4JK5rHIH/zhVH+f9iUaBoDWKqQBma1QqKxj5SG1CFECHHpix7ONLjnBWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070682; c=relaxed/simple; bh=n7l9LGVOvDCd2+hzVMt9/rX50u2r86z4dGB9ckGBacs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=VzWDsr4UTpSX+BTUcVSS2Ukix7AX0ZkLmfENfVHUXq1hIpJ2iu7spi0mbOYKMD7rNvQkPFhS0ZEsKDUzQ3ilALPMlG815ADAjf+yhbUfH/mLn/2pD1MYMxXmhVEhAECiITJMwRw3USFE5qFvcauIa1npsi94b+cm3ie42SKVQDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SUnJBnGa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SUnJBnGa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 47E65C2BCB0; Wed, 1 Apr 2026 19:11:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775070681; bh=n7l9LGVOvDCd2+hzVMt9/rX50u2r86z4dGB9ckGBacs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=SUnJBnGaMXzuGq2jfYs5fyq28qNIXelBnO41hoV9B/TDNXaFNLeaPBA2u3IMD422G Ef6b8Zmsw7GAGQcn7TyhEIdCdl6Jn0ONVBWJw8mFzvClmoMfSUuUTRHcZgndSQjL7j 6Ix4yrbDHn/OGZLU/GQng+CtK6HHtIe1VsTa9haE1EululVtYpLj3/SzuMcZ7FCPlD JzWsYwWE4BWEDLVp6nesvMuvbK3TlxevVn1b1OItdbzsPmXSXQGwe+atQUtZdQ728k l9Fn9FKi6h6StO7mevqc6+hID5bh/R+NxiUbjVVkfjkQBRAA+yCWJx0IlRu5jmJFQO zghBTtxY4mTXQ== From: Jeff Layton Date: Wed, 01 Apr 2026 15:10:59 -0400 Subject: [PATCH 2/4] mm: add atomic flush guard for IOCB_DONTCACHE writeback Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260401-dontcache-v1-2-1f5746fab47a@kernel.org> References: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=4833; i=jlayton@kernel.org; h=from:subject:message-id; bh=n7l9LGVOvDCd2+hzVMt9/rX50u2r86z4dGB9ckGBacs=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBpzW3T1yRFo42zG4qubsASRC7nh36KGdmLOG0+u SYX0QTMzu+JAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCac1t0wAKCRAADmhBGVaC FXRND/wLxlUsyChusdT+tMPyQlfIoRYVZSMnojtBnUxpk4ci5Ejisdqd31zG5v6L9d8nO2CqZ/7 XOkPlNuHLyCnnyvsNQfUoazU5/RyJKEmCUpAlVYl1p23ZACTrxj4dFv0rLIGuUf3i1hZU79WZ2V RZbFm0iLhHUS4IUYslnQm8Ybyn3AY1WtsM7n8aKqzF3ojSAZScFAkuiybXbkTL9J3x64/9e0Bsl e1NS+ojwLWQvZSy/D/dSYz0A6HuW81amiFk0cruWpqIBHIqpIDTzYXeXdqLbymCEYQV2yYdCGPN L4utfged3Ib93dMnGPE1gq1hP8Xc09bDojftaQ0Wjn2zJ//axjaMpLVZb/Auo2vfxY9h8RHWYmK mhJM6oR3tjgTf3FnUBoOqN5PIMObMl3fb6/VOZ3MybiSyx8jFCnR77b0+G2RLsuZhIqgGXjrhye 2sQWYaohXVjIXg410jVNstXP8tO8z+UjNAY35AMkBuW5Od5sE5RMR1f6OCu9dGppesvkmHoDHzP 3PlLqZOh8CjjS0gxnwq4jp2VWqSO0fr1ciFGtVF9CNytB/iIzXXhns1IJMTP0sbquNuM17N/HRt EIzcNuT3AxXLENvanZOvvzU9ooef3c62RM6qckIl2xdtB4wMoDEA/CaTKcji10x5zdyijBa7iiw p58BsgyOdBCCZsQ== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 When the PAGECACHE_TAG_WRITEBACK tag clears after a round of writeback completes, all concurrent IOCB_DONTCACHE writers see the tag clear simultaneously and submit proportional flushes at once =E2=80=94 a thunderi= ng herd that causes p99.9 tail latency spikes. Add an AS_DONTCACHE_FLUSHING flag to the address_space and use test_and_set_bit() to ensure at most one IOCB_DONTCACHE writer flushes at a time. Other writers that find the bit set skip their flush entirely. The bit is cleared when the flush completes. Together with the existing skip-if-busy check on PAGECACHE_TAG_WRITEBACK (which provides temporal rate limiting by skipping flushes while prior writeback is still draining), this creates a two-level guard: the writeback tag paces flush frequency to match device speed, while the atomic flag prevents the thundering herd at tag-clear transitions. Additionally, add a dirty pressure escape hatch: when dirty pages exceed 75% of the dirty_ratio threshold, bypass the WRITEBACK tag skip and attempt to flush anyway. Under heavy multi-writer load, the skip-if-busy check can cause dirty pages to accumulate (most writers skip because writeback is always in progress), eventually triggering balance_dirty_pages() throttling with severe tail latency. By forcing extra flushes when dirty pressure is high, dontcache writers help drain dirty pages before the throttle threshold is hit. Signed-off-by: Jeff Layton --- include/linux/pagemap.h | 1 + mm/filemap.c | 36 +++++++++++++++++++++++++++++------- 2 files changed, 30 insertions(+), 7 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9d9850d37185418349b89e6efe420..e71bf75f6c22d0da5330c17c6e5= 25cb12d254dfe 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -210,6 +210,7 @@ enum mapping_flags { AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM =3D 9, AS_KERNEL_FILE =3D 10, /* mapping for a fake kernel file that shouldn't account usage to user cgroups */ + AS_DONTCACHE_FLUSHING =3D 11, /* dontcache writeback in progress */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS =3D 5, AS_FOLIO_ORDER_MIN =3D 16, diff --git a/mm/filemap.c b/mm/filemap.c index af2024b736bef74571cc22ab7e3cde2c8e872efe..1b5577bd4eda8ad8ee182e58acd= 50d99f0a8f9f5 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -444,11 +444,21 @@ EXPORT_SYMBOL_GPL(filemap_flush_range); * @end: last byte offset (inclusive) for writeback * @nr_written: number of bytes just written by the caller * - * Rate-limited writeback for IOCB_DONTCACHE writes. Skips the flush - * entirely if writeback is already in progress on the mapping (skip-if-bu= sy), - * and when flushing, caps nr_to_write to the number of pages just written - * (proportional cap). Together these avoid writeback contention between - * concurrent writers and prevent I/O bursts that starve readers. + * Rate-limited writeback for IOCB_DONTCACHE writes. Uses three guards to + * avoid writeback contention between concurrent writers: + * + * 1. Skip-if-busy: if writeback is already in progress on the mapping + * (PAGECACHE_TAG_WRITEBACK set), skip the flush =E2=80=94 unless dirt= y pages + * are approaching the dirty_ratio threshold, in which case flush anyw= ay + * to help drain before balance_dirty_pages() throttles all writers. + * + * 2. Atomic flush guard: use test_and_set_bit(AS_DONTCACHE_FLUSHING) so + * that at most one dontcache writer flushes at a time, preventing a + * thundering herd when the writeback tag clears and multiple writers + * try to flush simultaneously. + * + * 3. Proportional cap: cap nr_to_write to the number of pages just writt= en, + * preventing any single flush from starving concurrent readers. * * Return: %0 on success, negative error code otherwise. */ @@ -456,13 +466,25 @@ int filemap_dontcache_writeback_range(struct address_= space *mapping, loff_t start, loff_t end, ssize_t nr_written) { long nr; + int ret; + + if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) { + unsigned long thresh, bg_thresh, dirty; =20 - if (mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) + global_dirty_limits(&bg_thresh, &thresh); + dirty =3D global_node_page_state(NR_FILE_DIRTY); + if (dirty < thresh * 3 / 4) + return 0; + } + + if (test_and_set_bit(AS_DONTCACHE_FLUSHING, &mapping->flags)) return 0; =20 nr =3D (nr_written + PAGE_SIZE - 1) >> PAGE_SHIFT; - return filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr, + ret =3D filemap_writeback(mapping, start, end, WB_SYNC_NONE, &nr, WB_REASON_BACKGROUND); + clear_bit(AS_DONTCACHE_FLUSHING, &mapping->flags); + return ret; } EXPORT_SYMBOL_GPL(filemap_dontcache_writeback_range); =20 --=20 2.53.0 From nobody Sun Jun 14 08:15:44 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 445933F0A91; Wed, 1 Apr 2026 19:11:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070684; cv=none; b=hLtC7tuvteNC1T9j525r7GbNj8mt9KCs5hjTO6atz0zt7+G12QhhgaGnEynMPyVpS67yhuX/uqpgu7eorCFrffFqtrYyvJxwvZCa9+EioPiWSzDMXe3oaMm2lOtSWcgSIBLb9I+b7k10m/lQN+NkOaUVfrSx9PJYAyAZrF6z18U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070684; c=relaxed/simple; bh=9C3Z1W8bME79bW1D+Vp8hGXV9sfT5cPsHTKIfuAOmtA=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MRJpjb2dnYGUXldyLxCCknA370tz65NEVt3Cq23OL23Kx/hXuzhu0nd8IihgQu/Rp/v/CauTHL75XN/l1NgrOZrfOK602bJ2yT/zfTnFtlkcjuo2Y4qo2X+4wA/wL0OuXYhNYt83pvedBcnyOciUudkmG5306DsQc371IY0He0c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kaZF+3PO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kaZF+3PO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1E155C4CEF7; Wed, 1 Apr 2026 19:11:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775070683; bh=9C3Z1W8bME79bW1D+Vp8hGXV9sfT5cPsHTKIfuAOmtA=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=kaZF+3PO1Jnu3l4mWmm3vGmowPCYCyTE2anTpbvtkdF/eQ6AgC3/b3KOyn53Cw2Pv q1+27jEyYKO1o1LQOVVbpv5uPnNpNm8EbCZbGngNwqupfZHJl5huyraJ3AuBMeeEmi 93JqRKjOE+f/W0Y7s533ZrRUfiQNOCfke39SX83DU96KDxu3XLKCV7pN8tBJ+ITZDm fbMPm+goL42L0vAtfmEj59fJ2j87dugQ6vNU6jgbjjz5r28FuhO4EFq7L7ncm3EBkh 5HXnQ1WUcA0A1s/jAeE2MZeSRlpCLT8qqb3tGYxE3OGoUo91S/VvALL+8HWjjK0Thw F4iqmD6gykqNA== From: Jeff Layton Date: Wed, 01 Apr 2026 15:11:00 -0400 Subject: [PATCH 3/4] testing: add nfsd-io-bench NFS server benchmark suite Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260401-dontcache-v1-3-1f5746fab47a@kernel.org> References: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=32221; i=jlayton@kernel.org; h=from:subject:message-id; bh=9C3Z1W8bME79bW1D+Vp8hGXV9sfT5cPsHTKIfuAOmtA=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBpzW3TGH4l5fhDtPMFILX1uxW+tFTMzas8HiGHH GvfbYbO4fKJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCac1t0wAKCRAADmhBGVaC FQe/EADUeoFz9/OI2umVCoEtq/aFC35uvnhS70Lt5CU5//c1+M8irfYYJW2Gjgb9WxNuI/xSxn7 3heowpwf0d0XHPSjCTrpwwCDej7XHuOO/JH2H7F9lXxxmL1lS5K9b9D01i8MVePB0ya7yHIhB0t JXDBvg1iOE7Xwus1hy9rQxeoDBMKD80UXqzmM2y72PvkJz5jW7XQb9s9QYlFs2M/HXfq82N28op TtPCYCa6gJLgrxliZ+8ZG1pzrBsU3AwFmCrynhcuaskW4xUPmZ9PTYM9RpvJtBWYrizLsfoeb8W 7HAI/AETcbFOfbT0X8wqfbt+C6Tlvdp/WJcHKbp/EZo6zP7dkBmmCPFUxm9hsRHT6A7VoyFHIne EEs8QJNJyPhSMxShO9E/QhZULGqf96Ymc6MHd4dIcsIg2zTNKaEU4bxFy0JiSJ18G3XQfoKkHCZ TFYHNnzy76TKJvBV9U3nq9y1kG0HJh1OXl9KLxa1JUYzoAv/sUvesQ/vP1lKCdRxj0035fK5Mod +ZD4kL8d4js1FTKHgPEzPI7P+6aO5qxMKzPpGEbkP5Cj8QFgCkHieN1Fb2F1KnZXJKRcWc9DiNo rtYKALaxzCYMihyd1Je1NjJ1u6vImBumdAMgxbW6T9CVNkaM8mVAiQ3QyGWhqHJ0nhXI6L3SrUn yM+w8BgDVgFgoIQ== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a benchmark suite for testing NFSD I/O mode performance using fio with the libnfs backend against an NFS server on localhost. Tests buffered, dontcache, and direct I/O modes via NFSD debugfs controls. Includes: - fio job files for sequential/random read/write, multi-writer, noisy-neighbor, and latency-sensitive reader workloads - run-benchmarks.sh: orchestrates test matrix with mode switching - parse-results.sh: extracts metrics from fio JSON output - setup-server.sh: configures NFS export for testing Signed-off-by: Jeff Layton --- .../testing/nfsd-io-bench/fio-jobs/lat-reader.fio | 15 + .../testing/nfsd-io-bench/fio-jobs/multi-write.fio | 14 + .../nfsd-io-bench/fio-jobs/noisy-writer.fio | 14 + tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio | 15 + .../testing/nfsd-io-bench/fio-jobs/rand-write.fio | 15 + tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio | 14 + tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio | 14 + .../testing/nfsd-io-bench/scripts/parse-results.sh | 238 +++++++++ .../nfsd-io-bench/scripts/run-benchmarks.sh | 543 +++++++++++++++++= ++++ .../testing/nfsd-io-bench/scripts/setup-server.sh | 94 ++++ 10 files changed, 976 insertions(+) diff --git a/tools/testing/nfsd-io-bench/fio-jobs/lat-reader.fio b/tools/te= sting/nfsd-io-bench/fio-jobs/lat-reader.fio new file mode 100644 index 0000000000000000000000000000000000000000..61af37e8b860bc3aa8b64e0a6e6= 8f7eb60ae2740 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/lat-reader.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D4k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Dlatreader +write_lat_log=3Dlatreader + +[lat_reader] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/multi-write.fio b/tools/t= esting/nfsd-io-bench/fio-jobs/multi-write.fio new file mode 100644 index 0000000000000000000000000000000000000000..16b792aecabbdfb4abb0c432593= 344352ed22ff6 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/multi-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dmultiwrite +write_lat_log=3Dmultiwrite + +[writer] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/noisy-writer.fio b/tools/= testing/nfsd-io-bench/fio-jobs/noisy-writer.fio new file mode 100644 index 0000000000000000000000000000000000000000..615154a7737e84308bcf4891dd2= 7e87aec43fea7 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/noisy-writer.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dnoisywriter +write_lat_log=3Dnoisywriter + +[bulk_writer] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio b/tools/tes= ting/nfsd-io-bench/fio-jobs/rand-read.fio new file mode 100644 index 0000000000000000000000000000000000000000..501bae7416a8ba514e4166469e6= 0c89e48a5fc20 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/rand-read.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D4k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Drandread +write_lat_log=3Drandread + +[randread] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/rand-write.fio b/tools/te= sting/nfsd-io-bench/fio-jobs/rand-write.fio new file mode 100644 index 0000000000000000000000000000000000000000..d891d04197aead906895031a9ab= 0ecdc86a85d58 --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/rand-write.fio @@ -0,0 +1,15 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D64k +numjobs=3D16 +runtime=3D300 +time_based=3D1 +group_reporting=3D1 +rw=3Drandwrite +log_avg_msec=3D1000 +write_bw_log=3Drandwrite +write_lat_log=3Drandwrite + +[randwrite] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio b/tools/test= ing/nfsd-io-bench/fio-jobs/seq-read.fio new file mode 100644 index 0000000000000000000000000000000000000000..6e24ab355026a243fac47ace8c6= da7967550cf9a --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/seq-read.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dseqread +write_lat_log=3Dseqread + +[seqread] diff --git a/tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio b/tools/tes= ting/nfsd-io-bench/fio-jobs/seq-write.fio new file mode 100644 index 0000000000000000000000000000000000000000..260858e345f5aaea239a7904089= c5111aa350ccb --- /dev/null +++ b/tools/testing/nfsd-io-bench/fio-jobs/seq-write.fio @@ -0,0 +1,14 @@ +[global] +ioengine=3Dnfs +nfs_url=3Dnfs://localhost/export +direct=3D0 +bs=3D1M +numjobs=3D16 +time_based=3D0 +group_reporting=3D1 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dseqwrite +write_lat_log=3Dseqwrite + +[seqwrite] diff --git a/tools/testing/nfsd-io-bench/scripts/parse-results.sh b/tools/t= esting/nfsd-io-bench/scripts/parse-results.sh new file mode 100755 index 0000000000000000000000000000000000000000..0427d411db04903a5d950675169= 5d9452b011e6a --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/parse-results.sh @@ -0,0 +1,238 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Parse fio JSON output and generate comparison tables. +# +# Usage: ./parse-results.sh + +set -euo pipefail + +if [ $# -lt 1 ]; then + echo "Usage: $0 " + exit 1 +fi + +RESULTS_DIR=3D"$1" + +if ! command -v jq &>/dev/null; then + echo "ERROR: jq is required" + exit 1 +fi + +# Extract metrics from a single fio JSON result +extract_metrics() { + local json_file=3D$1 + local rw_type=3D$2 # read or write + + if [ ! -f "$json_file" ]; then + echo "N/A N/A N/A N/A N/A N/A" + return + fi + + jq -r --arg rw "$rw_type" ' + .jobs[0][$rw] as $d | + [ + (($d.bw // 0) / 1024 | . * 10 | round / 10), # MB/s + ($d.iops // 0), # IOPS + ((($d.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), # avg lat us + (($d.clat_ns.percentile["50.000000"] // 0) / 1000), # p50 us + (($d.clat_ns.percentile["99.000000"] // 0) / 1000), # p99 us + (($d.clat_ns.percentile["99.900000"] // 0) / 1000) # p99.9 us + ] | @tsv + ' "$json_file" 2>/dev/null || echo "N/A N/A N/A N/A N/A N/A" +} + +# Extract server CPU from vmstat log (average sys%) +extract_cpu() { + local vmstat_log=3D$1 + if [ ! -f "$vmstat_log" ]; then + echo "N/A" + return + fi + # vmstat columns: us sy id wa st =E2=80=94 skip header lines + awk 'NR>2 {sum+=3D$14; n++} END {if(n>0) printf "%.1f", sum/n; else print= "N/A"}' \ + "$vmstat_log" 2>/dev/null || echo "N/A" +} + +# Extract peak dirty pages from meminfo log +extract_peak_dirty() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Dirty:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || e= cho "N/A" +} + +# Extract peak cached from meminfo log +extract_peak_cached() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Cached:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || = echo "N/A" +} + +print_separator() { + printf '%*s\n' 120 '' | tr ' ' '-' +} + +######################################################################## +# Deliverable 1: Single-client results +######################################################################## +echo "" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 1: Single-Client fio Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for workload in seq-write rand-write seq-read rand-read; do + case $workload in + seq-write|rand-write) rw_type=3D"write" ;; + seq-read|rand-read) rw_type=3D"read" ;; + esac + + echo "--- $workload ---" + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "Mode" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" "Sys CPU%= " "PeakDirty(kB)" "PeakCache(kB)" + print_separator + + for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/${workload}/${mode}" + json_file=3D$(find "$dir" -name '*.json' -not -name 'client*' 2>/dev/nul= l | head -1 || true) + if [ -z "$json_file" ]; then + printf "%-16s %10s\n" "$mode" "(no data)" + continue + fi + + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "$rw_type")" + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "$mode" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + done + echo "" +done + +######################################################################## +# Deliverable 2: Multi-client results +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 2: Noisy-Neighbor Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +# Scenario A: Multiple writers +echo "--- Scenario A: Multiple Writers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/multi-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "Client" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + total_bw=3D0 + count=3D0 + for json_file in "${dir}"/client*.json; do + [ -f "$json_file" ] || continue + client=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "write")" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "$client" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + total_bw=3D$(echo "$total_bw + ${mbps:-0}" | bc 2>/dev/null || echo "$to= tal_bw") + count=3D$(( count + 1 )) + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario C: Noisy neighbor +echo "--- Scenario C: Noisy Writer + Latency-Sensitive Readers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/noisy-neighbor/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario D: Mixed-mode noisy neighbor +echo "--- Scenario D: Mixed-Mode Noisy Writer + Readers ---" +for dir in "${RESULTS_DIR}"/noisy-neighbor-mixed/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " System Info" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +if [ -f "${RESULTS_DIR}/sysinfo.txt" ]; then + head -6 "${RESULTS_DIR}/sysinfo.txt" +fi +echo "" diff --git a/tools/testing/nfsd-io-bench/scripts/run-benchmarks.sh b/tools/= testing/nfsd-io-bench/scripts/run-benchmarks.sh new file mode 100755 index 0000000000000000000000000000000000000000..4b15900cc20f762955e121ccad9= 85f8f47cb1007 --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/run-benchmarks.sh @@ -0,0 +1,543 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# NFS server I/O mode benchmark suite +# +# Runs fio with the NFS ioengine against an NFS server on localhost, +# testing buffered, dontcache, and direct I/O modes. +# +# Usage: ./run-benchmarks.sh [OPTIONS] +# +# Options: +# -e EXPORT_PATH Server export path (default: /export) +# -s SIZE fio file size, should be >=3D 2x RAM (default: auto-d= etect) +# -r RESULTS_DIR Where to store results (default: ./results) +# -n NFS_VER NFS version: 3 or 4 (default: 3) +# -j FIO_JOBS_DIR Path to fio job files (default: ../fio-jobs) +# -d Dry run: print commands without executing +# -h Show this help + +set -euo pipefail + +# Defaults +EXPORT_PATH=3D"/export" +SIZE=3D"" +RESULTS_DIR=3D"./results" +NFS_VER=3D3 +SCRIPT_DIR=3D"$(cd "$(dirname "$0")" && pwd)" +FIO_JOBS_DIR=3D"${SCRIPT_DIR}/../fio-jobs" +DRY_RUN=3D0 + +DEBUGFS_BASE=3D"/sys/kernel/debug/nfsd" +IO_CACHE_READ=3D"${DEBUGFS_BASE}/io_cache_read" +IO_CACHE_WRITE=3D"${DEBUGFS_BASE}/io_cache_write" +DISABLE_SPLICE=3D"${DEBUGFS_BASE}/disable-splice-read" + +usage() { + echo "Usage: $0 [OPTIONS]" + echo " -e EXPORT_PATH Server export path (default: /export)" + echo " -s SIZE fio file size (default: 2x RAM)" + echo " -r RESULTS_DIR Results directory (default: ./results)" + echo " -n NFS_VER NFS version: 3 or 4 (default: 3)" + echo " -j FIO_JOBS_DIR Path to fio job files" + echo " -d Dry run" + echo " -h Help" + exit 1 +} + +while getopts "e:s:r:n:j:dh" opt; do + case $opt in + e) EXPORT_PATH=3D"$OPTARG" ;; + s) SIZE=3D"$OPTARG" ;; + r) RESULTS_DIR=3D"$OPTARG" ;; + n) NFS_VER=3D"$OPTARG" ;; + j) FIO_JOBS_DIR=3D"$OPTARG" ;; + d) DRY_RUN=3D1 ;; + h) usage ;; + *) usage ;; + esac +done + +# Auto-detect size: 2x total RAM +if [ -z "$SIZE" ]; then + MEM_KB=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + MEM_GB=3D$(( MEM_KB / 1024 / 1024 )) + SIZE=3D"$(( MEM_GB * 2 ))G" + echo "Auto-detected RAM: ${MEM_GB}G, using file size: ${SIZE}" +fi + + +log() { + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" +} + +run_cmd() { + if [ "$DRY_RUN" -eq 1 ]; then + echo " [DRY RUN] $*" + else + "$@" + fi +} + +# Preflight checks +preflight() { + log "=3D=3D=3D Preflight checks =3D=3D=3D" + + if ! command -v fio &>/dev/null; then + echo "ERROR: fio not found in PATH" + exit 1 + fi + + # Check fio has nfs ioengine + if ! fio --enghelp=3Dnfs &>/dev/null; then + echo "ERROR: fio does not have the nfs ioengine (needs libnfs)" + exit 1 + fi + + # Check debugfs knobs exist + for knob in "$IO_CACHE_READ" "$IO_CACHE_WRITE" "$DISABLE_SPLICE"; do + if [ ! -f "$knob" ]; then + echo "ERROR: $knob not found. Is the kernel new enough?" + exit 1 + fi + done + + # Check NFS server is exporting + if ! showmount -e localhost 2>/dev/null | grep -q "$EXPORT_PATH"; then + echo "WARNING: $EXPORT_PATH not in showmount output, proceeding anyway" + fi + + # Print system info + echo "Kernel: $(uname -r)" + echo "RAM: $(awk '/MemTotal/ {printf "%.1f GB", $2/1024/1024}' /pr= oc/meminfo)" + echo "Export: $EXPORT_PATH" + echo "NFS ver: $NFS_VER" + echo "File size: $SIZE" + echo "Results: $RESULTS_DIR" + echo "" +} + +# Set server I/O mode via debugfs +set_io_mode() { + local cache_write=3D$1 + local cache_read=3D$2 + local splice_off=3D$3 + + log "Setting io_cache_write=3D$cache_write io_cache_read=3D$cache_read di= sable-splice-read=3D$splice_off" + run_cmd bash -c "echo $cache_write > $IO_CACHE_WRITE" + run_cmd bash -c "echo $cache_read > $IO_CACHE_READ" + run_cmd bash -c "echo $splice_off > $DISABLE_SPLICE" +} + +# Drop page cache on server +drop_caches() { + log "Dropping page cache" + run_cmd bash -c "sync && echo 3 > /proc/sys/vm/drop_caches" + sleep 1 +} + +# Start background server monitoring +start_monitors() { + local outdir=3D$1 + + log "Starting server monitors in $outdir" + run_cmd vmstat 1 > "${outdir}/vmstat.log" 2>&1 & + VMSTAT_PID=3D$! + + run_cmd iostat -x 1 > "${outdir}/iostat.log" 2>&1 & + IOSTAT_PID=3D$! + + # Sample /proc/meminfo every second + (while true; do + echo "=3D=3D=3D $(date '+%s') =3D=3D=3D" + cat /proc/meminfo + sleep 1 + done) > "${outdir}/meminfo.log" 2>&1 & + MEMINFO_PID=3D$! +} + +# Stop background monitors +stop_monitors() { + log "Stopping monitors" + kill "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true + wait "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true +} + +# Run a single fio benchmark. +# nfs_url is set in the job files; we pass --filename and --size on +# the command line to vary the target file and data volume per run. +# Pass "keep" as 5th arg to preserve the test file after the run. +run_fio() { + local job_file=3D$1 + local outdir=3D$2 + local filename=3D$3 + local fio_size=3D${4:-$SIZE} + local keep=3D${5:-} + + local job_name + job_name=3D$(basename "$job_file" .fio) + + log "Running fio job: $job_name -> $outdir (file=3D$filename size=3D$fio_= size)" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + + run_cmd fio "$job_file" \ + --output-format=3Djson \ + --output=3D"${outdir}/${job_name}.json" \ + --filename=3D"$filename" \ + --size=3D"$fio_size" + + stop_monitors + + log "Finished: $job_name" + + # Clean up test file to free disk space unless told to keep it + if [ "$keep" !=3D "keep" ]; then + cleanup_test_files "$filename" + fi +} + +# Remove test files from the export to free disk space +cleanup_test_files() { + local filename + for filename in "$@"; do + local filepath=3D"${EXPORT_PATH}/${filename}" + log "Cleaning up: $filepath" + run_cmd rm -f "$filepath" + done +} + +# Ensure parent directories exist under the export for a given filename +ensure_export_dirs() { + local filename + for filename in "$@"; do + local dirpath=3D"${EXPORT_PATH}/$(dirname "$filename")" + if [ "$dirpath" !=3D "${EXPORT_PATH}/." ] && [ ! -d "$dirpath" ]; then + log "Creating directory: $dirpath" + run_cmd mkdir -p "$dirpath" + fi + done +} + +# Mode name from numeric value +mode_name() { + case $1 in + 0) echo "buffered" ;; + 1) echo "dontcache" ;; + 2) echo "direct" ;; + esac +} + +######################################################################## +# Deliverable 1: Single-client fio benchmarks +######################################################################## +run_deliverable1() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 1: Single-client fio benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Write test matrix: + # mode 0 (buffered): splice on (default) + # mode 1 (dontcache): splice off (required) + # mode 2 (direct): splice off (required) + + # Sequential write + for wmode in 0 1 2; do + local mname + mname=3D$(mode_name $wmode) + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + + drop_caches + set_io_mode "$wmode" 0 "$splice_off" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-write/${mname}" \ + "seq-write_testfile" + done + + # Random write + for wmode in 0 1 2; do + local mname + mname=3D$(mode_name $wmode) + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + + drop_caches + set_io_mode "$wmode" 0 "$splice_off" + run_fio "${FIO_JOBS_DIR}/rand-write.fio" \ + "${RESULTS_DIR}/rand-write/${mname}" \ + "rand-write_testfile" + done + + # Sequential read =E2=80=94 vary read mode, write stays buffered + # Pre-create the file for reading + log "Pre-creating sequential read test file" + set_io_mode 0 0 0 + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-read/precreate" \ + "seq-read_testfile" "$SIZE" "keep" + + for rmode in 0 1 2; do + local mname + mname=3D$(mode_name $rmode) + local splice_off=3D0 + [ "$rmode" -ne 0 ] && splice_off=3D1 + # Keep file for subsequent modes; clean up after last + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + set_io_mode 0 "$rmode" "$splice_off" + run_fio "${FIO_JOBS_DIR}/seq-read.fio" \ + "${RESULTS_DIR}/seq-read/${mname}" \ + "seq-read_testfile" "$SIZE" "$keep" + done + + # Random read =E2=80=94 vary read mode, write stays buffered + # Pre-create the file for reading + log "Pre-creating random read test file" + set_io_mode 0 0 0 + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/rand-read/precreate" \ + "rand-read_testfile" "$SIZE" "keep" + + for rmode in 0 1 2; do + local mname + mname=3D$(mode_name $rmode) + local splice_off=3D0 + [ "$rmode" -ne 0 ] && splice_off=3D1 + # Keep file for subsequent modes; clean up after last + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + set_io_mode 0 "$rmode" "$splice_off" + run_fio "${FIO_JOBS_DIR}/rand-read.fio" \ + "${RESULTS_DIR}/rand-read/${mname}" \ + "rand-read_testfile" "$SIZE" "$keep" + done +} + +######################################################################## +# Deliverable 2: Multi-client (simulated with multiple fio jobs) +######################################################################## +run_deliverable2() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 2: Noisy-neighbor benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + local num_clients=3D4 + local client_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + # Each client gets RAM/num_clients so total > RAM + client_size=3D"$(( mem_kb / 1024 / num_clients ))M" + + # Scenario A: Multiple writers + for mode in 0 1 2; do + local mname + mname=3D$(mode_name $mode) + local splice_off=3D0 + [ "$mode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/multi-write/${mname}" + mkdir -p "$outdir" + + set_io_mode "$mode" "$mode" "$splice_off" + drop_caches + + # Ensure client directories exist on export + for i in $(seq 1 $num_clients); do + ensure_export_dirs "client${i}/testfile" + done + + start_monitors "$outdir" + + # Launch N parallel fio writers + local pids=3D() + for i in $(seq 1 $num_clients); do + run_cmd fio "${FIO_JOBS_DIR}/multi-write.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/client${i}.json" \ + --filename=3D"client${i}/testfile" \ + --size=3D"$client_size" & + pids+=3D($!) + done + + # Wait for all + local rc=3D0 + for pid in "${pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + for i in $(seq 1 $num_clients); do + cleanup_test_files "client${i}/testfile" + done + done + + # Scenario C: Noisy writer + latency-sensitive readers + for mode in 0 1 2; do + local mname + mname=3D$(mode_name $mode) + local splice_off=3D0 + [ "$mode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/noisy-neighbor/${mname}" + mkdir -p "$outdir" + + set_io_mode "$mode" "$mode" "$splice_off" + drop_caches + + # Pre-create read files for latency readers + for i in $(seq 1 $(( num_clients - 1 ))); do + ensure_export_dirs "reader${i}/readfile" + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}/readfile" \ + "512M" "keep" + done + drop_caches + ensure_export_dirs "bulk/testfile" + start_monitors "$outdir" + + # Noisy writer + run_cmd fio "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"bulk/testfile" \ + --size=3D"$SIZE" & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd fio "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"reader${i}/readfile" \ + --size=3D"512M" & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + cleanup_test_files "bulk/testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}/readfile" + done + done + # Scenario D: Mixed-mode noisy neighbor + # Test write/read mode combinations where the writer uses a + # cache-friendly mode and readers use buffered reads to benefit + # from warm cache. + local mixed_modes=3D( + # write_mode read_mode label + "1 0 dontcache-w_buffered-r" + ) + + for combo in "${mixed_modes[@]}"; do + local wmode rmode label + read -r wmode rmode label <<< "$combo" + local splice_off=3D0 + [ "$wmode" -ne 0 ] && splice_off=3D1 + local outdir=3D"${RESULTS_DIR}/noisy-neighbor-mixed/${label}" + mkdir -p "$outdir" + + set_io_mode "$wmode" "$rmode" "$splice_off" + drop_caches + + # Pre-create read files for latency readers + for i in $(seq 1 $(( num_clients - 1 ))); do + ensure_export_dirs "reader${i}/readfile" + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}/readfile" \ + "512M" "keep" + done + drop_caches + ensure_export_dirs "bulk/testfile" + start_monitors "$outdir" + + # Noisy writer + run_cmd fio "${FIO_JOBS_DIR}/noisy-writer.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"bulk/testfile" \ + --size=3D"$SIZE" & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd fio "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"reader${i}/readfile" \ + --size=3D"512M" & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + # Clean up test files + cleanup_test_files "bulk/testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}/readfile" + done + done +} + +######################################################################## +# Main +######################################################################## +preflight + +TIMESTAMP=3D$(date '+%Y%m%d-%H%M%S') +RESULTS_DIR=3D"${RESULTS_DIR}/${TIMESTAMP}" +mkdir -p "$RESULTS_DIR" + +# Save system info +{ + echo "Timestamp: $TIMESTAMP" + echo "Kernel: $(uname -r)" + echo "Hostname: $(hostname)" + echo "NFS version: $NFS_VER" + echo "File size: $SIZE" + echo "Export: $EXPORT_PATH" + cat /proc/meminfo +} > "${RESULTS_DIR}/sysinfo.txt" + +log "Results will be saved to: $RESULTS_DIR" + +run_deliverable1 +run_deliverable2 + +# Reset to defaults +set_io_mode 0 0 0 + +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +log "All benchmarks complete." +log "Results in: $RESULTS_DIR" +log "Run: scripts/parse-results.sh $RESULTS_DIR" +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" diff --git a/tools/testing/nfsd-io-bench/scripts/setup-server.sh b/tools/te= sting/nfsd-io-bench/scripts/setup-server.sh new file mode 100755 index 0000000000000000000000000000000000000000..0efdd74a705e35b040dd8a64b88= e91bac4fa7510 --- /dev/null +++ b/tools/testing/nfsd-io-bench/scripts/setup-server.sh @@ -0,0 +1,94 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# One-time setup script for the NFS test server. +# Run this once before running benchmarks. +# +# Usage: sudo ./setup-server.sh [EXPORT_PATH] + +set -euo pipefail + +EXPORT_PATH=3D"${1:-/export}" +FSTYPE=3D"ext4" + +log() { + echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*" +} + +if [ "$(id -u)" -ne 0 ]; then + echo "ERROR: must run as root" + exit 1 +fi + +# Check for required tools +for cmd in fio exportfs showmount jq; do + if ! command -v "$cmd" &>/dev/null; then + echo "WARNING: $cmd not found, attempting install" + dnf install -y "$cmd" 2>/dev/null || \ + apt-get install -y "$cmd" 2>/dev/null || \ + echo "ERROR: cannot install $cmd, please install manually" + fi +done + +# Check fio has nfs ioengine +if ! fio --enghelp=3Dnfs &>/dev/null; then + echo "ERROR: fio nfs ioengine not available." + echo "You may need to install fio with libnfs support." + echo "Try: dnf install fio libnfs-devel (or build fio from source with -= -enable-nfs)" + exit 1 +fi + +# Create export directory if needed +if [ ! -d "$EXPORT_PATH" ]; then + log "Creating export directory: $EXPORT_PATH" + mkdir -p "$EXPORT_PATH" +fi + +# Create subdirectories for multi-client tests +for i in 1 2 3 4; do + mkdir -p "${EXPORT_PATH}/client${i}" + mkdir -p "${EXPORT_PATH}/reader${i}" +done +mkdir -p "${EXPORT_PATH}/bulk" + +# Check if already exported +if ! exportfs -s 2>/dev/null | grep -q "$EXPORT_PATH"; then + log "Adding NFS export for $EXPORT_PATH" + if ! grep -q "$EXPORT_PATH" /etc/exports 2>/dev/null; then + echo "${EXPORT_PATH} 127.0.0.1/32(rw,sync,no_root_squash,no_subtree_chec= k)" >> /etc/exports + fi + exportfs -ra +fi + +# Ensure NFS server is running +if ! systemctl is-active --quiet nfs-server 2>/dev/null; then + log "Starting NFS server" + systemctl start nfs-server +fi + +# Verify export +log "Current exports:" +showmount -e localhost + +# Check debugfs knobs +log "Checking debugfs knobs:" +DEBUGFS_BASE=3D"/sys/kernel/debug/nfsd" +for knob in io_cache_read io_cache_write disable-splice-read; do + if [ -f "${DEBUGFS_BASE}/${knob}" ]; then + echo " ${knob} =3D $(cat "${DEBUGFS_BASE}/${knob}")" + else + echo " ${knob}: NOT FOUND (kernel may be too old)" + fi +done + +# Print system summary +echo "" +log "=3D=3D=3D System Summary =3D=3D=3D" +echo "Kernel: $(uname -r)" +echo "RAM: $(awk '/MemTotal/ {printf "%.1f GB", $2/1024/1024}' /pr= oc/meminfo)" +echo "Export: $EXPORT_PATH" +echo "Filesystem: $(df -T "$EXPORT_PATH" | awk 'NR=3D=3D2 {print $2}')" +echo "Disk: $(df -h "$EXPORT_PATH" | awk 'NR=3D=3D2 {print $2, "tot= al,", $4, "free"}')" +echo "" +log "Setup complete. Run benchmarks with:" +echo " sudo ./scripts/run-benchmarks.sh -e $EXPORT_PATH" --=20 2.53.0 From nobody Sun Jun 14 08:15:44 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E3493BED40; Wed, 1 Apr 2026 19:11:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070686; cv=none; b=hGdwnhptndcYhKCf27mTKOxcNGyJw2oS+Wc8bU9h9Twk2hB3ZwTf36kAuD/Ccpvqnt2tJTtvkqp1uTCaQnqnGm+V9uJTANR1Xl5vhB9y4Bsc8XJNLGQCLHWSNlqcJ41/TUe2MqxFF35CVxM82t/rVJBerB1ffTEo/iJypGxGvRI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775070686; c=relaxed/simple; bh=M3xi0ebkRZud5UJu4UBh4eYAGyQUK+oXv7GFyquzsqE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BDDqRs5ffbIBpB7fNhxFT2MOLzQJ/Px6uufMWCFaowI3W5dW8Xxh2ktBA4ctArtxPPulUnNBgWUy4NGHSH6s4uVM373RKw1cHziU0NL3IvyDOaiMnEdU98pH5sFlrF1PiHlG3EC7WKOW7qKw3cRtgCM+hmM68n7UkbOrCC5KMJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PiAHPZi/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PiAHPZi/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 30FC1C19423; Wed, 1 Apr 2026 19:11:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775070685; bh=M3xi0ebkRZud5UJu4UBh4eYAGyQUK+oXv7GFyquzsqE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=PiAHPZi/sdJvEU0UbrpjVfrZo0s+dMBv3YT3Rz4OkqnYTgBYj0OyozJ9q00AZOiyf hxApiHkwTniTKOQYYPdHs3zoKlZTNMNpdiE/NxjBu6THcsOFUSVxlEF3yxNbmcOcl1 us2b+i1f03BkgkPuIOHas4V39OZGmexxAQ1Ua8cCYyDXfWGEk+ux0fXP1tLxAgWjpE wFuM+ec12TOrn18H9sobxiEpqCs7SxgZ9qGYl0AcyzG2KxdxqMSEx4U5mjgsSpJCGx +nUPVgYzYPbLsPC0I8FVbN8lU+rag8i40HYMQpuqIDwb6TIHET40JCWhuQgXscgJtG 5nMvv70I0pYNA== From: Jeff Layton Date: Wed, 01 Apr 2026 15:11:01 -0400 Subject: [PATCH 4/4] testing: add dontcache-bench local filesystem benchmark suite Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20260401-dontcache-v1-4-1f5746fab47a@kernel.org> References: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> In-Reply-To: <20260401-dontcache-v1-0-1f5746fab47a@kernel.org> To: Alexander Viro , Christian Brauner , Jan Kara , "Matthew Wilcox (Oracle)" , Andrew Morton , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Mike Snitzer , Chuck Lever Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=27829; i=jlayton@kernel.org; h=from:subject:message-id; bh=M3xi0ebkRZud5UJu4UBh4eYAGyQUK+oXv7GFyquzsqE=; b=owEBbQKS/ZANAwAKAQAOaEEZVoIVAcsmYgBpzW3TFRLL8hqE3mss5ONnqZrptFjdQ1MaJd5uy BbJAM01NlOJAjMEAAEKAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCac1t0wAKCRAADmhBGVaC FXD0EADDv07ZVkySkmUFkhGfWAKXZcaBZnAnnn9yzXwrhvw8uqq7SqCCusk2kCryExTTGV/BBYL GRVJ9HR7Gnx50qxvBdZN4K/1lMilXAz3VL/ZEesy0x6fXAmOMfK7xAJWGpz0EdFOiHMyQTd6Foh BAm3zBXoWlGRJVpmqSjvOQL9GZ3H2mYta9WnRdsfyk3sMXKmMzn8/6ymKNMEmVpKGVX+CwGpNaj vEXCffPocHh83rpGH2Hc0wRlOBRyGoveGIeCxgtbFmBHsntwHko0OgL6jWXLbNqX4eXOWKWMPMg gVra8dcJ9EWjD9DmzYIsNYJm/sZbbpTCpvAle7Pnz0dPUAkax/lOFyGm9kjLABqYGXVRDmINWXl YnPh5a2VHH4lzXzTn/EYzsdCmkTgSx2rfPKj4QH9NAY6cUVXqkNLKrcvpAwl3stu5HOREIRSb/h /VHNI2dk2VJaI09pFedyFCXdmkAK5tAxv6FQWR+SZpI1bRp1DYcvgyXRCIqFD1gYuT/qZm5Z/ZE 7KzRudtZt27eQAN37vXUdQTO2IUSoIzUBpnKTVfG1wUgTKb7kXLD+IXX8EdCx/vaTlvPEvgm/GM f9C2UFmKpkMWJ0Zdls3CiWQSXwRSKbB0aFyYjUxwSJSLsZW3z5PFzPthUPbw/kVZTZzB8thiG6n 9VyZKH5+94h5GSg== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a benchmark suite for testing IOCB_DONTCACHE on local filesystems via fio's io_uring engine with the RWF_DONTCACHE flag. The suite mirrors the nfsd-io-bench test matrix but uses io_uring with the "uncached" fio option instead of NFSD debugfs mode switching: - uncached=3D0: standard buffered I/O - uncached=3D1: RWF_DONTCACHE - Mode 2 uses O_DIRECT via fio's --direct=3D1 Includes fio job files, run-benchmarks.sh, and parse-results.sh. Signed-off-by: Jeff Layton --- .../dontcache-bench/fio-jobs/lat-reader.fio | 12 + .../dontcache-bench/fio-jobs/multi-write.fio | 9 + .../dontcache-bench/fio-jobs/noisy-writer.fio | 12 + .../testing/dontcache-bench/fio-jobs/rand-read.fio | 13 + .../dontcache-bench/fio-jobs/rand-write.fio | 13 + .../testing/dontcache-bench/fio-jobs/seq-read.fio | 13 + .../testing/dontcache-bench/fio-jobs/seq-write.fio | 13 + .../dontcache-bench/scripts/parse-results.sh | 238 ++++++++++ .../dontcache-bench/scripts/run-benchmarks.sh | 518 +++++++++++++++++= ++++ 9 files changed, 841 insertions(+) diff --git a/tools/testing/dontcache-bench/fio-jobs/lat-reader.fio b/tools/= testing/dontcache-bench/fio-jobs/lat-reader.fio new file mode 100644 index 0000000000000000000000000000000000000000..e221e7aedec9d20953898d19dc4= 4beb0588a2d6e --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/lat-reader.fio @@ -0,0 +1,12 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +time_based=3D0 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dlatreader +write_lat_log=3Dlatreader + +[latreader] diff --git a/tools/testing/dontcache-bench/fio-jobs/multi-write.fio b/tools= /testing/dontcache-bench/fio-jobs/multi-write.fio new file mode 100644 index 0000000000000000000000000000000000000000..8fc0770f5860667249bef3553b9= d9624eb0e2213 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/multi-write.fio @@ -0,0 +1,9 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +time_based=3D0 +rw=3Dwrite + +[multiwrite] diff --git a/tools/testing/dontcache-bench/fio-jobs/noisy-writer.fio b/tool= s/testing/dontcache-bench/fio-jobs/noisy-writer.fio new file mode 100644 index 0000000000000000000000000000000000000000..4524eebd4642f292e0a6093319f= c573b79820ff8 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/noisy-writer.fio @@ -0,0 +1,12 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +time_based=3D0 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dnoisywriter +write_lat_log=3Dnoisywriter + +[noisywriter] diff --git a/tools/testing/dontcache-bench/fio-jobs/rand-read.fio b/tools/t= esting/dontcache-bench/fio-jobs/rand-read.fio new file mode 100644 index 0000000000000000000000000000000000000000..e281fa82b86ad12ca4b2dc4fd08= 2d62415dd967a --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/rand-read.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Drandread +log_avg_msec=3D1000 +write_bw_log=3Drandread +write_lat_log=3Drandread + +[randread] diff --git a/tools/testing/dontcache-bench/fio-jobs/rand-write.fio b/tools/= testing/dontcache-bench/fio-jobs/rand-write.fio new file mode 100644 index 0000000000000000000000000000000000000000..cf53bc6f14b9e131793cdcdd4c4= 31ec4e0b79dba --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/rand-write.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D4k +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Drandwrite +log_avg_msec=3D1000 +write_bw_log=3Drandwrite +write_lat_log=3Drandwrite + +[randwrite] diff --git a/tools/testing/dontcache-bench/fio-jobs/seq-read.fio b/tools/te= sting/dontcache-bench/fio-jobs/seq-read.fio new file mode 100644 index 0000000000000000000000000000000000000000..ef87921465a7d8221dda0c6d01c= 0d4be14806703 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/seq-read.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Dread +log_avg_msec=3D1000 +write_bw_log=3Dseqread +write_lat_log=3Dseqread + +[seqread] diff --git a/tools/testing/dontcache-bench/fio-jobs/seq-write.fio b/tools/t= esting/dontcache-bench/fio-jobs/seq-write.fio new file mode 100644 index 0000000000000000000000000000000000000000..da3082f9b391e1112eb25756136= e5b7f27d6b5e2 --- /dev/null +++ b/tools/testing/dontcache-bench/fio-jobs/seq-write.fio @@ -0,0 +1,13 @@ +[global] +ioengine=3Dio_uring +direct=3D0 +bs=3D1M +numjobs=3D1 +iodepth=3D16 +time_based=3D0 +rw=3Dwrite +log_avg_msec=3D1000 +write_bw_log=3Dseqwrite +write_lat_log=3Dseqwrite + +[seqwrite] diff --git a/tools/testing/dontcache-bench/scripts/parse-results.sh b/tools= /testing/dontcache-bench/scripts/parse-results.sh new file mode 100755 index 0000000000000000000000000000000000000000..0427d411db04903a5d950675169= 5d9452b011e6a --- /dev/null +++ b/tools/testing/dontcache-bench/scripts/parse-results.sh @@ -0,0 +1,238 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Parse fio JSON output and generate comparison tables. +# +# Usage: ./parse-results.sh + +set -euo pipefail + +if [ $# -lt 1 ]; then + echo "Usage: $0 " + exit 1 +fi + +RESULTS_DIR=3D"$1" + +if ! command -v jq &>/dev/null; then + echo "ERROR: jq is required" + exit 1 +fi + +# Extract metrics from a single fio JSON result +extract_metrics() { + local json_file=3D$1 + local rw_type=3D$2 # read or write + + if [ ! -f "$json_file" ]; then + echo "N/A N/A N/A N/A N/A N/A" + return + fi + + jq -r --arg rw "$rw_type" ' + .jobs[0][$rw] as $d | + [ + (($d.bw // 0) / 1024 | . * 10 | round / 10), # MB/s + ($d.iops // 0), # IOPS + ((($d.clat_ns.mean // 0) / 1000) | . * 10 | round / 10), # avg lat us + (($d.clat_ns.percentile["50.000000"] // 0) / 1000), # p50 us + (($d.clat_ns.percentile["99.000000"] // 0) / 1000), # p99 us + (($d.clat_ns.percentile["99.900000"] // 0) / 1000) # p99.9 us + ] | @tsv + ' "$json_file" 2>/dev/null || echo "N/A N/A N/A N/A N/A N/A" +} + +# Extract server CPU from vmstat log (average sys%) +extract_cpu() { + local vmstat_log=3D$1 + if [ ! -f "$vmstat_log" ]; then + echo "N/A" + return + fi + # vmstat columns: us sy id wa st =E2=80=94 skip header lines + awk 'NR>2 {sum+=3D$14; n++} END {if(n>0) printf "%.1f", sum/n; else print= "N/A"}' \ + "$vmstat_log" 2>/dev/null || echo "N/A" +} + +# Extract peak dirty pages from meminfo log +extract_peak_dirty() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Dirty:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || e= cho "N/A" +} + +# Extract peak cached from meminfo log +extract_peak_cached() { + local meminfo_log=3D$1 + if [ ! -f "$meminfo_log" ]; then + echo "N/A" + return + fi + grep "^Cached:" "$meminfo_log" | awk '{print $2}' | sort -n | tail -1 || = echo "N/A" +} + +print_separator() { + printf '%*s\n' 120 '' | tr ' ' '-' +} + +######################################################################## +# Deliverable 1: Single-client results +######################################################################## +echo "" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 1: Single-Client fio Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +for workload in seq-write rand-write seq-read rand-read; do + case $workload in + seq-write|rand-write) rw_type=3D"write" ;; + seq-read|rand-read) rw_type=3D"read" ;; + esac + + echo "--- $workload ---" + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "Mode" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" "Sys CPU%= " "PeakDirty(kB)" "PeakCache(kB)" + print_separator + + for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/${workload}/${mode}" + json_file=3D$(find "$dir" -name '*.json' -not -name 'client*' 2>/dev/nul= l | head -1 || true) + if [ -z "$json_file" ]; then + printf "%-16s %10s\n" "$mode" "(no data)" + continue + fi + + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "$rw_type")" + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + cached=3D$(extract_peak_cached "${dir}/meminfo.log") + + printf "%-16s %10s %10s %10s %10s %10s %10s %10s %12s %12s\n" \ + "$mode" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" \ + "$cpu" "${dirty:-N/A}" "${cached:-N/A}" + done + echo "" +done + +######################################################################## +# Deliverable 2: Multi-client results +######################################################################## +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " Deliverable 2: Noisy-Neighbor Benchmarks" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo "" + +# Scenario A: Multiple writers +echo "--- Scenario A: Multiple Writers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/multi-write/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "Client" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + total_bw=3D0 + count=3D0 + for json_file in "${dir}"/client*.json; do + [ -f "$json_file" ] || continue + client=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "write")" + printf " %-10s %10s %10s %10s %10s %10s %10s\n" \ + "$client" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + total_bw=3D$(echo "$total_bw + ${mbps:-0}" | bc 2>/dev/null || echo "$to= tal_bw") + count=3D$(( count + 1 )) + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Aggregate BW: %s MB/s | Sys CPU: %s%% | Peak Dirty: %s kB\n" \ + "$total_bw" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario C: Noisy neighbor +echo "--- Scenario C: Noisy Writer + Latency-Sensitive Readers ---" +for mode in buffered dontcache direct; do + dir=3D"${RESULTS_DIR}/noisy-neighbor/${mode}" + if [ ! -d "$dir" ]; then + continue + fi + + echo " Mode: $mode" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +# Scenario D: Mixed-mode noisy neighbor +echo "--- Scenario D: Mixed-Mode Noisy Writer + Readers ---" +for dir in "${RESULTS_DIR}"/noisy-neighbor-mixed/*/; do + [ -d "$dir" ] || continue + label=3D$(basename "$dir") + + echo " Mode: $label" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Job" "MB/s" "IOPS" "Avg(us)" "p50(us)" "p99(us)" "p99.9(us)" + + # Writer + if [ -f "${dir}/noisy_writer.json" ]; then + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "${dir}/noisy_writer.json" "write")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "Bulk writer" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + fi + + # Readers + for json_file in "${dir}"/reader*.json; do + [ -f "$json_file" ] || continue + reader=3D$(basename "$json_file" .json) + read -r mbps iops avg_lat p50 p99 p999 <<< \ + "$(extract_metrics "$json_file" "read")" + printf " %-14s %10s %10s %10s %10s %10s %10s\n" \ + "$reader" "$mbps" "$iops" "$avg_lat" "$p50" "$p99" "$p999" + done + + cpu=3D$(extract_cpu "${dir}/vmstat.log") + dirty=3D$(extract_peak_dirty "${dir}/meminfo.log") + printf " Sys CPU: %s%% | Peak Dirty: %s kB\n" "$cpu" "${dirty:-N/A}" + echo "" +done + +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +echo " System Info" +echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +if [ -f "${RESULTS_DIR}/sysinfo.txt" ]; then + head -6 "${RESULTS_DIR}/sysinfo.txt" +fi +echo "" diff --git a/tools/testing/dontcache-bench/scripts/run-benchmarks.sh b/tool= s/testing/dontcache-bench/scripts/run-benchmarks.sh new file mode 100755 index 0000000000000000000000000000000000000000..195d579e8eab8b7f827bb643880= 0c4933cdf236b --- /dev/null +++ b/tools/testing/dontcache-bench/scripts/run-benchmarks.sh @@ -0,0 +1,518 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Local filesystem I/O mode benchmark suite. +# +# Runs the same test matrix as run-benchmarks.sh but on a local filesystem +# using fio's io_uring engine with the RWF_DONTCACHE flag instead of NFSD's +# debugfs mode knobs. +# +# Usage: ./run-local-benchmarks.sh [options] +# -t Test directory (must be on a filesystem supporting FOP_DON= TCACHE) +# -s File size (default: auto-sized to exceed RAM) +# -f Path to fio binary (default: fio in PATH) +# -o Output directory for results (default: ./results/) +# -d Dry run (print commands without executing) + +set -euo pipefail + +# Defaults +TEST_DIR=3D"" +SIZE=3D"" +FIO_BIN=3D"fio" +RESULTS_DIR=3D"" +DRY_RUN=3D0 +SCRIPT_DIR=3D"$(cd "$(dirname "$0")" && pwd)" +FIO_JOBS_DIR=3D"${SCRIPT_DIR}/../fio-jobs" + +usage() { + echo "Usage: $0 -t [-s ] [-f ] [-o ] [-d]" + echo "" + echo " -t Test directory (required, must support RWF_DONTCACHE)" + echo " -s File size (default: 2x RAM)" + echo " -f Path to fio binary (default: fio)" + echo " -o Output directory (default: ./results/)" + echo " -d Dry run" + exit 1 +} + +while getopts "t:s:f:o:dh" opt; do + case $opt in + t) TEST_DIR=3D"$OPTARG" ;; + s) SIZE=3D"$OPTARG" ;; + f) FIO_BIN=3D"$OPTARG" ;; + o) RESULTS_DIR=3D"$OPTARG" ;; + d) DRY_RUN=3D1 ;; + h) usage ;; + *) usage ;; + esac +done + +if [ -z "$TEST_DIR" ]; then + echo "ERROR: -t is required" + usage +fi + +# Auto-size to 2x RAM if not specified +if [ -z "$SIZE" ]; then + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + SIZE=3D"$(( mem_kb * 2 / 1024 ))M" +fi + +if [ -z "$RESULTS_DIR" ]; then + RESULTS_DIR=3D"./results/local-$(date +%Y%m%d-%H%M%S)" +fi + +mkdir -p "$RESULTS_DIR" + +log() { + echo "[$(date '+%H:%M:%S')] $*" +} + +run_cmd() { + if [ "$DRY_RUN" -eq 1 ]; then + echo " [DRY RUN] $*" + else + "$@" + fi +} + +# I/O mode definitions: +# buffered: direct=3D0, uncached=3D0 +# dontcache: direct=3D0, uncached=3D1 +# direct: direct=3D1, uncached=3D0 +# +# Mode name from numeric value +mode_name() { + case $1 in + 0) echo "buffered" ;; + 1) echo "dontcache" ;; + 2) echo "direct" ;; + esac +} + +# Return fio command-line flags for a given mode. +# "direct" is a standard fio option and works on the command line. +# "uncached" is an io_uring engine option that must be in the job file, +# so we inject it via make_job_file() below. +mode_fio_args() { + case $1 in + 0) echo "--direct=3D0" ;; # buffered + 1) echo "--direct=3D0" ;; # dontcache + 2) echo "--direct=3D1" ;; # direct + esac +} + +# Return the uncached=3D value for a given mode. +mode_uncached() { + case $1 in + 0) echo "0" ;; + 1) echo "1" ;; + 2) echo "0" ;; + esac +} + +# Create a temporary job file with uncached=3DN injected into [global]. +# For uncached=3D0 (buffered/direct), return the original file unchanged. +make_job_file() { + local job_file=3D$1 + local uncached=3D$2 + + if [ "$uncached" -eq 0 ]; then + echo "$job_file" + return + fi + + local tmp + tmp=3D$(mktemp) + sed "/^\[global\]/a uncached=3D${uncached}" "$job_file" > "$tmp" + echo "$tmp" +} + +drop_caches() { + run_cmd bash -c "sync && echo 3 > /proc/sys/vm/drop_caches" +} + +# Background monitors +VMSTAT_PID=3D"" +IOSTAT_PID=3D"" +MEMINFO_PID=3D"" + +start_monitors() { + local outdir=3D$1 + log "Starting monitors in $outdir" + run_cmd vmstat 1 > "${outdir}/vmstat.log" 2>&1 & + VMSTAT_PID=3D$! + run_cmd iostat -x 1 > "${outdir}/iostat.log" 2>&1 & + IOSTAT_PID=3D$! + (while true; do + echo "=3D=3D=3D $(date '+%s') =3D=3D=3D" + cat /proc/meminfo + sleep 1 + done) > "${outdir}/meminfo.log" 2>&1 & + MEMINFO_PID=3D$! +} + +stop_monitors() { + log "Stopping monitors" + kill "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true + wait "$VMSTAT_PID" "$IOSTAT_PID" "$MEMINFO_PID" 2>/dev/null || true +} + +cleanup_test_files() { + local filepath=3D"${TEST_DIR}/$1" + log "Cleaning up $filepath" + run_cmd rm -f "$filepath" +} + +# Run a single fio benchmark +run_fio() { + local job_file=3D$1 + local outdir=3D$2 + local filename=3D$3 + local fio_size=3D${4:-$SIZE} + local keep=3D${5:-} + local extra_args=3D${6:-} + local uncached=3D${7:-0} + + # Inject uncached=3DN into the job file if needed + local actual_job + actual_job=3D$(make_job_file "$job_file" "$uncached") + + local job_name + job_name=3D$(basename "$job_file" .fio) + + log "Running fio job: $job_name -> $outdir (file=3D${TEST_DIR}/$filename = size=3D$fio_size)" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$actual_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/${job_name}.json" \ + --filename=3D"${TEST_DIR}/$filename" \ + --size=3D"$fio_size" \ + $extra_args + + stop_monitors + log "Finished: $job_name" + + # Clean up temp job file if one was created + [ "$actual_job" !=3D "$job_file" ] && rm -f "$actual_job" + + if [ "$keep" !=3D "keep" ]; then + cleanup_test_files "$filename" + fi +} + +######################################################################## +# Preflight +######################################################################## +preflight() { + log "=3D=3D=3D Preflight checks =3D=3D=3D" + + if ! command -v "$FIO_BIN" &>/dev/null; then + echo "ERROR: fio not found at $FIO_BIN" + exit 1 + fi + + if [ ! -d "$TEST_DIR" ]; then + echo "ERROR: Test directory $TEST_DIR does not exist" + exit 1 + fi + + # Quick check that RWF_DONTCACHE works on this filesystem + local testfile=3D"${TEST_DIR}/.dontcache_test" + if ! "$FIO_BIN" --name=3Dtest --ioengine=3Dio_uring --rw=3Dwrite \ + --bs=3D4k --size=3D4k --direct=3D0 --uncached=3D1 \ + --filename=3D"$testfile" 2>/dev/null; then + echo "WARNING: RWF_DONTCACHE may not be supported on $TEST_DIR" + echo " (filesystem must support FOP_DONTCACHE)" + fi + rm -f "$testfile" + + log "Test directory: $TEST_DIR" + log "File size: $SIZE" + log "fio binary: $FIO_BIN" + log "Results: $RESULTS_DIR" + + # Record system info + { + echo "Timestamp: $(date +%Y%m%d-%H%M%S)" + echo "Kernel: $(uname -r)" + echo "Hostname: $(hostname)" + echo "Filesystem: $(df -T "$TEST_DIR" | tail -1 | awk '{print $2}')" + echo "File size: $SIZE" + echo "Test dir: $TEST_DIR" + } > "${RESULTS_DIR}/sysinfo.txt" +} + +######################################################################## +# Deliverable 1: Single-client benchmarks +######################################################################## +run_deliverable1() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 1: Single-client benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + # Sequential write + for mode in 0 1 2; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-write/${mname}" \ + "seq-write_testfile" "$SIZE" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Random write + for mode in 0 1 2; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + + drop_caches + run_fio "${FIO_JOBS_DIR}/rand-write.fio" \ + "${RESULTS_DIR}/rand-write/${mname}" \ + "rand-write_testfile" "$SIZE" "" "$fio_args" \ + "$(mode_uncached $mode)" + done + + # Sequential read =E2=80=94 pre-create file, then read with each mode + log "Pre-creating sequential read test file" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/seq-read/precreate" \ + "seq-read_testfile" "$SIZE" "keep" + + for rmode in 0 1 2; do + local mname + mname=3D$(mode_name $rmode) + local fio_args + fio_args=3D$(mode_fio_args $rmode) + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + run_fio "${FIO_JOBS_DIR}/seq-read.fio" \ + "${RESULTS_DIR}/seq-read/${mname}" \ + "seq-read_testfile" "$SIZE" "$keep" "$fio_args" \ + "$(mode_uncached $rmode)" + done + + # Random read =E2=80=94 pre-create file, then read with each mode + log "Pre-creating random read test file" + run_fio "${FIO_JOBS_DIR}/seq-write.fio" \ + "${RESULTS_DIR}/rand-read/precreate" \ + "rand-read_testfile" "$SIZE" "keep" + + for rmode in 0 1 2; do + local mname + mname=3D$(mode_name $rmode) + local fio_args + fio_args=3D$(mode_fio_args $rmode) + local keep=3D"keep" + [ "$rmode" -eq 2 ] && keep=3D"" + + drop_caches + run_fio "${FIO_JOBS_DIR}/rand-read.fio" \ + "${RESULTS_DIR}/rand-read/${mname}" \ + "rand-read_testfile" "$SIZE" "$keep" "$fio_args" \ + "$(mode_uncached $rmode)" + done +} + +######################################################################## +# Deliverable 2: Multi-client tests +######################################################################## +run_deliverable2() { + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + log "Deliverable 2: Noisy-neighbor benchmarks" + log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" + + local num_clients=3D4 + local client_size + local mem_kb + mem_kb=3D$(awk '/MemTotal/ {print $2}' /proc/meminfo) + client_size=3D"$(( mem_kb / 1024 / num_clients ))M" + + # Scenario A: Multiple writers + for mode in 0 1 2; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + local uncached + uncached=3D$(mode_uncached $mode) + local actual_job + actual_job=3D$(make_job_file "${FIO_JOBS_DIR}/multi-write.fio" "$uncache= d") + local outdir=3D"${RESULTS_DIR}/multi-write/${mname}" + mkdir -p "$outdir" + + drop_caches + start_monitors "$outdir" + + local pids=3D() + for i in $(seq 1 $num_clients); do + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$actual_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/client${i}.json" \ + --filename=3D"${TEST_DIR}/client${i}_testfile" \ + --size=3D"$client_size" \ + $fio_args & + pids+=3D($!) + done + + local rc=3D0 + for pid in "${pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$actual_job" !=3D "${FIO_JOBS_DIR}/multi-write.fio" ] && rm -f "$actu= al_job" + for i in $(seq 1 $num_clients); do + cleanup_test_files "client${i}_testfile" + done + done + + # Scenario C: Noisy writer + latency-sensitive readers + for mode in 0 1 2; do + local mname + mname=3D$(mode_name $mode) + local fio_args + fio_args=3D$(mode_fio_args $mode) + local uncached + uncached=3D$(mode_uncached $mode) + local writer_job + writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" "$uncach= ed") + local reader_job + reader_job=3D$(make_job_file "${FIO_JOBS_DIR}/lat-reader.fio" "$uncached= ") + local outdir=3D"${RESULTS_DIR}/noisy-neighbor/${mname}" + mkdir -p "$outdir" + + # Pre-create read files + for i in $(seq 1 $(( num_clients - 1 ))); do + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}_readfile" \ + "512M" "keep" + done + drop_caches + start_monitors "$outdir" + + # Noisy writer + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"${TEST_DIR}/bulk_testfile" \ + --size=3D"$SIZE" \ + $fio_args & + local writer_pid=3D$! + + # Latency-sensitive readers + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + # shellcheck disable=3DSC2086 + run_cmd "$FIO_BIN" "$reader_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"${TEST_DIR}/reader${i}_readfile" \ + --size=3D"512M" \ + $fio_args & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$wri= ter_job" + [ "$reader_job" !=3D "${FIO_JOBS_DIR}/lat-reader.fio" ] && rm -f "$reade= r_job" + cleanup_test_files "bulk_testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}_readfile" + done + done + + # Scenario D: Mixed-mode noisy neighbor + # dontcache writes + buffered reads + local outdir=3D"${RESULTS_DIR}/noisy-neighbor-mixed/dontcache-w_buffered-= r" + mkdir -p "$outdir" + local writer_job + writer_job=3D$(make_job_file "${FIO_JOBS_DIR}/noisy-writer.fio" 1) + + for i in $(seq 1 $(( num_clients - 1 ))); do + log "Pre-creating read file for reader $i" + run_fio "${FIO_JOBS_DIR}/multi-write.fio" \ + "${outdir}/precreate_reader${i}" \ + "reader${i}_readfile" \ + "512M" "keep" + done + drop_caches + start_monitors "$outdir" + + # Writer with dontcache + run_cmd "$FIO_BIN" "$writer_job" \ + --output-format=3Djson \ + --output=3D"${outdir}/noisy_writer.json" \ + --filename=3D"${TEST_DIR}/bulk_testfile" \ + --size=3D"$SIZE" \ + --direct=3D0 & + local writer_pid=3D$! + + # Readers with buffered (no uncached flag) + local reader_pids=3D() + for i in $(seq 1 $(( num_clients - 1 ))); do + run_cmd "$FIO_BIN" "${FIO_JOBS_DIR}/lat-reader.fio" \ + --output-format=3Djson \ + --output=3D"${outdir}/reader${i}.json" \ + --filename=3D"${TEST_DIR}/reader${i}_readfile" \ + --size=3D"512M" \ + --direct=3D0 & + reader_pids+=3D($!) + done + + local rc=3D0 + wait "$writer_pid" || rc=3D$? + for pid in "${reader_pids[@]}"; do + wait "$pid" || rc=3D$? + done + + stop_monitors + [ $rc -ne 0 ] && log "WARNING: some fio jobs exited non-zero" + + [ "$writer_job" !=3D "${FIO_JOBS_DIR}/noisy-writer.fio" ] && rm -f "$writ= er_job" + cleanup_test_files "bulk_testfile" + for i in $(seq 1 $(( num_clients - 1 ))); do + cleanup_test_files "reader${i}_readfile" + done +} + +######################################################################## +# Main +######################################################################## +preflight +run_deliverable1 +run_deliverable2 + +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" +log "All benchmarks complete." +log "Results in: $RESULTS_DIR" +log "Parse with: scripts/parse-results.sh $RESULTS_DIR" +log "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D" --=20 2.53.0