From nobody Sat Feb 7 18:21:07 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A587B26E6FB for ; Mon, 1 Dec 2025 21:01:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622917; cv=none; b=sqiAIFQLoU1UfBzgVmdESN0Yq7My9R3xsZxQNnJ7T0SYLa3B5yvUq2TBt/4E9Vp3L8WvRjDP+12Dyd6MXsQRdy1DwyO6+TP3p7nTzxDTSZehKhMRmwT5Deh/jaZ865yqaEw9+beDdoOSG6BxW79RjMNnFn+ZoaN6zqxLdeN8QxQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622917; c=relaxed/simple; bh=8ug+1yi+LQMFTT3mOnpEJdIWLd9u4ulukTjJaONrmZ8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S0BP3nGPpB9gfUpwJldhaTthY/e6Z+U31O0zjqdAh9dZsJHoOdNhLjoqXfSQZjXobA2JkWU5fBrxYXuL4f+OKUzRHK70SCQEcpIlRxglW6cxl21r+JcX6QeKJSYYQzvD1lTitIfPmA07YKTOt1nq9kXY0r++6wpbf7b6IcCvtc4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fCitQp+e; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fCitQp+e" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CB8CC4CEF1; Mon, 1 Dec 2025 21:01:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764622917; bh=8ug+1yi+LQMFTT3mOnpEJdIWLd9u4ulukTjJaONrmZ8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fCitQp+e9eyEJVDl5T1w8JSfJIXTa+Q1ad+QVYZeVTc0JxmC50gsbnpqfjof4KxA8 oNY+cBI7TZLVyQ90ZELeLVt+xHZYXfCjIKXcSXqDEVwqGXSsNRyvtd9QoUNFbR9Kcd WP1AmZlhlBOhJTiGApbG6KRD4pt4670xvpcKKJrnBCH3UScgjDtXYLA4PBFreOf3wh RuDMaZN8/rcm8RKGhJqTd3QmMoXxQVlJkJuX6K+m22tHjv0euUTNa0Q1LUesBG0FeB Df28r+Jdh2IlmqpZEoGR+K4FRiLya8qcIbX/q/q+1+PkinVf4HMs7oN3bjfN08Mo4b evDTa3ZOK7jxA== From: Jaegeuk Kim To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, Matthew Wilcox Cc: Jaegeuk Kim Subject: [PATCH 1/4] mm/readahead: fix the broken readahead for POSIX_FADV_WILLNEED Date: Mon, 1 Dec 2025 21:01:24 +0000 Message-ID: <20251201210152.909339-2-jaegeuk@kernel.org> X-Mailer: git-send-email 2.52.0.107.ga0afd4fd5b-goog In-Reply-To: <20251201210152.909339-1-jaegeuk@kernel.org> References: <20251201210152.909339-1-jaegeuk@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch fixes the broken readahead flow for POSIX_FADV_WILLNEED, where the problem is, in force_page_cache_ra(nr_to_read), nr_to_read is cut by the below code. max_pages =3D max_t(unsigned long, bdi->io_pages, ra->ra_pages); nr_to_read =3D min_t(unsigned long, nr_to_read, max_pages); IOWs, we are not able to read ahead larger than the above max_pages which is most likely the range of 2MB and 16MB. Note, it doesn't make sense to set ra->ra_pages to the entire file size. Instead, let's fix this logic. Before: f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0,= len:4294967296, advise:3 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D0 nr_to_read=3D512 lo= okahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D512 nr_to_read=3D512 = lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1024 nr_to_read=3D512= lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1536 nr_to_read=3D512= lookahead_size=3D0 After: f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0,= len:4294967296, advise:3 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D0 nr_to_read=3D2048 l= ookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D2048 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D4096 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D6144 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D8192 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D10240 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D12288 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D14336 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D16384 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D18432 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D20480 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D22528 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D24576 nr_to_read=3D20= 48 lookahead_size=3D0 ... page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1042432 nr_to_read=3D= 2048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1044480 nr_to_read=3D= 2048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1046528 nr_to_read=3D= 2048 lookahead_size=3D0 Cc: linux-mm@kvack.org Cc: Matthew Wilcox (Oracle) Signed-off-by: Jaegeuk Kim --- mm/readahead.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 3a4b5d58eeb6..c0db049a5b7b 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -311,7 +311,7 @@ EXPORT_SYMBOL_GPL(page_cache_ra_unbounded); * behaviour which would occur if page allocations are causing VM writebac= k. * We really don't want to intermingle reads and writes like that. */ -static void do_page_cache_ra(struct readahead_control *ractl, +static int do_page_cache_ra(struct readahead_control *ractl, unsigned long nr_to_read, unsigned long lookahead_size) { struct inode *inode =3D ractl->mapping->host; @@ -320,45 +320,42 @@ static void do_page_cache_ra(struct readahead_control= *ractl, pgoff_t end_index; /* The last page we want to read */ =20 if (isize =3D=3D 0) - return; + return -EINVAL; =20 end_index =3D (isize - 1) >> PAGE_SHIFT; if (index > end_index) - return; + return -EINVAL; /* Don't read past the page containing the last byte of the file */ if (nr_to_read > end_index - index) nr_to_read =3D end_index - index + 1; =20 page_cache_ra_unbounded(ractl, nr_to_read, lookahead_size); + return 0; } =20 /* - * Chunk the readahead into 2 megabyte units, so that we don't pin too much - * memory at once. + * Chunk the readahead per the block device capacity, and read all nr_to_r= ead. */ void force_page_cache_ra(struct readahead_control *ractl, unsigned long nr_to_read) { struct address_space *mapping =3D ractl->mapping; - struct file_ra_state *ra =3D ractl->ra; struct backing_dev_info *bdi =3D inode_to_bdi(mapping->host); - unsigned long max_pages; + unsigned long this_chunk; =20 if (unlikely(!mapping->a_ops->read_folio && !mapping->a_ops->readahead)) return; =20 /* - * If the request exceeds the readahead window, allow the read to - * be up to the optimal hardware IO size + * Consier the optimal hardware IO size for readahead chunk. */ - max_pages =3D max_t(unsigned long, bdi->io_pages, ra->ra_pages); - nr_to_read =3D min_t(unsigned long, nr_to_read, max_pages); + this_chunk =3D max_t(unsigned long, bdi->io_pages, ractl->ra->ra_pages); + while (nr_to_read) { - unsigned long this_chunk =3D (2 * 1024 * 1024) / PAGE_SIZE; + this_chunk =3D min_t(unsigned long, this_chunk, nr_to_read); =20 - if (this_chunk > nr_to_read) - this_chunk =3D nr_to_read; - do_page_cache_ra(ractl, this_chunk, 0); + if (do_page_cache_ra(ractl, this_chunk, 0)) + break; =20 nr_to_read -=3D this_chunk; } --=20 2.52.0.107.ga0afd4fd5b-goog From nobody Sat Feb 7 18:21:07 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E43F27AC41 for ; Mon, 1 Dec 2025 21:01:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622918; cv=none; b=dki3BJxfXTmD7nHdLSXA2jfmIF6oqhsvxz6FXRTR2lNZGmS13lW0gX77oJv+6j3iUy9Yo3f35beDCs8mdTJ/dgMeUPM3oE/k1AJFbQ6KVR//v9N2PuO0xKNbXLSy3MFdbF2LPTWGppCeKXySGoXlwbluRLlOpVmol1G3thY8GJU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622918; c=relaxed/simple; bh=ymCaVvYBxLZ105+ok6O11I72/6m8odE5XYgiapzJKeg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eZgxSyyHYv9Bg2E3shxNsZlBlU8+S332T7hkKLQunS/S4Elm5lLpa6EaxR/QoQdw13M/El/50Knv6/lotmeDOkr87ULOSWIe8Cv52kZTg9KMtu9u7xKBRHJXT9ridsLRq2DT5wBAmAq67wdi4PvK5mlXNJkX9TXWCC3A3aBSZPk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OJzhg7L0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OJzhg7L0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 11722C4CEF1; Mon, 1 Dec 2025 21:01:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764622918; bh=ymCaVvYBxLZ105+ok6O11I72/6m8odE5XYgiapzJKeg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OJzhg7L0deFEQE7KVhFIIkZcgXrvIaEEh6AyeUPb6Ol1+OxV6f0hfF+BLEK+i2nld jp7DZsh7aWmrzp+W3v9ReOgv7nCjzjz9kRREFnqbw0CQQvQ3Tg1Ao8DHMKcm/HTNQY eKUuTZlxxua436Lk6mP+nowy93oxot6Jo6ihn+04s9E/860ao/1Z0I12B6TCJL5veW 0vLVbgz7Cag+Ew2DqacqmbgXpMY6Znb5gxZsNb60YkByr5z5BWFMwt7mgUGJeg/0lS 8P5JQ86CytVCJEQJltAtbjBFnbM5zj6uK7H5mnBMZk6SjSlD4u4lM57qzUSxtaa58E elvpFPYXsdnhA== From: Jaegeuk Kim To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, Matthew Wilcox Cc: Jaegeuk Kim Subject: [PATCH 2/4] mm/readahead: use page_cache_sync_ra for FADVISE_FAV_WILLNEED Date: Mon, 1 Dec 2025 21:01:25 +0000 Message-ID: <20251201210152.909339-3-jaegeuk@kernel.org> X-Mailer: git-send-email 2.52.0.107.ga0afd4fd5b-goog In-Reply-To: <20251201210152.909339-1-jaegeuk@kernel.org> References: <20251201210152.909339-1-jaegeuk@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch replaces page_cache_ra_unbounded() with page_cache_sync_ra() in fadvise(FADVISE_FAV_WILLNEED) to support the large folio. Before: f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0= , len:4294967296, advise:3 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D0 nr_to_read=3D2048 = lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D2048 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D4096 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D6144 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D8192 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D10240 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D12288 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D14336 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D16384 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D18432 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D20480 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D22528 nr_to_read=3D2= 048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D24576 nr_to_read=3D2= 048 lookahead_size=3D0 ... page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1042432 nr_to_read= =3D2048 lookahead_size=3D0 This is all zero-order page allocation. After (order=3D0 by default): f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0= , len:4294967296, advise:3 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D0 req_count=3D2048 order= =3D0 size=3D0 async_size=3D0 ra_pages=3D2048 mmap_miss=3D0 prev_pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D0 order=3D0 size=3D2048 = async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D2048 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D2048 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D4096 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D4096 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D6144 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D6144 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D8192 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D8192 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D10240 req_count=3D2048 or= der=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_po= s=3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D10240 nr_to_read=3D2= 048 lookahead_size=3D0 ... page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1042432 nr_to_read= =3D2048 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1044480 req_count=3D2048 = order=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1044480 nr_to_read= =3D2048 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1046528 req_count=3D2048 = order=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1046528 nr_to_read= =3D2048 lookahead_size=3D0 Cc: linux-mm@kvack.org Cc: Matthew Wilcox (Oracle) Signed-off-by: Jaegeuk Kim --- mm/readahead.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index c0db049a5b7b..5beaf7803554 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -340,6 +340,7 @@ void force_page_cache_ra(struct readahead_control *ract= l, unsigned long nr_to_read) { struct address_space *mapping =3D ractl->mapping; + struct inode *inode =3D mapping->host; struct backing_dev_info *bdi =3D inode_to_bdi(mapping->host); unsigned long this_chunk; =20 @@ -352,11 +353,19 @@ void force_page_cache_ra(struct readahead_control *ra= ctl, this_chunk =3D max_t(unsigned long, bdi->io_pages, ractl->ra->ra_pages); =20 while (nr_to_read) { - this_chunk =3D min_t(unsigned long, this_chunk, nr_to_read); + unsigned long index =3D readahead_index(ractl); + pgoff_t end_index =3D (i_size_read(inode) - 1) >> PAGE_SHIFT; =20 - if (do_page_cache_ra(ractl, this_chunk, 0)) + if (index > end_index) break; =20 + if (nr_to_read > end_index - index) + nr_to_read =3D end_index - index + 1; + + this_chunk =3D min_t(unsigned long, this_chunk, nr_to_read); + + page_cache_sync_ra(ractl, this_chunk); + nr_to_read -=3D this_chunk; } } @@ -573,7 +582,7 @@ void page_cache_sync_ra(struct readahead_control *ractl, =20 /* be dumb */ if (do_forced_ra) { - force_page_cache_ra(ractl, req_count); + do_page_cache_ra(ractl, req_count, 0); return; } =20 --=20 2.52.0.107.ga0afd4fd5b-goog From nobody Sat Feb 7 18:21:07 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64A47283FC3 for ; Mon, 1 Dec 2025 21:01:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622919; cv=none; b=Skg6IG42CzIpwcVokvkkNoWPGe8VatH7hzV0psgFeSCVFULyU/lLHoltQCPDuJbzx0NCGFtYh2bzIF7pvfyGkBeFc8JQ5k4w8U70fzipO9sCBsI2it7z27F+LGp2ww6LQ8qXprr6RL9F3ssro+u+/+5NzsmhGxh+POdNFvm3ANA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622919; c=relaxed/simple; bh=TSjhUrQK4osexZsfhBcSoBUl/lcIsaNtwDeF27hGiGU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Li/IidyffCFu79tUE77MsCi7F1xmKvlYUc78lKaLAYdQAOOzc9M5eMLNYlX6h/UiiejHPinMLVT+IsZ+PjQYipbEFD0ATMLax9vmnZsFlBwqWMmqevYJ4HgH4aiG3TjZlOtMobE+f/5x1dukWMepiOeyAlhzpFGwxKNlXd3dbys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rH4NodVe; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rH4NodVe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E148CC4CEF1; Mon, 1 Dec 2025 21:01:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764622919; bh=TSjhUrQK4osexZsfhBcSoBUl/lcIsaNtwDeF27hGiGU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rH4NodVe/BbU+oexIBoGMxXa4yul2HcmP/E0rEL1mlQvVrW+YYjbwO98IE75BhVBQ kU0mbhK+ePjRcwdhQsyEfjun9kEBtd6UJSmZhPtDKENtY9w0/+uxNfyv+ykywTvFpD gbDUjEFjrWPvP8wIBVVopkd8OFc5mUK+fjRkiVUQ8OtD6mOLLKGZ1hYsB/6DJ+pSqB /AZahR5VraMQsf/qXL4EZ9SADBYiPF24f/qfdA0u4SmjZBmnlACh0G2ybixs9iuNV2 lOnwWhvkWv0xOZ6WDBM20lGWMs8EO/iig+dVhoLSyoxJdlZv0dcUfoEw+gTxs+ciYH +cepQRB4NaolQ== From: Jaegeuk Kim To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, Matthew Wilcox Cc: Jaegeuk Kim Subject: [PATCH 3/4] mm/readahead: add a_ops->ra_folio_order to get a desired folio order Date: Mon, 1 Dec 2025 21:01:26 +0000 Message-ID: <20251201210152.909339-4-jaegeuk@kernel.org> X-Mailer: git-send-email 2.52.0.107.ga0afd4fd5b-goog In-Reply-To: <20251201210152.909339-1-jaegeuk@kernel.org> References: <20251201210152.909339-1-jaegeuk@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch introduces a new address operation, a_ops->ra_folio_order(), whi= ch proposes a new folio order based on the adjusted order for page_cache_sync_= ra. Hence, each filesystem can set the desired minimum order of folio allocation when requesting fadvise(POSIX_FADV_WILLNEED). Cc: linux-mm@kvack.org Cc: Matthew Wilcox (Oracle) Signed-off-by: Jaegeuk Kim --- include/linux/fs.h | 4 ++++ include/linux/pagemap.h | 12 ++++++++++++ mm/readahead.c | 6 ++++-- 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index c895146c1444..ddab68b7e03b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -472,6 +472,10 @@ struct address_space_operations { void (*is_dirty_writeback) (struct folio *, bool *dirty, bool *wb); int (*error_remove_folio)(struct address_space *, struct folio *); =20 + /* Min folio order to allocate pages. */ + unsigned int (*ra_folio_order)(struct address_space *mapping, + unsigned int order); + /* swapfile support */ int (*swap_activate)(struct swap_info_struct *sis, struct file *file, sector_t *span); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 09b581c1d878..e1fe07477220 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -476,6 +476,18 @@ mapping_min_folio_order(const struct address_space *ma= pping) return (mapping->flags & AS_FOLIO_ORDER_MIN_MASK) >> AS_FOLIO_ORDER_MIN; } =20 +static inline unsigned int +mapping_ra_folio_order(struct address_space *mapping, unsigned int order) +{ + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) + return 0; + + if (!mapping->a_ops->ra_folio_order) + return order; + + return mapping->a_ops->ra_folio_order(mapping, order); +} + static inline unsigned long mapping_min_folio_nrpages(const struct address_space *mapping) { diff --git a/mm/readahead.c b/mm/readahead.c index 5beaf7803554..8c7d08af6e00 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -592,8 +592,10 @@ void page_cache_sync_ra(struct readahead_control *ract= l, * A start of file, oversized read, or sequential cache miss: * trivial case: (index - prev_index) =3D=3D 1 * unaligned reads: (index - prev_index) =3D=3D 0 + * if filesystem sets high-order allocation */ - if (!index || req_count > max_pages || index - prev_index <=3D 1UL) { + if (!index || req_count > max_pages || index - prev_index <=3D 1UL || + mapping_ra_folio_order(ractl->mapping, 0)) { ra->start =3D index; ra->size =3D get_init_ra_size(req_count, max_pages); ra->async_size =3D ra->size > req_count ? ra->size - req_count : @@ -627,7 +629,7 @@ void page_cache_sync_ra(struct readahead_control *ractl, ra->size =3D min(contig_count + req_count, max_pages); ra->async_size =3D 1; readit: - ra->order =3D 0; + ra->order =3D mapping_ra_folio_order(ractl->mapping, 0); ractl->_index =3D ra->start; page_cache_ra_order(ractl, ra); } --=20 2.52.0.107.ga0afd4fd5b-goog From nobody Sat Feb 7 18:21:07 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A4A3285CB3 for ; Mon, 1 Dec 2025 21:02:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622921; cv=none; b=sgX2koTZtpVTQ/AHJllu7ODACTH67zTNKUseicHiD9oCDCc3Hjgq/5YPCCqF8iQj3AwKX1aAATk60bRysDrskZJRCz9QVWVdBRbjTfpjqYAM3eZAAqlc1uEsZFjXKTeD7rrTy602WHKZ6pRUalSR78TPoqs0xQ/TtcIIIs2nClM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622921; c=relaxed/simple; bh=F8dsEH8LzOP5exYXTa3xg16NLIpugxfOznLwZuhVXug=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PFd1E3uyfXTMmrvYeGL9OOqaRg7DXXcehj6G7eS7qrWfbQYY8KvKa2/N/IUZbTORGFwK892vnaGz89WXPCnijCU/9mV2QxJuQamdO2yUouuDVZrwNDDgWd3f8T9AXf78yKCYZ1KH2lB/gzp7wLkLMhFPYfRB5c3mzlknH2p/LFQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f4Ygigxa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f4Ygigxa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE767C4CEF1; Mon, 1 Dec 2025 21:01:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764622919; bh=F8dsEH8LzOP5exYXTa3xg16NLIpugxfOznLwZuhVXug=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=f4YgigxadJvN2JMHsDU+R/BEtZ8bFGBGmBgXFIJPYiBvStYlK6M1cnXRFnegtigRu KUZiaTevjstN4jzw+2FZcBr56TnANRw/XzZtYQs621EuvKQF5QHPEDZZkXjwh9MhE1 7eOh+UG1r6tY9FQfgiqtbUqV77zXc5OvXVH2wu4MlhxhBR9QXzJ8xL7E2DeTRz4E9l KgjY1RNz341yGIQm9+Yd7kIXgfpeA3winTyqz0YMZNdDxY21hUiDSPQqdu+VYvFPBx et4lbIvcnmeZVFRC3gJVGH3MzSG0xinYNzdQ4VZbRcy+ybLfo3ebh/AEz7u/8X0R7w QpmCBuWuG1cbw== From: Jaegeuk Kim To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, Matthew Wilcox Cc: Jaegeuk Kim Subject: [PATCH 4/4] f2fs: attach a_ops->ra_folio_order to allocate large folios for readahead Date: Mon, 1 Dec 2025 21:01:27 +0000 Message-ID: <20251201210152.909339-5-jaegeuk@kernel.org> X-Mailer: git-send-email 2.52.0.107.ga0afd4fd5b-goog In-Reply-To: <20251201210152.909339-1-jaegeuk@kernel.org> References: <20251201210152.909339-1-jaegeuk@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds a sysfs entry to change the folio order. Given ra_folio_ord= er=3D9, we can see page_cache_ra_order getting order=3D9 when we submit readahead()= as below. =3D=3D=3D=3D folio_order=3D0 =3D=3D=3D=3D f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0= , len:4294967296, advise:3 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D0 req_count=3D2048 order= =3D0 size=3D0 async_size=3D0 ra_pages=3D2048 mmap_miss=3D0 prev_pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D0 order=3D0 size=3D2048 = async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D2048 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D2048 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D4096 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D4096 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D6144 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D6144 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D8192 req_count=3D2048 ord= er=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D8192 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D10240 req_count=3D2048 or= der=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_po= s=3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D10240 nr_to_read=3D2= 048 lookahead_size=3D0 ... page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1042432 nr_to_read= =3D2048 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1044480 req_count=3D2048 = order=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1044480 nr_to_read= =3D2048 lookahead_size=3D0 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1046528 req_count=3D2048 = order=3D0 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1046528 nr_to_read= =3D2048 lookahead_size=3D0 =3D=3D=3D=3D folio_order=3D9 =3D=3D=3D=3D f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0= , len:4294967296, advise:3 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D0 req_count=3D2048 order= =3D0 size=3D0 async_size=3D0 ra_pages=3D2048 mmap_miss=3D0 prev_pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D0 order=3D9 size=3D2048 = async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D2048 req_count=3D2048 ord= er=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D2048 order=3D9 size=3D20= 48 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D4096 req_count=3D2048 ord= er=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D4096 order=3D9 size=3D20= 48 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D6144 req_count=3D2048 ord= er=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D6144 order=3D9 size=3D20= 48 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D8192 req_count=3D2048 ord= er=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_pos= =3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D8192 order=3D9 size=3D20= 48 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D10240 req_count=3D2048 or= der=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_po= s=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D10240 order=3D9 size=3D2= 048 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D12288 req_count=3D2048 or= der=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_po= s=3D-1 ... page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1040384 req_count=3D2048 = order=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D1040384 order=3D9 size= =3D2048 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1042432 req_count=3D2048 = order=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D1042432 order=3D9 size= =3D2048 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1044480 req_count=3D2048 = order=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D1044480 order=3D9 size= =3D2048 async_size=3D1024 ra_pages=3D2048 page_cache_sync_ra: dev=3D252:16 ino=3De index=3D1046528 req_count=3D2048 = order=3D9 size=3D2048 async_size=3D1024 ra_pages=3D2048 mmap_miss=3D0 prev_= pos=3D-1 page_cache_ra_order: dev=3D252:16 ino=3De index=3D1046528 order=3D9 size= =3D2048 async_size=3D1024 ra_pages=3D2048 Cc: linux-mm@kvack.org Cc: Matthew Wilcox (Oracle) Signed-off-by: Jaegeuk Kim --- fs/f2fs/data.c | 9 +++++++++ fs/f2fs/f2fs.h | 3 +++ fs/f2fs/super.c | 1 + fs/f2fs/sysfs.c | 9 +++++++++ 4 files changed, 22 insertions(+) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 7a4f0f2d60cf..addef5a1fdb1 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3995,6 +3995,14 @@ static bool f2fs_dirty_data_folio(struct address_spa= ce *mapping, return false; } =20 +static unsigned int f2fs_ra_folio_order(struct address_space *mapping, + unsigned int order) +{ + if (!mapping_large_folio_support(mapping)) + return order; + + return max(order, F2FS_M_SB(mapping)->ra_folio_order); +} =20 static sector_t f2fs_bmap_compress(struct inode *inode, sector_t block) { @@ -4313,6 +4321,7 @@ const struct address_space_operations f2fs_dblock_aop= s =3D { .dirty_folio =3D f2fs_dirty_data_folio, .migrate_folio =3D filemap_migrate_folio, .invalidate_folio =3D f2fs_invalidate_folio, + .ra_folio_order =3D f2fs_ra_folio_order, .release_folio =3D f2fs_release_folio, .bmap =3D f2fs_bmap, .swap_activate =3D f2fs_swap_activate, diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d7600979218e..06f90d510a01 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1932,6 +1932,9 @@ struct f2fs_sb_info { /* carve out reserved_blocks from total blocks */ bool carve_out; =20 + /* enable large folio for readahead. */ + unsigned int ra_folio_order; + #ifdef CONFIG_F2FS_FS_COMPRESSION struct kmem_cache *page_array_slab; /* page array entry */ unsigned int page_array_slab_size; /* default page array slab size */ diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index ccb477086444..bae02ca96c1f 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4287,6 +4287,7 @@ static void init_sb_info(struct f2fs_sb_info *sbi) NAT_ENTRY_PER_BLOCK)); sbi->allocate_section_hint =3D le32_to_cpu(raw_super->section_count); sbi->allocate_section_policy =3D ALLOCATE_FORWARD_NOHINT; + sbi->ra_folio_order =3D 0; F2FS_ROOT_INO(sbi) =3D le32_to_cpu(raw_super->root_ino); F2FS_NODE_INO(sbi) =3D le32_to_cpu(raw_super->node_ino); F2FS_META_INO(sbi) =3D le32_to_cpu(raw_super->meta_ino); diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index c42f4f979d13..2537a25986a6 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -906,6 +906,13 @@ static ssize_t __sbi_store(struct f2fs_attr *a, return count; } =20 + if (!strcmp(a->attr.name, "ra_folio_order")) { + if (t < 0 || t > MAX_PAGECACHE_ORDER) + return -EINVAL; + sbi->ra_folio_order =3D t; + return count; + } + *ui =3D (unsigned int)t; =20 return count; @@ -1180,6 +1187,7 @@ F2FS_SBI_GENERAL_RW_ATTR(migration_window_granularity= ); F2FS_SBI_GENERAL_RW_ATTR(dir_level); F2FS_SBI_GENERAL_RW_ATTR(allocate_section_hint); F2FS_SBI_GENERAL_RW_ATTR(allocate_section_policy); +F2FS_SBI_GENERAL_RW_ATTR(ra_folio_order); #ifdef CONFIG_F2FS_IOSTAT F2FS_SBI_GENERAL_RW_ATTR(iostat_enable); F2FS_SBI_GENERAL_RW_ATTR(iostat_period_ms); @@ -1422,6 +1430,7 @@ static struct attribute *f2fs_attrs[] =3D { ATTR_LIST(reserved_pin_section), ATTR_LIST(allocate_section_hint), ATTR_LIST(allocate_section_policy), + ATTR_LIST(ra_folio_order), NULL, }; ATTRIBUTE_GROUPS(f2fs); --=20 2.52.0.107.ga0afd4fd5b-goog