From nobody Sun Feb 8 15:57:46 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A587B26E6FB for ; Mon, 1 Dec 2025 21:01:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622917; cv=none; b=sqiAIFQLoU1UfBzgVmdESN0Yq7My9R3xsZxQNnJ7T0SYLa3B5yvUq2TBt/4E9Vp3L8WvRjDP+12Dyd6MXsQRdy1DwyO6+TP3p7nTzxDTSZehKhMRmwT5Deh/jaZ865yqaEw9+beDdoOSG6BxW79RjMNnFn+ZoaN6zqxLdeN8QxQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764622917; c=relaxed/simple; bh=8ug+1yi+LQMFTT3mOnpEJdIWLd9u4ulukTjJaONrmZ8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S0BP3nGPpB9gfUpwJldhaTthY/e6Z+U31O0zjqdAh9dZsJHoOdNhLjoqXfSQZjXobA2JkWU5fBrxYXuL4f+OKUzRHK70SCQEcpIlRxglW6cxl21r+JcX6QeKJSYYQzvD1lTitIfPmA07YKTOt1nq9kXY0r++6wpbf7b6IcCvtc4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fCitQp+e; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fCitQp+e" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2CB8CC4CEF1; Mon, 1 Dec 2025 21:01:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1764622917; bh=8ug+1yi+LQMFTT3mOnpEJdIWLd9u4ulukTjJaONrmZ8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fCitQp+e9eyEJVDl5T1w8JSfJIXTa+Q1ad+QVYZeVTc0JxmC50gsbnpqfjof4KxA8 oNY+cBI7TZLVyQ90ZELeLVt+xHZYXfCjIKXcSXqDEVwqGXSsNRyvtd9QoUNFbR9Kcd WP1AmZlhlBOhJTiGApbG6KRD4pt4670xvpcKKJrnBCH3UScgjDtXYLA4PBFreOf3wh RuDMaZN8/rcm8RKGhJqTd3QmMoXxQVlJkJuX6K+m22tHjv0euUTNa0Q1LUesBG0FeB Df28r+Jdh2IlmqpZEoGR+K4FRiLya8qcIbX/q/q+1+PkinVf4HMs7oN3bjfN08Mo4b evDTa3ZOK7jxA== From: Jaegeuk Kim To: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, Matthew Wilcox Cc: Jaegeuk Kim Subject: [PATCH 1/4] mm/readahead: fix the broken readahead for POSIX_FADV_WILLNEED Date: Mon, 1 Dec 2025 21:01:24 +0000 Message-ID: <20251201210152.909339-2-jaegeuk@kernel.org> X-Mailer: git-send-email 2.52.0.107.ga0afd4fd5b-goog In-Reply-To: <20251201210152.909339-1-jaegeuk@kernel.org> References: <20251201210152.909339-1-jaegeuk@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch fixes the broken readahead flow for POSIX_FADV_WILLNEED, where the problem is, in force_page_cache_ra(nr_to_read), nr_to_read is cut by the below code. max_pages =3D max_t(unsigned long, bdi->io_pages, ra->ra_pages); nr_to_read =3D min_t(unsigned long, nr_to_read, max_pages); IOWs, we are not able to read ahead larger than the above max_pages which is most likely the range of 2MB and 16MB. Note, it doesn't make sense to set ra->ra_pages to the entire file size. Instead, let's fix this logic. Before: f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0,= len:4294967296, advise:3 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D0 nr_to_read=3D512 lo= okahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D512 nr_to_read=3D512 = lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1024 nr_to_read=3D512= lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1536 nr_to_read=3D512= lookahead_size=3D0 After: f2fs_fadvise: dev =3D (252,16), ino =3D 14, i_size =3D 4294967296 offset:0,= len:4294967296, advise:3 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D0 nr_to_read=3D2048 l= ookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D2048 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D4096 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D6144 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D8192 nr_to_read=3D204= 8 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D10240 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D12288 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D14336 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D16384 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D18432 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D20480 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D22528 nr_to_read=3D20= 48 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D24576 nr_to_read=3D20= 48 lookahead_size=3D0 ... page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1042432 nr_to_read=3D= 2048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1044480 nr_to_read=3D= 2048 lookahead_size=3D0 page_cache_ra_unbounded: dev=3D252:16 ino=3De index=3D1046528 nr_to_read=3D= 2048 lookahead_size=3D0 Cc: linux-mm@kvack.org Cc: Matthew Wilcox (Oracle) Signed-off-by: Jaegeuk Kim --- mm/readahead.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/mm/readahead.c b/mm/readahead.c index 3a4b5d58eeb6..c0db049a5b7b 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -311,7 +311,7 @@ EXPORT_SYMBOL_GPL(page_cache_ra_unbounded); * behaviour which would occur if page allocations are causing VM writebac= k. * We really don't want to intermingle reads and writes like that. */ -static void do_page_cache_ra(struct readahead_control *ractl, +static int do_page_cache_ra(struct readahead_control *ractl, unsigned long nr_to_read, unsigned long lookahead_size) { struct inode *inode =3D ractl->mapping->host; @@ -320,45 +320,42 @@ static void do_page_cache_ra(struct readahead_control= *ractl, pgoff_t end_index; /* The last page we want to read */ =20 if (isize =3D=3D 0) - return; + return -EINVAL; =20 end_index =3D (isize - 1) >> PAGE_SHIFT; if (index > end_index) - return; + return -EINVAL; /* Don't read past the page containing the last byte of the file */ if (nr_to_read > end_index - index) nr_to_read =3D end_index - index + 1; =20 page_cache_ra_unbounded(ractl, nr_to_read, lookahead_size); + return 0; } =20 /* - * Chunk the readahead into 2 megabyte units, so that we don't pin too much - * memory at once. + * Chunk the readahead per the block device capacity, and read all nr_to_r= ead. */ void force_page_cache_ra(struct readahead_control *ractl, unsigned long nr_to_read) { struct address_space *mapping =3D ractl->mapping; - struct file_ra_state *ra =3D ractl->ra; struct backing_dev_info *bdi =3D inode_to_bdi(mapping->host); - unsigned long max_pages; + unsigned long this_chunk; =20 if (unlikely(!mapping->a_ops->read_folio && !mapping->a_ops->readahead)) return; =20 /* - * If the request exceeds the readahead window, allow the read to - * be up to the optimal hardware IO size + * Consier the optimal hardware IO size for readahead chunk. */ - max_pages =3D max_t(unsigned long, bdi->io_pages, ra->ra_pages); - nr_to_read =3D min_t(unsigned long, nr_to_read, max_pages); + this_chunk =3D max_t(unsigned long, bdi->io_pages, ractl->ra->ra_pages); + while (nr_to_read) { - unsigned long this_chunk =3D (2 * 1024 * 1024) / PAGE_SIZE; + this_chunk =3D min_t(unsigned long, this_chunk, nr_to_read); =20 - if (this_chunk > nr_to_read) - this_chunk =3D nr_to_read; - do_page_cache_ra(ractl, this_chunk, 0); + if (do_page_cache_ra(ractl, this_chunk, 0)) + break; =20 nr_to_read -=3D this_chunk; } --=20 2.52.0.107.ga0afd4fd5b-goog