[v1] mm: limit filemap_fault readahead to VMA boundaries

[PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Frederick Mayle 1 month, 3 weeks ago

When a file mapping covers a strict subset of a file, an access to the
mapping can trigger readahead of file pages outside the mapped region.
Readahead is meant to prefetch pages likely to be accessed soon, but
these pages aren't accessible via the same means, so it fair to say we
don't have a good indicator they'll be accessed soon. Take an ELF file
for example: An access to the end of a program's read-only segment isn't
a sign that nearby file contents will be accessed next (they are likely
to be mapped discontiguously, or not at all). The pressure from loading
these pages into the cache can evict more useful pages.

To improve the behavior, make three changes:

* Introduce a new readahead_control option, max_index, as a hard limit
  on the readahead. The existing file_ra_state->size can't be used as a
  limit, it is more of a hint and can be increased by various
  heuristics.
* Set readahead_control->max_index to the end of the VMA in all of the
  readahead paths that can be triggered from a fault on a file mapping
  (both "sync" and "async" readahead).
* Limit the read-around range start to the VMA's start.

Note that these changes only affect readahead triggered in the context
of a fault, they do not affect readahead triggered by read syscalls. If
a user mixes the two types of accesses, the behavior is expected to be
the following: if a fault causes readahead and places a PG_readahead
marker and then a read(2) syscall hits the PG_readahead marker, the
resulting async readahead *will not* be limited to the VMA end.
Conversely, if a read(2) syscall places a PG_readahead marker and then a
fault hits the marker, the async readahead *will* be limited to the VMA
end.

There is an edge case that the above motivation glosses over: A single
file mapping might be backed by multiple VMAs. For example, a whole file
could be mapped RW, then part of the mapping made RO using mprotect.
This patch would hurt performance of a sequential read of such a
mapping, the degree depending on how fragmented the VMAs are. A usage
pattern like that is likely rare and already suffering from sub-optimal
performance because, e.g., the fragmented VMAs limit the fault-around,
so each VMA boundary in a sequential read would cause a minor fault.
Still, this would make it worse. See a previous discussion of this topic
at [1].

Tested by mapping and reading a small subset of a large file, then using
the cachestat syscall to verify the number of cached pages didn't exceed
the mapping size.

In practical scenarios, the effect depends on the specific file and
usage. Sometimes there is no effect at all, but, for some ELF files in
Android, we see ~20% fewer pages pull into the cache.

A comprehensive performance evaluation hasn't been done, but, in
addition to the anecdontal memory savings mentioned above, a benchmark
was run with fio 3.38, showing neutral looking results:

    /data/local/tmp/fio --version

    fio --name=mmap_test --ioengine=mmap --rw=read --bs=4k \
        --offset=1G --size=1G --filesize=3G --numjobs=1 \
        --filename=testfile.bin

        Before: 4366.6 MiB/s (avg of 3459, 4592, 4613, 4697, 4472)
        After:  4444.0 MiB/s (avg of 4633, 4655, 4511, 4571, 3850)
                +1.7%

    Same, with --ioengine=mmap --rw=randread

        Before: 445.6 MiB/s  (avg of 446, 447, 442, 452, 441)
        After:  447.0 MiB/s  (avg of 447, 446, 446, 451, 445)
                +0.3%

    Same, with --ioengine=psync --rw=read

        Before: 3086.6 MiB/s (avg of 3122, 3094, 3066, 3094, 3057)
        After:  3084.6 MiB/s (avg of 3039, 3103, 3103, 3084, 3094)
                -0.06%

    Same, with --ioengine=psync --rw=randread

        Before: 2226.4 MiB/s (avg of 2256, 2183, 2207, 2265, 2221)
        After:  2231.4 MiB/s (avg of 2236, 2241, 2236, 2193, 2251)
                +0.2%

[1] https://lore.kernel.org/all/ivnv2crd3et76p2nx7oszuqhzzah756oecn5yuykzqfkqzoygw@yvnlkhjjssoz/

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: android-mm@google.com
Cc: kernel-team@android.com
Signed-off-by: Frederick Mayle <fmayle@google.com>
---
 include/linux/pagemap.h | 2 ++
 mm/filemap.c            | 4 ++++
 mm/readahead.c          | 5 ++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ec442af3f886..cc628050bc5e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1366,6 +1366,7 @@ struct readahead_control {
 	bool dropbehind;
 	bool _workingset;
 	unsigned long _pflags;
+	unsigned long max_index; /* limit readahead to i<=max_index */
 };
 
 #define DEFINE_READAHEAD(ractl, f, r, m, i)				\
@@ -1374,6 +1375,7 @@ struct readahead_control {
 		.mapping = m,						\
 		.ra = r,						\
 		._index = i,						\
+		.max_index = ULONG_MAX,					\
 	}
 
 #define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
diff --git a/mm/filemap.c b/mm/filemap.c
index 4e636647100c..d2f6bef12f58 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3314,6 +3314,8 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
 	bool force_thp_readahead = false;
 	unsigned short mmap_miss;
 
+	ractl.max_index = vmf->vma->vm_pgoff + vma_pages(vmf->vma) - 1;
+
 	/* Use the readahead code, even if readahead is disabled */
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 	    (vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER)
@@ -3396,6 +3398,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
 		 * mmap read-around
 		 */
 		ra->start = max_t(long, 0, vmf->pgoff - ra->ra_pages / 2);
+		ra->start = max(ra->start, vmf->vma->vm_pgoff);
 		ra->size = ra->ra_pages;
 		ra->async_size = ra->ra_pages / 4;
 		ra->order = 0;
@@ -3438,6 +3441,7 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
 	}
 
 	if (folio_test_readahead(folio)) {
+		ractl.max_index = vmf->vma->vm_pgoff + vma_pages(vmf->vma) - 1;
 		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
 		page_cache_async_ra(&ractl, folio, ra->ra_pages);
 	}
diff --git a/mm/readahead.c b/mm/readahead.c
index 7b05082c89ea..95a424b2f3a3 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -324,6 +324,8 @@ static void do_page_cache_ra(struct readahead_control *ractl,
 		return;
 
 	end_index = (isize - 1) >> PAGE_SHIFT;
+	if (end_index > ractl->max_index)
+		end_index = ractl->max_index;
 	if (index > end_index)
 		return;
 	/* Don't read past the page containing the last byte of the file */
@@ -471,7 +473,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
 	pgoff_t start = readahead_index(ractl);
 	pgoff_t index = start;
 	unsigned int min_order = mapping_min_folio_order(mapping);
-	pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
+	pgoff_t limit = min_t(pgoff_t, (i_size_read(mapping->host) - 1) >> PAGE_SHIFT,
+				       ractl->max_index);
 	pgoff_t mark = index + ra->size - ra->async_size;
 	unsigned int nofs;
 	int err = 0;

base-commit: db2a1695b2b6feb071b47b72e61d0359bf1524bf
-- 
2.54.0.rc1.555.g9c883467ad-goog

Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Pedro Falcato 1 month, 3 weeks ago

On Tue, Apr 21, 2026 at 05:56:07PM -0700, Frederick Mayle wrote:
> When a file mapping covers a strict subset of a file, an access to the
> mapping can trigger readahead of file pages outside the mapped region.
> Readahead is meant to prefetch pages likely to be accessed soon, but
> these pages aren't accessible via the same means, so it fair to say we
> don't have a good indicator they'll be accessed soon. Take an ELF file
> for example: An access to the end of a program's read-only segment isn't
> a sign that nearby file contents will be accessed next (they are likely
> to be mapped discontiguously, or not at all). The pressure from loading
> these pages into the cache can evict more useful pages.
> 
> To improve the behavior, make three changes:
> 
> * Introduce a new readahead_control option, max_index, as a hard limit
>   on the readahead. The existing file_ra_state->size can't be used as a
>   limit, it is more of a hint and can be increased by various
>   heuristics.
> * Set readahead_control->max_index to the end of the VMA in all of the
>   readahead paths that can be triggered from a fault on a file mapping
>   (both "sync" and "async" readahead).
> * Limit the read-around range start to the VMA's start.
> 
> Note that these changes only affect readahead triggered in the context
> of a fault, they do not affect readahead triggered by read syscalls. If
> a user mixes the two types of accesses, the behavior is expected to be
> the following: if a fault causes readahead and places a PG_readahead
> marker and then a read(2) syscall hits the PG_readahead marker, the
> resulting async readahead *will not* be limited to the VMA end.
> Conversely, if a read(2) syscall places a PG_readahead marker and then a
> fault hits the marker, the async readahead *will* be limited to the VMA
> end.
> 
> There is an edge case that the above motivation glosses over: A single
> file mapping might be backed by multiple VMAs. For example, a whole file
> could be mapped RW, then part of the mapping made RO using mprotect.
> This patch would hurt performance of a sequential read of such a
> mapping, the degree depending on how fragmented the VMAs are. A usage
> pattern like that is likely rare and already suffering from sub-optimal
> performance because, e.g., the fragmented VMAs limit the fault-around,
> so each VMA boundary in a sequential read would cause a minor fault.
> Still, this would make it worse. See a previous discussion of this topic
> at [1].
> 
> Tested by mapping and reading a small subset of a large file, then using
> the cachestat syscall to verify the number of cached pages didn't exceed
> the mapping size.
> 
> In practical scenarios, the effect depends on the specific file and
> usage. Sometimes there is no effect at all, but, for some ELF files in
> Android, we see ~20% fewer pages pull into the cache.

Didn't Android have a gigantically modified RA window? Could this be why
you're seeing such large effects? Or is this no longer the case?

-- 
Pedro

Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Frederick Mayle 1 month, 3 weeks ago

On Wed, Apr 22, 2026 at 6:32 AM Pedro Falcato <pfalcato@suse.de> wrote:
>
> On Tue, Apr 21, 2026 at 05:56:07PM -0700, Frederick Mayle wrote:
> > When a file mapping covers a strict subset of a file, an access to the
> > mapping can trigger readahead of file pages outside the mapped region.
> > Readahead is meant to prefetch pages likely to be accessed soon, but
> > these pages aren't accessible via the same means, so it fair to say we
> > don't have a good indicator they'll be accessed soon. Take an ELF file
> > for example: An access to the end of a program's read-only segment isn't
> > a sign that nearby file contents will be accessed next (they are likely
> > to be mapped discontiguously, or not at all). The pressure from loading
> > these pages into the cache can evict more useful pages.
> >
> > To improve the behavior, make three changes:
> >
> > * Introduce a new readahead_control option, max_index, as a hard limit
> >   on the readahead. The existing file_ra_state->size can't be used as a
> >   limit, it is more of a hint and can be increased by various
> >   heuristics.
> > * Set readahead_control->max_index to the end of the VMA in all of the
> >   readahead paths that can be triggered from a fault on a file mapping
> >   (both "sync" and "async" readahead).
> > * Limit the read-around range start to the VMA's start.
> >
> > Note that these changes only affect readahead triggered in the context
> > of a fault, they do not affect readahead triggered by read syscalls. If
> > a user mixes the two types of accesses, the behavior is expected to be
> > the following: if a fault causes readahead and places a PG_readahead
> > marker and then a read(2) syscall hits the PG_readahead marker, the
> > resulting async readahead *will not* be limited to the VMA end.
> > Conversely, if a read(2) syscall places a PG_readahead marker and then a
> > fault hits the marker, the async readahead *will* be limited to the VMA
> > end.
> >
> > There is an edge case that the above motivation glosses over: A single
> > file mapping might be backed by multiple VMAs. For example, a whole file
> > could be mapped RW, then part of the mapping made RO using mprotect.
> > This patch would hurt performance of a sequential read of such a
> > mapping, the degree depending on how fragmented the VMAs are. A usage
> > pattern like that is likely rare and already suffering from sub-optimal
> > performance because, e.g., the fragmented VMAs limit the fault-around,
> > so each VMA boundary in a sequential read would cause a minor fault.
> > Still, this would make it worse. See a previous discussion of this topic
> > at [1].
> >
> > Tested by mapping and reading a small subset of a large file, then using
> > the cachestat syscall to verify the number of cached pages didn't exceed
> > the mapping size.
> >
> > In practical scenarios, the effect depends on the specific file and
> > usage. Sometimes there is no effect at all, but, for some ELF files in
> > Android, we see ~20% fewer pages pull into the cache.
>
> Didn't Android have a gigantically modified RA window? Could this be why
> you're seeing such large effects? Or is this no longer the case?

On the device I used to test this, the relevant storage device had a readahead
size of 128kb. In general, it can be configured by device OEMs, so perhaps some
devices in the ecosystem have giant RA windows.

Android binaries can have a lot of padding and sometimes unrelated APK sections
adjacent in the same file (see https://lwn.net/Articles/1016860/). Those may be
the source of the effect, but I haven't verified.

AppImage files have some similarities. They are an ELF files that include a
mountable filesystem. Here is a quick test I ran on Debian with the latest
neovim appimage (./cachestat is a binary that simply invokes the cachestat
syscall on a file):

    echo 3 >/proc/sys/vm/drop_caches && \
        ./cachestat ~/nvim-linux-x86_64.appimage && \
        ~/nvim-linux-x86_64.appimage -es >/dev/null && \
        ./cachestat ~/nvim-linux-x86_64.appimage

This patch reduces nr_cache from 4134 to 2131.


>
> --
> Pedro

Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Matthew Wilcox 1 month, 3 weeks ago

On Tue, Apr 21, 2026 at 05:56:07PM -0700, Frederick Mayle wrote:
> When a file mapping covers a strict subset of a file, an access to the
> mapping can trigger readahead of file pages outside the mapped region.
> Readahead is meant to prefetch pages likely to be accessed soon, but
> these pages aren't accessible via the same means, so it fair to say we
> don't have a good indicator they'll be accessed soon. Take an ELF file
> for example: An access to the end of a program's read-only segment isn't
> a sign that nearby file contents will be accessed next (they are likely
> to be mapped discontiguously, or not at all). The pressure from loading
> these pages into the cache can evict more useful pages.

The problem is that we might (for example) use mprotect() to mark a
portion of the file as being unmodifiable, but nevertheless still want
to prefetch through it (since it will be read, just not written).  I'm
sure this solves your problem, but I'm not sure it covers all use cases.

Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Jan Kara 1 month, 3 weeks ago

On Wed 22-04-26 13:30:47, Matthew Wilcox wrote:
> On Tue, Apr 21, 2026 at 05:56:07PM -0700, Frederick Mayle wrote:
> > When a file mapping covers a strict subset of a file, an access to the
> > mapping can trigger readahead of file pages outside the mapped region.
> > Readahead is meant to prefetch pages likely to be accessed soon, but
> > these pages aren't accessible via the same means, so it fair to say we
> > don't have a good indicator they'll be accessed soon. Take an ELF file
> > for example: An access to the end of a program's read-only segment isn't
> > a sign that nearby file contents will be accessed next (they are likely
> > to be mapped discontiguously, or not at all). The pressure from loading
> > these pages into the cache can evict more useful pages.
> 
> The problem is that we might (for example) use mprotect() to mark a
> portion of the file as being unmodifiable, but nevertheless still want
> to prefetch through it (since it will be read, just not written).  I'm
> sure this solves your problem, but I'm not sure it covers all use cases.

Well, I'm not sure whether all the usecases are covered either but is what
you describe above something you'd expect people to commonly do? In general
sequential reading through mmap seems to be already relatively rare...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Frederick Mayle 1 month, 3 weeks ago

On Wed, Apr 22, 2026 at 5:56 AM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 22-04-26 13:30:47, Matthew Wilcox wrote:
> > The problem is that we might (for example) use mprotect() to mark a
> > portion of the file as being unmodifiable, but nevertheless still want
> > to prefetch through it (since it will be read, just not written).  I'm
> > sure this solves your problem, but I'm not sure it covers all use cases.
>
> Well, I'm not sure whether all the usecases are covered either but is what
> you describe above something you'd expect people to commonly do? In general
> sequential reading through mmap seems to be already relatively rare...

I'm curious if anyone knows of a use case like this. I'd think that, if a user
is making heavy usage of mprotect, they'd be advance enough to be a step away
(or a few steps beyond) wanting to manually trigger readahead with madvise or
similar.

In recent history, commit 38b0ece6d763 ("mm/filemap: allow arch to request
folio size for exec memory") disabled readahead for VM_EXEC mappings and there
doesn't seem to have been pushback yet. Of course, non-exec mappings almost
certainly have more varied uses.

Re: [PATCH] mm: limit filemap_fault readahead to VMA boundaries

Posted by Jan Kara 1 month, 3 weeks ago

On Tue 21-04-26 17:56:07, Frederick Mayle wrote:
> When a file mapping covers a strict subset of a file, an access to the
> mapping can trigger readahead of file pages outside the mapped region.
> Readahead is meant to prefetch pages likely to be accessed soon, but
> these pages aren't accessible via the same means, so it fair to say we
> don't have a good indicator they'll be accessed soon. Take an ELF file
> for example: An access to the end of a program's read-only segment isn't
> a sign that nearby file contents will be accessed next (they are likely
> to be mapped discontiguously, or not at all). The pressure from loading
> these pages into the cache can evict more useful pages.
> 
> To improve the behavior, make three changes:
> 
> * Introduce a new readahead_control option, max_index, as a hard limit
>   on the readahead. The existing file_ra_state->size can't be used as a
>   limit, it is more of a hint and can be increased by various
>   heuristics.
> * Set readahead_control->max_index to the end of the VMA in all of the
>   readahead paths that can be triggered from a fault on a file mapping
>   (both "sync" and "async" readahead).
> * Limit the read-around range start to the VMA's start.
> 
> Note that these changes only affect readahead triggered in the context
> of a fault, they do not affect readahead triggered by read syscalls. If
> a user mixes the two types of accesses, the behavior is expected to be
> the following: if a fault causes readahead and places a PG_readahead
> marker and then a read(2) syscall hits the PG_readahead marker, the
> resulting async readahead *will not* be limited to the VMA end.
> Conversely, if a read(2) syscall places a PG_readahead marker and then a
> fault hits the marker, the async readahead *will* be limited to the VMA
> end.
> 
> There is an edge case that the above motivation glosses over: A single
> file mapping might be backed by multiple VMAs. For example, a whole file
> could be mapped RW, then part of the mapping made RO using mprotect.
> This patch would hurt performance of a sequential read of such a
> mapping, the degree depending on how fragmented the VMAs are. A usage
> pattern like that is likely rare and already suffering from sub-optimal
> performance because, e.g., the fragmented VMAs limit the fault-around,
> so each VMA boundary in a sequential read would cause a minor fault.
> Still, this would make it worse. See a previous discussion of this topic
> at [1].
> 
> Tested by mapping and reading a small subset of a large file, then using
> the cachestat syscall to verify the number of cached pages didn't exceed
> the mapping size.
> 
> In practical scenarios, the effect depends on the specific file and
> usage. Sometimes there is no effect at all, but, for some ELF files in
> Android, we see ~20% fewer pages pull into the cache.
> 
> A comprehensive performance evaluation hasn't been done, but, in
> addition to the anecdontal memory savings mentioned above, a benchmark
> was run with fio 3.38, showing neutral looking results:
> 
>     /data/local/tmp/fio --version
> 
>     fio --name=mmap_test --ioengine=mmap --rw=read --bs=4k \
>         --offset=1G --size=1G --filesize=3G --numjobs=1 \
>         --filename=testfile.bin
> 
>         Before: 4366.6 MiB/s (avg of 3459, 4592, 4613, 4697, 4472)
>         After:  4444.0 MiB/s (avg of 4633, 4655, 4511, 4571, 3850)
>                 +1.7%
> 
>     Same, with --ioengine=mmap --rw=randread
> 
>         Before: 445.6 MiB/s  (avg of 446, 447, 442, 452, 441)
>         After:  447.0 MiB/s  (avg of 447, 446, 446, 451, 445)
>                 +0.3%
> 
>     Same, with --ioengine=psync --rw=read
> 
>         Before: 3086.6 MiB/s (avg of 3122, 3094, 3066, 3094, 3057)
>         After:  3084.6 MiB/s (avg of 3039, 3103, 3103, 3084, 3094)
>                 -0.06%
> 
>     Same, with --ioengine=psync --rw=randread
> 
>         Before: 2226.4 MiB/s (avg of 2256, 2183, 2207, 2265, 2221)
>         After:  2231.4 MiB/s (avg of 2236, 2241, 2236, 2193, 2251)
>                 +0.2%
> 
> [1] https://lore.kernel.org/all/ivnv2crd3et76p2nx7oszuqhzzah756oecn5yuykzqfkqzoygw@yvnlkhjjssoz/
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Kalesh Singh <kaleshsingh@google.com>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: android-mm@google.com
> Cc: kernel-team@android.com
> Signed-off-by: Frederick Mayle <fmayle@google.com>

Looks good to me. Thanks! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/pagemap.h | 2 ++
>  mm/filemap.c            | 4 ++++
>  mm/readahead.c          | 5 ++++-
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index ec442af3f886..cc628050bc5e 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -1366,6 +1366,7 @@ struct readahead_control {
>  	bool dropbehind;
>  	bool _workingset;
>  	unsigned long _pflags;
> +	unsigned long max_index; /* limit readahead to i<=max_index */
>  };
>  
>  #define DEFINE_READAHEAD(ractl, f, r, m, i)				\
> @@ -1374,6 +1375,7 @@ struct readahead_control {
>  		.mapping = m,						\
>  		.ra = r,						\
>  		._index = i,						\
> +		.max_index = ULONG_MAX,					\
>  	}
>  
>  #define VM_READAHEAD_PAGES	(SZ_128K / PAGE_SIZE)
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 4e636647100c..d2f6bef12f58 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3314,6 +3314,8 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  	bool force_thp_readahead = false;
>  	unsigned short mmap_miss;
>  
> +	ractl.max_index = vmf->vma->vm_pgoff + vma_pages(vmf->vma) - 1;
> +
>  	/* Use the readahead code, even if readahead is disabled */
>  	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>  	    (vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <= MAX_PAGECACHE_ORDER)
> @@ -3396,6 +3398,7 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf)
>  		 * mmap read-around
>  		 */
>  		ra->start = max_t(long, 0, vmf->pgoff - ra->ra_pages / 2);
> +		ra->start = max(ra->start, vmf->vma->vm_pgoff);
>  		ra->size = ra->ra_pages;
>  		ra->async_size = ra->ra_pages / 4;
>  		ra->order = 0;
> @@ -3438,6 +3441,7 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
>  	}
>  
>  	if (folio_test_readahead(folio)) {
> +		ractl.max_index = vmf->vma->vm_pgoff + vma_pages(vmf->vma) - 1;
>  		fpin = maybe_unlock_mmap_for_io(vmf, fpin);
>  		page_cache_async_ra(&ractl, folio, ra->ra_pages);
>  	}
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 7b05082c89ea..95a424b2f3a3 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -324,6 +324,8 @@ static void do_page_cache_ra(struct readahead_control *ractl,
>  		return;
>  
>  	end_index = (isize - 1) >> PAGE_SHIFT;
> +	if (end_index > ractl->max_index)
> +		end_index = ractl->max_index;
>  	if (index > end_index)
>  		return;
>  	/* Don't read past the page containing the last byte of the file */
> @@ -471,7 +473,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
>  	pgoff_t start = readahead_index(ractl);
>  	pgoff_t index = start;
>  	unsigned int min_order = mapping_min_folio_order(mapping);
> -	pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
> +	pgoff_t limit = min_t(pgoff_t, (i_size_read(mapping->host) - 1) >> PAGE_SHIFT,
> +				       ractl->max_index);
>  	pgoff_t mark = index + ra->size - ra->async_size;
>  	unsigned int nofs;
>  	int err = 0;
> 
> base-commit: db2a1695b2b6feb071b47b72e61d0359bf1524bf
> -- 
> 2.54.0.rc1.555.g9c883467ad-goog
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR