include/uapi/linux/fadvise.h | 2 ++ mm/fadvise.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+)
This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
pages directly to the inaccessible mapping.
The inaccessible pages will be invalidated by evict_inode or explicit munlock().
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
include/uapi/linux/fadvise.h | 2 ++
mm/fadvise.c | 14 ++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
index 0862b87434c2..06018688b99b 100644
--- a/include/uapi/linux/fadvise.h
+++ b/include/uapi/linux/fadvise.h
@@ -19,4 +19,6 @@
#define POSIX_FADV_NOREUSE 5 /* Data will be accessed once. */
#endif
+#define POSIX_FADV_MLOCK 8 /* Load pages into inaccessible map. */
+
#endif /* FADVISE_H_INCLUDED */
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 588fe76c5a14..849b151d2024 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -56,6 +56,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
case POSIX_FADV_WILLNEED:
case POSIX_FADV_NOREUSE:
case POSIX_FADV_DONTNEED:
+ case POSIX_FADV_MLOCK:
/* no bad return value, but ignore advice */
break;
default:
@@ -93,6 +94,19 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
file->f_mode &= ~FMODE_RANDOM;
spin_unlock(&file->f_lock);
break;
+ case POSIX_FADV_MLOCK:
+ /* Remove the cached pages. */
+ if (!mapping_unevictable(mapping)) {
+ invalidate_inode_pages2_range(mapping,
+ offset >> PAGE_SHIFT,
+ (offset + len - 1) >> PAGE_SHIFT);
+
+ /* set the mapping is unevictable */
+ filemap_invalidate_lock(mapping);
+ mapping_set_inaccessible(mapping);
+ filemap_invalidate_unlock(mapping);
+ }
+ fallthrough;
case POSIX_FADV_WILLNEED:
/* First and last PARTIAL page! */
start_index = offset >> PAGE_SHIFT;
--
2.52.0.487.g5c8c507ade-goog
On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> pages directly to the inaccessible mapping.
... what?
This seems like something which is completely different from mlock().
So it needs a different name.
But I don't understand the point of this, whatever it's called. Need
more information.
> The inaccessible pages will be invalidated by evict_inode or explicit munlock().
>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
> include/uapi/linux/fadvise.h | 2 ++
> mm/fadvise.c | 14 ++++++++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
> index 0862b87434c2..06018688b99b 100644
> --- a/include/uapi/linux/fadvise.h
> +++ b/include/uapi/linux/fadvise.h
> @@ -19,4 +19,6 @@
> #define POSIX_FADV_NOREUSE 5 /* Data will be accessed once. */
> #endif
>
> +#define POSIX_FADV_MLOCK 8 /* Load pages into inaccessible map. */
> +
> #endif /* FADVISE_H_INCLUDED */
> diff --git a/mm/fadvise.c b/mm/fadvise.c
> index 588fe76c5a14..849b151d2024 100644
> --- a/mm/fadvise.c
> +++ b/mm/fadvise.c
> @@ -56,6 +56,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> case POSIX_FADV_WILLNEED:
> case POSIX_FADV_NOREUSE:
> case POSIX_FADV_DONTNEED:
> + case POSIX_FADV_MLOCK:
> /* no bad return value, but ignore advice */
> break;
> default:
> @@ -93,6 +94,19 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> file->f_mode &= ~FMODE_RANDOM;
> spin_unlock(&file->f_lock);
> break;
> + case POSIX_FADV_MLOCK:
> + /* Remove the cached pages. */
> + if (!mapping_unevictable(mapping)) {
> + invalidate_inode_pages2_range(mapping,
> + offset >> PAGE_SHIFT,
> + (offset + len - 1) >> PAGE_SHIFT);
> +
> + /* set the mapping is unevictable */
> + filemap_invalidate_lock(mapping);
> + mapping_set_inaccessible(mapping);
> + filemap_invalidate_unlock(mapping);
> + }
> + fallthrough;
> case POSIX_FADV_WILLNEED:
> /* First and last PARTIAL page! */
> start_index = offset >> PAGE_SHIFT;
> --
> 2.52.0.487.g5c8c507ade-goog
>
On 11/21, Matthew Wilcox wrote:
> On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote:
> > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of
> > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads
> > pages directly to the inaccessible mapping.
>
> ... what?
>
> This seems like something which is completely different from mlock().
> So it needs a different name.
>
> But I don't understand the point of this, whatever it's called. Need
> more information.
So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed
by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock()
takes 330ms additionally in order to migrate all the pages into inaccessible
map, IIUC.
So, I'm thinking to combine two operations into single fadvise() with whatever
advise. Does it make sense?
>
> > The inaccessible pages will be invalidated by evict_inode or explicit munlock().
> >
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> > ---
> > include/uapi/linux/fadvise.h | 2 ++
> > mm/fadvise.c | 14 ++++++++++++++
> > 2 files changed, 16 insertions(+)
> >
> > diff --git a/include/uapi/linux/fadvise.h b/include/uapi/linux/fadvise.h
> > index 0862b87434c2..06018688b99b 100644
> > --- a/include/uapi/linux/fadvise.h
> > +++ b/include/uapi/linux/fadvise.h
> > @@ -19,4 +19,6 @@
> > #define POSIX_FADV_NOREUSE 5 /* Data will be accessed once. */
> > #endif
> >
> > +#define POSIX_FADV_MLOCK 8 /* Load pages into inaccessible map. */
> > +
> > #endif /* FADVISE_H_INCLUDED */
> > diff --git a/mm/fadvise.c b/mm/fadvise.c
> > index 588fe76c5a14..849b151d2024 100644
> > --- a/mm/fadvise.c
> > +++ b/mm/fadvise.c
> > @@ -56,6 +56,7 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> > case POSIX_FADV_WILLNEED:
> > case POSIX_FADV_NOREUSE:
> > case POSIX_FADV_DONTNEED:
> > + case POSIX_FADV_MLOCK:
> > /* no bad return value, but ignore advice */
> > break;
> > default:
> > @@ -93,6 +94,19 @@ int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
> > file->f_mode &= ~FMODE_RANDOM;
> > spin_unlock(&file->f_lock);
> > break;
> > + case POSIX_FADV_MLOCK:
> > + /* Remove the cached pages. */
> > + if (!mapping_unevictable(mapping)) {
> > + invalidate_inode_pages2_range(mapping,
> > + offset >> PAGE_SHIFT,
> > + (offset + len - 1) >> PAGE_SHIFT);
> > +
> > + /* set the mapping is unevictable */
> > + filemap_invalidate_lock(mapping);
> > + mapping_set_inaccessible(mapping);
> > + filemap_invalidate_unlock(mapping);
> > + }
> > + fallthrough;
> > case POSIX_FADV_WILLNEED:
> > /* First and last PARTIAL page! */
> > start_index = offset >> PAGE_SHIFT;
> > --
> > 2.52.0.487.g5c8c507ade-goog
> >
On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote: > On 11/21, Matthew Wilcox wrote: > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote: > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads > > > pages directly to the inaccessible mapping. > > > > ... what? > > > > This seems like something which is completely different from mlock(). > > So it needs a different name. > > > > But I don't understand the point of this, whatever it's called. Need > > more information. > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed > by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock() > takes 330ms additionally in order to migrate all the pages into inaccessible > map, IIUC. Oh, so the MLOCK part is right, but the inaccessible() part is wrong. Inaccessible is special weird guest_memfd crap that has all kinds of side-effects that you don't want. Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and then calling readahead() for the desired range?
On 11/21, Matthew Wilcox wrote: > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote: > > On 11/21, Matthew Wilcox wrote: > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote: > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads > > > > pages directly to the inaccessible mapping. > > > > > > ... what? > > > > > > This seems like something which is completely different from mlock(). > > > So it needs a different name. > > > > > > But I don't understand the point of this, whatever it's called. Need > > > more information. > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed > > by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock() > > takes 330ms additionally in order to migrate all the pages into inaccessible > > map, IIUC. > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong. > Inaccessible is special weird guest_memfd crap that has all kinds of > side-effects that you don't want. > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and > then calling readahead() for the desired range? Oh, thank you. Let me try.
On 11/21, Jaegeuk Kim wrote: > On 11/21, Matthew Wilcox wrote: > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote: > > > On 11/21, Matthew Wilcox wrote: > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote: > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads > > > > > pages directly to the inaccessible mapping. > > > > > > > > ... what? > > > > > > > > This seems like something which is completely different from mlock(). > > > > So it needs a different name. > > > > > > > > But I don't understand the point of this, whatever it's called. Need > > > > more information. > > > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed > > > by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock() > > > takes 330ms additionally in order to migrate all the pages into inaccessible > > > map, IIUC. > > > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong. > > Inaccessible is special weird guest_memfd crap that has all kinds of > > side-effects that you don't want. > > > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and > > then calling readahead() for the desired range? > > Oh, thank you. Let me try. After checking the code and experiment, I don't think that gives what we need. That flag skips populate_vma_page_range only, but we need to allocate pages in the inaccessible mapping and fill the pages afterwards.
On Fri, Nov 21, 2025 at 07:52:02PM +0000, Jaegeuk Kim wrote: > On 11/21, Jaegeuk Kim wrote: > > On 11/21, Matthew Wilcox wrote: > > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote: > > > > On 11/21, Matthew Wilcox wrote: > > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote: > > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of > > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads > > > > > > pages directly to the inaccessible mapping. > > > > > > > > > > ... what? > > > > > > > > > > This seems like something which is completely different from mlock(). > > > > > So it needs a different name. > > > > > > > > > > But I don't understand the point of this, whatever it's called. Need > > > > > more information. > > > > > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed > > > > by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock() > > > > takes 330ms additionally in order to migrate all the pages into inaccessible > > > > map, IIUC. > > > > > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong. > > > Inaccessible is special weird guest_memfd crap that has all kinds of > > > side-effects that you don't want. > > > > > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and > > > then calling readahead() for the desired range? > > > > Oh, thank you. Let me try. > > After checking the code and experiment, I don't think that gives what we need. > That flag skips populate_vma_page_range only, but we need to allocate pages > in the inaccessible mapping and fill the pages afterwards. Then either I don't understand what you're trying to do, or you don't understand what the inaccessible mapping is for. Is this just for speeding up mlock() as you suggested earlier, or are you genuinely trying to do something with the inaccessible mapping?
On 11/21, Matthew Wilcox wrote: > On Fri, Nov 21, 2025 at 07:52:02PM +0000, Jaegeuk Kim wrote: > > On 11/21, Jaegeuk Kim wrote: > > > On 11/21, Matthew Wilcox wrote: > > > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote: > > > > > On 11/21, Matthew Wilcox wrote: > > > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote: > > > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of > > > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads > > > > > > > pages directly to the inaccessible mapping. > > > > > > > > > > > > ... what? > > > > > > > > > > > > This seems like something which is completely different from mlock(). > > > > > > So it needs a different name. > > > > > > > > > > > > But I don't understand the point of this, whatever it's called. Need > > > > > > more information. > > > > > > > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed > > > > > by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock() > > > > > takes 330ms additionally in order to migrate all the pages into inaccessible > > > > > map, IIUC. > > > > > > > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong. > > > > Inaccessible is special weird guest_memfd crap that has all kinds of > > > > side-effects that you don't want. > > > > > > > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and > > > > then calling readahead() for the desired range? > > > > > > Oh, thank you. Let me try. > > > > After checking the code and experiment, I don't think that gives what we need. > > That flag skips populate_vma_page_range only, but we need to allocate pages > > in the inaccessible mapping and fill the pages afterwards. > > Then either I don't understand what you're trying to do, or you don't > understand what the inaccessible mapping is for. Is this just for > speeding up mlock() as you suggested earlier, or are you genuinely > trying to do something with the inaccessible mapping? The latter. I'd like to propose a new read flow with the inaccessible mapping. As-Is: mmap() -> fadvise(fd, POSIX_FADV_WILLNEED) -> mlock() 1. fadvise() proposal mmap() -> fadvise(fd, POSIX_FADV_MLOCK) : all the pages will be loaded into inaccessible page cache directly 2. mlock2() proposal mmap() -> mlock2(MLOCK_ONFAULT) -> madvise(MADV_POPULATE_READ) If you mean #2, I need to find whether we can get the space for madvise, since we have only fd when reading the pages. And, also I need to find a way to handle the folio order directly instead of starging from 0 in madvise() path. Let me think about it.
On Fri, Nov 21, 2025 at 09:32:12PM +0000, Jaegeuk Kim wrote: > On 11/21, Matthew Wilcox wrote: > > On Fri, Nov 21, 2025 at 07:52:02PM +0000, Jaegeuk Kim wrote: > > > On 11/21, Jaegeuk Kim wrote: > > > > On 11/21, Matthew Wilcox wrote: > > > > > On Fri, Nov 21, 2025 at 04:46:14AM +0000, Jaegeuk Kim wrote: > > > > > > On 11/21, Matthew Wilcox wrote: > > > > > > > On Fri, Nov 21, 2025 at 03:27:18AM +0000, Jaegeuk Kim wrote: > > > > > > > > This patch introduces a new POSIX_FADV_MLOCK which 1) invalidates the range of > > > > > > > > cached pages, 2) sets the mapping as inaccessible, 3) POSIX_FADV_WILLNEED loads > > > > > > > > pages directly to the inaccessible mapping. > > > > > > > > > > > > > > ... what? > > > > > > > > > > > > > > This seems like something which is completely different from mlock(). > > > > > > > So it needs a different name. > > > > > > > > > > > > > > But I don't understand the point of this, whatever it's called. Need > > > > > > > more information. > > > > > > > > > > > > So, the sequence that I'd like to optimize is mmap(MAP_POPULATE) followed > > > > > > by mlock(). For example, mmap() takes 1 second to load 4GB data, and mlock() > > > > > > takes 330ms additionally in order to migrate all the pages into inaccessible > > > > > > map, IIUC. > > > > > > > > > > Oh, so the MLOCK part is right, but the inaccessible() part is wrong. > > > > > Inaccessible is special weird guest_memfd crap that has all kinds of > > > > > side-effects that you don't want. > > > > > > > > > > Wouldn't you get the same effect by calling mlock2(MLOCK_ONFAULT) and > > > > > then calling readahead() for the desired range? > > > > > > > > Oh, thank you. Let me try. > > > > > > After checking the code and experiment, I don't think that gives what we need. > > > That flag skips populate_vma_page_range only, but we need to allocate pages > > > in the inaccessible mapping and fill the pages afterwards. > > > > Then either I don't understand what you're trying to do, or you don't > > understand what the inaccessible mapping is for. Is this just for > > speeding up mlock() as you suggested earlier, or are you genuinely > > trying to do something with the inaccessible mapping? > > The latter. I'd like to propose a new read flow with the inaccessible mapping. You REALLY REALLY REALLY need to explain what you're doing because this all sounds completely bogus. The inaccessible mapping is something special that guest_memfd does. But here you are talking about it like it's some kind of normal filesystem thing. So, from the top. What are you trying to accomplish? Starting from "We have application A. It wants to ..." > As-Is: > mmap() -> fadvise(fd, POSIX_FADV_WILLNEED) -> mlock() > > 1. fadvise() proposal > mmap() -> fadvise(fd, POSIX_FADV_MLOCK) > : all the pages will be loaded into inaccessible page cache directly > > 2. mlock2() proposal > mmap() -> mlock2(MLOCK_ONFAULT) -> madvise(MADV_POPULATE_READ) > > If you mean #2, I need to find whether we can get the space for madvise, since > we have only fd when reading the pages. And, also I need to find a way to handle > the folio order directly instead of starging from 0 in madvise() path. > Let me think about it.
© 2016 - 2025 Red Hat, Inc.