[PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason

Mike Rapoport posted 5 patches 6 days, 5 hours ago
There is a newer version of this series
[PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Mike Rapoport 6 days, 5 hours ago
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

When a VMA is registered with userfaulfd in minor mode, its ->fault()
method should check if a folio exists in the page cache and if yes
->fault() should call handle_userfault(VM_UFFD_MISSING).

Instead of calling handle_userfault() directly from a specific ->fault()
implementation introduce new fault reason VM_FAULT_UFFD_MINOR that will
notify the core page fault handler that it should call
handle_userfaultfd(VM_UFFD_MISSING) to complete a page fault.

Replace a call to handle_userfault(VM_UFFD_MISSING) in shmem and use the
new VM_FAULT_UFFD_MINOR there instead.

For configurations that don't enable CONFIG_USERFAULTFD,
VM_FAULT_UFFD_MINOR is set to 0.

Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 include/linux/mm_types.h | 10 +++++++++-
 mm/memory.c              |  2 ++
 mm/shmem.c               |  2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 90e5790c318f..df71b057111b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1523,6 +1523,8 @@ typedef __bitwise unsigned int vm_fault_t;
  *				fsync() to complete (for synchronous page faults
  *				in DAX)
  * @VM_FAULT_COMPLETED:		->fault completed, meanwhile mmap lock released
+ * @VM_FAULT_UFFD_MINOR:	->fault did not modify page tables and needs
+ *				handle_userfault(VM_UFFD_MINOR) to complete
  * @VM_FAULT_HINDEX_MASK:	mask HINDEX value
  *
  */
@@ -1540,6 +1542,11 @@ enum vm_fault_reason {
 	VM_FAULT_DONE_COW       = (__force vm_fault_t)0x001000,
 	VM_FAULT_NEEDDSYNC      = (__force vm_fault_t)0x002000,
 	VM_FAULT_COMPLETED      = (__force vm_fault_t)0x004000,
+#ifdef CONFIG_USERFAULTFD
+	VM_FAULT_UFFD_MINOR	= (__force vm_fault_t)0x008000,
+#else
+	VM_FAULT_UFFD_MINOR	= (__force vm_fault_t)0x000000,
+#endif
 	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
 };
 
@@ -1564,7 +1571,8 @@ enum vm_fault_reason {
 	{ VM_FAULT_FALLBACK,            "FALLBACK" },	\
 	{ VM_FAULT_DONE_COW,            "DONE_COW" },	\
 	{ VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" },	\
-	{ VM_FAULT_COMPLETED,           "COMPLETED" }
+	{ VM_FAULT_COMPLETED,           "COMPLETED" },	\
+	{ VM_FAULT_UFFD_MINOR,		"UFFD_MINOR" },	\
 
 struct vm_special_mapping {
 	const char *name;	/* The name, e.g. "[vdso]". */
diff --git a/mm/memory.c b/mm/memory.c
index b59ae7ce42eb..94acbac8cefb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5279,6 +5279,8 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
 	}
 
 	ret = vma->vm_ops->fault(vmf);
+	if (unlikely(ret & VM_FAULT_UFFD_MINOR))
+		return handle_userfault(vmf, VM_UFFD_MINOR);
 	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
 			    VM_FAULT_DONE_COW)))
 		return ret;
diff --git a/mm/shmem.c b/mm/shmem.c
index e16c7c8c3e1e..a9a31c0b5979 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2461,7 +2461,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
 	if (folio && vma && userfaultfd_minor(vma)) {
 		if (!xa_is_value(folio))
 			folio_put(folio);
-		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+		*fault_type = VM_FAULT_UFFD_MINOR;
 		return 0;
 	}
 
-- 
2.50.1
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Nikita Kalyazin 5 days, 7 hours ago

On 25/11/2025 18:38, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> When a VMA is registered with userfaulfd in minor mode, its ->fault()
> method should check if a folio exists in the page cache and if yes
> ->fault() should call handle_userfault(VM_UFFD_MISSING).
> 
> Instead of calling handle_userfault() directly from a specific ->fault()
> implementation introduce new fault reason VM_FAULT_UFFD_MINOR that will
> notify the core page fault handler that it should call
> handle_userfaultfd(VM_UFFD_MISSING) to complete a page fault.
> 
> Replace a call to handle_userfault(VM_UFFD_MISSING) in shmem and use the
> new VM_FAULT_UFFD_MINOR there instead.
> 
> For configurations that don't enable CONFIG_USERFAULTFD,
> VM_FAULT_UFFD_MINOR is set to 0.
> 
> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
>   include/linux/mm_types.h | 10 +++++++++-
>   mm/memory.c              |  2 ++
>   mm/shmem.c               |  2 +-
>   3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 90e5790c318f..df71b057111b 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1523,6 +1523,8 @@ typedef __bitwise unsigned int vm_fault_t;
>    *                             fsync() to complete (for synchronous page faults
>    *                             in DAX)
>    * @VM_FAULT_COMPLETED:                ->fault completed, meanwhile mmap lock released
> + * @VM_FAULT_UFFD_MINOR:       ->fault did not modify page tables and needs
> + *                             handle_userfault(VM_UFFD_MINOR) to complete
>    * @VM_FAULT_HINDEX_MASK:      mask HINDEX value
>    *
>    */
> @@ -1540,6 +1542,11 @@ enum vm_fault_reason {
>          VM_FAULT_DONE_COW       = (__force vm_fault_t)0x001000,
>          VM_FAULT_NEEDDSYNC      = (__force vm_fault_t)0x002000,
>          VM_FAULT_COMPLETED      = (__force vm_fault_t)0x004000,
> +#ifdef CONFIG_USERFAULTFD
> +       VM_FAULT_UFFD_MINOR     = (__force vm_fault_t)0x008000,
> +#else
> +       VM_FAULT_UFFD_MINOR     = (__force vm_fault_t)0x000000,
> +#endif
>          VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
>   };
> 
> @@ -1564,7 +1571,8 @@ enum vm_fault_reason {
>          { VM_FAULT_FALLBACK,            "FALLBACK" },   \
>          { VM_FAULT_DONE_COW,            "DONE_COW" },   \
>          { VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" },  \
> -       { VM_FAULT_COMPLETED,           "COMPLETED" }
> +       { VM_FAULT_COMPLETED,           "COMPLETED" },  \
> +       { VM_FAULT_UFFD_MINOR,          "UFFD_MINOR" }, \

It looks like we have to keep the last element comma-less, otherwise I'm 
seeing compile errors somewhere in fs/dax.c.

> 
>   struct vm_special_mapping {
>          const char *name;       /* The name, e.g. "[vdso]". */
> diff --git a/mm/memory.c b/mm/memory.c
> index b59ae7ce42eb..94acbac8cefb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5279,6 +5279,8 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>          }
> 
>          ret = vma->vm_ops->fault(vmf);
> +       if (unlikely(ret & VM_FAULT_UFFD_MINOR))
> +               return handle_userfault(vmf, VM_UFFD_MINOR);
>          if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>                              VM_FAULT_DONE_COW)))
>                  return ret;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e16c7c8c3e1e..a9a31c0b5979 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2461,7 +2461,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>          if (folio && vma && userfaultfd_minor(vma)) {
>                  if (!xa_is_value(folio))
>                          folio_put(folio);
> -               *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
> +               *fault_type = VM_FAULT_UFFD_MINOR;
>                  return 0;
>          }
> 
> --
> 2.50.1
>
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Liam R. Howlett 5 days, 8 hours ago
* Mike Rapoport <rppt@kernel.org> [251125 13:39]:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> When a VMA is registered with userfaulfd in minor mode, its ->fault()
> method should check if a folio exists in the page cache and if yes
> ->fault() should call handle_userfault(VM_UFFD_MISSING).
> 
> Instead of calling handle_userfault() directly from a specific ->fault()
> implementation introduce new fault reason VM_FAULT_UFFD_MINOR that will
> notify the core page fault handler that it should call
> handle_userfaultfd(VM_UFFD_MISSING) to complete a page fault.
> 
> Replace a call to handle_userfault(VM_UFFD_MISSING) in shmem and use the
> new VM_FAULT_UFFD_MINOR there instead.
> 
> For configurations that don't enable CONFIG_USERFAULTFD,
> VM_FAULT_UFFD_MINOR is set to 0.
> 
> Suggested-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

Same nit as David, but the rest looks good.

> ---
>  include/linux/mm_types.h | 10 +++++++++-
>  mm/memory.c              |  2 ++
>  mm/shmem.c               |  2 +-
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 90e5790c318f..df71b057111b 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1523,6 +1523,8 @@ typedef __bitwise unsigned int vm_fault_t;
>   *				fsync() to complete (for synchronous page faults
>   *				in DAX)
>   * @VM_FAULT_COMPLETED:		->fault completed, meanwhile mmap lock released
> + * @VM_FAULT_UFFD_MINOR:	->fault did not modify page tables and needs
> + *				handle_userfault(VM_UFFD_MINOR) to complete
>   * @VM_FAULT_HINDEX_MASK:	mask HINDEX value
>   *
>   */
> @@ -1540,6 +1542,11 @@ enum vm_fault_reason {
>  	VM_FAULT_DONE_COW       = (__force vm_fault_t)0x001000,
>  	VM_FAULT_NEEDDSYNC      = (__force vm_fault_t)0x002000,
>  	VM_FAULT_COMPLETED      = (__force vm_fault_t)0x004000,
> +#ifdef CONFIG_USERFAULTFD
> +	VM_FAULT_UFFD_MINOR	= (__force vm_fault_t)0x008000,
> +#else
> +	VM_FAULT_UFFD_MINOR	= (__force vm_fault_t)0x000000,
> +#endif
>  	VM_FAULT_HINDEX_MASK    = (__force vm_fault_t)0x0f0000,
>  };
>  
> @@ -1564,7 +1571,8 @@ enum vm_fault_reason {
>  	{ VM_FAULT_FALLBACK,            "FALLBACK" },	\
>  	{ VM_FAULT_DONE_COW,            "DONE_COW" },	\
>  	{ VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" },	\
> -	{ VM_FAULT_COMPLETED,           "COMPLETED" }
> +	{ VM_FAULT_COMPLETED,           "COMPLETED" },	\
> +	{ VM_FAULT_UFFD_MINOR,		"UFFD_MINOR" },	\
>  
>  struct vm_special_mapping {
>  	const char *name;	/* The name, e.g. "[vdso]". */
> diff --git a/mm/memory.c b/mm/memory.c
> index b59ae7ce42eb..94acbac8cefb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5279,6 +5279,8 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>  	}
>  
>  	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_UFFD_MINOR))
> +		return handle_userfault(vmf, VM_UFFD_MINOR);
>  	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>  			    VM_FAULT_DONE_COW)))

I have the same concern as David here with adding instructions to the
faults that are not UFFD_MINOR.. I suspect the compiler will remove the
statement completely when UFFD is disabled (and thus ret & 0 in the
check), but it might be worth looking at this closer in the case where
uffd is enabled?  It won't be as clean looking but might make the
assembly better.

>  		return ret;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e16c7c8c3e1e..a9a31c0b5979 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2461,7 +2461,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
>  	if (folio && vma && userfaultfd_minor(vma)) {
>  		if (!xa_is_value(folio))
>  			folio_put(folio);
> -		*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
> +		*fault_type = VM_FAULT_UFFD_MINOR;
>  		return 0;
>  	}
>  
> -- 
> 2.50.1
>
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by David Hildenbrand (Red Hat) 5 days, 13 hours ago
> @@ -1564,7 +1571,8 @@ enum vm_fault_reason {
>   	{ VM_FAULT_FALLBACK,            "FALLBACK" },	\
>   	{ VM_FAULT_DONE_COW,            "DONE_COW" },	\
>   	{ VM_FAULT_NEEDDSYNC,           "NEEDDSYNC" },	\
> -	{ VM_FAULT_COMPLETED,           "COMPLETED" }
> +	{ VM_FAULT_COMPLETED,           "COMPLETED" },	\
> +	{ VM_FAULT_UFFD_MINOR,		"UFFD_MINOR" },	\
>   
>   struct vm_special_mapping {
>   	const char *name;	/* The name, e.g. "[vdso]". */
> diff --git a/mm/memory.c b/mm/memory.c
> index b59ae7ce42eb..94acbac8cefb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5279,6 +5279,8 @@ static vm_fault_t __do_fault(struct vm_fault *vmf)
>   	}
>   
>   	ret = vma->vm_ops->fault(vmf);
> +	if (unlikely(ret & VM_FAULT_UFFD_MINOR))
> +		return handle_userfault(vmf, VM_UFFD_MINOR);
>   	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
>   			    VM_FAULT_DONE_COW)))

If we want to reduce the overhead on the fast path, we can simply do

if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
		    VM_FAULT_DONE_COW | VM_FAULT_UFFD_MINOR))) {
	if (unlikely(ret & VM_FAULT_UFFD_MINOR))
		return handle_userfault(vmf, VM_UFFD_MINOR);
	return ret;
}

Maybe the compiler already does that to improve the likely case.

LGTM

-- 
Cheers

David
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Peter Xu 6 days, 4 hours ago
Hi, Mike,

On Tue, Nov 25, 2025 at 08:38:38PM +0200, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> When a VMA is registered with userfaulfd in minor mode, its ->fault()
> method should check if a folio exists in the page cache and if yes
> ->fault() should call handle_userfault(VM_UFFD_MISSING).

s/MISSING/MINOR/

> 
> Instead of calling handle_userfault() directly from a specific ->fault()
> implementation introduce new fault reason VM_FAULT_UFFD_MINOR that will
> notify the core page fault handler that it should call
> handle_userfaultfd(VM_UFFD_MISSING) to complete a page fault.

Same.

> 
> Replace a call to handle_userfault(VM_UFFD_MISSING) in shmem and use the

Same.

> new VM_FAULT_UFFD_MINOR there instead.

Personally I'd keep the fault path as simple as possible, because that's
the more frequently used path (rather than when userfaultfd is armed). I
also see it slightly a pity that even with flags introduced, it only solves
the MINOR problem, not MISSING.

If it's me, I'd simply export handle_userfault()..  I confess I still don't
know why exporting it is a problem, but maybe I missed something.

Only my two cents.  Feel free to go with whatever way you prefer.

Thanks,

-- 
Peter Xu
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Mike Rapoport 4 days, 12 hours ago
On Tue, Nov 25, 2025 at 02:21:16PM -0500, Peter Xu wrote:
> Hi, Mike,
> 
> On Tue, Nov 25, 2025 at 08:38:38PM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > 
> > When a VMA is registered with userfaulfd in minor mode, its ->fault()
> > method should check if a folio exists in the page cache and if yes
> > ->fault() should call handle_userfault(VM_UFFD_MISSING).
> 
> s/MISSING/MINOR/

Thanks, fixed. 

> > new VM_FAULT_UFFD_MINOR there instead.
> 
> Personally I'd keep the fault path as simple as possible, because that's
> the more frequently used path (rather than when userfaultfd is armed). I
> also see it slightly a pity that even with flags introduced, it only solves
> the MINOR problem, not MISSING.

With David's suggestion the likely path remains unchanged.

As for MISSING, let's take it baby steps. We have enough space in
vm_fault_reason for UFFD_MISSING if we'd want to pull handle_userfault()
from shmem and hugetlb.
 
> If it's me, I'd simply export handle_userfault()..  I confess I still don't
> know why exporting it is a problem, but maybe I missed something.

It's not only about export, it's also about not requiring ->fault()
methods for pte-mapped memory call handle_userfault().

> Only my two cents.  Feel free to go with whatever way you prefer.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 

-- 
Sincerely yours,
Mike.
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Peter Xu 4 days, 9 hours ago
On Thu, Nov 27, 2025 at 01:18:10PM +0200, Mike Rapoport wrote:
> On Tue, Nov 25, 2025 at 02:21:16PM -0500, Peter Xu wrote:
> > Hi, Mike,
> > 
> > On Tue, Nov 25, 2025 at 08:38:38PM +0200, Mike Rapoport wrote:
> > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > > 
> > > When a VMA is registered with userfaulfd in minor mode, its ->fault()
> > > method should check if a folio exists in the page cache and if yes
> > > ->fault() should call handle_userfault(VM_UFFD_MISSING).
> > 
> > s/MISSING/MINOR/
> 
> Thanks, fixed. 
> 
> > > new VM_FAULT_UFFD_MINOR there instead.
> > 
> > Personally I'd keep the fault path as simple as possible, because that's
> > the more frequently used path (rather than when userfaultfd is armed). I
> > also see it slightly a pity that even with flags introduced, it only solves
> > the MINOR problem, not MISSING.
> 
> With David's suggestion the likely path remains unchanged.

It is not about the likely, it's about introducing flags into core path
that makes the core path harder to follow, when it's not strictly required.

Meanwhile, personally I'm also not sure if we should have "unlikely" here..
My gut feeling is in reality we will only have two major use cases:

  (a) when userfaultfd minor isn't in the picture

  (b) when userfaultfd minor registered and actively being used (e.g. in a
      postcopy process)

Then without likely, IIUC the hardware should optimize path selected hence
both a+b performs almost equally well.

My guessing is after adding unlikely, (a) works well, but (b) works badly.
We may need to measure it, IIUC it's part of the reason why we sometimes do
not encourage "likely/unlikely".  But that's only my guess, some numbers
would be more helpful.

One thing we can try is if we add "unlikely" then compare a sequential
MINOR fault trapping on shmem and measure the time it takes, we need to
better make sure we don't regress perf there.  I wonder if James / Axel
would care about it - QEMU doesn't yet support minor, but will soon, and we
will also prefer better perf since the start.

> 
> As for MISSING, let's take it baby steps. We have enough space in
> vm_fault_reason for UFFD_MISSING if we'd want to pull handle_userfault()
> from shmem and hugetlb.

Yep.

>  
> > If it's me, I'd simply export handle_userfault()..  I confess I still don't
> > know why exporting it is a problem, but maybe I missed something.
> 
> It's not only about export, it's also about not requiring ->fault()
> methods for pte-mapped memory call handle_userfault().

I also don't see it a problem.. as what shmem used to do.  Maybe it's a
personal preference?  If so, I don't have a strong opinion.

Just to mention, if we want, I think we have at least one more option to do
the same thing, but without even introducing a new flag to ->fault()
retval.

That is, when we have get_folio() around, we can essentially do two faults
in sequence, one lighter then the real one, only for minor vmas, something
like (I didn't think deeper, so only a rough idea shown):

__do_fault():
  if (uffd_minor(vma)) {
    ...
    folio = vma->get_folio(...);
    if (folio)
       return handle_userfault(vmf, VM_UFFD_MINOR);
    // fallthrough, which imply a cache miss
  }
  ret = vma->vm_ops->fault(vmf);
  ...

The risk of above is also perf-wise, but it's another angle where it might
slow down page cache miss case where MINOR is registered only (hence, when
cache missing we'll need to call both get_folio() and fault() now).
However that's likely a less critical case than the unlikely, and I'm also
guessing due to the shared code of get_folio() / fault(), codes will be
preheated and it may not be measureable even if we write it like that.

Then maybe we can avoid this new flag completely but also achieve the same
goal.

Thanks,

-- 
Peter Xu
Re: [PATCH v2 3/5] mm: introduce VM_FAULT_UFFD_MINOR fault reason
Posted by Mike Rapoport 1 day, 12 hours ago
On Thu, Nov 27, 2025 at 09:10:56AM -0500, Peter Xu wrote:
> On Thu, Nov 27, 2025 at 01:18:10PM +0200, Mike Rapoport wrote:
> > On Tue, Nov 25, 2025 at 02:21:16PM -0500, Peter Xu wrote:
> > > Hi, Mike,
> > > 
> > > On Tue, Nov 25, 2025 at 08:38:38PM +0200, Mike Rapoport wrote:
> > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > > > 
> > > > When a VMA is registered with userfaulfd in minor mode, its ->fault()
> > > > method should check if a folio exists in the page cache and if yes
> > > > ->fault() should call handle_userfault(VM_UFFD_MISSING).
> > > 
> > > s/MISSING/MINOR/
> > 
> > Thanks, fixed. 
> > 
> > > > new VM_FAULT_UFFD_MINOR there instead.
> > > 
> > > Personally I'd keep the fault path as simple as possible, because that's
> > > the more frequently used path (rather than when userfaultfd is armed). I
> > > also see it slightly a pity that even with flags introduced, it only solves
> > > the MINOR problem, not MISSING.
> > 
> > With David's suggestion the likely path remains unchanged.
> 
> It is not about the likely, it's about introducing flags into core path
> that makes the core path harder to follow, when it's not strictly required.
 
	ret = vma->vm_ops->fault(vmf);
	if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY |
			    VM_FAULT_DONE_COW | VM_FAULT_UFFD_MINOR))) {
		if (ret & VM_FAULT_UFFD_MINOR)
			return handle_userfault(vmf, VM_UFFD_MINOR);
		return ret;
	}

isn't hard to follow and it's cleaner than adding EXPORT_SYMBOL that is not
strictly required.

> Meanwhile, personally I'm also not sure if we should have "unlikely" here..
> My gut feeling is in reality we will only have two major use cases:
> 
>   (a) when userfaultfd minor isn't in the picture
> 
>   (b) when userfaultfd minor registered and actively being used (e.g. in a
>       postcopy process)
> 
> Then without likely, IIUC the hardware should optimize path selected hence
> both a+b performs almost equally well.

unlikely() adds a branch that hardware will predict correctly if
UFFD_MINOR is actively used.

But even misspredicted branch is nothing compared to putting a task on a
wait queue and waiting for userspace to react to the fault notification
before handle_userfault() returns the control to the fault handler.
 
> Just to mention, if we want, I think we have at least one more option to do
> the same thing, but without even introducing a new flag to ->fault()
> retval.
> 
> That is, when we have get_folio() around, we can essentially do two faults
> in sequence, one lighter then the real one, only for minor vmas, something
> like (I didn't think deeper, so only a rough idea shown):
> 
> __do_fault():
>   if (uffd_minor(vma)) {
>     ...
>     folio = vma->get_folio(...);
>     if (folio)
>        return handle_userfault(vmf, VM_UFFD_MINOR);
>     // fallthrough, which imply a cache miss
>   }
>   ret = vma->vm_ops->fault(vmf);

That's something to consider for the future, especially if we'd be able to
pull out MISSING handling as well from ->fault() handlers.

> Thanks,
> -- 
> Peter Xu

-- 
Sincerely yours,
Mike.