[v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

[PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Jianhui Zhou 4 weeks, 1 day ago

In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
page index for hugetlb_fault_mutex_hash(). However, linear_page_index()
returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
expects the index in huge page units. This mismatch means that different
addresses within the same huge page can produce different hash values,
leading to the use of different mutexes for the same huge page. This can
cause races between faulting threads, which can corrupt the reservation
map and trigger the BUG_ON in resv_map_release().

Fix this by introducing hugetlb_linear_page_index(), which returns the
page index in huge page granularity, and using it in place of
linear_page_index().

Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
Cc: stable@vger.kernel.org
Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com>
---
v4:
- Introduce hugetlb_linear_page_index() instead of exposing
  vma_hugecache_offset(); call hstate_vma() internally to simplify
  the API (David Hildenbrand)

v3:
- Fix Fixes tag to a08c7193e4f1 (Hugh Dickins)

v2:
- Remove unnecessary !CONFIG_HUGETLB_PAGE stub for vma_hugecache_offset()
  (Peter Xu, SeongJae Park)

 include/linux/hugetlb.h | 17 +++++++++++++++++
 mm/userfaultfd.c        |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 65910437be1c..67d4f0924646 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(struct hstate *h)
 	return h->order + PAGE_SHIFT;
 }
 
+/**
+ * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
+ *				 page size granularity.
+ * @vma: the hugetlb VMA
+ * @address: the virtual address within the VMA
+ *
+ * Return: the page offset within the mapping in huge page units.
+ */
+static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
+		unsigned long address)
+{
+	struct hstate *h = hstate_vma(vma);
+
+	return ((address - vma->vm_start) >> huge_page_shift(h)) +
+		(vma->vm_pgoff >> huge_page_order(h));
+}
+
 static inline bool order_is_gigantic(unsigned int order)
 {
 	return order > MAX_PAGE_ORDER;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 927086bb4a3c..5590989e18c7 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -573,7 +573,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
 		 * in the case of shared pmds.  fault mutex prevents
 		 * races with other faulting threads.
 		 */
-		idx = linear_page_index(dst_vma, dst_addr);
+		idx = hugetlb_linear_page_index(dst_vma, dst_addr);
 		mapping = dst_vma->vm_file->f_mapping;
 		hash = hugetlb_fault_mutex_hash(mapping, idx);
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
-- 
2.43.0

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by jane.chu@oracle.com 4 weeks, 1 day ago


On 3/10/2026 4:05 AM, Jianhui Zhou wrote:
> In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
> page index for hugetlb_fault_mutex_hash(). However, linear_page_index()
> returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
> expects the index in huge page units. This mismatch means that different
> addresses within the same huge page can produce different hash values,
> leading to the use of different mutexes for the same huge page. This can
> cause races between faulting threads, which can corrupt the reservation
> map and trigger the BUG_ON in resv_map_release().
> 
> Fix this by introducing hugetlb_linear_page_index(), which returns the
> page index in huge page granularity, and using it in place of
> linear_page_index().
> 
> Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
> Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
> Cc: stable@vger.kernel.org
> Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com>
> ---
> v4:
> - Introduce hugetlb_linear_page_index() instead of exposing
>    vma_hugecache_offset(); call hstate_vma() internally to simplify
>    the API (David Hildenbrand)
> 
> v3:
> - Fix Fixes tag to a08c7193e4f1 (Hugh Dickins)
> 
> v2:
> - Remove unnecessary !CONFIG_HUGETLB_PAGE stub for vma_hugecache_offset()
>    (Peter Xu, SeongJae Park)
> 
>   include/linux/hugetlb.h | 17 +++++++++++++++++
>   mm/userfaultfd.c        |  2 +-
>   2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 65910437be1c..67d4f0924646 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(struct hstate *h)
>   	return h->order + PAGE_SHIFT;
>   }
>   
> +/**
> + * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
> + *				 page size granularity.
> + * @vma: the hugetlb VMA
> + * @address: the virtual address within the VMA
> + *
> + * Return: the page offset within the mapping in huge page units.
> + */
> +static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
> +		unsigned long address)
> +{
> +	struct hstate *h = hstate_vma(vma);
> +
> +	return ((address - vma->vm_start) >> huge_page_shift(h)) +
> +		(vma->vm_pgoff >> huge_page_order(h));
> +}
> +
>   static inline bool order_is_gigantic(unsigned int order)
>   {
>   	return order > MAX_PAGE_ORDER;
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 927086bb4a3c..5590989e18c7 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -573,7 +573,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
>   		 * in the case of shared pmds.  fault mutex prevents
>   		 * races with other faulting threads.
>   		 */
> -		idx = linear_page_index(dst_vma, dst_addr);
> +		idx = hugetlb_linear_page_index(dst_vma, dst_addr);

Just wondering whether making the shift explicit here instead of 
introducing another hugetlb helper might be sufficient?

     idx >>= huge_page_order(hstate_vma(vma));

I mean huge_page_order() is already explicitly called in several places 
outside hugetlb.


>   		mapping = dst_vma->vm_file->f_mapping;
>   		hash = hugetlb_fault_mutex_hash(mapping, idx);
>   		mutex_lock(&hugetlb_fault_mutex_table[hash]);

thanks,
-jane

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Jianhui Zhou 4 weeks ago

On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@oracle.com wrote:
> Just wondering whether making the shift explicit here instead of
> introducing another hugetlb helper might be sufficient?
>
>      idx >>= huge_page_order(hstate_vma(vma));

That would work for hugetlb VMAs since both (address - vm_start) and
vm_pgoff are guaranteed to be huge page aligned. However, David
suggested introducing hugetlb_linear_page_index() to provide a cleaner
API that mirrors linear_page_index(), so I kept this approach.

Thanks for the review!

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Andrew Morton 2 weeks, 1 day ago

On Wed, 11 Mar 2026 18:54:26 +0800 Jianhui Zhou <jianhuizzzzz@gmail.com> wrote:

> On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@oracle.com wrote:
> > Just wondering whether making the shift explicit here instead of
> > introducing another hugetlb helper might be sufficient?
> >
> >      idx >>= huge_page_order(hstate_vma(vma));
> 
> That would work for hugetlb VMAs since both (address - vm_start) and
> vm_pgoff are guaranteed to be huge page aligned. However, David
> suggested introducing hugetlb_linear_page_index() to provide a cleaner
> API that mirrors linear_page_index(), so I kept this approach.
> 

Thanks.

Would anyone like to review this cc:stable patch for us?


From: Jianhui Zhou <jianhuizzzzz@gmail.com>
Subject: mm/userfaultfd: fix hugetlb fault mutex hash calculation
Date: Tue, 10 Mar 2026 19:05:26 +0800

In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
page index for hugetlb_fault_mutex_hash().  However, linear_page_index()
returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
expects the index in huge page units.  This mismatch means that different
addresses within the same huge page can produce different hash values,
leading to the use of different mutexes for the same huge page.  This can
cause races between faulting threads, which can corrupt the reservation
map and trigger the BUG_ON in resv_map_release().

Fix this by introducing hugetlb_linear_page_index(), which returns the
page index in huge page granularity, and using it in place of
linear_page_index().

Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@gmail.com
Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com>
Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: JonasZhou <JonasZhou@zhaoxin.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/hugetlb.h |   17 +++++++++++++++++
 mm/userfaultfd.c        |    2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

--- a/include/linux/hugetlb.h~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
+++ a/include/linux/hugetlb.h
@@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(s
 	return h->order + PAGE_SHIFT;
 }
 
+/**
+ * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
+ *				 page size granularity.
+ * @vma: the hugetlb VMA
+ * @address: the virtual address within the VMA
+ *
+ * Return: the page offset within the mapping in huge page units.
+ */
+static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
+		unsigned long address)
+{
+	struct hstate *h = hstate_vma(vma);
+
+	return ((address - vma->vm_start) >> huge_page_shift(h)) +
+		(vma->vm_pgoff >> huge_page_order(h));
+}
+
 static inline bool order_is_gigantic(unsigned int order)
 {
 	return order > MAX_PAGE_ORDER;
--- a/mm/userfaultfd.c~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
+++ a/mm/userfaultfd.c
@@ -573,7 +573,7 @@ retry:
 		 * in the case of shared pmds.  fault mutex prevents
 		 * races with other faulting threads.
 		 */
-		idx = linear_page_index(dst_vma, dst_addr);
+		idx = hugetlb_linear_page_index(dst_vma, dst_addr);
 		mapping = dst_vma->vm_file->f_mapping;
 		hash = hugetlb_fault_mutex_hash(mapping, idx);
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
_

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Mike Rapoport 2 weeks ago

On Tue, Mar 24, 2026 at 05:03:11PM -0700, Andrew Morton wrote:
> On Wed, 11 Mar 2026 18:54:26 +0800 Jianhui Zhou <jianhuizzzzz@gmail.com> wrote:
> 
> > On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@oracle.com wrote:
> > > Just wondering whether making the shift explicit here instead of
> > > introducing another hugetlb helper might be sufficient?
> > >
> > >      idx >>= huge_page_order(hstate_vma(vma));
> > 
> > That would work for hugetlb VMAs since both (address - vm_start) and
> > vm_pgoff are guaranteed to be huge page aligned. However, David
> > suggested introducing hugetlb_linear_page_index() to provide a cleaner
> > API that mirrors linear_page_index(), so I kept this approach.
> > 
> 
> Thanks.
> 
> Would anyone like to review this cc:stable patch for us?
> 
> 
> From: Jianhui Zhou <jianhuizzzzz@gmail.com>
> Subject: mm/userfaultfd: fix hugetlb fault mutex hash calculation
> Date: Tue, 10 Mar 2026 19:05:26 +0800
> 
> In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
> page index for hugetlb_fault_mutex_hash().  However, linear_page_index()
> returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
> expects the index in huge page units.  This mismatch means that different
> addresses within the same huge page can produce different hash values,
> leading to the use of different mutexes for the same huge page.  This can
> cause races between faulting threads, which can corrupt the reservation
> map and trigger the BUG_ON in resv_map_release().
> 
> Fix this by introducing hugetlb_linear_page_index(), which returns the
> page index in huge page granularity, and using it in place of
> linear_page_index().
> 
> Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@gmail.com
> Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
> Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com>
> Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: JonasZhou <JonasZhou@zhaoxin.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Looks fine from uffd perspective, and simple enough for stable@.

Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

> ---
> 
>  include/linux/hugetlb.h |   17 +++++++++++++++++
>  mm/userfaultfd.c        |    2 +-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> --- a/include/linux/hugetlb.h~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/include/linux/hugetlb.h
> @@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(s
>  	return h->order + PAGE_SHIFT;
>  }
>  
> +/**
> + * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
> + *				 page size granularity.
> + * @vma: the hugetlb VMA
> + * @address: the virtual address within the VMA
> + *
> + * Return: the page offset within the mapping in huge page units.
> + */
> +static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
> +		unsigned long address)
> +{
> +	struct hstate *h = hstate_vma(vma);
> +
> +	return ((address - vma->vm_start) >> huge_page_shift(h)) +
> +		(vma->vm_pgoff >> huge_page_order(h));
> +}
> +
>  static inline bool order_is_gigantic(unsigned int order)
>  {
>  	return order > MAX_PAGE_ORDER;
> --- a/mm/userfaultfd.c~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/mm/userfaultfd.c
> @@ -573,7 +573,7 @@ retry:
>  		 * in the case of shared pmds.  fault mutex prevents
>  		 * races with other faulting threads.
>  		 */
> -		idx = linear_page_index(dst_vma, dst_addr);
> +		idx = hugetlb_linear_page_index(dst_vma, dst_addr);
>  		mapping = dst_vma->vm_file->f_mapping;
>  		hash = hugetlb_fault_mutex_hash(mapping, idx);
>  		mutex_lock(&hugetlb_fault_mutex_table[hash]);
> _
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by David Hildenbrand (Arm) 2 weeks ago

On 3/25/26 01:03, Andrew Morton wrote:
> On Wed, 11 Mar 2026 18:54:26 +0800 Jianhui Zhou <jianhuizzzzz@gmail.com> wrote:
> 
>> On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@oracle.com wrote:
>>> Just wondering whether making the shift explicit here instead of
>>> introducing another hugetlb helper might be sufficient?
>>>
>>>      idx >>= huge_page_order(hstate_vma(vma));
>>
>> That would work for hugetlb VMAs since both (address - vm_start) and
>> vm_pgoff are guaranteed to be huge page aligned. However, David
>> suggested introducing hugetlb_linear_page_index() to provide a cleaner
>> API that mirrors linear_page_index(), so I kept this approach.
>>
> 
> Thanks.
> 
> Would anyone like to review this cc:stable patch for us?

I would hope the hugetlb+userfaultfd submaintainers could have a
detailed look! Moving them to "To:"

One of the issue why this doesn't get more attention might be posting a
new revision as reply to an old revision, which is an anti-pattern :)

> 
> 
> From: Jianhui Zhou <jianhuizzzzz@gmail.com>
> Subject: mm/userfaultfd: fix hugetlb fault mutex hash calculation
> Date: Tue, 10 Mar 2026 19:05:26 +0800
> 
> In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
> page index for hugetlb_fault_mutex_hash().  However, linear_page_index()
> returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
> expects the index in huge page units.  This mismatch means that different
> addresses within the same huge page can produce different hash values,
> leading to the use of different mutexes for the same huge page.  This can
> cause races between faulting threads, which can corrupt the reservation
> map and trigger the BUG_ON in resv_map_release().
> 
> Fix this by introducing hugetlb_linear_page_index(), which returns the
> page index in huge page granularity, and using it in place of
> linear_page_index().
> 
> Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@gmail.com
> Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
> Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com>
> Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: JonasZhou <JonasZhou@zhaoxin.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/linux/hugetlb.h |   17 +++++++++++++++++
>  mm/userfaultfd.c        |    2 +-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> --- a/include/linux/hugetlb.h~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/include/linux/hugetlb.h
> @@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(s
>  	return h->order + PAGE_SHIFT;
>  }
>  
> +/**
> + * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
> + *				 page size granularity.
> + * @vma: the hugetlb VMA
> + * @address: the virtual address within the VMA
> + *
> + * Return: the page offset within the mapping in huge page units.
> + */
> +static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
> +		unsigned long address)
> +{
> +	struct hstate *h = hstate_vma(vma);
> +
> +	return ((address - vma->vm_start) >> huge_page_shift(h)) +
> +		(vma->vm_pgoff >> huge_page_order(h));
> +}
> +
>  static inline bool order_is_gigantic(unsigned int order)
>  {
>  	return order > MAX_PAGE_ORDER;
> --- a/mm/userfaultfd.c~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/mm/userfaultfd.c
> @@ -573,7 +573,7 @@ retry:
>  		 * in the case of shared pmds.  fault mutex prevents
>  		 * races with other faulting threads.
>  		 */
> -		idx = linear_page_index(dst_vma, dst_addr);
> +		idx = hugetlb_linear_page_index(dst_vma, dst_addr);
>  		mapping = dst_vma->vm_file->f_mapping;
>  		hash = hugetlb_fault_mutex_hash(mapping, idx);
>  		mutex_lock(&hugetlb_fault_mutex_table[hash]);
> _
> 

Let's take a look at other hugetlb_fault_mutex_hash() users:

* remove_inode_hugepages: uses folio->index >> huge_page_order(h)
 -> hugetlb granularity
* hugetlbfs_fallocate(): start/index is in hugetlb granularity
 -> hugetlb granularity
* memfd_alloc_folio(): idx >>= huge_page_order(h);
 -> hugetlb granularity
* hugetlb_wp(): uses vma_hugecache_offset()
 -> hugetlb granularity
* hugetlb_handle_userfault(): uses vmf->pgoff, which hugetlb_fault()
  sets to vma_hugecache_offset()
 -> hugetlb granularity
* hugetlb_no_page(): similarly uses vmf->pgoff
 -> hugetlb granularity
* hugetlb_fault(): similarly uses vmf->pgoff
 -> hugetlb granularity

So this change here looks good to me

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>


But it raises the question:

(1) should be convert all that to just operate on the ordinary index,
such that we don't even need hugetlb_linear_page_index()? That would be
an addon patch.

(2) Alternatively, could we replace all users of vma_hugecache_offset()
by the much cleaner hugetlb_linear_page_index() ?

In general, I think we should look into having idx/vmf->pgoff being
consistent with the remainder of MM, converting all code in hugetlb to
do that.

Any takers?

-- 
Cheers,

David

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by jane.chu@oracle.com 2 weeks ago

Hi, David,

On 3/25/2026 1:49 AM, David Hildenbrand (Arm) wrote:
[..]
>>
>> --- a/include/linux/hugetlb.h~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
>> +++ a/include/linux/hugetlb.h
>> @@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(s
>>   	return h->order + PAGE_SHIFT;
>>   }
>>   
>> +/**
>> + * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
>> + *				 page size granularity.
>> + * @vma: the hugetlb VMA
>> + * @address: the virtual address within the VMA
>> + *
>> + * Return: the page offset within the mapping in huge page units.
>> + */
>> +static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
>> +		unsigned long address)
>> +{
>> +	struct hstate *h = hstate_vma(vma);
>> +
>> +	return ((address - vma->vm_start) >> huge_page_shift(h)) +
>> +		(vma->vm_pgoff >> huge_page_order(h));
>> +}
>> +
>>   static inline bool order_is_gigantic(unsigned int order)
>>   {
>>   	return order > MAX_PAGE_ORDER;
>> --- a/mm/userfaultfd.c~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
>> +++ a/mm/userfaultfd.c
>> @@ -573,7 +573,7 @@ retry:
>>   		 * in the case of shared pmds.  fault mutex prevents
>>   		 * races with other faulting threads.
>>   		 */
>> -		idx = linear_page_index(dst_vma, dst_addr);
>> +		idx = hugetlb_linear_page_index(dst_vma, dst_addr);
>>   		mapping = dst_vma->vm_file->f_mapping;
>>   		hash = hugetlb_fault_mutex_hash(mapping, idx);
>>   		mutex_lock(&hugetlb_fault_mutex_table[hash]);
>> _
>>
> 
> Let's take a look at other hugetlb_fault_mutex_hash() users:
> 
> * remove_inode_hugepages: uses folio->index >> huge_page_order(h)
>   -> hugetlb granularity
> * hugetlbfs_fallocate(): start/index is in hugetlb granularity
>   -> hugetlb granularity
> * memfd_alloc_folio(): idx >>= huge_page_order(h);
>   -> hugetlb granularity
> * hugetlb_wp(): uses vma_hugecache_offset()
>   -> hugetlb granularity
> * hugetlb_handle_userfault(): uses vmf->pgoff, which hugetlb_fault()
>    sets to vma_hugecache_offset()
>   -> hugetlb granularity
> * hugetlb_no_page(): similarly uses vmf->pgoff
>   -> hugetlb granularity
> * hugetlb_fault(): similarly uses vmf->pgoff
>   -> hugetlb granularity
> 
> So this change here looks good to me
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> 
> 
> But it raises the question:
> 
> (1) should be convert all that to just operate on the ordinary index,
> such that we don't even need hugetlb_linear_page_index()? That would be
> an addon patch.
> 

Do you mean to convert all callers of hugetlb_linear_page_index() and 
vma_hugepcache_offset() to use index and huge_page_order(h) ?
May I add, to improve readability, rename the huge-page-granularity 
'idx' to huge_idx or hidx ?

> (2) Alternatively, could we replace all users of vma_hugecache_offset()
> by the much cleaner hugetlb_linear_page_index() ?
> 

The difference between the two helpers is hstate_vma() in the latter 
that is about 5 pointer de-references, not sure of any performance 
implication though.  At minimum, we could have
   hugetlb_linear_page_index(vma, addr)
   -> __hugetlb_linear_page_index(h, vma, addr)
basically renaming vma_hugecache_offset().

> In general, I think we should look into having idx/vmf->pgoff being
> consistent with the remainder of MM, converting all code in hugetlb to
> do that.
> 
> Any takers?
> 

I'd be happy to, just to make sure I understand the proposal clearly.

thanks!
-jane

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by David Hildenbrand (Arm) 1 week, 6 days ago

On 3/26/26 00:46, jane.chu@oracle.com wrote:
> Hi, David,
> 
> On 3/25/2026 1:49 AM, David Hildenbrand (Arm) wrote:
> [..]

[...]

>>
>> But it raises the question:
>>
>> (1) should be convert all that to just operate on the ordinary index,
>> such that we don't even need hugetlb_linear_page_index()? That would be
>> an addon patch.
>>
> 
> Do you mean to convert all callers of hugetlb_linear_page_index() and
> vma_hugepcache_offset() to use index and huge_page_order(h) ?
> May I add, to improve readability, rename the huge-page-granularity
> 'idx' to huge_idx or hidx ?

What I meant is that we change all hugetlb code to use the ordinary idx.
It's a bigger rework.

For example, we'd be getting rid of filemap_lock_hugetlb_folio() completely and
simply use filemap_lock_folio. As one example:

@@ -657,10 +657,9 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
                                        loff_t start,
                                        loff_t end)
 {
-       pgoff_t idx = start >> huge_page_shift(h);
        struct folio *folio;
 
-       folio = filemap_lock_hugetlb_folio(h, mapping, idx);
+       folio = filemap_lock_folio(mapping, start >> PAGE_SHIFT);
        if (IS_ERR(folio))
                return;
 
Other parts are more tricky, as we have to make sure that we get
an idx that points at the start of the folio.

Likely such a conversion could be done incrementally. But it's a bit of work.

We'd be getting rid of some more hugetlb special casing.


An alternative is passing in an address into hugetlb_linear_page_index(),
just letting it do the calculation itself (it can get the hstate from the mapping).

> 
>> (2) Alternatively, could we replace all users of vma_hugecache_offset()
>> by the much cleaner hugetlb_linear_page_index() ?
>>
> 
> The difference between the two helpers is hstate_vma() in the latter
> that is about 5 pointer de-references, not sure of any performance
> implication though. 

hstate_vma() is really just hstate_file(vma->vm_file)->
hstate_inode(file_inode(f))->HUGETLBFS_SB(i->i_sb)->hstate;

So some pointer chasing.

hard to believe that this would matter in any of this code :)

> At minimum, we could have
>   hugetlb_linear_page_index(vma, addr)
>   -> __hugetlb_linear_page_index(h, vma, addr)
> basically renaming vma_hugecache_offset().
I would only do that if it's really required for performance.

-- 
Cheers,

David

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Mike Rapoport 2 weeks ago

On Wed, Mar 25, 2026 at 09:49:09AM +0100, David Hildenbrand (Arm) wrote:
> On 3/25/26 01:03, Andrew Morton wrote:
> > On Wed, 11 Mar 2026 18:54:26 +0800 Jianhui Zhou <jianhuizzzzz@gmail.com> wrote:
> > 
> >> On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@oracle.com wrote:
> >>> Just wondering whether making the shift explicit here instead of
> >>> introducing another hugetlb helper might be sufficient?
> >>>
> >>>      idx >>= huge_page_order(hstate_vma(vma));
> >>
> >> That would work for hugetlb VMAs since both (address - vm_start) and
> >> vm_pgoff are guaranteed to be huge page aligned. However, David
> >> suggested introducing hugetlb_linear_page_index() to provide a cleaner
> >> API that mirrors linear_page_index(), so I kept this approach.
> >>
> > 
> > Thanks.
> > 
> > Would anyone like to review this cc:stable patch for us?
> 
> I would hope the hugetlb+userfaultfd submaintainers could have a
> detailed look! Moving them to "To:"

Wouldn't help much with something deeply buried in a thread :)
 
> One of the issue why this doesn't get more attention might be posting a
> new revision as reply to an old revision, which is an anti-pattern :)
 
Indeed.
 
-- 
Sincerely yours,
Mike.

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by SeongJae Park 2 weeks, 1 day ago

On Tue, 24 Mar 2026 17:03:11 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 11 Mar 2026 18:54:26 +0800 Jianhui Zhou <jianhuizzzzz@gmail.com> wrote:
> 
> > On Tue, Mar 10, 2026 at 12:47:07PM -0700, jane.chu@oracle.com wrote:
> > > Just wondering whether making the shift explicit here instead of
> > > introducing another hugetlb helper might be sufficient?
> > >
> > >      idx >>= huge_page_order(hstate_vma(vma));
> > 
> > That would work for hugetlb VMAs since both (address - vm_start) and
> > vm_pgoff are guaranteed to be huge page aligned. However, David
> > suggested introducing hugetlb_linear_page_index() to provide a cleaner
> > API that mirrors linear_page_index(), so I kept this approach.
> > 
> 
> Thanks.
> 
> Would anyone like to review this cc:stable patch for us?
> 
> 
> From: Jianhui Zhou <jianhuizzzzz@gmail.com>
> Subject: mm/userfaultfd: fix hugetlb fault mutex hash calculation
> Date: Tue, 10 Mar 2026 19:05:26 +0800
> 
> In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the
> page index for hugetlb_fault_mutex_hash().  However, linear_page_index()
> returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash()
> expects the index in huge page units.  This mismatch means that different
> addresses within the same huge page can produce different hash values,
> leading to the use of different mutexes for the same huge page.  This can
> cause races between faulting threads, which can corrupt the reservation
> map and trigger the BUG_ON in resv_map_release().
> 
> Fix this by introducing hugetlb_linear_page_index(), which returns the
> page index in huge page granularity, and using it in place of
> linear_page_index().
> 
> Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@gmail.com
> Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c")
> Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com>
> Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: JonasZhou <JonasZhou@zhaoxin.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: SeongJae Park <sj@kernel.org>
> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

I added trivial comments below, but looks good to me.

Acked-by: SeongJae Park <sj@kernel.org>

> ---
> 
>  include/linux/hugetlb.h |   17 +++++++++++++++++
>  mm/userfaultfd.c        |    2 +-
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> --- a/include/linux/hugetlb.h~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/include/linux/hugetlb.h
> @@ -796,6 +796,23 @@ static inline unsigned huge_page_shift(s
>  	return h->order + PAGE_SHIFT;
>  }
>  
> +/**
> + * hugetlb_linear_page_index() - linear_page_index() but in hugetlb
> + *				 page size granularity.
> + * @vma: the hugetlb VMA
> + * @address: the virtual address within the VMA
> + *
> + * Return: the page offset within the mapping in huge page units.
> + */
> +static inline pgoff_t hugetlb_linear_page_index(struct vm_area_struct *vma,
> +		unsigned long address)
> +{
> +	struct hstate *h = hstate_vma(vma);
> +
> +	return ((address - vma->vm_start) >> huge_page_shift(h)) +
> +		(vma->vm_pgoff >> huge_page_order(h));

Nit.  The outermost parentheses feels odd to me.

> +}
> +
>  static inline bool order_is_gigantic(unsigned int order)
>  {
>  	return order > MAX_PAGE_ORDER;
> --- a/mm/userfaultfd.c~mm-userfaultfd-fix-hugetlb-fault-mutex-hash-calculation
> +++ a/mm/userfaultfd.c
> @@ -573,7 +573,7 @@ retry:
>  		 * in the case of shared pmds.  fault mutex prevents
>  		 * races with other faulting threads.
>  		 */
> -		idx = linear_page_index(dst_vma, dst_addr);
> +		idx = hugetlb_linear_page_index(dst_vma, dst_addr);
>  		mapping = dst_vma->vm_file->f_mapping;
>  		hash = hugetlb_fault_mutex_hash(mapping, idx);
>  		mutex_lock(&hugetlb_fault_mutex_table[hash]);

Seems userfaulfd.c is the only caller of the new helper function.  Why don't
you define the function in userfaultfd.c ?


Thanks,
SJ

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Jianhui Zhou 2 weeks ago

On Tue, Mar 25, 2026 at 01:06:00AM +0000, SeongJae Park wrote:
> Seems userfaulfd.c is the only caller of the new helper function.  Why don't
> you define the function in userfaultfd.c ?
I kept hugetlb_linear_page_index() in include/linux/hugetlb.h because
this is hugetlb-specific logic, not userfaultfd-specific logic.

The goal was simply to avoid open-coding the hugetlb index conversion
outside hugetlb code and to make the unit change explicit at the call site.

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by David Hildenbrand (Arm) 2 weeks ago

On 3/25/26 07:07, Jianhui Zhou wrote:
> On Tue, Mar 25, 2026 at 01:06:00AM +0000, SeongJae Park wrote:
>> Seems userfaulfd.c is the only caller of the new helper function.  Why don't
>> you define the function in userfaultfd.c ?
> I kept hugetlb_linear_page_index() in include/linux/hugetlb.h because
> this is hugetlb-specific logic, not userfaultfd-specific logic.

Yes, and see my comment about either removing it entirely again next, or
actually also using it in hugetlb.c.

-- 
Cheers,

David

Re: [PATCH v4] mm/userfaultfd: fix hugetlb fault mutex hash calculation

Posted by Mike Rapoport 2 weeks ago

On Wed, Mar 25, 2026 at 09:49:54AM +0100, David Hildenbrand (Arm) wrote:
> On 3/25/26 07:07, Jianhui Zhou wrote:
> > On Tue, Mar 25, 2026 at 01:06:00AM +0000, SeongJae Park wrote:
> >> Seems userfaulfd.c is the only caller of the new helper function.  Why don't
> >> you define the function in userfaultfd.c ?
> > I kept hugetlb_linear_page_index() in include/linux/hugetlb.h because
> > this is hugetlb-specific logic, not userfaultfd-specific logic.
> 
> Yes, and see my comment about either removing it entirely again next, or
> actually also using it in hugetlb.c.

I think it's better to move large piece of mfill_atomic_hugetlb() to
hugetlb.c and git rid of the helper then. 
And now keep it simple for easier backporting.
 
> -- 
> Cheers,
> 
> David

-- 
Sincerely yours,
Mike.