[PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled

Ignacio Moreno Gonzalez via B4 Relay posted 2 patches 4 months, 3 weeks ago
[PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Ignacio Moreno Gonzalez via B4 Relay 4 months, 3 weeks ago
From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>

VM_NOHUGEPAGE is a no-op in a kernel without THP. So it makes no sense
to return an error when calling madvise() with MADV_NOHUGEPAGE.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
---
 include/linux/huge_mm.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e893d546a49f464f7586db639fe216231f03651a..cdb991f9be918182f94003394cf793654a080224 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -7,6 +7,10 @@
 #include <linux/fs.h> /* only for vma_is_dax() */
 #include <linux/kobject.h>
 
+#ifndef CONFIG_TRANSPARENT_HUGEPAGE
+#include <uapi/asm-generic/mman-common.h>
+#endif
+
 vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
@@ -598,6 +602,8 @@ static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
 static inline int hugepage_madvise(struct vm_area_struct *vma,
 				   unsigned long *vm_flags, int advice)
 {
+	if (advice == MADV_NOHUGEPAGE)
+		return 0;
 	return -EINVAL;
 }
 

-- 
2.39.5
Re: [PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Andrew Morton 4 months, 3 weeks ago
On Tue, 06 May 2025 15:44:33 +0200 Ignacio Moreno Gonzalez via B4 Relay <devnull+Ignacio.MorenoGonzalez.kuka.com@kernel.org> wrote:

> From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
> 
> VM_NOHUGEPAGE is a no-op in a kernel without THP. So it makes no sense
> to return an error when calling madvise() with MADV_NOHUGEPAGE.

The patch looks rather odd.

> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -7,6 +7,10 @@
>  #include <linux/fs.h> /* only for vma_is_dax() */
>  #include <linux/kobject.h>
>  
> +#ifndef CONFIG_TRANSPARENT_HUGEPAGE
> +#include <uapi/asm-generic/mman-common.h>
> +#endif

Why is the #ifndef here?

This is the only file under include/linux which directly includes
something from uapi/asm-generic.  Indicates that we're doing something
wrong.

If this hunk is truly the correct approach then I think we need a
comment here fully explaining what is going on.   Because it looks odd!

>  vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
>  int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
> @@ -598,6 +602,8 @@ static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
>  static inline int hugepage_madvise(struct vm_area_struct *vma,
>  				   unsigned long *vm_flags, int advice)
>  {
> +	if (advice == MADV_NOHUGEPAGE)
> +		return 0;

Also a comment here which explains why we're doing this?

>  	return -EINVAL;
>  }
>  
> 
> -- 
> 2.39.5
>
Re: [PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Ignacio Moreno Gonzalez 4 months, 3 weeks ago
This patch resulted from the discussion on [0]. The issue I described there consisted of two aspects:

  - CRIU saw a memory mapping containing the "nh" flag on a !THP kernel. This was addressed by my submission.
  - CRIU tried to set the "nh" flag with madvise(), which resulted on -EINVAL.

The issue is actually solved with the patch I submitted. But it was also discussed that calling madvise() with VM_NOHUGEPAGE should be a no-op and should not fail. That's why this patch was created.

On 5/7/2025 1:40 AM, Andrew Morton wrote:
>> +#ifndef CONFIG_TRANSPARENT_HUGEPAGE
>> +#include <uapi/asm-generic/mman-common.h>
>> +#endif
> 
> Why is the #ifndef here?

Because this is only included for the definition of VM_NOHUGEPAGE, which is only needed for the !THP case. 
> This is the only file under include/linux which directly includes
> something from uapi/asm-generic.  Indicates that we're doing something
> wrong.

If we want to differentiate the behavior of hugepage_madvise() depending on the advice, then we need the definitions for the different advices... Maybe this is not the correct place to do this? We could also do that differentiation in madvise.c. 

> If this hunk is truly the correct approach then I think we need a
> comment here fully explaining what is going on.   Because it looks odd!
To me, differentiating the behaviour of madvise() for MADV_HUGEPAGE and MADV_NOHUGEPAGE seems ok, and that was also the consensus on [0]. However, I lack the expertise to determine if this is the best place in the code to implement this change. Perhaps Lorenzo or Matthew can provide feedback here.


[0]: https://lore.kernel.org/linux-mm/20250502-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-v1-1-113cc634cd51@kuka.com/
Re: [PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Yang Shi 4 months, 3 weeks ago

On 5/7/25 4:44 AM, Ignacio Moreno Gonzalez wrote:
> This patch resulted from the discussion on [0]. The issue I described there consisted of two aspects:
>
>    - CRIU saw a memory mapping containing the "nh" flag on a !THP kernel. This was addressed by my submission.
>    - CRIU tried to set the "nh" flag with madvise(), which resulted on -EINVAL.
>
> The issue is actually solved with the patch I submitted. But it was also discussed that calling madvise() with VM_NOHUGEPAGE should be a no-op and should not fail. That's why this patch was created.
>
> On 5/7/2025 1:40 AM, Andrew Morton wrote:
>>> +#ifndef CONFIG_TRANSPARENT_HUGEPAGE
>>> +#include <uapi/asm-generic/mman-common.h>
>>> +#endif
>> Why is the #ifndef here?
> Because this is only included for the definition of VM_NOHUGEPAGE, which is only needed for the !THP case.

Can you just simply include <linux/mman.h> ?

Thanks,
Yang

>> This is the only file under include/linux which directly includes
>> something from uapi/asm-generic.  Indicates that we're doing something
>> wrong.
> If we want to differentiate the behavior of hugepage_madvise() depending on the advice, then we need the definitions for the different advices... Maybe this is not the correct place to do this? We could also do that differentiation in madvise.c.
>
>> If this hunk is truly the correct approach then I think we need a
>> comment here fully explaining what is going on.   Because it looks odd!
> To me, differentiating the behaviour of madvise() for MADV_HUGEPAGE and MADV_NOHUGEPAGE seems ok, and that was also the consensus on [0]. However, I lack the expertise to determine if this is the best place in the code to implement this change. Perhaps Lorenzo or Matthew can provide feedback here.
>
>
> [0]: https://lore.kernel.org/linux-mm/20250502-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-v1-1-113cc634cd51@kuka.com/
Re: [PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Ignacio Moreno Gonzalez 4 months, 3 weeks ago
On 5/7/2025 5:57 PM, Yang Shi wrote:

> Can you just simply include <linux/mman.h> ?
This is what I tried first, but then it won't compile due to 'undeclared MADV_NOHUGEPAGE'
Re: [PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Yang Shi 4 months, 3 weeks ago

On 5/7/25 9:11 AM, Ignacio Moreno Gonzalez wrote:
> On 5/7/2025 5:57 PM, Yang Shi wrote:
>
>> Can you just simply include <linux/mman.h> ?
> This is what I tried first, but then it won't compile due to 'undeclared MADV_NOHUGEPAGE'

OK, I ran into some compilation errors, but it is not due to 'undeclared 
MADV_NOHUGEPAGE'. The compiler reports some inline functions are not 
declared. It may be because some circular dependency because 
linux/mman.h also includes linux/mm.h. But I didn't have too much time 
investigating it.

The below patch works for me:

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 93e509b6c00e..750e17e552a0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -475,6 +475,8 @@ bool unmap_huge_pmd_locked(struct vm_area_struct 
*vma, unsigned long addr,

  #else /* CONFIG_TRANSPARENT_HUGEPAGE */

+#include <uapi/asm/mman.h>
+
  static inline bool folio_test_pmd_mappable(struct folio *folio)
  {
         return false;
@@ -558,6 +560,9 @@ static inline bool unmap_huge_pmd_locked(struct 
vm_area_struct *vma,
  static inline int hugepage_madvise(struct vm_area_struct *vma,
                                    unsigned long *vm_flags, int advice)
  {
+       if (advice == MADV_NOHUGEPAGE)
+               return 0;
+
         return -EINVAL;
  }


This should be slightly better than yours?

Thanks,
Yang


Re: [PATCH v2 2/2] mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
Posted by Liam R. Howlett 4 months, 3 weeks ago
* Ignacio Moreno Gonzalez via B4 Relay <devnull+Ignacio.MorenoGonzalez.kuka.com@kernel.org> [250506 09:44]:
> From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>
> 
> VM_NOHUGEPAGE is a no-op in a kernel without THP. So it makes no sense
> to return an error when calling madvise() with MADV_NOHUGEPAGE.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
> Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com>

Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>

> ---
>  include/linux/huge_mm.h | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index e893d546a49f464f7586db639fe216231f03651a..cdb991f9be918182f94003394cf793654a080224 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -7,6 +7,10 @@
>  #include <linux/fs.h> /* only for vma_is_dax() */
>  #include <linux/kobject.h>
>  
> +#ifndef CONFIG_TRANSPARENT_HUGEPAGE
> +#include <uapi/asm-generic/mman-common.h>
> +#endif
> +
>  vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
>  int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
> @@ -598,6 +602,8 @@ static inline bool unmap_huge_pmd_locked(struct vm_area_struct *vma,
>  static inline int hugepage_madvise(struct vm_area_struct *vma,
>  				   unsigned long *vm_flags, int advice)
>  {
> +	if (advice == MADV_NOHUGEPAGE)
> +		return 0;
>  	return -EINVAL;
>  }
>  
> 
> -- 
> 2.39.5
> 
>