[RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1

Zi Yan posted 1 patch 1 month, 3 weeks ago
include/linux/highmem.h | 14 ++------------
mm/huge_memory.c        |  4 +++-
2 files changed, 5 insertions(+), 13 deletions(-)
[RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by Zi Yan 1 month, 3 weeks ago
Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
init_on_free=1 boot options") forces allocated page to be cleared in
post_alloc_hook() when init_on_alloc=1.

For non PMD folios, if arch does not define
vma_alloc_zeroed_movable_folio(), the default implementation again clears
the page return from the buddy allocator. So the page is cleared twice.
Fix it by passing __GFP_ZERO instead to avoid double page clearing.
At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
define their own vma_alloc_zeroed_movable_folio().

For PMD folios, folio_zero_user() is called to clear the folio again.
Fix it by calling folio_zero_user() only if init_on_alloc is set.
All arch are impacted.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/highmem.h | 14 ++------------
 mm/huge_memory.c        |  4 +++-
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 930a591b9b61..4b15224842e1 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
  * Return: A folio containing one allocated and zeroed page or NULL if
  * we are out of memory.
  */
-static inline
-struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
-				   unsigned long vaddr)
-{
-	struct folio *folio;
-
-	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
-	if (folio)
-		clear_user_highpage(&folio->page, vaddr);
-
-	return folio;
-}
+#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
+	vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
 #endif
 
 static inline void clear_highpage(struct page *page)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a7b05f4c2a5e..ff746151896f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
 		goto release;
 	}
 
-	folio_zero_user(folio, vmf->address);
+	if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
+				&init_on_alloc))
+		folio_zero_user(folio, vmf->address);
 	/*
 	 * The memory barrier inside __folio_mark_uptodate makes sure that
 	 * folio_zero_user writes become visible before the set_pmd_at()
-- 
2.45.2
Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by Huang, Ying 1 month, 2 weeks ago
Zi Yan <ziy@nvidia.com> writes:

[snip]

> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 930a591b9b61..4b15224842e1 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
>   * Return: A folio containing one allocated and zeroed page or NULL if
>   * we are out of memory.
>   */
> -static inline
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> -				   unsigned long vaddr)
> -{
> -	struct folio *folio;
> -
> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
> -	if (folio)
> -		clear_user_highpage(&folio->page, vaddr);
> -
> -	return folio;
> -}
> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> +	vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)

Although just one line, I still prefer to use inline function instead of
macro here.  Not strong opinion.

>  #endif

[snip]

--
Best Regards,
Huang, Ying
Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by David Hildenbrand 1 month, 2 weeks ago
On 07.10.24 20:23, Zi Yan wrote:
> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
> init_on_free=1 boot options") forces allocated page to be cleared in
> post_alloc_hook() when init_on_alloc=1.
> 
> For non PMD folios, if arch does not define
> vma_alloc_zeroed_movable_folio(), the default implementation again clears
> the page return from the buddy allocator. So the page is cleared twice.
> Fix it by passing __GFP_ZERO instead to avoid double page clearing.
> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
> define their own vma_alloc_zeroed_movable_folio().
> 
> For PMD folios, folio_zero_user() is called to clear the folio again.
> Fix it by calling folio_zero_user() only if init_on_alloc is set.
> All arch are impacted.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   include/linux/highmem.h | 14 ++------------
>   mm/huge_memory.c        |  4 +++-
>   2 files changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index 930a591b9b61..4b15224842e1 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
>    * Return: A folio containing one allocated and zeroed page or NULL if
>    * we are out of memory.
>    */
> -static inline
> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
> -				   unsigned long vaddr)
> -{
> -	struct folio *folio;
> -
> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
> -	if (folio)
> -		clear_user_highpage(&folio->page, vaddr);
> -
> -	return folio;
> -}
> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
> +	vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
>   #endif
>   
>   static inline void clear_highpage(struct page *page)
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index a7b05f4c2a5e..ff746151896f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>   		goto release;
>   	}
>   
> -	folio_zero_user(folio, vmf->address);
> +	if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> +				&init_on_alloc))
> +		folio_zero_user(folio, vmf->address);
>   	/*
>   	 * The memory barrier inside __folio_mark_uptodate makes sure that
>   	 * folio_zero_user writes become visible before the set_pmd_at()

I remember we discussed that in the past and that we do *not* want to 
sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.

Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There 
is the slight chance that we zero-out when we're not going to use the 
allocated folio, but ... that can happen either way even with the 
current code?


-- 
Cheers,

David / dhildenb
Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by Zi Yan 1 month, 2 weeks ago
On 8 Oct 2024, at 4:26, David Hildenbrand wrote:

> On 07.10.24 20:23, Zi Yan wrote:
>> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and
>> init_on_free=1 boot options") forces allocated page to be cleared in
>> post_alloc_hook() when init_on_alloc=1.
>>
>> For non PMD folios, if arch does not define
>> vma_alloc_zeroed_movable_folio(), the default implementation again clears
>> the page return from the buddy allocator. So the page is cleared twice.
>> Fix it by passing __GFP_ZERO instead to avoid double page clearing.
>> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they
>> define their own vma_alloc_zeroed_movable_folio().
>>
>> For PMD folios, folio_zero_user() is called to clear the folio again.
>> Fix it by calling folio_zero_user() only if init_on_alloc is set.
>> All arch are impacted.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   include/linux/highmem.h | 14 ++------------
>>   mm/huge_memory.c        |  4 +++-
>>   2 files changed, 5 insertions(+), 13 deletions(-)
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index 930a591b9b61..4b15224842e1 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
>>    * Return: A folio containing one allocated and zeroed page or NULL if
>>    * we are out of memory.
>>    */
>> -static inline
>> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>> -				   unsigned long vaddr)
>> -{
>> -	struct folio *folio;
>> -
>> -	folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false);
>> -	if (folio)
>> -		clear_user_highpage(&folio->page, vaddr);
>> -
>> -	return folio;
>> -}
>> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \
>> +	vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false)
>>   #endif
>>    static inline void clear_highpage(struct page *page)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index a7b05f4c2a5e..ff746151896f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>>   		goto release;
>>   	}
>>  -	folio_zero_user(folio, vmf->address);
>> +	if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
>> +				&init_on_alloc))
>> +		folio_zero_user(folio, vmf->address);
>>   	/*
>>   	 * The memory barrier inside __folio_mark_uptodate makes sure that
>>   	 * folio_zero_user writes become visible before the set_pmd_at()
>
> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>
> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?

I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
folio_zero_user() uses vmf->address to improve cache performance by changing
subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
optimization. To keep it, vmf->address will need to be passed to allocation
code. Maybe that is acceptable?

Best Regards,
Yan, Zi
Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by Vlastimil Babka 1 month, 2 weeks ago
On 10/8/24 13:52, Zi Yan wrote:
> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
> 
>>
>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>
>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
> 
> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but

Create some nice inline wrapper for the test and it will look less ugly? :)

> folio_zero_user() uses vmf->address to improve cache performance by changing
> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
> optimization. To keep it, vmf->address will need to be passed to allocation
> code. Maybe that is acceptable?

I'd rather not change the page allocation code for this...

> Best Regards,
> Yan, Zi

Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by David Hildenbrand 1 month, 2 weeks ago
On 08.10.24 14:57, Vlastimil Babka wrote:
> On 10/8/24 13:52, Zi Yan wrote:
>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>>
>>>
>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>>
>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>>
>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
> 
> Create some nice inline wrapper for the test and it will look less ugly? :)
> 
>> folio_zero_user() uses vmf->address to improve cache performance by changing
>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
>> optimization. To keep it, vmf->address will need to be passed to allocation
>> code. Maybe that is acceptable?
> 
> I'd rather not change the page allocation code for this...

Although I'm curious if that optimization from 2017 is still valuable :)

-- 
Cheers,

David / dhildenb

Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by Zi Yan 1 month, 2 weeks ago
On 8 Oct 2024, at 9:06, David Hildenbrand wrote:

> On 08.10.24 14:57, Vlastimil Babka wrote:
>> On 10/8/24 13:52, Zi Yan wrote:
>>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>>>
>>>>
>>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>>>
>>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>>>
>>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
>>
>> Create some nice inline wrapper for the test and it will look less ugly? :)

something like?

static inline bool alloc_zeroed()
{
	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
			&init_on_alloc);
}


I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP.
So both PMD THP and mTHP are zeroed twice for all arch.

Adding Ryan for mTHP.

>>
>>> folio_zero_user() uses vmf->address to improve cache performance by changing
>>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
>>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
>>> optimization. To keep it, vmf->address will need to be passed to allocation
>>> code. Maybe that is acceptable?
>>
>> I'd rather not change the page allocation code for this...
>
> Although I'm curious if that optimization from 2017 is still valuable :)

Maybe Ying can give some insight on this.


Do we need some general guidance on who is responsible for zeroing allocated
folios? Should people use GFP_ZERO instead of zeroing by themselves if possible?


Best Regards,
Yan, Zi
Re: [RFC PATCH] mm: avoid clearing user movable page twice with init_on_alloc=1
Posted by Huang, Ying 1 month, 2 weeks ago
Zi Yan <ziy@nvidia.com> writes:

> On 8 Oct 2024, at 9:06, David Hildenbrand wrote:
>
>> On 08.10.24 14:57, Vlastimil Babka wrote:
>>> On 10/8/24 13:52, Zi Yan wrote:
>>>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote:
>>>>
>>>>>
>>>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel.
>>>>>
>>>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
>>>>
>>>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but
>>>
>>> Create some nice inline wrapper for the test and it will look less ugly? :)
>
> something like?
>
> static inline bool alloc_zeroed()
> {
> 	return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON,
> 			&init_on_alloc);
> }
>
>
> I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP.
> So both PMD THP and mTHP are zeroed twice for all arch.
>
> Adding Ryan for mTHP.
>
>>>
>>>> folio_zero_user() uses vmf->address to improve cache performance by changing
>>>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target
>>>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this
>>>> optimization. To keep it, vmf->address will need to be passed to allocation
>>>> code. Maybe that is acceptable?
>>>
>>> I'd rather not change the page allocation code for this...
>>
>> Although I'm curious if that optimization from 2017 is still valuable :)
>
> Maybe Ying can give some insight on this.

I guess the optimization still applies now.  Although the size of the
per-core(thread) last level cache increases, it's still quite common for
it to be smaller than the size of THP.  And the sizes of L1/L2 are
significantly smaller, the likelihood for the accessed cache line to be
in L1/L2/LLC increases with the optimization.

--
Best Regards,
Huang, Ying