[PATCH] mm: Inline vma_needs_copy

Yunshui Jiang posted 1 patch 3 months, 3 weeks ago
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] mm: Inline vma_needs_copy
Posted by Yunshui Jiang 3 months, 3 weeks ago
From: jiangyunshui <jiangyunshui@kylinos.cn>

Since commit bcd51a3c679d ("hugetlb: lazy page table copies
in fork()"), the logic about judging whether to copy
page table inside func copy_page_range has been extracted
into a separate func vma_needs_copy. While this change
improves code readability, it also incurs more function call
overhead, especially where fork() were frequently called.

Inline func vma_needs_copy to optimize the copy_page_range
performance. Given that func vma_needs_copy is only called
by copy_page_range, inlining it would not cause unacceptable
code bloat.

Testing was done with the byte-unixbench spawn benchmark
(which frequently calls fork). I measured 1.7% improvement
on x86 and 1.8% improvement on arm64.

Signed-off-by: jiangyunshui <jiangyunshui@kylinos.cn>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 8eba595056fe..d15b07f96ab1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1337,7 +1337,7 @@ copy_p4d_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
  * false when we can speed up fork() by allowing lazy page faults later until
  * when the child accesses the memory range.
  */
-static bool
+static __always_inline bool
 vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 {
 	/*
-- 
2.47.1
Re: [PATCH] mm: Inline vma_needs_copy
Posted by Lorenzo Stoakes 3 months, 3 weeks ago
On Wed, Jun 18, 2025 at 09:42:09AM +0800, Yunshui Jiang wrote:
> From: jiangyunshui <jiangyunshui@kylinos.cn>
>
> Since commit bcd51a3c679d ("hugetlb: lazy page table copies
> in fork()"), the logic about judging whether to copy
> page table inside func copy_page_range has been extracted
> into a separate func vma_needs_copy. While this change
> improves code readability, it also incurs more function call
> overhead, especially where fork() were frequently called.
>
> Inline func vma_needs_copy to optimize the copy_page_range
> performance. Given that func vma_needs_copy is only called
> by copy_page_range, inlining it would not cause unacceptable
> code bloat.
>
> Testing was done with the byte-unixbench spawn benchmark
> (which frequently calls fork). I measured 1.7% improvement
> on x86 and 1.8% improvement on arm64.

As per others you are going to need to provide details of your compiler
setup because modern compilers are inlining this already.

if it's ye olde compiler I'm not sure this is justified...

>
> Signed-off-by: jiangyunshui <jiangyunshui@kylinos.cn>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 8eba595056fe..d15b07f96ab1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1337,7 +1337,7 @@ copy_p4d_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
>   * false when we can speed up fork() by allowing lazy page faults later until
>   * when the child accesses the memory range.
>   */
> -static bool
> +static __always_inline bool
>  vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>  {

This needs to live in vma.h probably... todo++...

>  	/*
> --
> 2.47.1
>
Re: [PATCH] mm: Inline vma_needs_copy
Posted by Vlastimil Babka 3 months, 3 weeks ago
On 6/18/25 3:42 AM, Yunshui Jiang wrote:
> From: jiangyunshui <jiangyunshui@kylinos.cn>
> 
> Since commit bcd51a3c679d ("hugetlb: lazy page table copies
> in fork()"), the logic about judging whether to copy
> page table inside func copy_page_range has been extracted
> into a separate func vma_needs_copy. While this change
> improves code readability, it also incurs more function call
> overhead, especially where fork() were frequently called.
> 
> Inline func vma_needs_copy to optimize the copy_page_range
> performance. Given that func vma_needs_copy is only called
> by copy_page_range, inlining it would not cause unacceptable
> code bloat.

I'm surprised the compiler doesn't inline it already, if there's a
single caller. In fact, mine (gcc-14.3 on x86) already does.

So I wonder to which extent should we force override wrong compiler
heuristics? Maybe just inline instead of __always_inline would be OK? Is
that enough of a hint for your compiler?

> Testing was done with the byte-unixbench spawn benchmark
> (which frequently calls fork). I measured 1.7% improvement
> on x86 and 1.8% improvement on arm64.
> 
> Signed-off-by: jiangyunshui <jiangyunshui@kylinos.cn>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 8eba595056fe..d15b07f96ab1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1337,7 +1337,7 @@ copy_p4d_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
>   * false when we can speed up fork() by allowing lazy page faults later until
>   * when the child accesses the memory range.
>   */
> -static bool
> +static __always_inline bool
>  vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
>  {
>  	/*
Re: [PATCH] mm: Inline vma_needs_copy
Posted by David Hildenbrand 3 months, 3 weeks ago
On 18.06.25 11:39, Vlastimil Babka wrote:
> On 6/18/25 3:42 AM, Yunshui Jiang wrote:
>> From: jiangyunshui <jiangyunshui@kylinos.cn>
>>
>> Since commit bcd51a3c679d ("hugetlb: lazy page table copies
>> in fork()"), the logic about judging whether to copy
>> page table inside func copy_page_range has been extracted
>> into a separate func vma_needs_copy. While this change
>> improves code readability, it also incurs more function call
>> overhead, especially where fork() were frequently called.
>>
>> Inline func vma_needs_copy to optimize the copy_page_range
>> performance. Given that func vma_needs_copy is only called
>> by copy_page_range, inlining it would not cause unacceptable
>> code bloat.
> 
> I'm surprised the compiler doesn't inline it already, if there's a
> single caller. In fact, mine (gcc-14.3 on x86) already does.

Same here

	gcc (GCC) 15.1.1 20250521 (Red Hat 15.1.1-2)
> 
> So I wonder to which extent should we force override wrong compiler
> heuristics?

I think only where we (a) really know better (b) it's crucial for 
performance and (c) modern compilers don't do the right thing.

> Maybe just inline instead of __always_inline would be OK? Is
> that enough of a hint for your compiler?

I would do the same thing (inline as a hint), but some people apparently 
recently argued that newer compiler mostly ignore it either way. At 
least that's what I recall.

-- 
Cheers,

David / dhildenb
Re: [PATCH] mm: Inline vma_needs_copy
Posted by Liam R. Howlett 3 months, 3 weeks ago
* David Hildenbrand <david@redhat.com> [250618 06:03]:
> On 18.06.25 11:39, Vlastimil Babka wrote:
> > On 6/18/25 3:42 AM, Yunshui Jiang wrote:
> > > From: jiangyunshui <jiangyunshui@kylinos.cn>
> > > 
> > > Since commit bcd51a3c679d ("hugetlb: lazy page table copies
> > > in fork()"), the logic about judging whether to copy
> > > page table inside func copy_page_range has been extracted
> > > into a separate func vma_needs_copy. While this change
> > > improves code readability, it also incurs more function call
> > > overhead, especially where fork() were frequently called.
> > > 
> > > Inline func vma_needs_copy to optimize the copy_page_range
> > > performance. Given that func vma_needs_copy is only called
> > > by copy_page_range, inlining it would not cause unacceptable
> > > code bloat.
> > 
> > I'm surprised the compiler doesn't inline it already, if there's a
> > single caller. In fact, mine (gcc-14.3 on x86) already does.
> 
> Same here
> 
> 	gcc (GCC) 15.1.1 20250521 (Red Hat 15.1.1-2)
> > 
> > So I wonder to which extent should we force override wrong compiler
> > heuristics?
> 
> I think only where we (a) really know better (b) it's crucial for
> performance and (c) modern compilers don't do the right thing.
> 
> > Maybe just inline instead of __always_inline would be OK? Is
> > that enough of a hint for your compiler?
> 
> I would do the same thing (inline as a hint), but some people apparently
> recently argued that newer compiler mostly ignore it either way. At least
> that's what I recall.

I have seen inline ignored, but not with a single caller.  Perhaps
godbolt could find a reason this is necessary, otherwise we should just
leave it, IMO.

Cheers,
Liam
Re: [PATCH] mm: Inline vma_needs_copy
Posted by Yunshui Jiang 3 months, 3 weeks ago
Initially, I added explicit inline hints in the code precisely because some
newer compilers might ignore such suggestions.
 

Based on everyone's feedback, I rechecked my compiler configurations

—ARM64 uses GCC 12.3, while x86 uses GCC 12.4. By analyzing the

disassembly of vmlinux via objdump, I confirmed that vma_needs_copy

is indeed inlined in both cases. In this case the score difference in spawn

might just be due to fluctuations.

 

Leave it unchanged might be better. If we later encounter issues where

older compilers fail to inline single-caller cases or multiple calls are

incurred due to code changes, we could re-discuss  whether manual

intervention is needed. 

 

Best regards,
Yunshui Jiang