[PATCH next v2 0/2] THP COW support for private executable file mmap

Zhang Qilong posted 2 patches 1 month, 1 week ago
include/linux/huge_mm.h |  1 +
mm/huge_memory.c        | 91 +++++++++++++++++++++++++++++++++++++++++
mm/memory.c             | 15 +++++++
3 files changed, 107 insertions(+)
[PATCH next v2 0/2] THP COW support for private executable file mmap
Posted by Zhang Qilong 1 month, 1 week ago
This patch series implementate THP COW for private executable file mmap.
It's major designed to improve the performance of hotpatch programs, and
reusing 'vma->vm_flags' hints to determine whether to trigger the exec
THP COW.

The MySQL (Ver 8.0.25) test results on AMD are as follows:

-------------------------------------------------------------------
                 | Exec mmap Rss(kB)  | Measured tpmC (NewOrders) |
-----------------|--------------------|---------------------------|
 base(page COW)  |       32868        |        339686             |
-----------------|--------------------|---------------------------|
 exec THP COW    |       43516        |        371324             |
-------------------------------------------------------------------

The MySQL using exec THP COW consumes an additional 10648 kB of memory
but achieves 9.3% performance improvement in the scenario of hotpatch.
Additionally, another our internal program achieves approximately a 5%
performance improvement as well.

As result, using exec THP COW will consume additional memory. The
additional memory consumption may be negligible for the current system.
It's necessary to balance the memory consumption with the performance
impact.

v2:
- Add MySQL and internal program test results

Zhang Qilong (2):
  mm/huge_memory: Implementation of THP COW for executable file mmap
  mm/huge_memory: Use per-VMA hugepage flag hints for exec THP COW

 include/linux/huge_mm.h |  1 +
 mm/huge_memory.c        | 91 +++++++++++++++++++++++++++++++++++++++++
 mm/memory.c             | 15 +++++++
 3 files changed, 107 insertions(+)

-- 
2.43.0
Re: [PATCH next v2 0/2] THP COW support for private executable file mmap
Posted by David Hildenbrand (Red Hat) 1 month, 1 week ago
On 12/26/25 11:03, Zhang Qilong wrote:
> This patch series implementate THP COW for private executable file mmap.
> It's major designed to improve the performance of hotpatch programs, and
> reusing 'vma->vm_flags' hints to determine whether to trigger the exec
> THP COW.
> 
> The MySQL (Ver 8.0.25) test results on AMD are as follows:
> 
> -------------------------------------------------------------------
>                   | Exec mmap Rss(kB)  | Measured tpmC (NewOrders) |
> -----------------|--------------------|---------------------------|
>   base(page COW)  |       32868        |        339686             |
> -----------------|--------------------|---------------------------|
>   exec THP COW    |       43516        |        371324             |
> -------------------------------------------------------------------
> 
> The MySQL using exec THP COW consumes an additional 10648 kB of memory
> but achieves 9.3% performance improvement in the scenario of hotpatch.
> Additionally, another our internal program achieves approximately a 5%
> performance improvement as well.
> 
> As result, using exec THP COW will consume additional memory. The
> additional memory consumption may be negligible for the current system.
> It's necessary to balance the memory consumption with the performance
> impact.

I agree with Willy that "negligible" is the wrong word. Assume you're 
using uprobes and end up firing up the same executable in many 
processes. Each process will suddenly consume 2M vs. 4k just for 
installing a single uprobe. Of course, VM_HUGEPAGE mitigates this.

But really, this is the first time that we are using large anon folios 
in MAP_PRIVATE file mappings IIRC.

Take a look at kernel/events/uprobes.c:__uprobe_write(), which I 
prepared to deal with large folios.

But the removal logic for zapping pages when removing uprobes will not 
be able to reclaim the memory in case we over-allocated memory during 
the COW fault. We'll be zapping a single PTE only and *not* restoring 
the original file THP PMD.

Zapping more is rather complicated (doable, but complicated), and I'm 
not particularly keen about adding that complexity there.

Long story short: this is the first time we allocate anon THPs in such 
areas and I wouldn't be surprised if there are more problems lurking 
somewhere.

-- 
Cheers

David
Re: [PATCH next v2 0/2] THP COW support for private executable file mmap
Posted by Matthew Wilcox 1 month, 1 week ago
On Fri, Dec 26, 2025 at 06:03:35PM +0800, Zhang Qilong wrote:
> The MySQL (Ver 8.0.25) test results on AMD are as follows:
> 
> -------------------------------------------------------------------
>                  | Exec mmap Rss(kB)  | Measured tpmC (NewOrders) |
> -----------------|--------------------|---------------------------|
>  base(page COW)  |       32868        |        339686             |
> -----------------|--------------------|---------------------------|
>  exec THP COW    |       43516        |        371324             |
> -------------------------------------------------------------------
> 
> The MySQL using exec THP COW consumes an additional 10648 kB of memory
> but achieves 9.3% performance improvement in the scenario of hotpatch.
> Additionally, another our internal program achieves approximately a 5%
> performance improvement as well.
> 
> As result, using exec THP COW will consume additional memory. The
> additional memory consumption may be negligible for the current system.
> It's necessary to balance the memory consumption with the performance
> impact.

I mean ... you say "negligible", I saay "32% extra".  9% performance
gain is certainly nothing to sneer at (and is consistent with measured
performance gains from using large folios for, eg, kernel compiles).
But wow, that's a lot of extra memory.  My feeling is that we shouldn't
add this functionality, but I'd welcome other opinions.
Re: [PATCH next v2 0/2] THP COW support for private executable file mmap
Posted by David Hildenbrand (Red Hat) 1 month, 1 week ago
On 12/28/25 04:42, Matthew Wilcox wrote:
> On Fri, Dec 26, 2025 at 06:03:35PM +0800, Zhang Qilong wrote:
>> The MySQL (Ver 8.0.25) test results on AMD are as follows:
>>
>> -------------------------------------------------------------------
>>                   | Exec mmap Rss(kB)  | Measured tpmC (NewOrders) |
>> -----------------|--------------------|---------------------------|
>>   base(page COW)  |       32868        |        339686             |
>> -----------------|--------------------|---------------------------|
>>   exec THP COW    |       43516        |        371324             |
>> -------------------------------------------------------------------
>>
>> The MySQL using exec THP COW consumes an additional 10648 kB of memory
>> but achieves 9.3% performance improvement in the scenario of hotpatch.
>> Additionally, another our internal program achieves approximately a 5%
>> performance improvement as well.
>>
>> As result, using exec THP COW will consume additional memory. The
>> additional memory consumption may be negligible for the current system.
>> It's necessary to balance the memory consumption with the performance
>> impact.
> 
> I mean ... you say "negligible", I saay "32% extra".  9% performance
> gain is certainly nothing to sneer at (and is consistent with measured
> performance gains from using large folios for, eg, kernel compiles).
> But wow, that's a lot of extra memory.  My feeling is that we shouldn't
> add this functionality, but I'd welcome other opinions.

Also, I wonder whether there aren't other approaches for such code 
patching where user space is able to create THPs more effectively? 
Handling creation of a patched file version etc in user space.

E.g., I'd assume that a single "patched" version (with a single THP) for 
multiple program instances could be beneficial over one patched version 
per program instance.

Which type of code patching does hotpatch perform?

-- 
Cheers

David