[v1] Introduce Copy-On-Write to Page Table

[RFC PATCH 0/6] Introduce Copy-On-Write to Page Table

Posted by Chih-En Lin 3 years, 11 months ago

When creating the user process, it usually uses the Copy-On-Write (COW)
mechanism to save the memory usage and the cost of time for copying.
COW defers the work of copying private memory and shares it across the
processes as read-only. If either process wants to write in these
memories, it will page fault and copy the shared memory, so the process
will now get its private memory right here, which is called break COW.

Presently this kind of technology is only used as the mapping memory.
It still needs to copy the entire page table from the parent.
It might cost a lot of time and memory to copy each page table when the
parent already has a lot of page tables allocated. For example, here is
the state table for mapping the 1 GB memory of forking.

	    mmap before fork         mmap after fork
MemTotal:       32746776 kB             32746776 kB
MemFree:        31468152 kB             31463244 kB
AnonPages:       1073836 kB              1073628 kB
Mapped:            39520 kB                39992 kB
PageTables:         3356 kB                 5432 kB

This patch introduces Copy-On-Write to the page table. This patch only
implements the COW on the PTE level. It's based on the paper
On-Demand Fork [1]. Summary of the implementation for the paper:

- Only implements the COW to the anonymous mapping
- Only do COW to the PTE table which the range is all covered by a
  single VMA.
- Use the reference count to control the COW PTE table lifetime.
  Decrease the counter when breaking COW or dereference the COW PTE
  table. When the counter reduces to zero, free the PTE table.

The paper is based on v5.6, and this patch is for v.518-rc6. And, this
patch has some differences between the version of paper. To reduce the
work of duplicating page tables, I adapted the restriction of the COW
page table. Excluding the brk and shared memory, it will do the COW to
all the PTE tables. With a reference count of one, we reuse the table
when breaking COW. To handle the page table state of the process, it
adds the ownership of the COW PTE table. It uses the address of the PMD
index for the ownership of the PTE table to maintain the COW PTE table
state to the RSS and pgtable_bytes.

If we do the COW to the PTE table once as the time we touch the PMD
entry, it cannot preserves the reference count of the COW PTE table.
Since the address range of VMA may overlap the PTE table, the copying
function will use VMA to travel the page table for copying it.
So it may increase the reference count of the COW PTE table multiple
times in one COW page table forking. Generically it will only increase
once time as the child reference it. To solve this problem, it needs to
check the destination of PMD entry does exist. And the reference count
of the source PTE table is more than one before doing the COW.

Here is the patch of a state table for mapping the 1 GB memory of
forking.

            mmap before fork         mmap after fork
MemTotal:       32746776 kB             32746776 kB
MemFree:        31471324 kB             31468888 kB
AnonPages:       1073628 kB              1073660 kB
Mapped:            39264 kB                39504 kB
PageTables:         3304 kB                 3396 kB

TODO list:
- Handle the swap
- Rewrite the TLB flush for zapping the COW PTE table.
- Experiment COW to the entire page table. (Now just for PTE level)
- Bug in some case from copy_pte_range()::vm_normal_page()::print_bad_pte().
- Bug of Bad RSS counter in multiple times COW PTE table forking.

[1] https://dl.acm.org/doi/10.1145/3447786.3456258

This patch is based on v5.18-rc6.

---

Chih-En Lin (6):
  mm: Add a new mm flag for Copy-On-Write PTE table
  mm: clone3: Add CLONE_COW_PGTABLE flag
  mm, pgtable: Add ownership for the PTE table
  mm: Add COW PTE fallback function
  mm, pgtable: Add the reference counter for COW PTE
  mm: Expand Copy-On-Write to PTE table

 include/linux/mm.h             |   2 +
 include/linux/mm_types.h       |   2 +
 include/linux/pgtable.h        |  44 +++++
 include/linux/sched/coredump.h |   5 +-
 include/uapi/linux/sched.h     |   1 +
 kernel/fork.c                  |   6 +-
 mm/memory.c                    | 329 ++++++++++++++++++++++++++++++---
 mm/mmap.c                      |   4 +
 mm/mremap.c                    |   5 +
 9 files changed, 373 insertions(+), 25 deletions(-)

-- 
2.36.1

Re: [External] [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table

Posted by Qi Zheng 3 years, 11 months ago


On 2022/5/20 2:31 AM, Chih-En Lin wrote:
> When creating the user process, it usually uses the Copy-On-Write (COW)
> mechanism to save the memory usage and the cost of time for copying.
> COW defers the work of copying private memory and shares it across the
> processes as read-only. If either process wants to write in these
> memories, it will page fault and copy the shared memory, so the process
> will now get its private memory right here, which is called break COW.
> 
> Presently this kind of technology is only used as the mapping memory.
> It still needs to copy the entire page table from the parent.
> It might cost a lot of time and memory to copy each page table when the
> parent already has a lot of page tables allocated. For example, here is
> the state table for mapping the 1 GB memory of forking.
> 
> 	    mmap before fork         mmap after fork
> MemTotal:       32746776 kB             32746776 kB
> MemFree:        31468152 kB             31463244 kB
> AnonPages:       1073836 kB              1073628 kB
> Mapped:            39520 kB                39992 kB
> PageTables:         3356 kB                 5432 kB
> 
> This patch introduces Copy-On-Write to the page table. This patch only
> implements the COW on the PTE level. It's based on the paper
> On-Demand Fork [1]. Summary of the implementation for the paper:
> 
> - Only implements the COW to the anonymous mapping
> - Only do COW to the PTE table which the range is all covered by a
>    single VMA.
> - Use the reference count to control the COW PTE table lifetime.
>    Decrease the counter when breaking COW or dereference the COW PTE
>    table. When the counter reduces to zero, free the PTE table.
> 

Hi,

To reduce the empty user PTE tables, I also introduced a reference
count (pte_ref) for user PTE tables in my patch[1][2], It is used
to track the usage of each user PTE tables.

The following people will hold a pte_ref:
  - The !pte_none() entry, such as regular page table entry that map
    physical pages, or swap entry, or migrate entry, etc.
  - Visitor to the PTE page table entries, such as page table walker.

With COW PTE, a new holder (the process using the COW PTE) is added.

It's funny, it leads me to see more meaning of pte_ref.

Thanks,
Qi

[1] [RFC PATCH 00/18] Try to free user PTE page table pages
     link: 
https://lore.kernel.org/lkml/20220429133552.33768-1-zhengqi.arch@bytedance.com/
     (percpu_ref version)

[2] [PATCH v3 00/15] Free user PTE page table pages
     link: 
https://lore.kernel.org/lkml/20211110105428.32458-1-zhengqi.arch@bytedance.com/
     (atomic count version)

-- 
Thanks,
Qi

Re: [External] [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table

Posted by Chih-En Lin 3 years, 11 months ago

On Sat, May 21, 2022 at 04:59:19PM +0800, Qi Zheng wrote:
> Hi,
> 
> To reduce the empty user PTE tables, I also introduced a reference
> count (pte_ref) for user PTE tables in my patch[1][2], It is used
> to track the usage of each user PTE tables.
> 
> The following people will hold a pte_ref:
>  - The !pte_none() entry, such as regular page table entry that map
>    physical pages, or swap entry, or migrate entry, etc.
>  - Visitor to the PTE page table entries, such as page table walker.
> 
> With COW PTE, a new holder (the process using the COW PTE) is added.
> 
> It's funny, it leads me to see more meaning of pte_ref.
> 
> Thanks,
> Qi
> 
> [1] [RFC PATCH 00/18] Try to free user PTE page table pages
>     link: https://lore.kernel.org/lkml/20220429133552.33768-1-zhengqi.arch@bytedance.com/
>     (percpu_ref version)
> 
> [2] [PATCH v3 00/15] Free user PTE page table pages
>     link: https://lore.kernel.org/lkml/20211110105428.32458-1-zhengqi.arch@bytedance.com/
>     (atomic count version)
> 
> -- 
> Thanks,
> Qi

Hi,

I saw your patch a few months ago.
Actually, my school's independent study is tracing the page table. And
one of the topics is your patch. It is really helpful from your pte_ref.
It's great to see you have more ideas for your pte_ref.

Thanks.