[PATCH 1/2] arm64: mm: vmemmap populate to page level if not section aligned

Zhenhua Huang posted 2 patches 1 year, 2 months ago
There is a newer version of this series
[PATCH 1/2] arm64: mm: vmemmap populate to page level if not section aligned
Posted by Zhenhua Huang 1 year, 2 months ago
Commit 2045a3b8911b ("mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()")
optimizes the vmemmap to populate at the PMD section level. However, if start
or end is not aligned to a section boundary, such as when a subsection is hot
added, populating the entire section is inefficient and wasteful. In such
cases, it is more effective to populate at page granularity.

This change also addresses misalignment issues during vmemmap_free(). When
pmd_sect() is true, the entire PMD section is cleared, even if only a
subsection is mapped. For example, if subsections pagemap1 and pagemap2 are
added sequentially and then pagemap1 is removed, vmemmap_free() will clear the
entire PMD section, even though pagemap2 is still active.

Fixes: 2045a3b8911b ("mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()")
Signed-off-by: Zhenhua Huang <quic_zhenhuah@quicinc.com>
---
 arch/arm64/mm/mmu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index fe833de501f7..bfecabac14a3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1151,7 +1151,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
 {
 	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
 
-	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
+	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
 		return vmemmap_populate_basepages(start, end, node, altmap);
 	else
 		return vmemmap_populate_hugepages(start, end, node, altmap);
-- 
2.25.1
Re: [PATCH 1/2] arm64: mm: vmemmap populate to page level if not section aligned
Posted by Catalin Marinas 1 year, 2 months ago
On Thu, Nov 21, 2024 at 03:12:55PM +0800, Zhenhua Huang wrote:
> Commit 2045a3b8911b ("mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()")
> optimizes the vmemmap to populate at the PMD section level.

Wasn't the above commit just a non-functional change making the code
generic? If there was a functional change, it needs to be spelt out. It
also implies that the code prior to the above commit needs fixing.

> However, if start
> or end is not aligned to a section boundary, such as when a subsection is hot
> added, populating the entire section is inefficient and wasteful. In such
> cases, it is more effective to populate at page granularity.

Do you have any numbers to show how inefficient it is? We trade some
memory for less TLB pressure by using huge pages for vmemmap.

> This change also addresses misalignment issues during vmemmap_free(). When
> pmd_sect() is true, the entire PMD section is cleared, even if only a
> subsection is mapped. For example, if subsections pagemap1 and pagemap2 are
> added sequentially and then pagemap1 is removed, vmemmap_free() will clear the
> entire PMD section, even though pagemap2 is still active.

What do you mean by a PMD section? The whole PAGE_SIZE *
PAGES_PER_SECTION range or a single pmd entry? I couldn't see how the
former happens in the core code but I only looked briefly. If it's just
a pmd entry, I think it's fair to require a 2MB alignment of hotplugged
memory ranges.

-- 
Catalin
Re: [PATCH 1/2] arm64: mm: vmemmap populate to page level if not section aligned
Posted by Zhenhua Huang 1 year, 2 months ago
Thanks Catalin for review!

On 2024/12/7 1:13, Catalin Marinas wrote:
> On Thu, Nov 21, 2024 at 03:12:55PM +0800, Zhenhua Huang wrote:
>> Commit 2045a3b8911b ("mm/sparse-vmemmap: generalise vmemmap_populate_hugepages()")
>> optimizes the vmemmap to populate at the PMD section level.
> 
> Wasn't the above commit just a non-functional change making the code
> generic? If there was a functional change, it needs to be spelt out. It
> also implies that the code prior to the above commit needs fixing.
> 

Oh... right. I looked up your change from over a decade ago, identified 
by commit c1cc1552616d ("arm64: MMU initialisation").
However, at that time, there was no support for subsection hotplug, 
which was later introduced by commit ba72b4c8cf60 ("mm/sparsemem: 
support sub-section hotplug").

>> However, if start
>> or end is not aligned to a section boundary, such as when a subsection is hot
>> added, populating the entire section is inefficient and wasteful. In such
>> cases, it is more effective to populate at page granularity.
> 
> Do you have any numbers to show how inefficient it is? We trade some
> memory for less TLB pressure by using huge pages for vmemmap.

I see.. thanks, yeah. TLB efficiency will benefit.
I want to express even one subsection hot-added, current code logic 
still populate 2M backup metadata, although only 2M/64 = 32K needs.

> 
>> This change also addresses misalignment issues during vmemmap_free(). When
>> pmd_sect() is true, the entire PMD section is cleared, even if only a
>> subsection is mapped. For example, if subsections pagemap1 and pagemap2 are
>> added sequentially and then pagemap1 is removed, vmemmap_free() will clear the
>> entire PMD section, even though pagemap2 is still active.
> 
> What do you mean by a PMD section? The whole PAGE_SIZE *
> PAGES_PER_SECTION range or a single pmd entry? I couldn't see how the

I am referring to a single pmd entry, but the buffer it points to manage 
whole PAGE_SIZE * PAGES_PER_SECTION physical memory. for arm64,4K pages:
pmd entry(2M, struct page metadata) -> PAGE_SIZE * 
PAGES_PER_SECTION(128M physical memory)

pagemap1(Where a subsection equals to 2M/64 = 32K) and pagemap2 are part 
of a single PMD entry. When pagemap1 is removed, vmemmap_free() will 
clear the entire PMD section. IOW, total 128M physical memory will 
become unusable.

> former happens in the core code but I only looked briefly. If it's just
> a pmd entry, I think it's fair to require a 2MB alignment of hotplugged
> memory ranges.

Agree that 2MB alignment of hotplugged memory is fair, commit 
ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") supported it. 
The issue here I want to address is for its backup struct page metadata.

>