[v4] mm: Introduce for_each_valid_pfn()

[PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Woodhouse 9 months, 3 weeks ago

From: David Woodhouse <dwmw@amazon.co.uk>

Currently, memmap_init initializes pfn_hole with 0 instead of
ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
page from the page at address zero to the first available page, but it
won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
won't pass.

If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
kernel is used as a library and loaded at a very high address), the
pointless iteration for pages below ARCH_PFN_OFFSET will take a very
long time, and the kernel will look stuck at boot time.

Use for_each_valid_pfn() to skip the pointless iterations.

Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
---
 mm/mm_init.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 41884f2155c4..0d1a4546825c 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
 	unsigned long pfn;
 	u64 pgcnt = 0;
 
-	for (pfn = spfn; pfn < epfn; pfn++) {
-		if (!pfn_valid(pageblock_start_pfn(pfn))) {
-			pfn = pageblock_end_pfn(pfn) - 1;
-			continue;
-		}
+	for_each_valid_pfn(pfn, spfn, epfn) {
 		__init_single_page(pfn_to_page(pfn), pfn, zone, node);
 		__SetPageReserved(pfn_to_page(pfn));
 		pgcnt++;
-- 
2.49.0

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Hildenbrand 9 months, 2 weeks ago

On 23.04.25 15:33, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Currently, memmap_init initializes pfn_hole with 0 instead of
> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
> page from the page at address zero to the first available page, but it
> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
> won't pass.
> 
> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
> kernel is used as a library and loaded at a very high address), the
> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
> long time, and the kernel will look stuck at boot time.
> 
> Use for_each_valid_pfn() to skip the pointless iterations.
> 
> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
> ---
>   mm/mm_init.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 41884f2155c4..0d1a4546825c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>   	unsigned long pfn;
>   	u64 pgcnt = 0;
>   
> -	for (pfn = spfn; pfn < epfn; pfn++) {
> -		if (!pfn_valid(pageblock_start_pfn(pfn))) {
> -			pfn = pageblock_end_pfn(pfn) - 1;
> -			continue;
> -		}

So, if the first pfn in a pageblock is not valid, we skip the whole 
pageblock ...

> +	for_each_valid_pfn(pfn, spfn, epfn) {
>   		__init_single_page(pfn_to_page(pfn), pfn, zone, node);
>   		__SetPageReserved(pfn_to_page(pfn));
>   		pgcnt++;

but here, we would process further pfns inside such a pageblock?

-- 
Cheers,

David / dhildenb

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Woodhouse 9 months, 2 weeks ago

On 25 April 2025 17:17:25 BST, David Hildenbrand <david@redhat.com> wrote:
>On 23.04.25 15:33, David Woodhouse wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>> 
>> Currently, memmap_init initializes pfn_hole with 0 instead of
>> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
>> page from the page at address zero to the first available page, but it
>> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
>> won't pass.
>> 
>> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
>> kernel is used as a library and loaded at a very high address), the
>> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
>> long time, and the kernel will look stuck at boot time.
>> 
>> Use for_each_valid_pfn() to skip the pointless iterations.
>> 
>> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
>> ---
>>   mm/mm_init.c | 6 +-----
>>   1 file changed, 1 insertion(+), 5 deletions(-)
>> 
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 41884f2155c4..0d1a4546825c 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>   	unsigned long pfn;
>>   	u64 pgcnt = 0;
>>   -	for (pfn = spfn; pfn < epfn; pfn++) {
>> -		if (!pfn_valid(pageblock_start_pfn(pfn))) {
>> -			pfn = pageblock_end_pfn(pfn) - 1;
>> -			continue;
>> -		}
>
>So, if the first pfn in a pageblock is not valid, we skip the whole pageblock ...
>
>> +	for_each_valid_pfn(pfn, spfn, epfn) {
>>   		__init_single_page(pfn_to_page(pfn), pfn, zone, node);
>>   		__SetPageReserved(pfn_to_page(pfn));
>>   		pgcnt++;
>
>but here, we would process further pfns inside such a pageblock?
>

Is it not the case that either *all*, or *none*, of the PFNs within a given pageblock will be valid? 

I assumed that was *why* it had that skip, as an attempt at the kind of optimisation that for_each_valid_pfn() now gives us?

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Hildenbrand 9 months, 2 weeks ago

On 25.04.25 21:08, David Woodhouse wrote:
> On 25 April 2025 17:17:25 BST, David Hildenbrand <david@redhat.com> wrote:
>> On 23.04.25 15:33, David Woodhouse wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>
>>> Currently, memmap_init initializes pfn_hole with 0 instead of
>>> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
>>> page from the page at address zero to the first available page, but it
>>> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
>>> won't pass.
>>>
>>> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
>>> kernel is used as a library and loaded at a very high address), the
>>> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
>>> long time, and the kernel will look stuck at boot time.
>>>
>>> Use for_each_valid_pfn() to skip the pointless iterations.
>>>
>>> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
>>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
>>> ---
>>>    mm/mm_init.c | 6 +-----
>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>>> index 41884f2155c4..0d1a4546825c 100644
>>> --- a/mm/mm_init.c
>>> +++ b/mm/mm_init.c
>>> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>>    	unsigned long pfn;
>>>    	u64 pgcnt = 0;
>>>    -	for (pfn = spfn; pfn < epfn; pfn++) {
>>> -		if (!pfn_valid(pageblock_start_pfn(pfn))) {
>>> -			pfn = pageblock_end_pfn(pfn) - 1;
>>> -			continue;
>>> -		}
>>
>> So, if the first pfn in a pageblock is not valid, we skip the whole pageblock ...
>>
>>> +	for_each_valid_pfn(pfn, spfn, epfn) {
>>>    		__init_single_page(pfn_to_page(pfn), pfn, zone, node);
>>>    		__SetPageReserved(pfn_to_page(pfn));
>>>    		pgcnt++;
>>
>> but here, we would process further pfns inside such a pageblock?
>>
> 
> Is it not the case that either *all*, or *none*, of the PFNs within a given pageblock will be valid?

Hmm, good point. I was thinking about sub-sections, but all early 
sections are fully valid.

(Also, at least on x86, the subsection size should match the pageblock 
size; might not be the case on other architectures, like arm64 with 64K 
base pages ...)

> 
> I assumed that was *why* it had that skip, as an attempt at the kind of optimisation that for_each_valid_pfn() now gives us?

But it's interesting in this code that we didn't optimize for "if the 
first pfn is valid, all the remaining ones are valid". We would still 
check each PFN.

In any case, trying to figure out why Lorenzo ran into an issue ... if 
it's nit because of the pageblock, maybe something in for_each_valid_pfn 
with sparsemem is still shaky.

-- 
Cheers,

David / dhildenb

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Woodhouse 9 months, 2 weeks ago

On Fri, 2025-04-25 at 22:12 +0200, David Hildenbrand wrote:
> 
> In any case, trying to figure out why Lorenzo ran into an issue ... if 
> it's nit because of the pageblock, maybe something in for_each_valid_pfn 
> with sparsemem is still shaky.

Yep, I think this was it:

--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_pfn(unsigned long pfn, unsigned long end_
        /*
         * Either every PFN within the section (or subsection for VMEMMAP) is
         * valid, or none of them are. So there's no point repeating the check
-        * for every PFN; only call first_valid_pfn() the first time, and when
-        * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
+        * for every PFN; only call first_valid_pfn() again when crossing a
+        * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
         */
-       if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
+       if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
                   PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
                return pfn;
 
I've pushed the fixed version out to 
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/for_each_valid_pfn

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Hildenbrand 9 months, 2 weeks ago

On 26.04.25 01:04, David Woodhouse wrote:
> On Fri, 2025-04-25 at 22:12 +0200, David Hildenbrand wrote:
>>
>> In any case, trying to figure out why Lorenzo ran into an issue ... if
>> it's nit because of the pageblock, maybe something in for_each_valid_pfn
>> with sparsemem is still shaky.
> 
> Yep, I think this was it:
> 
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_pfn(unsigned long pfn, unsigned long end_
>          /*
>           * Either every PFN within the section (or subsection for VMEMMAP) is
>           * valid, or none of them are. So there's no point repeating the check
> -        * for every PFN; only call first_valid_pfn() the first time, and when
> -        * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
> +        * for every PFN; only call first_valid_pfn() again when crossing a
> +        * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
>           */
> -       if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
> +       if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
>                     PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))


LGTM, although we could make way this easier to understand:

Something like:


unsigned long pfn_mask = PAGE_SECTION_MASK;

if (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)
	pfn_mask = PAGE_SUBSECTION_MASK;

if (pfn & ~pfn_mask)
	...

-- 
Cheers,

David / dhildenb

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Woodhouse 9 months, 2 weeks ago

On 25 April 2025 21:12:49 BST, David Hildenbrand <david@redhat.com> wrote:
>On 25.04.25 21:08, David Woodhouse wrote:
>> On 25 April 2025 17:17:25 BST, David Hildenbrand <david@redhat.com> wrote:
>>> On 23.04.25 15:33, David Woodhouse wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>> 
>>>> Currently, memmap_init initializes pfn_hole with 0 instead of
>>>> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
>>>> page from the page at address zero to the first available page, but it
>>>> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
>>>> won't pass.
>>>> 
>>>> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
>>>> kernel is used as a library and loaded at a very high address), the
>>>> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
>>>> long time, and the kernel will look stuck at boot time.
>>>> 
>>>> Use for_each_valid_pfn() to skip the pointless iterations.
>>>> 
>>>> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
>>>> Suggested-by: Mike Rapoport <rppt@kernel.org>
>>>> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>>>> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>>>> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
>>>> ---
>>>>    mm/mm_init.c | 6 +-----
>>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>> 
>>>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>>>> index 41884f2155c4..0d1a4546825c 100644
>>>> --- a/mm/mm_init.c
>>>> +++ b/mm/mm_init.c
>>>> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>>>>    	unsigned long pfn;
>>>>    	u64 pgcnt = 0;
>>>>    -	for (pfn = spfn; pfn < epfn; pfn++) {
>>>> -		if (!pfn_valid(pageblock_start_pfn(pfn))) {
>>>> -			pfn = pageblock_end_pfn(pfn) - 1;
>>>> -			continue;
>>>> -		}
>>> 
>>> So, if the first pfn in a pageblock is not valid, we skip the whole pageblock ...
>>> 
>>>> +	for_each_valid_pfn(pfn, spfn, epfn) {
>>>>    		__init_single_page(pfn_to_page(pfn), pfn, zone, node);
>>>>    		__SetPageReserved(pfn_to_page(pfn));
>>>>    		pgcnt++;
>>> 
>>> but here, we would process further pfns inside such a pageblock?
>>> 
>> 
>> Is it not the case that either *all*, or *none*, of the PFNs within a given pageblock will be valid?
>
>Hmm, good point. I was thinking about sub-sections, but all early sections are fully valid.
>
>(Also, at least on x86, the subsection size should match the pageblock size; might not be the case on other architectures, like arm64 with 64K base pages ...)
>
>> 
>> I assumed that was *why* it had that skip, as an attempt at the kind of optimisation that for_each_valid_pfn() now gives us?
>
>But it's interesting in this code that we didn't optimize for "if the first pfn is valid, all the remaining ones are valid". We would still check each PFN.
>
>In any case, trying to figure out why Lorenzo ran into an issue ... if it's nit because of the pageblock, maybe something in for_each_valid_pfn with sparsemem is still shaky.
>

A previous round of the patch series had a less aggressively optimised version of the sparsemem implementation...?

Will see if I can reproduce in the morning. A boot in QEMU worked here before I sent it out.

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by Lorenzo Stoakes 9 months, 2 weeks ago

Andrew - can we drop this from mm-new? It's breaking it.

David, this seems to break on qemu boot for me in Andrew's mm-new branch,
bisected to this commit.

Splat on a basic x86 defconfig variant:

[    0.029481] BUG: unable to handle page fault for address: ffffea0003000034
[    0.029840] #PF: supervisor write access in kernel mode
[    0.030089] #PF: error_code(0x0002) - not-present page
[    0.030327] PGD 26ccc3067 P4D 26ccc3067 PUD 26ccc2067 PMD 0
[    0.030599] Oops: Oops: 0002 [#1] SMP NOPTI
[    0.030794] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc2+ #9 PREEMPT(undef)
[    0.031177] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[    0.031610] RIP: 0010:__init_single_page+0xa/0x50

__init_single_page+0xa/0x50:
arch_atomic_set at arch/x86/include/asm/atomic.h:28
(inlined by) raw_atomic_set at include/linux/atomic/atomic-arch-fallback.h:503
(inlined by) atomic_set at include/linux/atomic/atomic-instrumented.h:68
(inlined by) set_page_count at include/linux/page_ref.h:99
(inlined by) init_page_count at include/linux/page_ref.h:115
(inlined by) __init_single_page at mm/mm_init.c:586

^-- faddr2line decode

[    0.031832] Code: ff e9 0a 06 e4 fe 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 317
[    0.032710] RSP: 0000:ffffffff82a03da8 EFLAGS: 00010016
[    0.032954] RAX: 0000000000000000 RBX: 00000000000c0000 RCX: 0000000000000000
[    0.033287] RDX: 0200000000000000 RSI: 00000000000c0000 RDI: ffffea0003000000
[    0.033614] RBP: 0000000000100000 R08: 0000000000000000 R09: ffffea0009b30000
[    0.034186] R10: 0000000000000000 R11: 0000000000100000 R12: 0000000000000002
[    0.034519] R13: 0000000000000000 R14: 0000000000000023 R15: 0000000003000000
[    0.034856] FS:  0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000
[    0.035240] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.035509] CR2: ffffea0003000034 CR3: 0000000002a32000 CR4: 00000000000000b0
[    0.035846] Call Trace:
[    0.035961]  <TASK>
[    0.036070]  ? init_unavailable_range+0x42/0xb0

	for_each_valid_pfn(pfn, spfn, epfn) {
  		__init_single_page(pfn_to_page(pfn), pfn, zone, node); <--- this is here.

[    0.036284]  ? free_area_init+0xd70/0xe30
[    0.036473]  ? zone_sizes_init+0x44/0x50
[    0.036657]  ? setup_arch+0x9a8/0xa80
[    0.036831]  ? start_kernel+0x58/0x6c0
[    0.037010]  ? x86_64_start_reservations+0x24/0x30
[    0.037236]  ? x86_64_start_kernel+0x8c/0x90
[    0.037439]  ? common_startup_64+0x13e/0x148
[    0.037642]  </TASK>

Cheers, Lorenzo

On Wed, Apr 23, 2025 at 02:33:43PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Currently, memmap_init initializes pfn_hole with 0 instead of
> ARCH_PFN_OFFSET. Then init_unavailable_range will start iterating each
> page from the page at address zero to the first available page, but it
> won't do anything for pages below ARCH_PFN_OFFSET because pfn_valid
> won't pass.
>
> If ARCH_PFN_OFFSET is very large (e.g., something like 2^64-2GiB if the
> kernel is used as a library and loaded at a very high address), the
> pointless iteration for pages below ARCH_PFN_OFFSET will take a very
> long time, and the kernel will look stuck at boot time.
>
> Use for_each_valid_pfn() to skip the pointless iterations.
>
> Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
> Suggested-by: Mike Rapoport <rppt@kernel.org>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Tested-by: Ruihan Li <lrh2000@pku.edu.cn>
> ---
>  mm/mm_init.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 41884f2155c4..0d1a4546825c 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -845,11 +845,7 @@ static void __init init_unavailable_range(unsigned long spfn,
>  	unsigned long pfn;
>  	u64 pgcnt = 0;
>
> -	for (pfn = spfn; pfn < epfn; pfn++) {
> -		if (!pfn_valid(pageblock_start_pfn(pfn))) {
> -			pfn = pageblock_end_pfn(pfn) - 1;
> -			continue;
> -		}
> +	for_each_valid_pfn(pfn, spfn, epfn) {
>  		__init_single_page(pfn_to_page(pfn), pfn, zone, node);
>  		__SetPageReserved(pfn_to_page(pfn));
>  		pgcnt++;
> --
> 2.49.0
>
>

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by Andrew Morton 9 months, 2 weeks ago

On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> Andrew - can we drop this from mm-new? It's breaking it.

I almost did, but David seems to have a fix.

--- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
+++ a/include/linux/mmzone.h
@@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_p
 	/*
 	 * Either every PFN within the section (or subsection for VMEMMAP) is
 	 * valid, or none of them are. So there's no point repeating the check
-	 * for every PFN; only call first_valid_pfn() the first time, and when
-	 * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
+	 * for every PFN; only call first_valid_pfn() again when crossing a
+	 * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
 	 */
-	if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
+	if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
 		   PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
 		return pfn;
 
_

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by David Woodhouse 9 months, 2 weeks ago

On Fri, 2025-04-25 at 16:38 -0700, Andrew Morton wrote:
> On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> 
> > Andrew - can we drop this from mm-new? It's breaking it.
> 
> I almost did, but David seems to have a fix.
> 
> --- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix

The symptoms only manifested when it got used in
init_unavailable_range() but that's actually a fix for the sparsemem
implementation of for_each_valid_pfn(), as David H surmised.

Please could the fix be folded into
mm-implement-for_each_valid_pfn-for-config_sparsemem.patch ?

This is what it should look like with the fix:
https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=55bebbb093

If you want to keep the fix separate, then that's the patch that it
fixes. Do you want a commit message? I'll certainly give you a proper
SoB:

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

Happy to resend the fixed series if it helps; it looks like you've
already basically sorted it though?

Thanks!

> +++ a/include/linux/mmzone.h
> @@ -2190,10 +2190,10 @@ static inline unsigned long next_valid_p
>  	/*
>  	 * Either every PFN within the section (or subsection for VMEMMAP) is
>  	 * valid, or none of them are. So there's no point repeating the check
> -	 * for every PFN; only call first_valid_pfn() the first time, and when
> -	 * crossing a (sub)section boundary (i.e. !(pfn & ~PFN_VALID_MASK)).
> +	 * for every PFN; only call first_valid_pfn() again when crossing a
> +	 * (sub)section boundary (i.e. !(pfn & ~PAGE_{SUB,}SECTION_MASK)).
>  	 */
> -	if (pfn & (IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
> +	if (pfn & ~(IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP) ?
>  		   PAGE_SUBSECTION_MASK : PAGE_SECTION_MASK))
>  		return pfn;
>  
> _
>

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by Andrew Morton 9 months, 2 weeks ago

On Sat, 26 Apr 2025 09:30:50 +0100 David Woodhouse <dwmw2@infradead.org> wrote:

> On Fri, 2025-04-25 at 16:38 -0700, Andrew Morton wrote:
> > On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> > 
> > > Andrew - can we drop this from mm-new? It's breaking it.
> > 
> > I almost did, but David seems to have a fix.
> > 
> > --- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
> 
> The symptoms only manifested when it got used in
> init_unavailable_range() but that's actually a fix for the sparsemem
> implementation of for_each_valid_pfn(), as David H surmised.
> 
> Please could the fix be folded into
> mm-implement-for_each_valid_pfn-for-config_sparsemem.patch ?

yep, that's why I named the patch file
"mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix.patch".
To tell myself to fold it into
mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range.patch.

> This is what it should look like with the fix:
> https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=55bebbb093

We're good.

> If you want to keep the fix separate, then that's the patch that it
> fixes. Do you want a commit message? I'll certainly give you a proper
> SoB:
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>

I already unauthorisedly added that so I don't get grumpygrams from
Stephen ;)

> Happy to resend the fixed series if it helps; it looks like you've
> already basically sorted it though?

THanks, it doesn't appear necessary at this time.

Re: [PATCH v4 7/7] mm/mm_init: Use for_each_valid_pfn() in init_unavailable_range()

Posted by Lorenzo Stoakes 9 months, 2 weeks ago

This fixes the issues for me thanks David + thanks Andrew for taking so
quick! :)

I know you're probably rebasing this Andrew so maybe not so useful, but
fwiw wrt the reported issue:

Tested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

On Sun, Apr 27, 2025 at 04:07:46PM -0700, Andrew Morton wrote:
> On Sat, 26 Apr 2025 09:30:50 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
>
> > On Fri, 2025-04-25 at 16:38 -0700, Andrew Morton wrote:
> > > On Fri, 25 Apr 2025 17:11:10 +0100 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > > Andrew - can we drop this from mm-new? It's breaking it.
> > >
> > > I almost did, but David seems to have a fix.
> > >
> > > --- a/include/linux/mmzone.h~mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix
> >
> > The symptoms only manifested when it got used in
> > init_unavailable_range() but that's actually a fix for the sparsemem
> > implementation of for_each_valid_pfn(), as David H surmised.
> >
> > Please could the fix be folded into
> > mm-implement-for_each_valid_pfn-for-config_sparsemem.patch ?
>
> yep, that's why I named the patch file
> "mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range-fix.patch".
> To tell myself to fold it into
> mm-mm_init-use-for_each_valid_pfn-in-init_unavailable_range.patch.
>
> > This is what it should look like with the fix:
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=55bebbb093
>
> We're good.
>
> > If you want to keep the fix separate, then that's the patch that it
> > fixes. Do you want a commit message? I'll certainly give you a proper
> > SoB:
> >
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
>
> I already unauthorisedly added that so I don't get grumpygrams from
> Stephen ;)
>
> > Happy to resend the fixed series if it helps; it looks like you've
> > already basically sorted it though?
>
> THanks, it doesn't appear necessary at this time.