[PATCH 2/4] x86/mm: harmonize return value of phys_pte_init()

Brendan Jackman posted 4 patches 2 months, 2 weeks ago
[PATCH 2/4] x86/mm: harmonize return value of phys_pte_init()
Posted by Brendan Jackman 2 months, 2 weeks ago
In the case that they encounter pre-existing mappings, all the other
phys_*_init()s include those pre-mapped PFNs in the returned value.
Excluding those PFNs only when they are mapped at 4K seems like an
error. So make it consistent.

The other functions only include the existing mappings if the
page_size_mask would have allowed creating those mappings.
4K pages can't be disabled by page_size_mask so that condition is not
needed here; paddr_last can be assigned unconditionally before checking
for existing mappings.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 arch/x86/mm/init_64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 9e45b371a6234b41bd7177b81b5d432341ae7214..968a5092dbd7ee3e7007fa0c769eff7d7ecb0ba3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -492,6 +492,8 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 			continue;
 		}
 
+		paddr_last = paddr_next;
+
 		/*
 		 * We will re-use the existing mapping.
 		 * Xen for example has some special requirements, like mapping
@@ -506,7 +508,6 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
 
 		pages++;
 		set_pte_init(pte, pfn_pte(paddr >> PAGE_SHIFT, prot), init);
-		paddr_last = (paddr & PAGE_MASK) + PAGE_SIZE;
 	}
 
 	update_page_count(PG_LEVEL_4K, pages);

-- 
2.50.1
Re: [PATCH 2/4] x86/mm: harmonize return value of phys_pte_init()
Posted by Borislav Petkov 2 weeks, 6 days ago
On Fri, Oct 03, 2025 at 04:56:42PM +0000, Brendan Jackman wrote:
> In the case that they encounter pre-existing mappings, all the other
> phys_*_init()s include those pre-mapped PFNs in the returned value.
> Excluding those PFNs only when they are mapped at 4K seems like an
> error. So make it consistent.
> 
> The other functions only include the existing mappings if the
> page_size_mask would have allowed creating those mappings.
> 4K pages can't be disabled by page_size_mask so that condition is not
> needed here; paddr_last can be assigned unconditionally before checking
> for existing mappings.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  arch/x86/mm/init_64.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 9e45b371a6234b41bd7177b81b5d432341ae7214..968a5092dbd7ee3e7007fa0c769eff7d7ecb0ba3 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -492,6 +492,8 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
>  			continue;
>  		}
>  
> +		paddr_last = paddr_next;
> +
>  		/*
>  		 * We will re-use the existing mapping.
>  		 * Xen for example has some special requirements, like mapping

I don't understand: the other phys_*_init() things do:

		if (!XXX_none())

			...

			paddr_last = paddr_next;

while you've raised the assignment above that test.

Also "seems like an error" needs a lot more poking at because if it is an
error, then its incarnation must be really nasty and subtle or it is not, and
then we don't care. And it has been that way for a while now...

But maybe I'm not seeing it from the right angle...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH 2/4] x86/mm: harmonize return value of phys_pte_init()
Posted by Brendan Jackman 2 weeks, 5 days ago
On Thu Nov 27, 2025 at 2:35 PM UTC, Borislav Petkov wrote:
> On Fri, Oct 03, 2025 at 04:56:42PM +0000, Brendan Jackman wrote:
>> In the case that they encounter pre-existing mappings, all the other
>> phys_*_init()s include those pre-mapped PFNs in the returned value.
>> Excluding those PFNs only when they are mapped at 4K seems like an
>> error. So make it consistent.
>> 
>> The other functions only include the existing mappings if the
>> page_size_mask would have allowed creating those mappings.
>> 4K pages can't be disabled by page_size_mask so that condition is not
>> needed here; paddr_last can be assigned unconditionally before checking
>> for existing mappings.
>> 
>> Signed-off-by: Brendan Jackman <jackmanb@google.com>
>> ---
>>  arch/x86/mm/init_64.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 9e45b371a6234b41bd7177b81b5d432341ae7214..968a5092dbd7ee3e7007fa0c769eff7d7ecb0ba3 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -492,6 +492,8 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
>>  			continue;
>>  		}
>>  
>> +		paddr_last = paddr_next;
>> +
>>  		/*
>>  		 * We will re-use the existing mapping.
>>  		 * Xen for example has some special requirements, like mapping
>
> I don't understand: the other phys_*_init() things do:
>
> 		if (!XXX_none())
>
> 			...
>
> 			paddr_last = paddr_next;
>
> while you've raised the assignment above that test.

Well they actually do this:

		if (!p*_none()) {
			if (!p*_leaf()) {
				paddr_last = ...
				continue;
			}
			if (page_size_mask & *) {
				paddr_last = ...
				continue;
			}
		}

		if (page_size_mask & *) {
			paddr_last = *
			continue;
		}

		paddr_last = *
	
That is, they update paddr_last unconditionally. While before this
patch, phys_pte_init() skips the update in the !pte_non() case.

> Also "seems like an error" needs a lot more poking at because if it is an
> error, then its incarnation must be really nasty and subtle or it is not, and
> then we don't care. And it has been that way for a while now...

Before the patchset, the return value of kernel_physical_mapping_init()
means something like:

1. The last physical address that was mapped.

2. ... This includes addresses that were already mapped before the call

3. ... UNLESS that pre-existing mapping was 4K.

In patch 4/4 I'm claiming:

> The exact definition of this is pretty fiddly, but only when there is a mismatch
> between the alignment of the requested range and the page sizes allowed
> by page_size_mask, or when the range ends in a region that is not mapped
> according to e820.

Which would not be true given point 3 above. Without this
phys_pte_init() change, the return value of init_memory_mapping() is
fiddly even if you are allow arbitary page sizes and all the paddrs
you're trying to map definitely exist, because of the 4K special-case in
point 4. Instead of trying to justify why init_memory_mapping() doesn't
care even about that special-case, I just removed that special-case
because I think it was probably a bug anyway.

HOWEVER... with the wisdom of hindsight... this was a VERY obscure
and confusing way to go about writing the patchset. I apologise!

I think the right way to do this is to drop this patch (2/4) and
evaluate the remainder against the claim that init_memory_mapping()
doesn't care about the return value at all. So that would have to mean:

a. It only calls kernel_physical_mapping_init() for physical ranges that
   exist.

b. It always uses a page_size_mask that matches the alignment of the
   ranges it's passing.

c. It doesn't operate on ranges that already have mappings.

Am I making a bit more sense now...?
Re: [PATCH 2/4] x86/mm: harmonize return value of phys_pte_init()
Posted by Dave Hansen 1 week, 4 days ago
On 11/28/25 06:03, Brendan Jackman wrote:
> Before the patchset, the return value of kernel_physical_mapping_init()
> means something like:
> 
> 1. The last physical address that was mapped.
> 
> 2. ... This includes addresses that were already mapped before the call
> 
> 3. ... UNLESS that pre-existing mapping was 4K.

Yeah, the 4k thing certainly sounds like a bug. The *only* thing that
this influences is the add_pfn_range_mapped() call and it doesn't care
about 4k.

> I think the right way to do this is to drop this patch (2/4) and
> evaluate the remainder against the claim that init_memory_mapping()
> doesn't care about the return value at all. So that would have to mean:
> 
> a. It only calls kernel_physical_mapping_init() for physical ranges that
>    exist.
> 
> b. It always uses a page_size_mask that matches the alignment of the
>    ranges it's passing.
> 
> c. It doesn't operate on ranges that already have mappings.

Yeah, that makes sense to go forward with. Instead of having the code
try to cope with all that stuff that we don't think is happening
_anyway_, let's just warn on those conditions and effectively not handle
them.
Re: [PATCH 2/4] x86/mm: harmonize return value of phys_pte_init()
Posted by Brendan Jackman 1 week, 3 days ago
On Fri Dec 5, 2025 at 7:29 PM UTC, Dave Hansen wrote:
> On 11/28/25 06:03, Brendan Jackman wrote:
>> Before the patchset, the return value of kernel_physical_mapping_init()
>> means something like:
>> 
>> 1. The last physical address that was mapped.
>> 
>> 2. ... This includes addresses that were already mapped before the call
>> 
>> 3. ... UNLESS that pre-existing mapping was 4K.
>
> Yeah, the 4k thing certainly sounds like a bug. The *only* thing that
> this influences is the add_pfn_range_mapped() call and it doesn't care
> about 4k.
>
>> I think the right way to do this is to drop this patch (2/4) and
>> evaluate the remainder against the claim that init_memory_mapping()
>> doesn't care about the return value at all. So that would have to mean:
>> 
>> a. It only calls kernel_physical_mapping_init() for physical ranges that
>>    exist.
>> 
>> b. It always uses a page_size_mask that matches the alignment of the
>>    ranges it's passing.
>> 
>> c. It doesn't operate on ranges that already have mappings.
>
> Yeah, that makes sense to go forward with. Instead of having the code
> try to cope with all that stuff that we don't think is happening
> _anyway_, let's just warn on those conditions and effectively not handle
> them.

I assume those conditions can arise in other cases than
init_memory_mapping(). It's just that those cases already ignore the
return value so it doesn't matter anyway.

Anyway yeah will go ahead with this approach, minus the warnings.
Probably after LPC as I am still not finished with my page_alloc
stuff (yikes!).