include/linux/mmzone.h | 24 +++++++++--------------- mm/sparse.c | 3 ++- 2 files changed, 11 insertions(+), 16 deletions(-)
The comment in mmzone.h states that the alignment requirement
is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
pointer arithmetic (mem_map - section_nr_to_pfn()) results in
a byte offset scaled by sizeof(struct page). Thus, the actual
alignment provided by the second term is PFN_SECTION_SHIFT +
__ffs(sizeof(struct page)).
Update the compile-time check and the mmzone.h comment to
accurately reflect this mathematically guaranteed alignment by
taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
__ffs(sizeof(struct page)). This avoids the issue of the check
being overly restrictive on architectures like powerpc where
PFN_SECTION_SHIFT alone is very small (e.g., 6).
Also, remove the exhaustive per-architecture bit-width list from the
comment; such details risk falling out of date over time and may
inadvertently be left un-updated, while the existing BUILD_BUG_ON
provides sufficient compile-time verification of the constraint.
No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
the smaller limit on all existing architectures.
Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
include/linux/mmzone.h | 24 +++++++++---------------
mm/sparse.c | 3 ++-
2 files changed, 11 insertions(+), 16 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7bd0134c241c..584fa598ad75 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -2073,21 +2073,15 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
extern size_t mem_section_usage_size(void);
/*
- * We use the lower bits of the mem_map pointer to store
- * a little bit of information. The pointer is calculated
- * as mem_map - section_nr_to_pfn(pnum). The result is
- * aligned to the minimum alignment of the two values:
- * 1. All mem_map arrays are page-aligned.
- * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
- * lowest bits. PFN_SECTION_SHIFT is arch-specific
- * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
- * worst combination is powerpc with 256k pages,
- * which results in PFN_SECTION_SHIFT equal 6.
- * To sum it up, at least 6 bits are available on all architectures.
- * However, we can exceed 6 bits on some other architectures except
- * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
- * with the worst case of 64K pages on arm64) if we make sure the
- * exceeded bit is not applicable to powerpc.
+ * We use the lower bits of the mem_map pointer to store a little bit of
+ * information. The pointer is calculated as mem_map - section_nr_to_pfn().
+ * The result is aligned to the minimum alignment of the two values:
+ *
+ * 1. All mem_map arrays are page-aligned.
+ * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
+ * it is subtracted from a struct page pointer, the offset is scaled by
+ * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
+ * __ffs(sizeof(struct page)).
*/
enum {
SECTION_MARKED_PRESENT_BIT,
diff --git a/mm/sparse.c b/mm/sparse.c
index dfabe554adf8..c2eb36bfb86d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
{
unsigned long coded_mem_map =
(unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
- BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
+ BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
+ PAGE_SHIFT));
BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
return coded_mem_map;
}
--
2.20.1
On 3/31/26 13:30, Muchun Song wrote:
> The comment in mmzone.h states that the alignment requirement
> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
> a byte offset scaled by sizeof(struct page). Thus, the actual
> alignment provided by the second term is PFN_SECTION_SHIFT +
> __ffs(sizeof(struct page)).
>
> Update the compile-time check and the mmzone.h comment to
> accurately reflect this mathematically guaranteed alignment by
> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
> __ffs(sizeof(struct page)). This avoids the issue of the check
> being overly restrictive on architectures like powerpc where
> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>
> Also, remove the exhaustive per-architecture bit-width list from the
> comment; such details risk falling out of date over time and may
> inadvertently be left un-updated, while the existing BUILD_BUG_ON
> provides sufficient compile-time verification of the constraint.
>
> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
> the smaller limit on all existing architectures.
>
> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
> include/linux/mmzone.h | 24 +++++++++---------------
> mm/sparse.c | 3 ++-
> 2 files changed, 11 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7bd0134c241c..584fa598ad75 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -2073,21 +2073,15 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
> extern size_t mem_section_usage_size(void);
>
> /*
> - * We use the lower bits of the mem_map pointer to store
> - * a little bit of information. The pointer is calculated
> - * as mem_map - section_nr_to_pfn(pnum). The result is
> - * aligned to the minimum alignment of the two values:
> - * 1. All mem_map arrays are page-aligned.
> - * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
> - * lowest bits. PFN_SECTION_SHIFT is arch-specific
> - * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
> - * worst combination is powerpc with 256k pages,
> - * which results in PFN_SECTION_SHIFT equal 6.
> - * To sum it up, at least 6 bits are available on all architectures.
> - * However, we can exceed 6 bits on some other architectures except
> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
> - * with the worst case of 64K pages on arm64) if we make sure the
> - * exceeded bit is not applicable to powerpc.
> + * We use the lower bits of the mem_map pointer to store a little bit of
> + * information. The pointer is calculated as mem_map - section_nr_to_pfn().
> + * The result is aligned to the minimum alignment of the two values:
> + *
> + * 1. All mem_map arrays are page-aligned.
> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
> + * it is subtracted from a struct page pointer, the offset is scaled by
> + * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
> + * __ffs(sizeof(struct page)).
> */
> enum {
> SECTION_MARKED_PRESENT_BIT,
> diff --git a/mm/sparse.c b/mm/sparse.c
> index dfabe554adf8..c2eb36bfb86d 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
> {
> unsigned long coded_mem_map =
> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
> + PAGE_SHIFT));
If that would trigger, wouldn't the memmap of a memory section be
smaller than a single page?
Is this really something we should be concerned about? :)
--
Cheers,
David
> On Apr 1, 2026, at 04:29, David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 3/31/26 13:30, Muchun Song wrote:
>> The comment in mmzone.h states that the alignment requirement
>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>> a byte offset scaled by sizeof(struct page). Thus, the actual
>> alignment provided by the second term is PFN_SECTION_SHIFT +
>> __ffs(sizeof(struct page)).
>>
>> Update the compile-time check and the mmzone.h comment to
>> accurately reflect this mathematically guaranteed alignment by
>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>> __ffs(sizeof(struct page)). This avoids the issue of the check
>> being overly restrictive on architectures like powerpc where
>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>
>> Also, remove the exhaustive per-architecture bit-width list from the
>> comment; such details risk falling out of date over time and may
>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>> provides sufficient compile-time verification of the constraint.
>>
>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>> the smaller limit on all existing architectures.
>>
>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>> include/linux/mmzone.h | 24 +++++++++---------------
>> mm/sparse.c | 3 ++-
>> 2 files changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 7bd0134c241c..584fa598ad75 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -2073,21 +2073,15 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
>> extern size_t mem_section_usage_size(void);
>>
>> /*
>> - * We use the lower bits of the mem_map pointer to store
>> - * a little bit of information. The pointer is calculated
>> - * as mem_map - section_nr_to_pfn(pnum). The result is
>> - * aligned to the minimum alignment of the two values:
>> - * 1. All mem_map arrays are page-aligned.
>> - * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
>> - * lowest bits. PFN_SECTION_SHIFT is arch-specific
>> - * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
>> - * worst combination is powerpc with 256k pages,
>> - * which results in PFN_SECTION_SHIFT equal 6.
>> - * To sum it up, at least 6 bits are available on all architectures.
>> - * However, we can exceed 6 bits on some other architectures except
>> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
>> - * with the worst case of 64K pages on arm64) if we make sure the
>> - * exceeded bit is not applicable to powerpc.
>> + * We use the lower bits of the mem_map pointer to store a little bit of
>> + * information. The pointer is calculated as mem_map - section_nr_to_pfn().
>> + * The result is aligned to the minimum alignment of the two values:
>> + *
>> + * 1. All mem_map arrays are page-aligned.
>> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
>> + * it is subtracted from a struct page pointer, the offset is scaled by
>> + * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
>> + * __ffs(sizeof(struct page)).
>> */
>> enum {
>> SECTION_MARKED_PRESENT_BIT,
>> diff --git a/mm/sparse.c b/mm/sparse.c
>> index dfabe554adf8..c2eb36bfb86d 100644
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>> {
>> unsigned long coded_mem_map =
>> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
>> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
>> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
>> + PAGE_SHIFT));
>
> If that would trigger, wouldn't the memmap of a memory section be
> smaller than a single page?
I don't think a memory section can be smaller than a page, because
PFN_SECTION_SHIFT is defined as follows:
#define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On powerpc,
PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination).
Sorry, but I didn't understand what your concern is. Could you elaborate
a bit more?
>
> Is this really something we should be concerned about? :)
>
When we continuously increase SECTION_MAP_LAST_BIT, it may trigger issues,
because I expect to catch problems as early as possible at compile time. That
was the motivation behind my change.
Thanks.
> --
> Cheers,
>
> David
> On Apr 1, 2026, at 10:57, Muchun Song <muchun.song@linux.dev> wrote:
>
>
>
>> On Apr 1, 2026, at 04:29, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> On 3/31/26 13:30, Muchun Song wrote:
>>> The comment in mmzone.h states that the alignment requirement
>>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>>> a byte offset scaled by sizeof(struct page). Thus, the actual
>>> alignment provided by the second term is PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)).
>>>
>>> Update the compile-time check and the mmzone.h comment to
>>> accurately reflect this mathematically guaranteed alignment by
>>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)). This avoids the issue of the check
>>> being overly restrictive on architectures like powerpc where
>>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>>
>>> Also, remove the exhaustive per-architecture bit-width list from the
>>> comment; such details risk falling out of date over time and may
>>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>>> provides sufficient compile-time verification of the constraint.
>>>
>>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>>> the smaller limit on all existing architectures.
>>>
>>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>> include/linux/mmzone.h | 24 +++++++++---------------
>>> mm/sparse.c | 3 ++-
>>> 2 files changed, 11 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 7bd0134c241c..584fa598ad75 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -2073,21 +2073,15 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
>>> extern size_t mem_section_usage_size(void);
>>>
>>> /*
>>> - * We use the lower bits of the mem_map pointer to store
>>> - * a little bit of information. The pointer is calculated
>>> - * as mem_map - section_nr_to_pfn(pnum). The result is
>>> - * aligned to the minimum alignment of the two values:
>>> - * 1. All mem_map arrays are page-aligned.
>>> - * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
>>> - * lowest bits. PFN_SECTION_SHIFT is arch-specific
>>> - * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
>>> - * worst combination is powerpc with 256k pages,
>>> - * which results in PFN_SECTION_SHIFT equal 6.
>>> - * To sum it up, at least 6 bits are available on all architectures.
>>> - * However, we can exceed 6 bits on some other architectures except
>>> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
>>> - * with the worst case of 64K pages on arm64) if we make sure the
>>> - * exceeded bit is not applicable to powerpc.
>>> + * We use the lower bits of the mem_map pointer to store a little bit of
>>> + * information. The pointer is calculated as mem_map - section_nr_to_pfn().
>>> + * The result is aligned to the minimum alignment of the two values:
>>> + *
>>> + * 1. All mem_map arrays are page-aligned.
>>> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
>>> + * it is subtracted from a struct page pointer, the offset is scaled by
>>> + * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
>>> + * __ffs(sizeof(struct page)).
>>> */
>>> enum {
>>> SECTION_MARKED_PRESENT_BIT,
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index dfabe554adf8..c2eb36bfb86d 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>>> {
>>> unsigned long coded_mem_map =
>>> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
>>> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
>>> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
>>> + PAGE_SHIFT));
>>
>> If that would trigger, wouldn't the memmap of a memory section be
>> smaller than a single page?
>
> I don't think a memory section can be smaller than a page, because
> PFN_SECTION_SHIFT is defined as follows:
>
> #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
>
> Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On powerpc,
> PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination).
>
> Sorry, but I didn't understand what your concern is. Could you elaborate
> a bit more?
Sorry, I misread it earlier — I thought it was about the memory
section size, but it's actually about the memmap size. Let me respond
again to your question. On the powerpc architecture, when PFN_SECTION_SHIFT
is 6 and PAGE_SHIFT is 18, it does seem that the memmap of a memory
section would be smaller than a single page.
Thanks.
>
>>
>> Is this really something we should be concerned about? :)
>>
>
> When we continuously increase SECTION_MAP_LAST_BIT, it may trigger issues,
> because I expect to catch problems as early as possible at compile time. That
> was the motivation behind my change.
>
> Thanks.
>
>> --
>> Cheers,
>>
>> David
On 4/1/26 06:01, Muchun Song wrote: > > >> On Apr 1, 2026, at 10:57, Muchun Song <muchun.song@linux.dev> wrote: >> >> >> >>> >>> >>> If that would trigger, wouldn't the memmap of a memory section be >>> smaller than a single page? >> >> I don't think a memory section can be smaller than a page, because >> PFN_SECTION_SHIFT is defined as follows: >> >> #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT) >> >> Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On powerpc, >> PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination). >> >> Sorry, but I didn't understand what your concern is. Could you elaborate >> a bit more? > > Sorry, I misread it earlier — I thought it was about the memory > section size, but it's actually about the memmap size. Let me respond > again to your question. On the powerpc architecture, when PFN_SECTION_SHIFT > is 6 and PAGE_SHIFT is 18, it does seem that the memmap of a memory > section would be smaller than a single page. Right, and I am saying we don't care about that and do not support it. For example, most vmemmap code I am aware of relies on a single section covering full pages. So I don't think this patch here adds any value, except faking that we might be supporting something we don't? -- Cheers, David
> On Apr 1, 2026, at 15:08, David Hildenbrand (Arm) <david@kernel.org> wrote: > > On 4/1/26 06:01, Muchun Song wrote: >> >> >>> On Apr 1, 2026, at 10:57, Muchun Song <muchun.song@linux.dev> wrote: >>> >>> >>> >>>> >>>> >>>> If that would trigger, wouldn't the memmap of a memory section be >>>> smaller than a single page? >>> >>> I don't think a memory section can be smaller than a page, because >>> PFN_SECTION_SHIFT is defined as follows: >>> >>> #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT) >>> >>> Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On powerpc, >>> PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination). >>> >>> Sorry, but I didn't understand what your concern is. Could you elaborate >>> a bit more? >> >> Sorry, I misread it earlier — I thought it was about the memory >> section size, but it's actually about the memmap size. Let me respond >> again to your question. On the powerpc architecture, when PFN_SECTION_SHIFT >> is 6 and PAGE_SHIFT is 18, it does seem that the memmap of a memory >> section would be smaller than a single page. > > Right, and I am saying we don't care about that and do not support it. > > For example, most vmemmap code I am aware of relies on a single section > covering full pages. > > So I don't think this patch here adds any value, except faking that we > might be supporting something we don't? Got it. My original motivation was simply that when I read this part of the code, the comment felt a bit off to me, so I just fixed it casually. But if it ends up creating the impression that "we might be supporting something we don't," as you said, then I'm fine with dropping this patch. Thanks. > > -- > Cheers, > > David
On 4/1/26 09:23, Muchun Song wrote: > > >> On Apr 1, 2026, at 15:08, David Hildenbrand (Arm) <david@kernel.org> wrote: >> >> On 4/1/26 06:01, Muchun Song wrote: >>> >>> >>> >>> Sorry, I misread it earlier — I thought it was about the memory >>> section size, but it's actually about the memmap size. Let me respond >>> again to your question. On the powerpc architecture, when PFN_SECTION_SHIFT >>> is 6 and PAGE_SHIFT is 18, it does seem that the memmap of a memory >>> section would be smaller than a single page. >> >> Right, and I am saying we don't care about that and do not support it. >> >> For example, most vmemmap code I am aware of relies on a single section >> covering full pages. >> >> So I don't think this patch here adds any value, except faking that we >> might be supporting something we don't? > > Got it. > > My original motivation was simply that when I read this part of the code, > the comment felt a bit off to me, so I just fixed it casually. But if it > ends up creating the impression that "we might be supporting something > we don't," as you said, then I'm fine with dropping this patch. Can you rework your patch to only simplify the comment, and to clarify that we always expect a single section to cover full pages? -- Cheers, David
> On Apr 1, 2026, at 15:26, David Hildenbrand (Arm) <david@kernel.org> wrote: > > On 4/1/26 09:23, Muchun Song wrote: >> >> >>> On Apr 1, 2026, at 15:08, David Hildenbrand (Arm) <david@kernel.org> wrote: >>> >>> On 4/1/26 06:01, Muchun Song wrote: >>>> >>>> >>>> >>>> Sorry, I misread it earlier — I thought it was about the memory >>>> section size, but it's actually about the memmap size. Let me respond >>>> again to your question. On the powerpc architecture, when PFN_SECTION_SHIFT >>>> is 6 and PAGE_SHIFT is 18, it does seem that the memmap of a memory >>>> section would be smaller than a single page. >>> >>> Right, and I am saying we don't care about that and do not support it. >>> >>> For example, most vmemmap code I am aware of relies on a single section >>> covering full pages. >>> >>> So I don't think this patch here adds any value, except faking that we >>> might be supporting something we don't? >> >> Got it. >> >> My original motivation was simply that when I read this part of the code, >> the comment felt a bit off to me, so I just fixed it casually. But if it >> ends up creating the impression that "we might be supporting something >> we don't," as you said, then I'm fine with dropping this patch. > > Can you rework your patch to only simplify the comment, and to clarify > that we always expect a single section to cover full pages? No problem. > > -- > Cheers, > > David
> On Apr 1, 2026, at 10:57, Muchun Song <muchun.song@linux.dev> wrote:
>
>
>
>> On Apr 1, 2026, at 04:29, David Hildenbrand (Arm) <david@kernel.org> wrote:
>>
>> On 3/31/26 13:30, Muchun Song wrote:
>>> The comment in mmzone.h states that the alignment requirement
>>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>>> a byte offset scaled by sizeof(struct page). Thus, the actual
>>> alignment provided by the second term is PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)).
>>>
>>> Update the compile-time check and the mmzone.h comment to
>>> accurately reflect this mathematically guaranteed alignment by
>>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)). This avoids the issue of the check
>>> being overly restrictive on architectures like powerpc where
>>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>>
>>> Also, remove the exhaustive per-architecture bit-width list from the
>>> comment; such details risk falling out of date over time and may
>>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>>> provides sufficient compile-time verification of the constraint.
>>>
>>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>>> the smaller limit on all existing architectures.
>>>
>>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>> ---
>>> include/linux/mmzone.h | 24 +++++++++---------------
>>> mm/sparse.c | 3 ++-
>>> 2 files changed, 11 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index 7bd0134c241c..584fa598ad75 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -2073,21 +2073,15 @@ static inline struct mem_section *__nr_to_section(unsigned long nr)
>>> extern size_t mem_section_usage_size(void);
>>>
>>> /*
>>> - * We use the lower bits of the mem_map pointer to store
>>> - * a little bit of information. The pointer is calculated
>>> - * as mem_map - section_nr_to_pfn(pnum). The result is
>>> - * aligned to the minimum alignment of the two values:
>>> - * 1. All mem_map arrays are page-aligned.
>>> - * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
>>> - * lowest bits. PFN_SECTION_SHIFT is arch-specific
>>> - * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
>>> - * worst combination is powerpc with 256k pages,
>>> - * which results in PFN_SECTION_SHIFT equal 6.
>>> - * To sum it up, at least 6 bits are available on all architectures.
>>> - * However, we can exceed 6 bits on some other architectures except
>>> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
>>> - * with the worst case of 64K pages on arm64) if we make sure the
>>> - * exceeded bit is not applicable to powerpc.
>>> + * We use the lower bits of the mem_map pointer to store a little bit of
>>> + * information. The pointer is calculated as mem_map - section_nr_to_pfn().
>>> + * The result is aligned to the minimum alignment of the two values:
>>> + *
>>> + * 1. All mem_map arrays are page-aligned.
>>> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
>>> + * it is subtracted from a struct page pointer, the offset is scaled by
>>> + * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
>>> + * __ffs(sizeof(struct page)).
>>> */
>>> enum {
>>> SECTION_MARKED_PRESENT_BIT,
>>> diff --git a/mm/sparse.c b/mm/sparse.c
>>> index dfabe554adf8..c2eb36bfb86d 100644
>>> --- a/mm/sparse.c
>>> +++ b/mm/sparse.c
>>> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>>> {
>>> unsigned long coded_mem_map =
>>> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
>>> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
>>> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
>>> + PAGE_SHIFT));
>>
>> If that would trigger, wouldn't the memmap of a memory section be
>> smaller than a single page?
>
> I don't think a memory section can be smaller than a page, because
> PFN_SECTION_SHIFT is defined as follows:
>
> #define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
>
> Therefore, PFN_SECTION_SHIFT must be greater than PAGE_SHIFT. On powerpc,
Sorry, I want to say memory section must be greater than page.
> PFN_SECTION_SHIFT is 6, PAGE_SHIFT is 18 (the worst combination).
>
> Sorry, but I didn't understand what your concern is. Could you elaborate
> a bit more?
>
>>
>> Is this really something we should be concerned about? :)
>>
>
> When we continuously increase SECTION_MAP_LAST_BIT, it may trigger issues,
> because I expect to catch problems as early as possible at compile time. That
> was the motivation behind my change.
>
> Thanks.
>
>> --
>> Cheers,
>>
>> David
On Tue, 31 Mar 2026 19:30:23 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> The comment in mmzone.h states that the alignment requirement
> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
> a byte offset scaled by sizeof(struct page). Thus, the actual
> alignment provided by the second term is PFN_SECTION_SHIFT +
> __ffs(sizeof(struct page)).
>
> Update the compile-time check and the mmzone.h comment to
> accurately reflect this mathematically guaranteed alignment by
> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
> __ffs(sizeof(struct page)). This avoids the issue of the check
> being overly restrictive on architectures like powerpc where
> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>
> Also, remove the exhaustive per-architecture bit-width list from the
> comment; such details risk falling out of date over time and may
> inadvertently be left un-updated, while the existing BUILD_BUG_ON
> provides sufficient compile-time verification of the constraint.
>
> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
> the smaller limit on all existing architectures.
>
> ...
>
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
> {
> unsigned long coded_mem_map =
> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
> + PAGE_SHIFT));
> BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
> return coded_mem_map;
> }
In mm-stable this was moved into mm/internal.h's new
sparse_init_one_section(). By David's 6a2f8fb8ed2d ("mm/sparse: move
sparse_init_one_section() to internal.h")
I did the obvious thing:
include/linux/mmzone.h | 24 +++++++++---------------
mm/internal.h | 3 ++-
2 files changed, 11 insertions(+), 16 deletions(-)
--- a/include/linux/mmzone.h~mm-sparse-fix-build_bug_on-check-for-section-map-alignment
+++ a/include/linux/mmzone.h
@@ -2068,21 +2068,15 @@ static inline struct mem_section *__nr_t
extern size_t mem_section_usage_size(void);
/*
- * We use the lower bits of the mem_map pointer to store
- * a little bit of information. The pointer is calculated
- * as mem_map - section_nr_to_pfn(pnum). The result is
- * aligned to the minimum alignment of the two values:
- * 1. All mem_map arrays are page-aligned.
- * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
- * lowest bits. PFN_SECTION_SHIFT is arch-specific
- * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
- * worst combination is powerpc with 256k pages,
- * which results in PFN_SECTION_SHIFT equal 6.
- * To sum it up, at least 6 bits are available on all architectures.
- * However, we can exceed 6 bits on some other architectures except
- * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
- * with the worst case of 64K pages on arm64) if we make sure the
- * exceeded bit is not applicable to powerpc.
+ * We use the lower bits of the mem_map pointer to store a little bit of
+ * information. The pointer is calculated as mem_map - section_nr_to_pfn().
+ * The result is aligned to the minimum alignment of the two values:
+ *
+ * 1. All mem_map arrays are page-aligned.
+ * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
+ * it is subtracted from a struct page pointer, the offset is scaled by
+ * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
+ * __ffs(sizeof(struct page)).
*/
enum {
SECTION_MARKED_PRESENT_BIT,
--- a/mm/internal.h~mm-sparse-fix-build_bug_on-check-for-section-map-alignment
+++ a/mm/internal.h
@@ -972,7 +972,8 @@ static inline void sparse_init_one_secti
{
unsigned long coded_mem_map;
- BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
+ BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
+ PAGE_SHIFT));
/*
* We encode the start PFN of the section into the mem_map such that
_
(boy that's an eyesore on an 80-col xterm!)
> On Apr 1, 2026, at 04:07, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 31 Mar 2026 19:30:23 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
>
>> The comment in mmzone.h states that the alignment requirement
>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>> a byte offset scaled by sizeof(struct page). Thus, the actual
>> alignment provided by the second term is PFN_SECTION_SHIFT +
>> __ffs(sizeof(struct page)).
>>
>> Update the compile-time check and the mmzone.h comment to
>> accurately reflect this mathematically guaranteed alignment by
>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>> __ffs(sizeof(struct page)). This avoids the issue of the check
>> being overly restrictive on architectures like powerpc where
>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>
>> Also, remove the exhaustive per-architecture bit-width list from the
>> comment; such details risk falling out of date over time and may
>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>> provides sufficient compile-time verification of the constraint.
>>
>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>> the smaller limit on all existing architectures.
>>
>> ...
>>
>> --- a/mm/sparse.c
>> +++ b/mm/sparse.c
>> @@ -269,7 +269,8 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p
>> {
>> unsigned long coded_mem_map =
>> (unsigned long)(mem_map - (section_nr_to_pfn(pnum)));
>> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
>> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
>> + PAGE_SHIFT));
>> BUG_ON(coded_mem_map & ~SECTION_MAP_MASK);
>> return coded_mem_map;
>> }
>
> In mm-stable this was moved into mm/internal.h's new
> sparse_init_one_section(). By David's 6a2f8fb8ed2d ("mm/sparse: move
> sparse_init_one_section() to internal.h")
Got it. I see it.
>
> I did the obvious thing:
Thank you for your work for me.
>
> include/linux/mmzone.h | 24 +++++++++---------------
> mm/internal.h | 3 ++-
> 2 files changed, 11 insertions(+), 16 deletions(-)
>
> --- a/include/linux/mmzone.h~mm-sparse-fix-build_bug_on-check-for-section-map-alignment
> +++ a/include/linux/mmzone.h
> @@ -2068,21 +2068,15 @@ static inline struct mem_section *__nr_t
> extern size_t mem_section_usage_size(void);
>
> /*
> - * We use the lower bits of the mem_map pointer to store
> - * a little bit of information. The pointer is calculated
> - * as mem_map - section_nr_to_pfn(pnum). The result is
> - * aligned to the minimum alignment of the two values:
> - * 1. All mem_map arrays are page-aligned.
> - * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT
> - * lowest bits. PFN_SECTION_SHIFT is arch-specific
> - * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the
> - * worst combination is powerpc with 256k pages,
> - * which results in PFN_SECTION_SHIFT equal 6.
> - * To sum it up, at least 6 bits are available on all architectures.
> - * However, we can exceed 6 bits on some other architectures except
> - * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available
> - * with the worst case of 64K pages on arm64) if we make sure the
> - * exceeded bit is not applicable to powerpc.
> + * We use the lower bits of the mem_map pointer to store a little bit of
> + * information. The pointer is calculated as mem_map - section_nr_to_pfn().
> + * The result is aligned to the minimum alignment of the two values:
> + *
> + * 1. All mem_map arrays are page-aligned.
> + * 2. section_nr_to_pfn() always clears PFN_SECTION_SHIFT lowest bits. Because
> + * it is subtracted from a struct page pointer, the offset is scaled by
> + * sizeof(struct page). This provides an alignment of PFN_SECTION_SHIFT +
> + * __ffs(sizeof(struct page)).
> */
> enum {
> SECTION_MARKED_PRESENT_BIT,
> --- a/mm/internal.h~mm-sparse-fix-build_bug_on-check-for-section-map-alignment
> +++ a/mm/internal.h
> @@ -972,7 +972,8 @@ static inline void sparse_init_one_secti
> {
> unsigned long coded_mem_map;
>
> - BUILD_BUG_ON(SECTION_MAP_LAST_BIT > PFN_SECTION_SHIFT);
> + BUILD_BUG_ON(SECTION_MAP_LAST_BIT > min(PFN_SECTION_SHIFT + __ffs(sizeof(struct page)),
> + PAGE_SHIFT));
>
> /*
> * We encode the start PFN of the section into the mem_map such that
> _
>
> (boy that's an eyesore on an 80-col xterm!)
On Tue, 31 Mar 2026 19:30:23 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> The comment in mmzone.h states that the alignment requirement
> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
> a byte offset scaled by sizeof(struct page). Thus, the actual
> alignment provided by the second term is PFN_SECTION_SHIFT +
> __ffs(sizeof(struct page)).
>
> Update the compile-time check and the mmzone.h comment to
> accurately reflect this mathematically guaranteed alignment by
> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
> __ffs(sizeof(struct page)). This avoids the issue of the check
> being overly restrictive on architectures like powerpc where
> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>
> Also, remove the exhaustive per-architecture bit-width list from the
> comment; such details risk falling out of date over time and may
> inadvertently be left un-updated, while the existing BUILD_BUG_ON
> provides sufficient compile-time verification of the constraint.
>
> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
> the smaller limit on all existing architectures.
Thanks. So this can fix the build on some arch/kconfig combinations?
Do you think we should fix older kernels?
> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
Eight years ago so I'm going with "no".
On 3/31/26 21:55, Andrew Morton wrote:
> On Tue, 31 Mar 2026 19:30:23 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
>
>> The comment in mmzone.h states that the alignment requirement
>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>> a byte offset scaled by sizeof(struct page). Thus, the actual
>> alignment provided by the second term is PFN_SECTION_SHIFT +
>> __ffs(sizeof(struct page)).
>>
>> Update the compile-time check and the mmzone.h comment to
>> accurately reflect this mathematically guaranteed alignment by
>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>> __ffs(sizeof(struct page)). This avoids the issue of the check
>> being overly restrictive on architectures like powerpc where
>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>
>> Also, remove the exhaustive per-architecture bit-width list from the
>> comment; such details risk falling out of date over time and may
>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>> provides sufficient compile-time verification of the constraint.
>>
>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>> the smaller limit on all existing architectures.
>
> Thanks. So this can fix the build on some arch/kconfig combinations?
>
> Do you think we should fix older kernels?
>
>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
>
> Eight years ago so I'm going with "no".
IIUC, there is nothing broken. Could only be revealed by some new
architecture.
So no stable :)
--
Cheers,
David
> On Apr 1, 2026, at 04:04, David Hildenbrand (Arm) <david@kernel.org> wrote:
>
> On 3/31/26 21:55, Andrew Morton wrote:
>> On Tue, 31 Mar 2026 19:30:23 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
>>
>>> The comment in mmzone.h states that the alignment requirement
>>> is the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT. However, the
>>> pointer arithmetic (mem_map - section_nr_to_pfn()) results in
>>> a byte offset scaled by sizeof(struct page). Thus, the actual
>>> alignment provided by the second term is PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)).
>>>
>>> Update the compile-time check and the mmzone.h comment to
>>> accurately reflect this mathematically guaranteed alignment by
>>> taking the minimum of PAGE_SHIFT and PFN_SECTION_SHIFT +
>>> __ffs(sizeof(struct page)). This avoids the issue of the check
>>> being overly restrictive on architectures like powerpc where
>>> PFN_SECTION_SHIFT alone is very small (e.g., 6).
>>>
>>> Also, remove the exhaustive per-architecture bit-width list from the
>>> comment; such details risk falling out of date over time and may
>>> inadvertently be left un-updated, while the existing BUILD_BUG_ON
>>> provides sufficient compile-time verification of the constraint.
>>>
>>> No runtime impact so far: SECTION_MAP_LAST_BIT happens to fit within
>>> the smaller limit on all existing architectures.
>>
>> Thanks. So this can fix the build on some arch/kconfig combinations?
>>
>> Do you think we should fix older kernels?
>>
>>> Fixes: def9b71ee651 ("include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer")
>>
>> Eight years ago so I'm going with "no".
>
> IIUC, there is nothing broken. Could only be revealed by some new
> architecture.
Or increasing SECTION_MAP_LAST_BIT for powerpc in the future.
>
> So no stable :)
Absolutely right.
Thanks.
>
> --
> Cheers,
>
> David
© 2016 - 2026 Red Hat, Inc.