We can now safely iterate over all pages in a folio, so no need for the
pfn_to_page().
Also, as we already force the refcount in __init_single_page() to 1,
we can just set the refcount to 0 and avoid page_ref_freeze() +
VM_BUG_ON. Likely, in the future, we would just want to tell
__init_single_page() to which value to initialize the refcount.
Further, adjust the comments to highlight that we are dealing with an
open-coded prep_compound_page() variant, and add another comment explaining
why we really need the __init_single_page() only on the tail pages.
Note that the current code was likely problematic, but we never ran into
it: prep_compound_tail() would have been called with an offset that might
exceed a memory section, and prep_compound_tail() would have simply
added that offset to the page pointer -- which would not have done the
right thing on sparsemem without vmemmap.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/hugetlb.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4a97e4f14c0dc..1f42186a85ea4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
{
enum zone_type zone = zone_idx(folio_zone(folio));
int nid = folio_nid(folio);
+ struct page *page = folio_page(folio, start_page_number);
unsigned long head_pfn = folio_pfn(folio);
unsigned long pfn, end_pfn = head_pfn + end_page_number;
- int ret;
-
- for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
- struct page *page = pfn_to_page(pfn);
+ /*
+ * We mark all tail pages with memblock_reserved_mark_noinit(),
+ * so these pages are completely uninitialized.
+ */
+ for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
__init_single_page(page, pfn, zone, nid);
prep_compound_tail((struct page *)folio, pfn - head_pfn);
- ret = page_ref_freeze(page, 1);
- VM_BUG_ON(!ret);
+ set_page_count(page, 0);
}
}
@@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
{
int ret;
- /* Prepare folio head */
+ /*
+ * This is an open-coded prep_compound_page() whereby we avoid
+ * walking pages twice by initializing/preparing+freezing them in the
+ * same go.
+ */
__folio_clear_reserved(folio);
__folio_set_head(folio);
ret = folio_ref_freeze(folio, 1);
VM_BUG_ON(!ret);
- /* Initialize the necessary tail struct pages */
hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
prep_compound_head((struct page *)folio, huge_page_order(h));
}
--
2.50.1
* David Hildenbrand <david@redhat.com> [250827 18:06]:
> We can now safely iterate over all pages in a folio, so no need for the
> pfn_to_page().
>
> Also, as we already force the refcount in __init_single_page() to 1,
> we can just set the refcount to 0 and avoid page_ref_freeze() +
> VM_BUG_ON. Likely, in the future, we would just want to tell
> __init_single_page() to which value to initialize the refcount.
>
> Further, adjust the comments to highlight that we are dealing with an
> open-coded prep_compound_page() variant, and add another comment explaining
> why we really need the __init_single_page() only on the tail pages.
>
> Note that the current code was likely problematic, but we never ran into
> it: prep_compound_tail() would have been called with an offset that might
> exceed a memory section, and prep_compound_tail() would have simply
> added that offset to the page pointer -- which would not have done the
> right thing on sparsemem without vmemmap.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
> mm/hugetlb.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a97e4f14c0dc..1f42186a85ea4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> {
> enum zone_type zone = zone_idx(folio_zone(folio));
> int nid = folio_nid(folio);
> + struct page *page = folio_page(folio, start_page_number);
> unsigned long head_pfn = folio_pfn(folio);
> unsigned long pfn, end_pfn = head_pfn + end_page_number;
> - int ret;
> -
> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> - struct page *page = pfn_to_page(pfn);
>
> + /*
> + * We mark all tail pages with memblock_reserved_mark_noinit(),
> + * so these pages are completely uninitialized.
> + */
> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail((struct page *)folio, pfn - head_pfn);
> - ret = page_ref_freeze(page, 1);
> - VM_BUG_ON(!ret);
> + set_page_count(page, 0);
> }
> }
>
> @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> {
> int ret;
>
> - /* Prepare folio head */
> + /*
> + * This is an open-coded prep_compound_page() whereby we avoid
> + * walking pages twice by initializing/preparing+freezing them in the
> + * same go.
> + */
> __folio_clear_reserved(folio);
> __folio_set_head(folio);
> ret = folio_ref_freeze(folio, 1);
> VM_BUG_ON(!ret);
> - /* Initialize the necessary tail struct pages */
> hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
> prep_compound_head((struct page *)folio, huge_page_order(h));
> }
> --
> 2.50.1
>
On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> We can now safely iterate over all pages in a folio, so no need for the
> pfn_to_page().
>
> Also, as we already force the refcount in __init_single_page() to 1,
Mega huge nit (ignore if you want), but maybe worth saying 'via
init_page_count()'.
> we can just set the refcount to 0 and avoid page_ref_freeze() +
> VM_BUG_ON. Likely, in the future, we would just want to tell
> __init_single_page() to which value to initialize the refcount.
Right yes :)
>
> Further, adjust the comments to highlight that we are dealing with an
> open-coded prep_compound_page() variant, and add another comment explaining
> why we really need the __init_single_page() only on the tail pages.
Ah nice another 'anchor' to grep for!
>
> Note that the current code was likely problematic, but we never ran into
> it: prep_compound_tail() would have been called with an offset that might
> exceed a memory section, and prep_compound_tail() would have simply
> added that offset to the page pointer -- which would not have done the
> right thing on sparsemem without vmemmap.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
LGTM, so:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> mm/hugetlb.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a97e4f14c0dc..1f42186a85ea4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> {
> enum zone_type zone = zone_idx(folio_zone(folio));
> int nid = folio_nid(folio);
> + struct page *page = folio_page(folio, start_page_number);
> unsigned long head_pfn = folio_pfn(folio);
> unsigned long pfn, end_pfn = head_pfn + end_page_number;
> - int ret;
> -
> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> - struct page *page = pfn_to_page(pfn);
>
> + /*
> + * We mark all tail pages with memblock_reserved_mark_noinit(),
> + * so these pages are completely uninitialized.
> + */
> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail((struct page *)folio, pfn - head_pfn);
> - ret = page_ref_freeze(page, 1);
> - VM_BUG_ON(!ret);
> + set_page_count(page, 0);
> }
> }
>
> @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> {
> int ret;
>
> - /* Prepare folio head */
> + /*
> + * This is an open-coded prep_compound_page() whereby we avoid
> + * walking pages twice by initializing/preparing+freezing them in the
> + * same go.
> + */
> __folio_clear_reserved(folio);
> __folio_set_head(folio);
> ret = folio_ref_freeze(folio, 1);
> VM_BUG_ON(!ret);
> - /* Initialize the necessary tail struct pages */
> hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
> prep_compound_head((struct page *)folio, huge_page_order(h));
> }
> --
> 2.50.1
>
On 28.08.25 17:37, Lorenzo Stoakes wrote: > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: >> We can now safely iterate over all pages in a folio, so no need for the >> pfn_to_page(). >> >> Also, as we already force the refcount in __init_single_page() to 1, > > Mega huge nit (ignore if you want), but maybe worth saying 'via > init_page_count()'. Will add, thanks! -- Cheers David / dhildenb
On Fri, Aug 29, 2025 at 01:59:19PM +0200, David Hildenbrand wrote: > On 28.08.25 17:37, Lorenzo Stoakes wrote: > > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > > > We can now safely iterate over all pages in a folio, so no need for the > > > pfn_to_page(). > > > > > > Also, as we already force the refcount in __init_single_page() to 1, > > > > Mega huge nit (ignore if you want), but maybe worth saying 'via > > init_page_count()'. > > Will add, thanks! Thanks! > > -- > Cheers > > David / dhildenb > >
On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> We can now safely iterate over all pages in a folio, so no need for the
> pfn_to_page().
>
> Also, as we already force the refcount in __init_single_page() to 1,
> we can just set the refcount to 0 and avoid page_ref_freeze() +
> VM_BUG_ON. Likely, in the future, we would just want to tell
> __init_single_page() to which value to initialize the refcount.
>
> Further, adjust the comments to highlight that we are dealing with an
> open-coded prep_compound_page() variant, and add another comment explaining
> why we really need the __init_single_page() only on the tail pages.
>
> Note that the current code was likely problematic, but we never ran into
> it: prep_compound_tail() would have been called with an offset that might
> exceed a memory section, and prep_compound_tail() would have simply
> added that offset to the page pointer -- which would not have done the
> right thing on sparsemem without vmemmap.
>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> mm/hugetlb.c | 20 ++++++++++++--------
> 1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4a97e4f14c0dc..1f42186a85ea4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> {
> enum zone_type zone = zone_idx(folio_zone(folio));
> int nid = folio_nid(folio);
> + struct page *page = folio_page(folio, start_page_number);
> unsigned long head_pfn = folio_pfn(folio);
> unsigned long pfn, end_pfn = head_pfn + end_page_number;
> - int ret;
> -
> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> - struct page *page = pfn_to_page(pfn);
>
> + /*
> + * We mark all tail pages with memblock_reserved_mark_noinit(),
> + * so these pages are completely uninitialized.
^ not? ;-)
> + */
> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
> __init_single_page(page, pfn, zone, nid);
> prep_compound_tail((struct page *)folio, pfn - head_pfn);
> - ret = page_ref_freeze(page, 1);
> - VM_BUG_ON(!ret);
> + set_page_count(page, 0);
> }
> }
>
> @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> {
> int ret;
>
> - /* Prepare folio head */
> + /*
> + * This is an open-coded prep_compound_page() whereby we avoid
> + * walking pages twice by initializing/preparing+freezing them in the
> + * same go.
> + */
> __folio_clear_reserved(folio);
> __folio_set_head(folio);
> ret = folio_ref_freeze(folio, 1);
> VM_BUG_ON(!ret);
> - /* Initialize the necessary tail struct pages */
> hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
> prep_compound_head((struct page *)folio, huge_page_order(h));
> }
> --
> 2.50.1
>
--
Sincerely yours,
Mike.
On 28.08.25 09:21, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
>> We can now safely iterate over all pages in a folio, so no need for the
>> pfn_to_page().
>>
>> Also, as we already force the refcount in __init_single_page() to 1,
>> we can just set the refcount to 0 and avoid page_ref_freeze() +
>> VM_BUG_ON. Likely, in the future, we would just want to tell
>> __init_single_page() to which value to initialize the refcount.
>>
>> Further, adjust the comments to highlight that we are dealing with an
>> open-coded prep_compound_page() variant, and add another comment explaining
>> why we really need the __init_single_page() only on the tail pages.
>>
>> Note that the current code was likely problematic, but we never ran into
>> it: prep_compound_tail() would have been called with an offset that might
>> exceed a memory section, and prep_compound_tail() would have simply
>> added that offset to the page pointer -- which would not have done the
>> right thing on sparsemem without vmemmap.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> mm/hugetlb.c | 20 ++++++++++++--------
>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 4a97e4f14c0dc..1f42186a85ea4 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
>> {
>> enum zone_type zone = zone_idx(folio_zone(folio));
>> int nid = folio_nid(folio);
>> + struct page *page = folio_page(folio, start_page_number);
>> unsigned long head_pfn = folio_pfn(folio);
>> unsigned long pfn, end_pfn = head_pfn + end_page_number;
>> - int ret;
>> -
>> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
>> - struct page *page = pfn_to_page(pfn);
>>
>> + /*
>> + * We mark all tail pages with memblock_reserved_mark_noinit(),
>> + * so these pages are completely uninitialized.
>
> ^ not? ;-)
Can you elaborate?
--
Cheers
David / dhildenb
On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote:
> On 28.08.25 09:21, Mike Rapoport wrote:
> > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
> > > We can now safely iterate over all pages in a folio, so no need for the
> > > pfn_to_page().
> > >
> > > Also, as we already force the refcount in __init_single_page() to 1,
> > > we can just set the refcount to 0 and avoid page_ref_freeze() +
> > > VM_BUG_ON. Likely, in the future, we would just want to tell
> > > __init_single_page() to which value to initialize the refcount.
> > >
> > > Further, adjust the comments to highlight that we are dealing with an
> > > open-coded prep_compound_page() variant, and add another comment explaining
> > > why we really need the __init_single_page() only on the tail pages.
> > >
> > > Note that the current code was likely problematic, but we never ran into
> > > it: prep_compound_tail() would have been called with an offset that might
> > > exceed a memory section, and prep_compound_tail() would have simply
> > > added that offset to the page pointer -- which would not have done the
> > > right thing on sparsemem without vmemmap.
> > >
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > mm/hugetlb.c | 20 ++++++++++++--------
> > > 1 file changed, 12 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 4a97e4f14c0dc..1f42186a85ea4 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
> > > {
> > > enum zone_type zone = zone_idx(folio_zone(folio));
> > > int nid = folio_nid(folio);
> > > + struct page *page = folio_page(folio, start_page_number);
> > > unsigned long head_pfn = folio_pfn(folio);
> > > unsigned long pfn, end_pfn = head_pfn + end_page_number;
> > > - int ret;
> > > -
> > > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
> > > - struct page *page = pfn_to_page(pfn);
> > > + /*
> > > + * We mark all tail pages with memblock_reserved_mark_noinit(),
> > > + * so these pages are completely uninitialized.
> >
> > ^ not? ;-)
>
> Can you elaborate?
Oh, sorry, I misread "uninitialized".
Still, I'd phrase it as
/*
* We marked all tail pages with memblock_reserved_mark_noinit(),
* so we must initialize them here.
*/
> --
> Cheers
>
> David / dhildenb
>
--
Sincerely yours,
Mike.
On 28.08.25 10:06, Mike Rapoport wrote:
> On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote:
>> On 28.08.25 09:21, Mike Rapoport wrote:
>>> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote:
>>>> We can now safely iterate over all pages in a folio, so no need for the
>>>> pfn_to_page().
>>>>
>>>> Also, as we already force the refcount in __init_single_page() to 1,
>>>> we can just set the refcount to 0 and avoid page_ref_freeze() +
>>>> VM_BUG_ON. Likely, in the future, we would just want to tell
>>>> __init_single_page() to which value to initialize the refcount.
>>>>
>>>> Further, adjust the comments to highlight that we are dealing with an
>>>> open-coded prep_compound_page() variant, and add another comment explaining
>>>> why we really need the __init_single_page() only on the tail pages.
>>>>
>>>> Note that the current code was likely problematic, but we never ran into
>>>> it: prep_compound_tail() would have been called with an offset that might
>>>> exceed a memory section, and prep_compound_tail() would have simply
>>>> added that offset to the page pointer -- which would not have done the
>>>> right thing on sparsemem without vmemmap.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> mm/hugetlb.c | 20 ++++++++++++--------
>>>> 1 file changed, 12 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 4a97e4f14c0dc..1f42186a85ea4 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
>>>> {
>>>> enum zone_type zone = zone_idx(folio_zone(folio));
>>>> int nid = folio_nid(folio);
>>>> + struct page *page = folio_page(folio, start_page_number);
>>>> unsigned long head_pfn = folio_pfn(folio);
>>>> unsigned long pfn, end_pfn = head_pfn + end_page_number;
>>>> - int ret;
>>>> -
>>>> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
>>>> - struct page *page = pfn_to_page(pfn);
>>>> + /*
>>>> + * We mark all tail pages with memblock_reserved_mark_noinit(),
>>>> + * so these pages are completely uninitialized.
>>>
>>> ^ not? ;-)
>>
>> Can you elaborate?
>
> Oh, sorry, I misread "uninitialized".
> Still, I'd phrase it as
>
> /*
> * We marked all tail pages with memblock_reserved_mark_noinit(),
> * so we must initialize them here.
> */
I prefer what I currently have, but thanks for the review.
--
Cheers
David / dhildenb
On Thu, Aug 28, 2025 at 10:18:23AM +0200, David Hildenbrand wrote: > On 28.08.25 10:06, Mike Rapoport wrote: > > On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote: > > > On 28.08.25 09:21, Mike Rapoport wrote: > > > > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > > > > > + /* > > > > > + * We mark all tail pages with memblock_reserved_mark_noinit(), > > > > > + * so these pages are completely uninitialized. > > > > > > > > ^ not? ;-) > > > > > > Can you elaborate? > > > > Oh, sorry, I misread "uninitialized". > > Still, I'd phrase it as > > > > /* > > * We marked all tail pages with memblock_reserved_mark_noinit(), > > * so we must initialize them here. > > */ > > I prefer what I currently have, but thanks for the review. No strong feelings, feel free to add Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> -- Sincerely yours, Mike.
On 28.08.25 10:37, Mike Rapoport wrote: > On Thu, Aug 28, 2025 at 10:18:23AM +0200, David Hildenbrand wrote: >> On 28.08.25 10:06, Mike Rapoport wrote: >>> On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote: >>>> On 28.08.25 09:21, Mike Rapoport wrote: >>>>> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: >>>>>> + /* >>>>>> + * We mark all tail pages with memblock_reserved_mark_noinit(), >>>>>> + * so these pages are completely uninitialized. >>>>> >>>>> ^ not? ;-) >>>> >>>> Can you elaborate? >>> >>> Oh, sorry, I misread "uninitialized". >>> Still, I'd phrase it as >>> >>> /* >>> * We marked all tail pages with memblock_reserved_mark_noinit(), >>> * so we must initialize them here. >>> */ >> >> I prefer what I currently have, but thanks for the review. > > No strong feelings, feel free to add > > Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > I now have "As we marked all tail pages with memblock_reserved_mark_noinit(), we must initialize them ourselves here." -- Cheers David / dhildenb
© 2016 - 2026 Red Hat, Inc.