We can now safely iterate over all pages in a folio, so no need for the
pfn_to_page().
Also, as we already force the refcount in __init_single_page() to 1,
we can just set the refcount to 0 and avoid page_ref_freeze() +
VM_BUG_ON. Likely, in the future, we would just want to tell
__init_single_page() to which value to initialize the refcount.
Further, adjust the comments to highlight that we are dealing with an
open-coded prep_compound_page() variant, and add another comment explaining
why we really need the __init_single_page() only on the tail pages.
Note that the current code was likely problematic, but we never ran into
it: prep_compound_tail() would have been called with an offset that might
exceed a memory section, and prep_compound_tail() would have simply
added that offset to the page pointer -- which would not have done the
right thing on sparsemem without vmemmap.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/hugetlb.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4a97e4f14c0dc..1f42186a85ea4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
{
enum zone_type zone = zone_idx(folio_zone(folio));
int nid = folio_nid(folio);
+ struct page *page = folio_page(folio, start_page_number);
unsigned long head_pfn = folio_pfn(folio);
unsigned long pfn, end_pfn = head_pfn + end_page_number;
- int ret;
-
- for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
- struct page *page = pfn_to_page(pfn);
+ /*
+ * We mark all tail pages with memblock_reserved_mark_noinit(),
+ * so these pages are completely uninitialized.
+ */
+ for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) {
__init_single_page(page, pfn, zone, nid);
prep_compound_tail((struct page *)folio, pfn - head_pfn);
- ret = page_ref_freeze(page, 1);
- VM_BUG_ON(!ret);
+ set_page_count(page, 0);
}
}
@@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
{
int ret;
- /* Prepare folio head */
+ /*
+ * This is an open-coded prep_compound_page() whereby we avoid
+ * walking pages twice by initializing/preparing+freezing them in the
+ * same go.
+ */
__folio_clear_reserved(folio);
__folio_set_head(folio);
ret = folio_ref_freeze(folio, 1);
VM_BUG_ON(!ret);
- /* Initialize the necessary tail struct pages */
hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
prep_compound_head((struct page *)folio, huge_page_order(h));
}
--
2.50.1
* David Hildenbrand <david@redhat.com> [250827 18:06]: > We can now safely iterate over all pages in a folio, so no need for the > pfn_to_page(). > > Also, as we already force the refcount in __init_single_page() to 1, > we can just set the refcount to 0 and avoid page_ref_freeze() + > VM_BUG_ON. Likely, in the future, we would just want to tell > __init_single_page() to which value to initialize the refcount. > > Further, adjust the comments to highlight that we are dealing with an > open-coded prep_compound_page() variant, and add another comment explaining > why we really need the __init_single_page() only on the tail pages. > > Note that the current code was likely problematic, but we never ran into > it: prep_compound_tail() would have been called with an offset that might > exceed a memory section, and prep_compound_tail() would have simply > added that offset to the page pointer -- which would not have done the > right thing on sparsemem without vmemmap. > > Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> > --- > mm/hugetlb.c | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4a97e4f14c0dc..1f42186a85ea4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, > { > enum zone_type zone = zone_idx(folio_zone(folio)); > int nid = folio_nid(folio); > + struct page *page = folio_page(folio, start_page_number); > unsigned long head_pfn = folio_pfn(folio); > unsigned long pfn, end_pfn = head_pfn + end_page_number; > - int ret; > - > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { > - struct page *page = pfn_to_page(pfn); > > + /* > + * We mark all tail pages with memblock_reserved_mark_noinit(), > + * so these pages are completely uninitialized. > + */ > + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) { > __init_single_page(page, pfn, zone, nid); > prep_compound_tail((struct page *)folio, pfn - head_pfn); > - ret = page_ref_freeze(page, 1); > - VM_BUG_ON(!ret); > + set_page_count(page, 0); > } > } > > @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio, > { > int ret; > > - /* Prepare folio head */ > + /* > + * This is an open-coded prep_compound_page() whereby we avoid > + * walking pages twice by initializing/preparing+freezing them in the > + * same go. > + */ > __folio_clear_reserved(folio); > __folio_set_head(folio); > ret = folio_ref_freeze(folio, 1); > VM_BUG_ON(!ret); > - /* Initialize the necessary tail struct pages */ > hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages); > prep_compound_head((struct page *)folio, huge_page_order(h)); > } > -- > 2.50.1 >
On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > We can now safely iterate over all pages in a folio, so no need for the > pfn_to_page(). > > Also, as we already force the refcount in __init_single_page() to 1, Mega huge nit (ignore if you want), but maybe worth saying 'via init_page_count()'. > we can just set the refcount to 0 and avoid page_ref_freeze() + > VM_BUG_ON. Likely, in the future, we would just want to tell > __init_single_page() to which value to initialize the refcount. Right yes :) > > Further, adjust the comments to highlight that we are dealing with an > open-coded prep_compound_page() variant, and add another comment explaining > why we really need the __init_single_page() only on the tail pages. Ah nice another 'anchor' to grep for! > > Note that the current code was likely problematic, but we never ran into > it: prep_compound_tail() would have been called with an offset that might > exceed a memory section, and prep_compound_tail() would have simply > added that offset to the page pointer -- which would not have done the > right thing on sparsemem without vmemmap. > > Signed-off-by: David Hildenbrand <david@redhat.com> LGTM, so: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > mm/hugetlb.c | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4a97e4f14c0dc..1f42186a85ea4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, > { > enum zone_type zone = zone_idx(folio_zone(folio)); > int nid = folio_nid(folio); > + struct page *page = folio_page(folio, start_page_number); > unsigned long head_pfn = folio_pfn(folio); > unsigned long pfn, end_pfn = head_pfn + end_page_number; > - int ret; > - > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { > - struct page *page = pfn_to_page(pfn); > > + /* > + * We mark all tail pages with memblock_reserved_mark_noinit(), > + * so these pages are completely uninitialized. > + */ > + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) { > __init_single_page(page, pfn, zone, nid); > prep_compound_tail((struct page *)folio, pfn - head_pfn); > - ret = page_ref_freeze(page, 1); > - VM_BUG_ON(!ret); > + set_page_count(page, 0); > } > } > > @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio, > { > int ret; > > - /* Prepare folio head */ > + /* > + * This is an open-coded prep_compound_page() whereby we avoid > + * walking pages twice by initializing/preparing+freezing them in the > + * same go. > + */ > __folio_clear_reserved(folio); > __folio_set_head(folio); > ret = folio_ref_freeze(folio, 1); > VM_BUG_ON(!ret); > - /* Initialize the necessary tail struct pages */ > hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages); > prep_compound_head((struct page *)folio, huge_page_order(h)); > } > -- > 2.50.1 >
On 28.08.25 17:37, Lorenzo Stoakes wrote: > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: >> We can now safely iterate over all pages in a folio, so no need for the >> pfn_to_page(). >> >> Also, as we already force the refcount in __init_single_page() to 1, > > Mega huge nit (ignore if you want), but maybe worth saying 'via > init_page_count()'. Will add, thanks! -- Cheers David / dhildenb
On Fri, Aug 29, 2025 at 01:59:19PM +0200, David Hildenbrand wrote: > On 28.08.25 17:37, Lorenzo Stoakes wrote: > > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > > > We can now safely iterate over all pages in a folio, so no need for the > > > pfn_to_page(). > > > > > > Also, as we already force the refcount in __init_single_page() to 1, > > > > Mega huge nit (ignore if you want), but maybe worth saying 'via > > init_page_count()'. > > Will add, thanks! Thanks! > > -- > Cheers > > David / dhildenb > >
On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > We can now safely iterate over all pages in a folio, so no need for the > pfn_to_page(). > > Also, as we already force the refcount in __init_single_page() to 1, > we can just set the refcount to 0 and avoid page_ref_freeze() + > VM_BUG_ON. Likely, in the future, we would just want to tell > __init_single_page() to which value to initialize the refcount. > > Further, adjust the comments to highlight that we are dealing with an > open-coded prep_compound_page() variant, and add another comment explaining > why we really need the __init_single_page() only on the tail pages. > > Note that the current code was likely problematic, but we never ran into > it: prep_compound_tail() would have been called with an offset that might > exceed a memory section, and prep_compound_tail() would have simply > added that offset to the page pointer -- which would not have done the > right thing on sparsemem without vmemmap. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/hugetlb.c | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 4a97e4f14c0dc..1f42186a85ea4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, > { > enum zone_type zone = zone_idx(folio_zone(folio)); > int nid = folio_nid(folio); > + struct page *page = folio_page(folio, start_page_number); > unsigned long head_pfn = folio_pfn(folio); > unsigned long pfn, end_pfn = head_pfn + end_page_number; > - int ret; > - > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { > - struct page *page = pfn_to_page(pfn); > > + /* > + * We mark all tail pages with memblock_reserved_mark_noinit(), > + * so these pages are completely uninitialized. ^ not? ;-) > + */ > + for (pfn = head_pfn + start_page_number; pfn < end_pfn; page++, pfn++) { > __init_single_page(page, pfn, zone, nid); > prep_compound_tail((struct page *)folio, pfn - head_pfn); > - ret = page_ref_freeze(page, 1); > - VM_BUG_ON(!ret); > + set_page_count(page, 0); > } > } > > @@ -3257,12 +3258,15 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio, > { > int ret; > > - /* Prepare folio head */ > + /* > + * This is an open-coded prep_compound_page() whereby we avoid > + * walking pages twice by initializing/preparing+freezing them in the > + * same go. > + */ > __folio_clear_reserved(folio); > __folio_set_head(folio); > ret = folio_ref_freeze(folio, 1); > VM_BUG_ON(!ret); > - /* Initialize the necessary tail struct pages */ > hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages); > prep_compound_head((struct page *)folio, huge_page_order(h)); > } > -- > 2.50.1 > -- Sincerely yours, Mike.
On 28.08.25 09:21, Mike Rapoport wrote: > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: >> We can now safely iterate over all pages in a folio, so no need for the >> pfn_to_page(). >> >> Also, as we already force the refcount in __init_single_page() to 1, >> we can just set the refcount to 0 and avoid page_ref_freeze() + >> VM_BUG_ON. Likely, in the future, we would just want to tell >> __init_single_page() to which value to initialize the refcount. >> >> Further, adjust the comments to highlight that we are dealing with an >> open-coded prep_compound_page() variant, and add another comment explaining >> why we really need the __init_single_page() only on the tail pages. >> >> Note that the current code was likely problematic, but we never ran into >> it: prep_compound_tail() would have been called with an offset that might >> exceed a memory section, and prep_compound_tail() would have simply >> added that offset to the page pointer -- which would not have done the >> right thing on sparsemem without vmemmap. >> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/hugetlb.c | 20 ++++++++++++-------- >> 1 file changed, 12 insertions(+), 8 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 4a97e4f14c0dc..1f42186a85ea4 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, >> { >> enum zone_type zone = zone_idx(folio_zone(folio)); >> int nid = folio_nid(folio); >> + struct page *page = folio_page(folio, start_page_number); >> unsigned long head_pfn = folio_pfn(folio); >> unsigned long pfn, end_pfn = head_pfn + end_page_number; >> - int ret; >> - >> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { >> - struct page *page = pfn_to_page(pfn); >> >> + /* >> + * We mark all tail pages with memblock_reserved_mark_noinit(), >> + * so these pages are completely uninitialized. > > ^ not? ;-) Can you elaborate? -- Cheers David / dhildenb
On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote: > On 28.08.25 09:21, Mike Rapoport wrote: > > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > > > We can now safely iterate over all pages in a folio, so no need for the > > > pfn_to_page(). > > > > > > Also, as we already force the refcount in __init_single_page() to 1, > > > we can just set the refcount to 0 and avoid page_ref_freeze() + > > > VM_BUG_ON. Likely, in the future, we would just want to tell > > > __init_single_page() to which value to initialize the refcount. > > > > > > Further, adjust the comments to highlight that we are dealing with an > > > open-coded prep_compound_page() variant, and add another comment explaining > > > why we really need the __init_single_page() only on the tail pages. > > > > > > Note that the current code was likely problematic, but we never ran into > > > it: prep_compound_tail() would have been called with an offset that might > > > exceed a memory section, and prep_compound_tail() would have simply > > > added that offset to the page pointer -- which would not have done the > > > right thing on sparsemem without vmemmap. > > > > > > Signed-off-by: David Hildenbrand <david@redhat.com> > > > --- > > > mm/hugetlb.c | 20 ++++++++++++-------- > > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 4a97e4f14c0dc..1f42186a85ea4 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, > > > { > > > enum zone_type zone = zone_idx(folio_zone(folio)); > > > int nid = folio_nid(folio); > > > + struct page *page = folio_page(folio, start_page_number); > > > unsigned long head_pfn = folio_pfn(folio); > > > unsigned long pfn, end_pfn = head_pfn + end_page_number; > > > - int ret; > > > - > > > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { > > > - struct page *page = pfn_to_page(pfn); > > > + /* > > > + * We mark all tail pages with memblock_reserved_mark_noinit(), > > > + * so these pages are completely uninitialized. > > > > ^ not? ;-) > > Can you elaborate? Oh, sorry, I misread "uninitialized". Still, I'd phrase it as /* * We marked all tail pages with memblock_reserved_mark_noinit(), * so we must initialize them here. */ > -- > Cheers > > David / dhildenb > -- Sincerely yours, Mike.
On 28.08.25 10:06, Mike Rapoport wrote: > On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote: >> On 28.08.25 09:21, Mike Rapoport wrote: >>> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: >>>> We can now safely iterate over all pages in a folio, so no need for the >>>> pfn_to_page(). >>>> >>>> Also, as we already force the refcount in __init_single_page() to 1, >>>> we can just set the refcount to 0 and avoid page_ref_freeze() + >>>> VM_BUG_ON. Likely, in the future, we would just want to tell >>>> __init_single_page() to which value to initialize the refcount. >>>> >>>> Further, adjust the comments to highlight that we are dealing with an >>>> open-coded prep_compound_page() variant, and add another comment explaining >>>> why we really need the __init_single_page() only on the tail pages. >>>> >>>> Note that the current code was likely problematic, but we never ran into >>>> it: prep_compound_tail() would have been called with an offset that might >>>> exceed a memory section, and prep_compound_tail() would have simply >>>> added that offset to the page pointer -- which would not have done the >>>> right thing on sparsemem without vmemmap. >>>> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> mm/hugetlb.c | 20 ++++++++++++-------- >>>> 1 file changed, 12 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>>> index 4a97e4f14c0dc..1f42186a85ea4 100644 >>>> --- a/mm/hugetlb.c >>>> +++ b/mm/hugetlb.c >>>> @@ -3237,17 +3237,18 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, >>>> { >>>> enum zone_type zone = zone_idx(folio_zone(folio)); >>>> int nid = folio_nid(folio); >>>> + struct page *page = folio_page(folio, start_page_number); >>>> unsigned long head_pfn = folio_pfn(folio); >>>> unsigned long pfn, end_pfn = head_pfn + end_page_number; >>>> - int ret; >>>> - >>>> - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { >>>> - struct page *page = pfn_to_page(pfn); >>>> + /* >>>> + * We mark all tail pages with memblock_reserved_mark_noinit(), >>>> + * so these pages are completely uninitialized. >>> >>> ^ not? ;-) >> >> Can you elaborate? > > Oh, sorry, I misread "uninitialized". > Still, I'd phrase it as > > /* > * We marked all tail pages with memblock_reserved_mark_noinit(), > * so we must initialize them here. > */ I prefer what I currently have, but thanks for the review. -- Cheers David / dhildenb
On Thu, Aug 28, 2025 at 10:18:23AM +0200, David Hildenbrand wrote: > On 28.08.25 10:06, Mike Rapoport wrote: > > On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote: > > > On 28.08.25 09:21, Mike Rapoport wrote: > > > > On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: > > > > > + /* > > > > > + * We mark all tail pages with memblock_reserved_mark_noinit(), > > > > > + * so these pages are completely uninitialized. > > > > > > > > ^ not? ;-) > > > > > > Can you elaborate? > > > > Oh, sorry, I misread "uninitialized". > > Still, I'd phrase it as > > > > /* > > * We marked all tail pages with memblock_reserved_mark_noinit(), > > * so we must initialize them here. > > */ > > I prefer what I currently have, but thanks for the review. No strong feelings, feel free to add Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> -- Sincerely yours, Mike.
On 28.08.25 10:37, Mike Rapoport wrote: > On Thu, Aug 28, 2025 at 10:18:23AM +0200, David Hildenbrand wrote: >> On 28.08.25 10:06, Mike Rapoport wrote: >>> On Thu, Aug 28, 2025 at 09:44:27AM +0200, David Hildenbrand wrote: >>>> On 28.08.25 09:21, Mike Rapoport wrote: >>>>> On Thu, Aug 28, 2025 at 12:01:17AM +0200, David Hildenbrand wrote: >>>>>> + /* >>>>>> + * We mark all tail pages with memblock_reserved_mark_noinit(), >>>>>> + * so these pages are completely uninitialized. >>>>> >>>>> ^ not? ;-) >>>> >>>> Can you elaborate? >>> >>> Oh, sorry, I misread "uninitialized". >>> Still, I'd phrase it as >>> >>> /* >>> * We marked all tail pages with memblock_reserved_mark_noinit(), >>> * so we must initialize them here. >>> */ >> >> I prefer what I currently have, but thanks for the review. > > No strong feelings, feel free to add > > Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > I now have "As we marked all tail pages with memblock_reserved_mark_noinit(), we must initialize them ourselves here." -- Cheers David / dhildenb
© 2016 - 2025 Red Hat, Inc.