All pages were already initialized and set to PageReserved() with a
refcount of 1 by MM init code.
In fact, by using __init_single_page(), we will be setting the refcount to
1 just to freeze it again immediately afterwards.
So drop the __init_single_page() and use __ClearPageReserved() instead.
Adjust the comments to highlight that we are dealing with an open-coded
prep_compound_page() variant.
Further, as we can now safely iterate over all pages in a folio, let's
avoid the page-pfn dance and just iterate the pages directly.
Note that the current code was likely problematic, but we never ran into
it: prep_compound_tail() would have been called with an offset that might
exceed a memory section, and prep_compound_tail() would have simply
added that offset to the page pointer -- which would not have done the
right thing on sparsemem without vmemmap.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
mm/hugetlb.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d12a9d5146af4..ae82a845b14ad 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3235,17 +3235,14 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
unsigned long start_page_number,
unsigned long end_page_number)
{
- enum zone_type zone = zone_idx(folio_zone(folio));
- int nid = folio_nid(folio);
- unsigned long head_pfn = folio_pfn(folio);
- unsigned long pfn, end_pfn = head_pfn + end_page_number;
+ struct page *head_page = folio_page(folio, 0);
+ struct page *page = folio_page(folio, start_page_number);
+ unsigned long i;
int ret;
- for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
- struct page *page = pfn_to_page(pfn);
-
- __init_single_page(page, pfn, zone, nid);
- prep_compound_tail((struct page *)folio, pfn - head_pfn);
+ for (i = start_page_number; i < end_page_number; i++, page++) {
+ __ClearPageReserved(page);
+ prep_compound_tail(head_page, i);
ret = page_ref_freeze(page, 1);
VM_BUG_ON(!ret);
}
@@ -3257,12 +3254,14 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
{
int ret;
- /* Prepare folio head */
+ /*
+ * This is an open-coded prep_compound_page() whereby we avoid
+ * walking pages twice by preparing+freezing them in the same go.
+ */
__folio_clear_reserved(folio);
__folio_set_head(folio);
ret = folio_ref_freeze(folio, 1);
VM_BUG_ON(!ret);
- /* Initialize the necessary tail struct pages */
hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages);
prep_compound_head((struct page *)folio, huge_page_order(h));
}
--
2.50.1
On 8/21/25 23:06, David Hildenbrand wrote: > All pages were already initialized and set to PageReserved() with a > refcount of 1 by MM init code. Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to initialize struct pages? > In fact, by using __init_single_page(), we will be setting the refcount to > 1 just to freeze it again immediately afterwards. > > So drop the __init_single_page() and use __ClearPageReserved() instead. > Adjust the comments to highlight that we are dealing with an open-coded > prep_compound_page() variant. > > Further, as we can now safely iterate over all pages in a folio, let's > avoid the page-pfn dance and just iterate the pages directly. > > Note that the current code was likely problematic, but we never ran into > it: prep_compound_tail() would have been called with an offset that might > exceed a memory section, and prep_compound_tail() would have simply > added that offset to the page pointer -- which would not have done the > right thing on sparsemem without vmemmap. > > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/hugetlb.c | 21 ++++++++++----------- > 1 file changed, 10 insertions(+), 11 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index d12a9d5146af4..ae82a845b14ad 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3235,17 +3235,14 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, > unsigned long start_page_number, > unsigned long end_page_number) > { > - enum zone_type zone = zone_idx(folio_zone(folio)); > - int nid = folio_nid(folio); > - unsigned long head_pfn = folio_pfn(folio); > - unsigned long pfn, end_pfn = head_pfn + end_page_number; > + struct page *head_page = folio_page(folio, 0); > + struct page *page = folio_page(folio, start_page_number); > + unsigned long i; > int ret; > > - for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { > - struct page *page = pfn_to_page(pfn); > - > - __init_single_page(page, pfn, zone, nid); > - prep_compound_tail((struct page *)folio, pfn - head_pfn); > + for (i = start_page_number; i < end_page_number; i++, page++) { > + __ClearPageReserved(page); > + prep_compound_tail(head_page, i); > ret = page_ref_freeze(page, 1); > VM_BUG_ON(!ret); > } > @@ -3257,12 +3254,14 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio, > { > int ret; > > - /* Prepare folio head */ > + /* > + * This is an open-coded prep_compound_page() whereby we avoid > + * walking pages twice by preparing+freezing them in the same go. > + */ > __folio_clear_reserved(folio); > __folio_set_head(folio); > ret = folio_ref_freeze(folio, 1); > VM_BUG_ON(!ret); > - /* Initialize the necessary tail struct pages */ > hugetlb_folio_init_tail_vmemmap(folio, 1, nr_pages); > prep_compound_head((struct page *)folio, huge_page_order(h)); > } --Mika
On 22.08.25 06:09, Mika Penttilä wrote: > > On 8/21/25 23:06, David Hildenbrand wrote: > >> All pages were already initialized and set to PageReserved() with a >> refcount of 1 by MM init code. > > Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to > initialize struct pages? Excellent point, I did not know about that one. Spotting that we don't do the same for the head page made me assume that it's just a misuse of __init_single_page(). But the nasty thing is that we use memblock_reserved_mark_noinit() to only mark the tail pages ... Let me revert back to __init_single_page() and add a big fat comment why this is required. Thanks! -- Cheers David / dhildenb
On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: > On 22.08.25 06:09, Mika Penttilä wrote: > > > > On 8/21/25 23:06, David Hildenbrand wrote: > > > > > All pages were already initialized and set to PageReserved() with a > > > refcount of 1 by MM init code. > > > > Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to > > initialize struct pages? > > Excellent point, I did not know about that one. > > Spotting that we don't do the same for the head page made me assume that > it's just a misuse of __init_single_page(). > > But the nasty thing is that we use memblock_reserved_mark_noinit() to only > mark the tail pages ... And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled struct pages are initialized regardless of memblock_reserved_mark_noinit(). I think this patch should go in before your updates: diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 753f99b4c718..1c51788339a5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3230,6 +3230,22 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) return 1; } +/* + * Tail pages in a huge folio allocated from memblock are marked as 'noinit', + * which means that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled their + * struct page won't be initialized + */ +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +static void __init hugetlb_init_tail_page(struct page *page, unsigned long pfn, + enum zone_type zone, int nid) +{ + __init_single_page(page, pfn, zone, nid); +} +#else +static inline void hugetlb_init_tail_page(struct page *page, unsigned long pfn, + enum zone_type zone, int nid) {} +#endif + /* Initialize [start_page:end_page_number] tail struct pages of a hugepage */ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, unsigned long start_page_number, @@ -3244,7 +3260,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { struct page *page = pfn_to_page(pfn); - __init_single_page(page, pfn, zone, nid); + hugetlb_init_tail_page(page, pfn, zone, nid); prep_compound_tail((struct page *)folio, pfn - head_pfn); ret = page_ref_freeze(page, 1); VM_BUG_ON(!ret); > Let me revert back to __init_single_page() and add a big fat comment why > this is required. > > Thanks! -- Sincerely yours, Mike.
On 23.08.25 10:59, Mike Rapoport wrote: > On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: >> On 22.08.25 06:09, Mika Penttilä wrote: >>> >>> On 8/21/25 23:06, David Hildenbrand wrote: >>> >>>> All pages were already initialized and set to PageReserved() with a >>>> refcount of 1 by MM init code. >>> >>> Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to >>> initialize struct pages? >> >> Excellent point, I did not know about that one. >> >> Spotting that we don't do the same for the head page made me assume that >> it's just a misuse of __init_single_page(). >> >> But the nasty thing is that we use memblock_reserved_mark_noinit() to only >> mark the tail pages ... > > And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > disabled struct pages are initialized regardless of > memblock_reserved_mark_noinit(). > > I think this patch should go in before your updates: Shouldn't we fix this in memblock code? Hacking around that in the memblock_reserved_mark_noinit() user sound wrong -- and nothing in the doc of memblock_reserved_mark_noinit() spells that behavior out. -- Cheers David / dhildenb
On Mon, Aug 25, 2025 at 02:48:58PM +0200, David Hildenbrand wrote: > On 23.08.25 10:59, Mike Rapoport wrote: > > On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: > > > On 22.08.25 06:09, Mika Penttilä wrote: > > > > > > > > On 8/21/25 23:06, David Hildenbrand wrote: > > > > > > > > > All pages were already initialized and set to PageReserved() with a > > > > > refcount of 1 by MM init code. > > > > > > > > Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to > > > > initialize struct pages? > > > > > > Excellent point, I did not know about that one. > > > > > > Spotting that we don't do the same for the head page made me assume that > > > it's just a misuse of __init_single_page(). > > > > > > But the nasty thing is that we use memblock_reserved_mark_noinit() to only > > > mark the tail pages ... > > > > And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > > disabled struct pages are initialized regardless of > > memblock_reserved_mark_noinit(). > > > > I think this patch should go in before your updates: > > Shouldn't we fix this in memblock code? > > Hacking around that in the memblock_reserved_mark_noinit() user sound wrong > -- and nothing in the doc of memblock_reserved_mark_noinit() spells that > behavior out. We can surely update the docs, but unfortunately I don't see how to avoid hacking around it in hugetlb. Since it's used to optimise HVO even further to the point hugetlb open codes memmap initialization, I think it's fair that it should deal with all possible configurations. > -- > Cheers > > David / dhildenb > > -- Sincerely yours, Mike.
On 25.08.25 16:32, Mike Rapoport wrote: > On Mon, Aug 25, 2025 at 02:48:58PM +0200, David Hildenbrand wrote: >> On 23.08.25 10:59, Mike Rapoport wrote: >>> On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: >>>> On 22.08.25 06:09, Mika Penttilä wrote: >>>>> >>>>> On 8/21/25 23:06, David Hildenbrand wrote: >>>>> >>>>>> All pages were already initialized and set to PageReserved() with a >>>>>> refcount of 1 by MM init code. >>>>> >>>>> Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to >>>>> initialize struct pages? >>>> >>>> Excellent point, I did not know about that one. >>>> >>>> Spotting that we don't do the same for the head page made me assume that >>>> it's just a misuse of __init_single_page(). >>>> >>>> But the nasty thing is that we use memblock_reserved_mark_noinit() to only >>>> mark the tail pages ... >>> >>> And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is >>> disabled struct pages are initialized regardless of >>> memblock_reserved_mark_noinit(). >>> >>> I think this patch should go in before your updates: >> >> Shouldn't we fix this in memblock code? >> >> Hacking around that in the memblock_reserved_mark_noinit() user sound wrong >> -- and nothing in the doc of memblock_reserved_mark_noinit() spells that >> behavior out. > > We can surely update the docs, but unfortunately I don't see how to avoid > hacking around it in hugetlb. > Since it's used to optimise HVO even further to the point hugetlb open > codes memmap initialization, I think it's fair that it should deal with all > possible configurations. Remind me, why can't we support memblock_reserved_mark_noinit() when CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled? -- Cheers David / dhildenb
On Mon, Aug 25, 2025 at 04:38:03PM +0200, David Hildenbrand wrote: > On 25.08.25 16:32, Mike Rapoport wrote: > > On Mon, Aug 25, 2025 at 02:48:58PM +0200, David Hildenbrand wrote: > > > On 23.08.25 10:59, Mike Rapoport wrote: > > > > On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: > > > > > On 22.08.25 06:09, Mika Penttilä wrote: > > > > > > > > > > > > On 8/21/25 23:06, David Hildenbrand wrote: > > > > > > > > > > > > > All pages were already initialized and set to PageReserved() with a > > > > > > > refcount of 1 by MM init code. > > > > > > > > > > > > Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to > > > > > > initialize struct pages? > > > > > > > > > > Excellent point, I did not know about that one. > > > > > > > > > > Spotting that we don't do the same for the head page made me assume that > > > > > it's just a misuse of __init_single_page(). > > > > > > > > > > But the nasty thing is that we use memblock_reserved_mark_noinit() to only > > > > > mark the tail pages ... > > > > > > > > And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > > > > disabled struct pages are initialized regardless of > > > > memblock_reserved_mark_noinit(). > > > > > > > > I think this patch should go in before your updates: > > > > > > Shouldn't we fix this in memblock code? > > > > > > Hacking around that in the memblock_reserved_mark_noinit() user sound wrong > > > -- and nothing in the doc of memblock_reserved_mark_noinit() spells that > > > behavior out. > > > > We can surely update the docs, but unfortunately I don't see how to avoid > > hacking around it in hugetlb. > > Since it's used to optimise HVO even further to the point hugetlb open > > codes memmap initialization, I think it's fair that it should deal with all > > possible configurations. > > Remind me, why can't we support memblock_reserved_mark_noinit() when > CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled? When CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled we initialize the entire memmap early (setup_arch()->free_area_init()), and we may have a bunch of memblock_reserved_mark_noinit() afterwards > -- > Cheers > > David / dhildenb > -- Sincerely yours, Mike.
On 25.08.25 16:59, Mike Rapoport wrote: > On Mon, Aug 25, 2025 at 04:38:03PM +0200, David Hildenbrand wrote: >> On 25.08.25 16:32, Mike Rapoport wrote: >>> On Mon, Aug 25, 2025 at 02:48:58PM +0200, David Hildenbrand wrote: >>>> On 23.08.25 10:59, Mike Rapoport wrote: >>>>> On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: >>>>>> On 22.08.25 06:09, Mika Penttilä wrote: >>>>>>> >>>>>>> On 8/21/25 23:06, David Hildenbrand wrote: >>>>>>> >>>>>>>> All pages were already initialized and set to PageReserved() with a >>>>>>>> refcount of 1 by MM init code. >>>>>>> >>>>>>> Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to >>>>>>> initialize struct pages? >>>>>> >>>>>> Excellent point, I did not know about that one. >>>>>> >>>>>> Spotting that we don't do the same for the head page made me assume that >>>>>> it's just a misuse of __init_single_page(). >>>>>> >>>>>> But the nasty thing is that we use memblock_reserved_mark_noinit() to only >>>>>> mark the tail pages ... >>>>> >>>>> And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is >>>>> disabled struct pages are initialized regardless of >>>>> memblock_reserved_mark_noinit(). >>>>> >>>>> I think this patch should go in before your updates: >>>> >>>> Shouldn't we fix this in memblock code? >>>> >>>> Hacking around that in the memblock_reserved_mark_noinit() user sound wrong >>>> -- and nothing in the doc of memblock_reserved_mark_noinit() spells that >>>> behavior out. >>> >>> We can surely update the docs, but unfortunately I don't see how to avoid >>> hacking around it in hugetlb. >>> Since it's used to optimise HVO even further to the point hugetlb open >>> codes memmap initialization, I think it's fair that it should deal with all >>> possible configurations. >> >> Remind me, why can't we support memblock_reserved_mark_noinit() when >> CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled? > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled we initialize the entire > memmap early (setup_arch()->free_area_init()), and we may have a bunch of > memblock_reserved_mark_noinit() afterwards Oh, you mean that we get effective memblock modifications after already initializing the memmap. That sounds ... interesting :) So yeah, we have to document this for memblock_reserved_mark_noinit(). Is it also a problem for kexec_handover? We should do something like: diff --git a/mm/memblock.c b/mm/memblock.c index 154f1d73b61f2..ed4c563d72c32 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1091,13 +1091,16 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) /** * memblock_reserved_mark_noinit - Mark a reserved memory region with flag - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized - * for this region. + * MEMBLOCK_RSRV_NOINIT which allows for the "struct pages" corresponding + * to this region not getting initialized, because the caller will take + * care of it. * @base: the base phys addr of the region * @size: the size of the region * - * struct pages will not be initialized for reserved memory regions marked with - * %MEMBLOCK_RSRV_NOINIT. + * "struct pages" will not be initialized for reserved memory regions marked + * with %MEMBLOCK_RSRV_NOINIT if this function is called before initialization + * code runs. Without CONFIG_DEFERRED_STRUCT_PAGE_INIT, it is more likely + * that this function is not effective. * * Return: 0 on success, -errno on failure. */ Optimizing the hugetlb code could be done, but I am not sure how high the priority is (nobody complained so far about the double init). -- Cheers David / dhildenb
On Mon, Aug 25, 2025 at 05:42:33PM +0200, David Hildenbrand wrote: > On 25.08.25 16:59, Mike Rapoport wrote: > > On Mon, Aug 25, 2025 at 04:38:03PM +0200, David Hildenbrand wrote: > > > On 25.08.25 16:32, Mike Rapoport wrote: > > > > On Mon, Aug 25, 2025 at 02:48:58PM +0200, David Hildenbrand wrote: > > > > > On 23.08.25 10:59, Mike Rapoport wrote: > > > > > > On Fri, Aug 22, 2025 at 08:24:31AM +0200, David Hildenbrand wrote: > > > > > > > On 22.08.25 06:09, Mika Penttilä wrote: > > > > > > > > > > > > > > > > On 8/21/25 23:06, David Hildenbrand wrote: > > > > > > > > > > > > > > > > > All pages were already initialized and set to PageReserved() with a > > > > > > > > > refcount of 1 by MM init code. > > > > > > > > > > > > > > > > Just to be sure, how is this working with MEMBLOCK_RSRV_NOINIT, where MM is supposed not to > > > > > > > > initialize struct pages? > > > > > > > > > > > > > > Excellent point, I did not know about that one. > > > > > > > > > > > > > > Spotting that we don't do the same for the head page made me assume that > > > > > > > it's just a misuse of __init_single_page(). > > > > > > > > > > > > > > But the nasty thing is that we use memblock_reserved_mark_noinit() to only > > > > > > > mark the tail pages ... > > > > > > > > > > > > And even nastier thing is that when CONFIG_DEFERRED_STRUCT_PAGE_INIT is > > > > > > disabled struct pages are initialized regardless of > > > > > > memblock_reserved_mark_noinit(). > > > > > > > > > > > > I think this patch should go in before your updates: > > > > > > > > > > Shouldn't we fix this in memblock code? > > > > > > > > > > Hacking around that in the memblock_reserved_mark_noinit() user sound wrong > > > > > -- and nothing in the doc of memblock_reserved_mark_noinit() spells that > > > > > behavior out. > > > > > > > > We can surely update the docs, but unfortunately I don't see how to avoid > > > > hacking around it in hugetlb. > > > > Since it's used to optimise HVO even further to the point hugetlb open > > > > codes memmap initialization, I think it's fair that it should deal with all > > > > possible configurations. > > > > > > Remind me, why can't we support memblock_reserved_mark_noinit() when > > > CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled? > > > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled we initialize the entire > > memmap early (setup_arch()->free_area_init()), and we may have a bunch of > > memblock_reserved_mark_noinit() afterwards > > Oh, you mean that we get effective memblock modifications after already > initializing the memmap. > > That sounds ... interesting :) It's memmap, not the free lists. Without deferred init, memblock is active for a while after memmap initialized and before the memory goes to the free lists. > So yeah, we have to document this for memblock_reserved_mark_noinit(). > > Is it also a problem for kexec_handover? With KHO it's also interesting, but it does not support deferred struct page init for now :) > We should do something like: > > diff --git a/mm/memblock.c b/mm/memblock.c > index 154f1d73b61f2..ed4c563d72c32 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1091,13 +1091,16 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) > /** > * memblock_reserved_mark_noinit - Mark a reserved memory region with flag > - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized > - * for this region. > + * MEMBLOCK_RSRV_NOINIT which allows for the "struct pages" corresponding > + * to this region not getting initialized, because the caller will take > + * care of it. > * @base: the base phys addr of the region > * @size: the size of the region > * > - * struct pages will not be initialized for reserved memory regions marked with > - * %MEMBLOCK_RSRV_NOINIT. > + * "struct pages" will not be initialized for reserved memory regions marked > + * with %MEMBLOCK_RSRV_NOINIT if this function is called before initialization > + * code runs. Without CONFIG_DEFERRED_STRUCT_PAGE_INIT, it is more likely > + * that this function is not effective. > * > * Return: 0 on success, -errno on failure. > */ I have a different version :) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index b96746376e17..d20d091c6343 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -40,8 +40,9 @@ extern unsigned long long max_possible_pfn; * via a driver, and never indicated in the firmware-provided memory map as * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the * kernel resource tree. - * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages are - * not initialized (only for reserved regions). + * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages don't have + * PG_Reserved set and are completely not initialized when + * %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled (only for reserved regions). * @MEMBLOCK_RSRV_KERN: memory region that is reserved for kernel use, * either explictitly with memblock_reserve_kern() or via memblock * allocation APIs. All memblock allocations set this flag. diff --git a/mm/memblock.c b/mm/memblock.c index 154f1d73b61f..02de5ffb085b 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1091,13 +1091,15 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) /** * memblock_reserved_mark_noinit - Mark a reserved memory region with flag - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized - * for this region. + * MEMBLOCK_RSRV_NOINIT + * * @base: the base phys addr of the region * @size: the size of the region * - * struct pages will not be initialized for reserved memory regions marked with - * %MEMBLOCK_RSRV_NOINIT. + * The struct pages for the reserved regions marked %MEMBLOCK_RSRV_NOINIT will + * not have %PG_Reserved flag set. + * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, setting this flags also + * completly bypasses the initialization of struct pages for this region. * * Return: 0 on success, -errno on failure. */ > Optimizing the hugetlb code could be done, but I am not sure how high > the priority is (nobody complained so far about the double init). > > -- > Cheers > > David / dhildenb > -- Sincerely yours, Mike.
> >> We should do something like: >> >> diff --git a/mm/memblock.c b/mm/memblock.c >> index 154f1d73b61f2..ed4c563d72c32 100644 >> --- a/mm/memblock.c >> +++ b/mm/memblock.c >> @@ -1091,13 +1091,16 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) >> /** >> * memblock_reserved_mark_noinit - Mark a reserved memory region with flag >> - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized >> - * for this region. >> + * MEMBLOCK_RSRV_NOINIT which allows for the "struct pages" corresponding >> + * to this region not getting initialized, because the caller will take >> + * care of it. >> * @base: the base phys addr of the region >> * @size: the size of the region >> * >> - * struct pages will not be initialized for reserved memory regions marked with >> - * %MEMBLOCK_RSRV_NOINIT. >> + * "struct pages" will not be initialized for reserved memory regions marked >> + * with %MEMBLOCK_RSRV_NOINIT if this function is called before initialization >> + * code runs. Without CONFIG_DEFERRED_STRUCT_PAGE_INIT, it is more likely >> + * that this function is not effective. >> * >> * Return: 0 on success, -errno on failure. >> */ > > I have a different version :) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index b96746376e17..d20d091c6343 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -40,8 +40,9 @@ extern unsigned long long max_possible_pfn; > * via a driver, and never indicated in the firmware-provided memory map as > * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the > * kernel resource tree. > - * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages are > - * not initialized (only for reserved regions). > + * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages don't have > + * PG_Reserved set and are completely not initialized when > + * %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled (only for reserved regions). > * @MEMBLOCK_RSRV_KERN: memory region that is reserved for kernel use, > * either explictitly with memblock_reserve_kern() or via memblock > * allocation APIs. All memblock allocations set this flag. > diff --git a/mm/memblock.c b/mm/memblock.c > index 154f1d73b61f..02de5ffb085b 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1091,13 +1091,15 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) > > /** > * memblock_reserved_mark_noinit - Mark a reserved memory region with flag > - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized > - * for this region. > + * MEMBLOCK_RSRV_NOINIT > + * > * @base: the base phys addr of the region > * @size: the size of the region > * > - * struct pages will not be initialized for reserved memory regions marked with > - * %MEMBLOCK_RSRV_NOINIT. > + * The struct pages for the reserved regions marked %MEMBLOCK_RSRV_NOINIT will > + * not have %PG_Reserved flag set. > + * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, setting this flags also > + * completly bypasses the initialization of struct pages for this region. s/completly/completely. I don't quite understand the interaction with PG_Reserved and why anybody using this function should care. So maybe you can rephrase in a way that is easier to digest, and rather focuses on what callers of this function are supposed to do vs. have the liberty of not doing? -- Cheers David / dhildenb
On Mon, Aug 25, 2025 at 06:23:48PM +0200, David Hildenbrand wrote:
>
> I don't quite understand the interaction with PG_Reserved and why anybody
> using this function should care.
>
> So maybe you can rephrase in a way that is easier to digest, and rather
> focuses on what callers of this function are supposed to do vs. have the
> liberty of not doing?
How about
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index b96746376e17..fcda8481de9a 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -40,8 +40,9 @@ extern unsigned long long max_possible_pfn;
* via a driver, and never indicated in the firmware-provided memory map as
* system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
* kernel resource tree.
- * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages are
- * not initialized (only for reserved regions).
+ * @MEMBLOCK_RSRV_NOINIT: reserved memory region for which struct pages are not
+ * fully initialized. Users of this flag are responsible to properly initialize
+ * struct pages of this region
* @MEMBLOCK_RSRV_KERN: memory region that is reserved for kernel use,
* either explictitly with memblock_reserve_kern() or via memblock
* allocation APIs. All memblock allocations set this flag.
diff --git a/mm/memblock.c b/mm/memblock.c
index 154f1d73b61f..46b411fb3630 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1091,13 +1091,20 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size)
/**
* memblock_reserved_mark_noinit - Mark a reserved memory region with flag
- * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized
- * for this region.
+ * MEMBLOCK_RSRV_NOINIT
+ *
* @base: the base phys addr of the region
* @size: the size of the region
*
- * struct pages will not be initialized for reserved memory regions marked with
- * %MEMBLOCK_RSRV_NOINIT.
+ * The struct pages for the reserved regions marked %MEMBLOCK_RSRV_NOINIT will
+ * not be fully initialized to allow the caller optimize their initialization.
+ *
+ * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, setting this flag
+ * completely bypasses the initialization of struct pages for such region.
+ *
+ * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled, struct pages in this
+ * region will be initialized with default values but won't be marked as
+ * reserved.
*
* Return: 0 on success, -errno on failure.
*/
> --
> Cheers
>
> David / dhildenb
>
--
Sincerely yours,
Mike.
On 25.08.25 18:58, Mike Rapoport wrote: > On Mon, Aug 25, 2025 at 06:23:48PM +0200, David Hildenbrand wrote: >> >> I don't quite understand the interaction with PG_Reserved and why anybody >> using this function should care. >> >> So maybe you can rephrase in a way that is easier to digest, and rather >> focuses on what callers of this function are supposed to do vs. have the >> liberty of not doing? > > How about > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index b96746376e17..fcda8481de9a 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -40,8 +40,9 @@ extern unsigned long long max_possible_pfn; > * via a driver, and never indicated in the firmware-provided memory map as > * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the > * kernel resource tree. > - * @MEMBLOCK_RSRV_NOINIT: memory region for which struct pages are > - * not initialized (only for reserved regions). > + * @MEMBLOCK_RSRV_NOINIT: reserved memory region for which struct pages are not > + * fully initialized. Users of this flag are responsible to properly initialize > + * struct pages of this region > * @MEMBLOCK_RSRV_KERN: memory region that is reserved for kernel use, > * either explictitly with memblock_reserve_kern() or via memblock > * allocation APIs. All memblock allocations set this flag. > diff --git a/mm/memblock.c b/mm/memblock.c > index 154f1d73b61f..46b411fb3630 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1091,13 +1091,20 @@ int __init_memblock memblock_clear_nomap(phys_addr_t base, phys_addr_t size) > > /** > * memblock_reserved_mark_noinit - Mark a reserved memory region with flag > - * MEMBLOCK_RSRV_NOINIT which results in the struct pages not being initialized > - * for this region. > + * MEMBLOCK_RSRV_NOINIT > + * > * @base: the base phys addr of the region > * @size: the size of the region > * > - * struct pages will not be initialized for reserved memory regions marked with > - * %MEMBLOCK_RSRV_NOINIT. > + * The struct pages for the reserved regions marked %MEMBLOCK_RSRV_NOINIT will > + * not be fully initialized to allow the caller optimize their initialization. > + * > + * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, setting this flag > + * completely bypasses the initialization of struct pages for such region. > + * > + * When %CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled, struct pages in this > + * region will be initialized with default values but won't be marked as > + * reserved. Sounds good. I am surprised regarding "reserved", but I guess that's because we don't end up calling "reserve_bootmem_region()" on these regions in memmap_init_reserved_pages(). -- Cheers David / dhildenb
© 2016 - 2025 Red Hat, Inc.