[PATCHv2 02/14] mm/sparse: Check memmap alignment

Kiryl Shutsemau posted 14 patches 1 month, 3 weeks ago
There is a newer version of this series
[PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Kiryl Shutsemau 1 month, 3 weeks ago
The upcoming changes in compound_head() require memmap to be naturally
aligned to the maximum folio size.

Add a warning if it is not.

A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
kernel is still likely to be functional if this strict check fails.

Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
 include/linux/mmzone.h | 1 +
 mm/sparse.c            | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6cfede39570a..9f44dc760cdc 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -91,6 +91,7 @@
 #endif
 
 #define MAX_FOLIO_NR_PAGES	(1UL << MAX_FOLIO_ORDER)
+#define MAX_FOLIO_SIZE		(PAGE_SIZE << MAX_FOLIO_ORDER)
 
 enum migratetype {
 	MIGRATE_UNMOVABLE,
diff --git a/mm/sparse.c b/mm/sparse.c
index 17c50a6415c2..c5810ff7c6f7 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -600,6 +600,9 @@ void __init sparse_init(void)
 	BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
 	memblocks_present();
 
+	WARN_ON(!IS_ALIGNED((unsigned long)pfn_to_page(0),
+			    MAX_FOLIO_SIZE / sizeof(struct page)));
+
 	pnum_begin = first_present_section_nr();
 	nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
 
-- 
2.51.2
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Muchun Song 1 month, 2 weeks ago

On 2025/12/18 23:09, Kiryl Shutsemau wrote:
> The upcoming changes in compound_head() require memmap to be naturally
> aligned to the maximum folio size.
>
> Add a warning if it is not.
>
> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
> kernel is still likely to be functional if this strict check fails.

Different architectures default to 2 MB alignment (mainly to
enable huge mappings), which only accommodates folios up to
128 MB. Yet 1 GB huge pages are still fairly common, so
validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
miss the most frequent case.

I’m concerned that this might plant a hidden time bomb: it
could detonate at any moment in later code, silently triggering
memory corruption or similar failures. Therefore, I don’t
think a WARNING is a good choice.

>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> ---
>   include/linux/mmzone.h | 1 +
>   mm/sparse.c            | 3 +++
>   2 files changed, 4 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 6cfede39570a..9f44dc760cdc 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -91,6 +91,7 @@
>   #endif
>   
>   #define MAX_FOLIO_NR_PAGES	(1UL << MAX_FOLIO_ORDER)
> +#define MAX_FOLIO_SIZE		(PAGE_SIZE << MAX_FOLIO_ORDER)
>   
>   enum migratetype {
>   	MIGRATE_UNMOVABLE,
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 17c50a6415c2..c5810ff7c6f7 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -600,6 +600,9 @@ void __init sparse_init(void)
>   	BUILD_BUG_ON(!is_power_of_2(sizeof(struct mem_section)));
>   	memblocks_present();
>   
> +	WARN_ON(!IS_ALIGNED((unsigned long)pfn_to_page(0),
> +			    MAX_FOLIO_SIZE / sizeof(struct page)));
> +
>   	pnum_begin = first_present_section_nr();
>   	nid_begin = sparse_early_nid(__nr_to_section(pnum_begin));
>   

Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Kiryl Shutsemau 1 month, 2 weeks ago
On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
> 
> 
> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
> > The upcoming changes in compound_head() require memmap to be naturally
> > aligned to the maximum folio size.
> > 
> > Add a warning if it is not.
> > 
> > A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
> > kernel is still likely to be functional if this strict check fails.
> 
> Different architectures default to 2 MB alignment (mainly to
> enable huge mappings), which only accommodates folios up to
> 128 MB. Yet 1 GB huge pages are still fairly common, so
> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
> miss the most frequent case.

I don't follow. 16 GB check is more strict that anything smaller.
How can it miss the most frequent case?

> I’m concerned that this might plant a hidden time bomb: it
> could detonate at any moment in later code, silently triggering
> memory corruption or similar failures. Therefore, I don’t
> think a WARNING is a good choice.

We can upgrade it BUG_ON(), but I want to understand your logic here
first.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by David Hildenbrand (Red Hat) 1 month, 2 weeks ago
On 12/22/25 15:02, Kiryl Shutsemau wrote:
> On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
>>
>>
>> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
>>> The upcoming changes in compound_head() require memmap to be naturally
>>> aligned to the maximum folio size.
>>>
>>> Add a warning if it is not.
>>>
>>> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
>>> kernel is still likely to be functional if this strict check fails.
>>
>> Different architectures default to 2 MB alignment (mainly to
>> enable huge mappings), which only accommodates folios up to
>> 128 MB. Yet 1 GB huge pages are still fairly common, so
>> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
>> miss the most frequent case.
> 
> I don't follow. 16 GB check is more strict that anything smaller.
> How can it miss the most frequent case?
> 
>> I’m concerned that this might plant a hidden time bomb: it
>> could detonate at any moment in later code, silently triggering
>> memory corruption or similar failures. Therefore, I don’t
>> think a WARNING is a good choice.
> 
> We can upgrade it BUG_ON(), but I want to understand your logic here
> first.

Definitely no BUG_ON(). I would assume this is something we would find 
early during testing, so even a VM_WARN_ON_ONCE() should be good enough?

This smells like a possible problem, though, as soon as some 
architecture wants to increase the folio size. What would be the 
expected step to ensure the alignment is done properly?

But OTOH, as I raised Willy's work will make all of that here obsolete 
either way, so maybe not worth worrying about that case too much,

-- 
Cheers

David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Kiryl Shutsemau 1 month, 2 weeks ago
On Mon, Dec 22, 2025 at 03:18:29PM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/22/25 15:02, Kiryl Shutsemau wrote:
> > On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
> > > 
> > > 
> > > On 2025/12/18 23:09, Kiryl Shutsemau wrote:
> > > > The upcoming changes in compound_head() require memmap to be naturally
> > > > aligned to the maximum folio size.
> > > > 
> > > > Add a warning if it is not.
> > > > 
> > > > A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
> > > > kernel is still likely to be functional if this strict check fails.
> > > 
> > > Different architectures default to 2 MB alignment (mainly to
> > > enable huge mappings), which only accommodates folios up to
> > > 128 MB. Yet 1 GB huge pages are still fairly common, so
> > > validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
> > > miss the most frequent case.
> > 
> > I don't follow. 16 GB check is more strict that anything smaller.
> > How can it miss the most frequent case?
> > 
> > > I’m concerned that this might plant a hidden time bomb: it
> > > could detonate at any moment in later code, silently triggering
> > > memory corruption or similar failures. Therefore, I don’t
> > > think a WARNING is a good choice.
> > 
> > We can upgrade it BUG_ON(), but I want to understand your logic here
> > first.
> 
> Definitely no BUG_ON(). I would assume this is something we would find early
> during testing, so even a VM_WARN_ON_ONCE() should be good enough?
> 
> This smells like a possible problem, though, as soon as some architecture
> wants to increase the folio size. What would be the expected step to ensure
> the alignment is done properly?

It depends on memory model and whether the arch has KASLR for memmap.

> But OTOH, as I raised Willy's work will make all of that here obsolete
> either way, so maybe not worth worrying about that case too much,

Willy, what is timeline here?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Muchun Song 1 month, 2 weeks ago

> On Dec 22, 2025, at 22:52, Kiryl Shutsemau <kas@kernel.org> wrote:
> 
> On Mon, Dec 22, 2025 at 03:18:29PM +0100, David Hildenbrand (Red Hat) wrote:
>>> On 12/22/25 15:02, Kiryl Shutsemau wrote:
>>> On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
>>>> 
>>>> 
>>>> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
>>>>> The upcoming changes in compound_head() require memmap to be naturally
>>>>> aligned to the maximum folio size.
>>>>> 
>>>>> Add a warning if it is not.
>>>>> 
>>>>> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
>>>>> kernel is still likely to be functional if this strict check fails.
>>>> 
>>>> Different architectures default to 2 MB alignment (mainly to
>>>> enable huge mappings), which only accommodates folios up to
>>>> 128 MB. Yet 1 GB huge pages are still fairly common, so
>>>> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
>>>> miss the most frequent case.
>>> 
>>> I don't follow. 16 GB check is more strict that anything smaller.
>>> How can it miss the most frequent case?
>>> 
>>>> I’m concerned that this might plant a hidden time bomb: it
>>>> could detonate at any moment in later code, silently triggering
>>>> memory corruption or similar failures. Therefore, I don’t
>>>> think a WARNING is a good choice.
>>> 
>>> We can upgrade it BUG_ON(), but I want to understand your logic here
>>> first.
>> 
>> Definitely no BUG_ON(). I would assume this is something we would find early
>> during testing, so even a VM_WARN_ON_ONCE() should be good enough?
>> 
>> This smells like a possible problem, though, as soon as some architecture
>> wants to increase the folio size. What would be the expected step to ensure
>> the alignment is done properly?
> 
> It depends on memory model and whether the arch has KASLR for memmap.

Yes. Theoretically, the most correct approach is
to ensure that the randomly chosen offset at the
KASLR relocation site meets alignment
requirements, and it likely needs to be adapted
for each architecture—sounds rather tedious.

> 
>> But OTOH, as I raised Willy's work will make all of that here obsolete
>> either way, so maybe not worth worrying about that case too much,
> 
> Willy, what is timeline here?
> 
> --
>  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Muchun Song 1 month, 2 weeks ago

> On Dec 22, 2025, at 22:18, David Hildenbrand (Red Hat) <david@kernel.org> wrote:
> 
> On 12/22/25 15:02, Kiryl Shutsemau wrote:
>>> On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
>>> 
>>> 
>>> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
>>>> The upcoming changes in compound_head() require memmap to be naturally
>>>> aligned to the maximum folio size.
>>>> 
>>>> Add a warning if it is not.
>>>> 
>>>> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
>>>> kernel is still likely to be functional if this strict check fails.
>>> 
>>> Different architectures default to 2 MB alignment (mainly to
>>> enable huge mappings), which only accommodates folios up to
>>> 128 MB. Yet 1 GB huge pages are still fairly common, so
>>> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
>>> miss the most frequent case.
>> I don't follow. 16 GB check is more strict that anything smaller.
>> How can it miss the most frequent case?
>>> I’m concerned that this might plant a hidden time bomb: it
>>> could detonate at any moment in later code, silently triggering
>>> memory corruption or similar failures. Therefore, I don’t
>>> think a WARNING is a good choice.
>> We can upgrade it BUG_ON(), but I want to understand your logic here
>> first.
> 
> Definitely no BUG_ON(). I would assume this is something we would find early during testing, so even a VM_WARN_ON_ONCE() should be good enough?
> 
> This smells like a possible problem, though, as soon as some architecture wants to increase the folio size. What would be the expected step to ensure the alignment is done properly?
> 
> But OTOH, as I raised Willy's work will make all of that here obsolete either way, so maybe not worth worrying about that case too much,

Hi David,

I hope you're doing well. I must admit I have limited knowledge of Willy's work, and I was wondering if you might be kind enough to share any publicly available links where I could learn more about the future direction of this project. I would be truly grateful for your guidance.
Thank you very much in advance.

Best regards,

> 
> --
> Cheers
> 
> David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by David Hildenbrand (Red Hat) 1 month, 2 weeks ago
On 12/22/25 15:55, Muchun Song wrote:
> 
> 
>> On Dec 22, 2025, at 22:18, David Hildenbrand (Red Hat) <david@kernel.org> wrote:
>>
>> On 12/22/25 15:02, Kiryl Shutsemau wrote:
>>>> On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
>>>>
>>>>
>>>> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
>>>>> The upcoming changes in compound_head() require memmap to be naturally
>>>>> aligned to the maximum folio size.
>>>>>
>>>>> Add a warning if it is not.
>>>>>
>>>>> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
>>>>> kernel is still likely to be functional if this strict check fails.
>>>>
>>>> Different architectures default to 2 MB alignment (mainly to
>>>> enable huge mappings), which only accommodates folios up to
>>>> 128 MB. Yet 1 GB huge pages are still fairly common, so
>>>> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
>>>> miss the most frequent case.
>>> I don't follow. 16 GB check is more strict that anything smaller.
>>> How can it miss the most frequent case?
>>>> I’m concerned that this might plant a hidden time bomb: it
>>>> could detonate at any moment in later code, silently triggering
>>>> memory corruption or similar failures. Therefore, I don’t
>>>> think a WARNING is a good choice.
>>> We can upgrade it BUG_ON(), but I want to understand your logic here
>>> first.
>>
>> Definitely no BUG_ON(). I would assume this is something we would find early during testing, so even a VM_WARN_ON_ONCE() should be good enough?
>>
>> This smells like a possible problem, though, as soon as some architecture wants to increase the folio size. What would be the expected step to ensure the alignment is done properly?
>>
>> But OTOH, as I raised Willy's work will make all of that here obsolete either way, so maybe not worth worrying about that case too much,
> 
> Hi David,
> 

Hi! :)

> I hope you're doing well. I must admit I have limited knowledge of Willy's work, and I was wondering if you might be kind enough to share any publicly available links where I could learn more about the future direction of this project. I would be truly grateful for your guidance.
> Thank you very much in advance.

There is some information to be had at [1], but more at [2]. Take a look 
at [2] in "After those projects are complete - Then we can shrink struct 
page to 32 bytes:"

In essence, all pages (belonging to a memdesc) will have a "memdesc" 
pointer (that replaces the compound_head pointer).

"Then we make page->compound_head point to the dynamically allocated 
memdesc rather than the first page. Then we can transition to the above 
layout. "

The "memdesc" could be a pointer to a "struct folio" that is allocated 
from the slab.

So in the new memdesc world, all pages part of a folio will point at the 
allocated "struct folio", not the head page where "struct folio" 
currently overlays "struct page".

That would mean that the proposal in this patch set will have to be 
reverted again.


At LPC, Willy said that he wants to have something out there in the 
first half of 2026.

[1] https://kernelnewbies.org/MatthewWilcox/Memdescs
[2] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

-- 
Cheers

David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Kiryl Shutsemau 1 month, 2 weeks ago
On Tue, Dec 23, 2025 at 10:38:26AM +0100, David Hildenbrand (Red Hat) wrote:
> On 12/22/25 15:55, Muchun Song wrote:
> > 
> > 
> > > On Dec 22, 2025, at 22:18, David Hildenbrand (Red Hat) <david@kernel.org> wrote:
> > > 
> > > On 12/22/25 15:02, Kiryl Shutsemau wrote:
> > > > > On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
> > > > > 
> > > > > 
> > > > > On 2025/12/18 23:09, Kiryl Shutsemau wrote:
> > > > > > The upcoming changes in compound_head() require memmap to be naturally
> > > > > > aligned to the maximum folio size.
> > > > > > 
> > > > > > Add a warning if it is not.
> > > > > > 
> > > > > > A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
> > > > > > kernel is still likely to be functional if this strict check fails.
> > > > > 
> > > > > Different architectures default to 2 MB alignment (mainly to
> > > > > enable huge mappings), which only accommodates folios up to
> > > > > 128 MB. Yet 1 GB huge pages are still fairly common, so
> > > > > validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
> > > > > miss the most frequent case.
> > > > I don't follow. 16 GB check is more strict that anything smaller.
> > > > How can it miss the most frequent case?
> > > > > I’m concerned that this might plant a hidden time bomb: it
> > > > > could detonate at any moment in later code, silently triggering
> > > > > memory corruption or similar failures. Therefore, I don’t
> > > > > think a WARNING is a good choice.
> > > > We can upgrade it BUG_ON(), but I want to understand your logic here
> > > > first.
> > > 
> > > Definitely no BUG_ON(). I would assume this is something we would find early during testing, so even a VM_WARN_ON_ONCE() should be good enough?
> > > 
> > > This smells like a possible problem, though, as soon as some architecture wants to increase the folio size. What would be the expected step to ensure the alignment is done properly?
> > > 
> > > But OTOH, as I raised Willy's work will make all of that here obsolete either way, so maybe not worth worrying about that case too much,
> > 
> > Hi David,
> > 
> 
> Hi! :)
> 
> > I hope you're doing well. I must admit I have limited knowledge of Willy's work, and I was wondering if you might be kind enough to share any publicly available links where I could learn more about the future direction of this project. I would be truly grateful for your guidance.
> > Thank you very much in advance.
> 
> There is some information to be had at [1], but more at [2]. Take a look at
> [2] in "After those projects are complete - Then we can shrink struct page
> to 32 bytes:"
> 
> In essence, all pages (belonging to a memdesc) will have a "memdesc" pointer
> (that replaces the compound_head pointer).
> 
> "Then we make page->compound_head point to the dynamically allocated memdesc
> rather than the first page. Then we can transition to the above layout. "

I am not sure I understand how it is going to work.

32-byte layout indicates that flags will stay in the statically
allocated part, but most (all?) flags are in the head page and we would
need a way to redirect from tail to head in the statically allocated
pages.

> The "memdesc" could be a pointer to a "struct folio" that is allocated from
> the slab.
> 
> So in the new memdesc world, all pages part of a folio will point at the
> allocated "struct folio", not the head page where "struct folio" currently
> overlays "struct page".
> 
> That would mean that the proposal in this patch set will have to be reverted
> again.
> 
> 
> At LPC, Willy said that he wants to have something out there in the first
> half of 2026.

Okay, seems ambitious to me.

Last time I asked, we had no idea how much performance would additional
indirection cost us. Do we have a clue?

I like memdesc idea, but indirection cost always bothered me.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by David Hildenbrand (Red Hat) 1 month ago
>> "Then we make page->compound_head point to the dynamically allocated memdesc
>> rather than the first page. Then we can transition to the above layout. "
> 

Sorry for the late reply, it's been a bit crazy over here.

> I am not sure I understand how it is going to work.
> 

I don't recall all the details that Willy shared over the last years 
while working on folios, but I will try to answer as best as I can from 
the top of my head. (there are plenty of resources on the list, on the 
web, in his presentations etc.).

> 32-byte layout indicates that flags will stay in the statically
> allocated part, but most (all?) flags are in the head page and we would
> need a way to redirect from tail to head in the statically allocated
> pages.

When working with folios we will never go through the head page flags. 
That's why Willy has incrementally converted most folio code that worked 
on pages to work on folios.

For example, PageUptodate() does a

	folio_test_uptodate(page_folio(page));

The flags in the 32-byte layout will be used by some non-folio things 
for which we won't allocate memdescs (just yet) (e.g., free pages in the 
buddy and other things that does not require a lot of metadata). Some of 
these flags will be moved into the memdesc pointer in the future as the 
conversion proceeeds.

> 
>> The "memdesc" could be a pointer to a "struct folio" that is allocated from
>> the slab.
>>
>> So in the new memdesc world, all pages part of a folio will point at the
>> allocated "struct folio", not the head page where "struct folio" currently
>> overlays "struct page".
>>
>> That would mean that the proposal in this patch set will have to be reverted
>> again.
>>
>>
>> At LPC, Willy said that he wants to have something out there in the first
>> half of 2026.
> 
> Okay, seems ambitious to me.

When the program was called "2025" I considered it very ambitious :) Now 
I consider it ambitious. I think Willy already shared early versions of 
the "struct slab" split and the "struct ptdesc" split recently on the list.

> 
> Last time I asked, we had no idea how much performance would additional
> indirection cost us. Do we have a clue?

I raised that in the past, and I think the answer I got was that

(a) We always had these indirection cost when going from tail page to
     head page / folio.
(b) We must convert the code to do as little page_folio() as possible.
     That's why we saw so much code conversion to stop working on pages
     and only work on folios.

There are certainly cases where we cannot currently avoid the 
indirection, like when we traverse a page table and go

	pfn -> page -> folio

and cannot simply go

	pfn -> folio

On the bright side, we'll lose the head-page checks and can simply 
dereference the pointer.

I don't know whether Willy has more information yet, but I would assume 
that in most cases this will be similar to the performance summary in 
your cover letter: "... has shown either no change or only a slight 
improvement within the noise.", just that it will be "only a slight 
degradation within the noise". :)

We'll learn I guess, in particular which other page -> folio conversions 
cannot be optimized out by caching the folio.


For quite some time there will be a magical config option that will 
switch between both layouts. I'd assume that things will get more 
complicated if we suddenly have a "compound_head/folio" pointer and a 
"compound_info" pointer at the same time.

But it's really Willy who has the concept in mind as he is very likely 
right now busy writing some of that code.

I'm just the messenger.

:)

[I would hope that Willy could share his thoughts]

-- 
Cheers

David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Matthew Wilcox 4 weeks ago
On Thu, Jan 08, 2026 at 12:08:35AM +0100, David Hildenbrand (Red Hat) wrote:
> > > "Then we make page->compound_head point to the dynamically allocated memdesc
> > > rather than the first page. Then we can transition to the above layout. "
> > 
> 
> Sorry for the late reply, it's been a bit crazy over here.
> 
> > I am not sure I understand how it is going to work.
> > 
> 
> I don't recall all the details that Willy shared over the last years while
> working on folios, but I will try to answer as best as I can from the top of
> my head. (there are plenty of resources on the list, on the web, in his
> presentations etc.).
> 
> > 32-byte layout indicates that flags will stay in the statically
> > allocated part, but most (all?) flags are in the head page and we would
> > need a way to redirect from tail to head in the statically allocated
> > pages.
> 
> When working with folios we will never go through the head page flags.
> That's why Willy has incrementally converted most folio code that worked on
> pages to work on folios.

A little more detail here:

 - Zone/Node/Section stay in page->flags and are replicated to
   folio->flags
 - HWPoison stays in page->flags
 - Reserved stays in page->flags
 - AnonExclusive stays in page->flags
 - Writeback/Referenced/Uptodate/Dirty/LRU/Active/Workingset/Owner1/
   Owner2/Reclaim/Swapbacked/Unevictable/Dropbehind/MLocked/Young/Idle
   all exist only in folio->flags
 - Head/Private/Private2 all go away
 - Locked & Waiters are ... complicated.  I'll elaborate if there's
   demand.
 - I haven't put any effort into analyzing the Xen flags.
 - HasHWPoisoned/LargeRmappable/PartiallyMapped all move to folio->flags

> When the program was called "2025" I considered it very ambitious :) Now I
> consider it ambitious. I think Willy already shared early versions of the
> "struct slab" split and the "struct ptdesc" split recently on the list.

ptdesc, yes.  Slab is still in progress.

> For quite some time there will be a magical config option that will switch
> between both layouts. I'd assume that things will get more complicated if we
> suddenly have a "compound_head/folio" pointer and a "compound_info" pointer
> at the same time.

What I'm hoping to get to is a point where calling compound_head() on
a page which is part of a folio is a BUG.  You should only be calling
page_folio() on a page which is part of a folio -- because there's nothing
useful to find in the head page.  So compound_head (or compound_info) can
share space with page->memdesc.  For now I've actually put page->memdesc
adjacent to page->compound_head, for no reason that I can recall.

I had thought that calling page_folio() on a page that's not part of
a folio would also be a BUG(), but now I think it's better to quietly
return NULL.  That's based on my experience working with slab and ptdesc.
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by David Hildenbrand (Red Hat) 3 weeks, 4 days ago
>> When the program was called "2025" I considered it very ambitious :) Now I
>> consider it ambitious. I think Willy already shared early versions of the
>> "struct slab" split and the "struct ptdesc" split recently on the list.
> 
> ptdesc, yes.  Slab is still in progress.

Ah, I could have sworn you sent something out, but maybe these were 
preparations only. :)

> 
>> For quite some time there will be a magical config option that will switch
>> between both layouts. I'd assume that things will get more complicated if we
>> suddenly have a "compound_head/folio" pointer and a "compound_info" pointer
>> at the same time.
> 
> What I'm hoping to get to is a point where calling compound_head() on
> a page which is part of a folio is a BUG.  You should only be calling
> page_folio() on a page which is part of a folio -- because there's nothing
> useful to find in the head page.  So compound_head (or compound_info) can
> share space with page->memdesc.  For now I've actually put page->memdesc
> adjacent to page->compound_head, for no reason that I can recall.
> 
> I had thought that calling page_folio() on a page that's not part of
> a folio would also be a BUG(), but now I think it's better to quietly
> return NULL.  That's based on my experience working with slab and ptdesc.

So once that is in, even if we only allocate "struct folio" separately, 
the whole fake-head stuff can go away either way, as it is 
hugetlb->folio material only.

Which leaves the question whether we should consider Kiryl's patch set 
in the meantime here as something to merge.

Willy, what is the rough timeline until we can expect to see at least 
"struct folio" get allocated separately, and would this patch set here 
get in the way of doing so, or doesn't it really matter?

-- 
Cheers

David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Kiryl Shutsemau 1 month ago
On Thu, Jan 08, 2026 at 12:08:35AM +0100, David Hildenbrand (Red Hat) wrote:
> > > "Then we make page->compound_head point to the dynamically allocated memdesc
> > > rather than the first page. Then we can transition to the above layout. "
> > 
> 
> Sorry for the late reply, it's been a bit crazy over here.
> 
> > I am not sure I understand how it is going to work.
> > 
> 
> I don't recall all the details that Willy shared over the last years while
> working on folios, but I will try to answer as best as I can from the top of
> my head. (there are plenty of resources on the list, on the web, in his
> presentations etc.).
> 
> > 32-byte layout indicates that flags will stay in the statically
> > allocated part, but most (all?) flags are in the head page and we would
> > need a way to redirect from tail to head in the statically allocated
> > pages.
> 
> When working with folios we will never go through the head page flags.
> That's why Willy has incrementally converted most folio code that worked on
> pages to work on folios.
> 
> For example, PageUptodate() does a
> 
> 	folio_test_uptodate(page_folio(page));
> 
> The flags in the 32-byte layout will be used by some non-folio things for
> which we won't allocate memdescs (just yet) (e.g., free pages in the buddy
> and other things that does not require a lot of metadata). Some of these
> flags will be moved into the memdesc pointer in the future as the conversion
> proceeeds.

Okay, makes sense.

> > > The "memdesc" could be a pointer to a "struct folio" that is allocated from
> > > the slab.
> > > 
> > > So in the new memdesc world, all pages part of a folio will point at the
> > > allocated "struct folio", not the head page where "struct folio" currently
> > > overlays "struct page".
> > > 
> > > That would mean that the proposal in this patch set will have to be reverted
> > > again.
> > > 
> > > 
> > > At LPC, Willy said that he wants to have something out there in the first
> > > half of 2026.
> > 
> > Okay, seems ambitious to me.
> 
> When the program was called "2025" I considered it very ambitious :) Now I
> consider it ambitious. I think Willy already shared early versions of the
> "struct slab" split and the "struct ptdesc" split recently on the list.
> 
> > 
> > Last time I asked, we had no idea how much performance would additional
> > indirection cost us. Do we have a clue?
> 
> I raised that in the past, and I think the answer I got was that
> 
> (a) We always had these indirection cost when going from tail page to
>     head page / folio.
> (b) We must convert the code to do as little page_folio() as possible.
>     That's why we saw so much code conversion to stop working on pages
>     and only work on folios.
> 
> There are certainly cases where we cannot currently avoid the indirection,
> like when we traverse a page table and go
> 
> 	pfn -> page -> folio
> 
> and cannot simply go
> 
> 	pfn -> folio
> 
> On the bright side, we'll lose the head-page checks and can simply
> dereference the pointer.
> 
> I don't know whether Willy has more information yet, but I would assume that
> in most cases this will be similar to the performance summary in your cover
> letter: "... has shown either no change or only a slight improvement within
> the noise.", just that it will be "only a slight degradation within the
> noise". :)
> 
> We'll learn I guess, in particular which other page -> folio conversions
> cannot be optimized out by caching the folio.
> 
> 
> For quite some time there will be a magical config option that will switch
> between both layouts. I'd assume that things will get more complicated if we
> suddenly have a "compound_head/folio" pointer and a "compound_info" pointer
> at the same time.
> 
> But it's really Willy who has the concept in mind as he is very likely right
> now busy writing some of that code.
> 
> I'm just the messenger.
> 
> :)
> 
> [I would hope that Willy could share his thoughts]

If you or Willy think that this patch will impede memdesc progress, I am
okay not pushing this patchset upstream.

I was really excited when I found this trick to get rid of fake heads.
But ultimately, it is a clean up. I failed to find a performance win I
hoped for.

Also, I try to understand what 32-byte layout means for fake heads.
_refcount in struct page is going to 0 and refcounting happens on folios.
So I wounder if we can all pages identical (no tail pages per se) and
avoid fake heads this way?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by David Hildenbrand (Red Hat) 4 weeks, 1 day ago
>> For quite some time there will be a magical config option that will switch
>> between both layouts. I'd assume that things will get more complicated if we
>> suddenly have a "compound_head/folio" pointer and a "compound_info" pointer
>> at the same time.
>>
>> But it's really Willy who has the concept in mind as he is very likely right
>> now busy writing some of that code.
>>
>> I'm just the messenger.
>>
>> :)
>>
>> [I would hope that Willy could share his thoughts]
> 
> If you or Willy think that this patch will impede memdesc progress, I am
> okay not pushing this patchset upstream.

I pinged Willy.

> 
> I was really excited when I found this trick to get rid of fake heads.
> But ultimately, it is a clean up. I failed to find a performance win I
> hoped for.

I think it's quite nice as a cleanup, and if we wouldn't have memdescs 
on the horizon that essentially change the code completely in another 
direction (having all pages point to a struct folio, not just the tail 
pages), I wouldn't be bringing this up :)

> 
> Also, I try to understand what 32-byte layout means for fake heads.
> _refcount in struct page is going to 0 and refcounting happens on folios.

Yes, for folios.

> So I wounder if we can all pages identical (no tail pages per se) and
> avoid fake heads this way?

That's the ultimate goal, yes. Essentially, all pages will point to the
memdesc, and there will not be a reason to check for head/fake-head etc.

I think initially, the compound-page concept might
still co-exist for some memdescs that we won't initially allocate 
separately.
But I don't know the details of that.

I know that the transition phase is tricky :)

Regarding reference and folios: yes exactly. When trying to get a 
reference, we'll spot in the memdesc field that this is a folio and try 
on the folio instead.

In the future, most pages will either be permanently frozen and not have 
a refcount (e.g., struct ptdesc), or have a refcount in their memdesc. 
In the transition, the location of the refcount depends on memdesc type 
(in memdesc vs. in page).

-- 
Cheers

David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Kiryl Shutsemau 1 month ago
On Thu, Jan 08, 2026 at 12:32:47PM +0000, Kiryl Shutsemau wrote:
> On Thu, Jan 08, 2026 at 12:08:35AM +0100, David Hildenbrand (Red Hat) wrote:
> > > > "Then we make page->compound_head point to the dynamically allocated memdesc
> > > > rather than the first page. Then we can transition to the above layout. "
> > > 
> > 
> > Sorry for the late reply, it's been a bit crazy over here.
> > 
> > > I am not sure I understand how it is going to work.
> > > 
> > 
> > I don't recall all the details that Willy shared over the last years while
> > working on folios, but I will try to answer as best as I can from the top of
> > my head. (there are plenty of resources on the list, on the web, in his
> > presentations etc.).
> > 
> > > 32-byte layout indicates that flags will stay in the statically
> > > allocated part, but most (all?) flags are in the head page and we would
> > > need a way to redirect from tail to head in the statically allocated
> > > pages.
> > 
> > When working with folios we will never go through the head page flags.
> > That's why Willy has incrementally converted most folio code that worked on
> > pages to work on folios.
> > 
> > For example, PageUptodate() does a
> > 
> > 	folio_test_uptodate(page_folio(page));
> > 
> > The flags in the 32-byte layout will be used by some non-folio things for
> > which we won't allocate memdescs (just yet) (e.g., free pages in the buddy
> > and other things that does not require a lot of metadata). Some of these
> > flags will be moved into the memdesc pointer in the future as the conversion
> > proceeeds.
> 
> Okay, makes sense.
> 
> > > > The "memdesc" could be a pointer to a "struct folio" that is allocated from
> > > > the slab.
> > > > 
> > > > So in the new memdesc world, all pages part of a folio will point at the
> > > > allocated "struct folio", not the head page where "struct folio" currently
> > > > overlays "struct page".
> > > > 
> > > > That would mean that the proposal in this patch set will have to be reverted
> > > > again.
> > > > 
> > > > 
> > > > At LPC, Willy said that he wants to have something out there in the first
> > > > half of 2026.
> > > 
> > > Okay, seems ambitious to me.
> > 
> > When the program was called "2025" I considered it very ambitious :) Now I
> > consider it ambitious. I think Willy already shared early versions of the
> > "struct slab" split and the "struct ptdesc" split recently on the list.
> > 
> > > 
> > > Last time I asked, we had no idea how much performance would additional
> > > indirection cost us. Do we have a clue?
> > 
> > I raised that in the past, and I think the answer I got was that
> > 
> > (a) We always had these indirection cost when going from tail page to
> >     head page / folio.
> > (b) We must convert the code to do as little page_folio() as possible.
> >     That's why we saw so much code conversion to stop working on pages
> >     and only work on folios.
> > 
> > There are certainly cases where we cannot currently avoid the indirection,
> > like when we traverse a page table and go
> > 
> > 	pfn -> page -> folio
> > 
> > and cannot simply go
> > 
> > 	pfn -> folio
> > 
> > On the bright side, we'll lose the head-page checks and can simply
> > dereference the pointer.
> > 
> > I don't know whether Willy has more information yet, but I would assume that
> > in most cases this will be similar to the performance summary in your cover
> > letter: "... has shown either no change or only a slight improvement within
> > the noise.", just that it will be "only a slight degradation within the
> > noise". :)
> > 
> > We'll learn I guess, in particular which other page -> folio conversions
> > cannot be optimized out by caching the folio.
> > 
> > 
> > For quite some time there will be a magical config option that will switch
> > between both layouts. I'd assume that things will get more complicated if we
> > suddenly have a "compound_head/folio" pointer and a "compound_info" pointer
> > at the same time.
> > 
> > But it's really Willy who has the concept in mind as he is very likely right
> > now busy writing some of that code.
> > 
> > I'm just the messenger.
> > 
> > :)
> > 
> > [I would hope that Willy could share his thoughts]
> 
> If you or Willy think that this patch will impede memdesc progress, I am
> okay not pushing this patchset upstream.

Or other option is to get this patchset upstream (I need to fix/test few
things still) and revert it later when (if?) memdesc lands.

What do you think?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Muchun Song 4 weeks, 1 day ago

> On Jan 8, 2026, at 21:30, Kiryl Shutsemau <kas@kernel.org> wrote:
> 
> On Thu, Jan 08, 2026 at 12:32:47PM +0000, Kiryl Shutsemau wrote:
>> On Thu, Jan 08, 2026 at 12:08:35AM +0100, David Hildenbrand (Red Hat) wrote:
>>>>> "Then we make page->compound_head point to the dynamically allocated memdesc
>>>>> rather than the first page. Then we can transition to the above layout. "
>>>> 
>>> 
>>> Sorry for the late reply, it's been a bit crazy over here.
>>> 
>>>> I am not sure I understand how it is going to work.
>>>> 
>>> 
>>> I don't recall all the details that Willy shared over the last years while
>>> working on folios, but I will try to answer as best as I can from the top of
>>> my head. (there are plenty of resources on the list, on the web, in his
>>> presentations etc.).
>>> 
>>>> 32-byte layout indicates that flags will stay in the statically
>>>> allocated part, but most (all?) flags are in the head page and we would
>>>> need a way to redirect from tail to head in the statically allocated
>>>> pages.
>>> 
>>> When working with folios we will never go through the head page flags.
>>> That's why Willy has incrementally converted most folio code that worked on
>>> pages to work on folios.
>>> 
>>> For example, PageUptodate() does a
>>> 
>>> folio_test_uptodate(page_folio(page));
>>> 
>>> The flags in the 32-byte layout will be used by some non-folio things for
>>> which we won't allocate memdescs (just yet) (e.g., free pages in the buddy
>>> and other things that does not require a lot of metadata). Some of these
>>> flags will be moved into the memdesc pointer in the future as the conversion
>>> proceeeds.
>> 
>> Okay, makes sense.
>> 
>>>>> The "memdesc" could be a pointer to a "struct folio" that is allocated from
>>>>> the slab.
>>>>> 
>>>>> So in the new memdesc world, all pages part of a folio will point at the
>>>>> allocated "struct folio", not the head page where "struct folio" currently
>>>>> overlays "struct page".
>>>>> 
>>>>> That would mean that the proposal in this patch set will have to be reverted
>>>>> again.
>>>>> 
>>>>> 
>>>>> At LPC, Willy said that he wants to have something out there in the first
>>>>> half of 2026.
>>>> 
>>>> Okay, seems ambitious to me.
>>> 
>>> When the program was called "2025" I considered it very ambitious :) Now I
>>> consider it ambitious. I think Willy already shared early versions of the
>>> "struct slab" split and the "struct ptdesc" split recently on the list.
>>> 
>>>> 
>>>> Last time I asked, we had no idea how much performance would additional
>>>> indirection cost us. Do we have a clue?
>>> 
>>> I raised that in the past, and I think the answer I got was that
>>> 
>>> (a) We always had these indirection cost when going from tail page to
>>>    head page / folio.
>>> (b) We must convert the code to do as little page_folio() as possible.
>>>    That's why we saw so much code conversion to stop working on pages
>>>    and only work on folios.
>>> 
>>> There are certainly cases where we cannot currently avoid the indirection,
>>> like when we traverse a page table and go
>>> 
>>> pfn -> page -> folio
>>> 
>>> and cannot simply go
>>> 
>>> pfn -> folio
>>> 
>>> On the bright side, we'll lose the head-page checks and can simply
>>> dereference the pointer.
>>> 
>>> I don't know whether Willy has more information yet, but I would assume that
>>> in most cases this will be similar to the performance summary in your cover
>>> letter: "... has shown either no change or only a slight improvement within
>>> the noise.", just that it will be "only a slight degradation within the
>>> noise". :)
>>> 
>>> We'll learn I guess, in particular which other page -> folio conversions
>>> cannot be optimized out by caching the folio.
>>> 
>>> 
>>> For quite some time there will be a magical config option that will switch
>>> between both layouts. I'd assume that things will get more complicated if we
>>> suddenly have a "compound_head/folio" pointer and a "compound_info" pointer
>>> at the same time.
>>> 
>>> But it's really Willy who has the concept in mind as he is very likely right
>>> now busy writing some of that code.
>>> 
>>> I'm just the messenger.
>>> 
>>> :)
>>> 
>>> [I would hope that Willy could share his thoughts]
>> 
>> If you or Willy think that this patch will impede memdesc progress, I am
>> okay not pushing this patchset upstream.
> 
> Or other option is to get this patchset upstream (I need to fix/test few
> things still) and revert it later when (if?) memdesc lands.
> 
> What do you think?

It seems the merge of memdesc is still some time away? If it’s going to
take a while, my personal preference is to merge it first and then decide
whether to revert the changes based on actual needs.

Thanks.

> 
> -- 
>  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Muchun Song 1 month, 2 weeks ago

> On Dec 23, 2025, at 17:38, David Hildenbrand (Red Hat) <david@kernel.org> wrote:
> 
> On 12/22/25 15:55, Muchun Song wrote:
>>> On Dec 22, 2025, at 22:18, David Hildenbrand (Red Hat) <david@kernel.org> wrote:
>>> 
>>> On 12/22/25 15:02, Kiryl Shutsemau wrote:
>>>>> On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
>>>>> 
>>>>> 
>>>>> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
>>>>>> The upcoming changes in compound_head() require memmap to be naturally
>>>>>> aligned to the maximum folio size.
>>>>>> 
>>>>>> Add a warning if it is not.
>>>>>> 
>>>>>> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
>>>>>> kernel is still likely to be functional if this strict check fails.
>>>>> 
>>>>> Different architectures default to 2 MB alignment (mainly to
>>>>> enable huge mappings), which only accommodates folios up to
>>>>> 128 MB. Yet 1 GB huge pages are still fairly common, so
>>>>> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
>>>>> miss the most frequent case.
>>>> I don't follow. 16 GB check is more strict that anything smaller.
>>>> How can it miss the most frequent case?
>>>>> I’m concerned that this might plant a hidden time bomb: it
>>>>> could detonate at any moment in later code, silently triggering
>>>>> memory corruption or similar failures. Therefore, I don’t
>>>>> think a WARNING is a good choice.
>>>> We can upgrade it BUG_ON(), but I want to understand your logic here
>>>> first.
>>> 
>>> Definitely no BUG_ON(). I would assume this is something we would find early during testing, so even a VM_WARN_ON_ONCE() should be good enough?
>>> 
>>> This smells like a possible problem, though, as soon as some architecture wants to increase the folio size. What would be the expected step to ensure the alignment is done properly?
>>> 
>>> But OTOH, as I raised Willy's work will make all of that here obsolete either way, so maybe not worth worrying about that case too much,
>> Hi David,
> 
> Hi! :)
> 
>> I hope you're doing well. I must admit I have limited knowledge of Willy's work, and I was wondering if you might be kind enough to share any publicly available links where I could learn more about the future direction of this project. I would be truly grateful for your guidance.
>> Thank you very much in advance.
> 
> There is some information to be had at [1], but more at [2]. Take a look at [2] in "After those projects are complete - Then we can shrink struct page to 32 bytes:"
> 
> In essence, all pages (belonging to a memdesc) will have a "memdesc" pointer (that replaces the compound_head pointer).
> 
> "Then we make page->compound_head point to the dynamically allocated memdesc rather than the first page. Then we can transition to the above layout. "
> 
> The "memdesc" could be a pointer to a "struct folio" that is allocated from the slab.
> 
> So in the new memdesc world, all pages part of a folio will point at the allocated "struct folio", not the head page where "struct folio" currently overlays "struct page".
> 
> That would mean that the proposal in this patch set will have to be reverted again.
> 
> 
> At LPC, Willy said that he wants to have something out there in the first half of 2026.
> 
> [1] https://kernelnewbies.org/MatthewWilcox/Memdescs
> [2] https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

Many thanks for taking the time to explain everything in detail and for providing
such valuable information. I plan to invest additional time to fully understand
the details you’ve shared.

Muchun,
Thanks.

> 
> -- 
> Cheers
> 
> David
Re: [PATCHv2 02/14] mm/sparse: Check memmap alignment
Posted by Muchun Song 1 month, 2 weeks ago

> On Dec 22, 2025, at 22:03, Kiryl Shutsemau <kas@kernel.org> wrote:
> On Mon, Dec 22, 2025 at 04:34:40PM +0800, Muchun Song wrote:
>> 
>> 
>> On 2025/12/18 23:09, Kiryl Shutsemau wrote:
>>> The upcoming changes in compound_head() require memmap to be naturally
>>> aligned to the maximum folio size.
>>> Add a warning if it is not.
>>> A warning is sufficient as MAX_FOLIO_ORDER is very rarely used, so the
>>> kernel is still likely to be functional if this strict check fails.
>> 
>> Different architectures default to 2 MB alignment (mainly to
>> enable huge mappings), which only accommodates folios up to
>> 128 MB. Yet 1 GB huge pages are still fairly common, so
>> validating 16 GB (MAX_FOLIO_SIZE) alignment seems likely to
>> miss the most frequent case.
> 
> I don't follow. 16 GB check is more strict that anything smaller.
> How can it miss the most frequent case?

Sorry, I didn’t make myself clear. What I meant
is that if this warning triggers, it implies the
largest-sized folio isn’t properly aligned, and
the 1 GB folios are probably mis-aligned too.
Your commit message says
“MAX_FOLIO_ORDER is very rarely used,” but
I want to stress that 1 GB folios are actually
 common. If they’re also mis-aligned, we’re
quietly planting a land-mine. That’s why I’m
worried a mere warning isn’t enough—it
leaves a latent bug in the system.

If there’s a problem, we should stop right
here—this is the earliest place where it will surface.

As David assumed, if we expect to catch the
problem during testing, then I think VM_BUG_ON
would be more appropriate.

Thanks.

> 
>> I’m concerned that this might plant a hidden time bomb: it
>> could detonate at any moment in later code, silently triggering
>> memory corruption or similar failures. Therefore, I don’t
>> think a WARNING is a good choice.
> 
> We can upgrade it BUG_ON(), but I want to understand your logic here
> first.
> 
> --
>  Kiryl Shutsemau / Kirill A. Shutemov