drivers/virtio/virtio_mem.c | 111 +----------------------------------- include/linux/page-flags.h | 29 +++++++--- mm/memory_hotplug.c | 22 ++----- mm/page_alloc.c | 8 +-- mm/page_isolation.c | 21 +++---- 5 files changed, 40 insertions(+), 151 deletions(-)
This is a requirement for making PageOffline pages not have a refcount
in the long future ("frozen"), and for reworking non-folio page migration
in the near future.
I have patches mostly ready to go to handle the latter. For turning all
PageOffline() pages frozen, the non-folio page migration and memory
ballooning drivers will have to be reworked first, to no longer rely on
the refcount of PageOffline pages.
Introduce PG_offline_skippable that only applies to PageOffline() pages --
of course, reusing one of the existing PG_ flags for now -- and convert
virtio-mem to make use of the new way: to allow for skipping PageOffline
pages during memory offlining, treating them as if they would not be
allocated.
Note that the existing mechanism relied on the driver (virtio-mem)
dropping its reference during MEM_GOING_OFFLINE, which is complicated and
not compatible with the concept of frozen pages (no refcount).
Tested with virtio-mem on x86, including partially hotplugging a memory
block (hotplugging 64MiB with a 128 MiB memory block size), and repeatedly
onlining+offlining the memory block.
Cc: David Hildenbrand <david@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Eugenio Pérez" <eperezma@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
David Hildenbrand (2):
mm/memory_hotplug: PG_offline_skippable for offlining memory blocks
with PageOffline pages
mm/memory_hotplug: remove -EBUSY handling from scan_movable_pages()
drivers/virtio/virtio_mem.c | 111 +-----------------------------------
include/linux/page-flags.h | 29 +++++++---
mm/memory_hotplug.c | 22 ++-----
mm/page_alloc.c | 8 +--
mm/page_isolation.c | 21 +++----
5 files changed, 40 insertions(+), 151 deletions(-)
base-commit: 2f6baf8dadecc2bec7d6bc931f7e0d58d8443d76
--
2.49.0
On 14 May 2025, at 7:15, David Hildenbrand wrote:
> This is a requirement for making PageOffline pages not have a refcount
> in the long future ("frozen"), and for reworking non-folio page migration
> in the near future.
>
> I have patches mostly ready to go to handle the latter. For turning all
> PageOffline() pages frozen, the non-folio page migration and memory
> ballooning drivers will have to be reworked first, to no longer rely on
> the refcount of PageOffline pages.
>
> Introduce PG_offline_skippable that only applies to PageOffline() pages --
> of course, reusing one of the existing PG_ flags for now -- and convert
> virtio-mem to make use of the new way: to allow for skipping PageOffline
> pages during memory offlining, treating them as if they would not be
> allocated.
IIUC, based on Documentation/admin-guide/mm/memory-hotplug.rst,
to offline a page, the page first needs to be set PageOffline() to be
removed from page allocator. Next, the page is removed from its memory
block. When will PG_offline_skippable be used? The second phase when
the page is being removed from its memory block?
Thanks.
>
> Note that the existing mechanism relied on the driver (virtio-mem)
> dropping its reference during MEM_GOING_OFFLINE, which is complicated and
> not compatible with the concept of frozen pages (no refcount).
>
> Tested with virtio-mem on x86, including partially hotplugging a memory
> block (hotplugging 64MiB with a 128 MiB memory block size), and repeatedly
> onlining+offlining the memory block.
>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Cc: "Eugenio Pérez" <eperezma@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Brendan Jackman <jackmanb@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> David Hildenbrand (2):
> mm/memory_hotplug: PG_offline_skippable for offlining memory blocks
> with PageOffline pages
> mm/memory_hotplug: remove -EBUSY handling from scan_movable_pages()
>
> drivers/virtio/virtio_mem.c | 111 +-----------------------------------
> include/linux/page-flags.h | 29 +++++++---
> mm/memory_hotplug.c | 22 ++-----
> mm/page_alloc.c | 8 +--
> mm/page_isolation.c | 21 +++----
> 5 files changed, 40 insertions(+), 151 deletions(-)
>
>
> base-commit: 2f6baf8dadecc2bec7d6bc931f7e0d58d8443d76
> --
> 2.49.0
--
Best Regards,
Yan, Zi
On 14.05.25 15:45, Zi Yan wrote:
> On 14 May 2025, at 7:15, David Hildenbrand wrote:
>
>> This is a requirement for making PageOffline pages not have a refcount
>> in the long future ("frozen"), and for reworking non-folio page migration
>> in the near future.
>>
>> I have patches mostly ready to go to handle the latter. For turning all
>> PageOffline() pages frozen, the non-folio page migration and memory
>> ballooning drivers will have to be reworked first, to no longer rely on
>> the refcount of PageOffline pages.
>>
>> Introduce PG_offline_skippable that only applies to PageOffline() pages --
>> of course, reusing one of the existing PG_ flags for now -- and convert
>> virtio-mem to make use of the new way: to allow for skipping PageOffline
>> pages during memory offlining, treating them as if they would not be
>> allocated.
>
Thanks for taking a look!
> IIUC, based on Documentation/admin-guide/mm/memory-hotplug.rst,
> to offline a page, the page first needs to be set PageOffline() to be
PageOffline is not mentioned in there. :)
Note that PageOffline() is a bit confusing because it's "Memory block
online but page is logically offline (e.g., has a memmap that can be
touched, but the page content should not be touched)".
(memory block offline -> all pages offline and have effectively no state
because the memmap is stale)
> removed from page allocator.
Usually, all pages are freed back to the buddy (isolated pageblock ->
put onto the isolated list). Memory offlining code can then simply grab
these "free" pages from the buddy -- no PageOffline involved.
If something fails during memory offlining, these isolated pages are
simply put back on the appropriate migratetype list and become ordinary
free pages that can be allocated immediately.
Some PageOffline pages can be migrated using the non-folio migration:
this is done for memory ballooning (memory comapction). As they get
migrated, they are freed back to the buddy, PageOffline() is cleared --
they become PageBuddy() -- and the above applies.
Other PageOffline pages can be skipped during memory offlining
(virtio-mem use case, what we are doing her). We don't want them to ever
go through the buddy, especially because if memory offlining fails they
must definitely not be treated like free pages that can be allocated
immediately.
Next, the page is removed from its memory
> block. When will PG_offline_skippable be used? The second phase when
> the page is being removed from its memory block?
PG_offline_skippable is used during memory offlining, while we look for
any pages that are not PageBuddy (... or hwpoisoned ...), to migrate
them off the memory so they get converted to PageBuddy.
PageOffline + PageOfflineSkippable are checked on that phase, such that
they don't require any migration.
--
Cheers,
David / dhildenb
On 14 May 2025, at 10:12, David Hildenbrand wrote:
> On 14.05.25 15:45, Zi Yan wrote:
>> On 14 May 2025, at 7:15, David Hildenbrand wrote:
>>
>>> This is a requirement for making PageOffline pages not have a refcount
>>> in the long future ("frozen"), and for reworking non-folio page migration
>>> in the near future.
>>>
>>> I have patches mostly ready to go to handle the latter. For turning all
>>> PageOffline() pages frozen, the non-folio page migration and memory
>>> ballooning drivers will have to be reworked first, to no longer rely on
>>> the refcount of PageOffline pages.
>>>
>>> Introduce PG_offline_skippable that only applies to PageOffline() pages --
>>> of course, reusing one of the existing PG_ flags for now -- and convert
>>> virtio-mem to make use of the new way: to allow for skipping PageOffline
>>> pages during memory offlining, treating them as if they would not be
>>> allocated.
>>
>
> Thanks for taking a look!
>
>> IIUC, based on Documentation/admin-guide/mm/memory-hotplug.rst,
>> to offline a page, the page first needs to be set PageOffline() to be
>
> PageOffline is not mentioned in there. :)
Sorry, I was mixing the code with the documentation as I was reading
both.
>
> Note that PageOffline() is a bit confusing because it's "Memory block online but page is logically offline (e.g., has a memmap that can be touched, but the page content should not be touched)".
So PageOffline() is before memory block offline, which is the first phase of
memory hotunplug.
>
> (memory block offline -> all pages offline and have effectively no state because the memmap is stale)
What do you mean by memmap is stale? When a memory block is offline, memmap is
still present, so pfn scanner can see these pages. pfn scanner checks memmap
to know that it should not touch these pages, right?
>
>> removed from page allocator.
>
> Usually, all pages are freed back to the buddy (isolated pageblock -> put onto the isolated list). Memory offlining code can then simply grab these "free" pages from the buddy -- no PageOffline involved.
>
> If something fails during memory offlining, these isolated pages are simply put back on the appropriate migratetype list and become ordinary free pages that can be allocated immediately.
I am familiar with this part. Then, when PageOffline is used?
From the comment in page-flags.h, I see two examples: inflated pages by balloon driver
and not onlined pages when onlining the section. These are two different operations:
1) inflated pages are going to be offline, 2) not onlined pages are going to be
online. But you mentioned above that Memory off lining code does not involve
PageOffline, so inflated pages by balloon driver is not part of memory offlining
code, but a different way of offlining pages. Am I getting it right?
I read a little bit more on memory ballooning and virtio-mem and understand
that memory ballooning still keeps the inflated page but guest cannot allocate
and use it, whereas virtio-mem and memory hotunplug remove the page from
Linux completely (i.e., Linux no longer sees the memory).
It seems that I am mixing memory offlining and memory hotunplug. IIUC,
memory offlining means no one can allocate and use the offlined memory, but
Linux still sees it; memory hotunplug means Linux no longer sees it (no related
memmap and other metadata). Am I getting it right?
>
> Some PageOffline pages can be migrated using the non-folio migration: this is done for memory ballooning (memory comapction). As they get migrated, they are freed back to the buddy, PageOffline() is cleared -- they become PageBuddy() -- and the above applies.
After a PageOffline page is migrated, the destination page becomes PageOffline, right?
OK, I see it in balloon_page_insert().
>
> Other PageOffline pages can be skipped during memory offlining (virtio-mem use case, what we are doing her). We don't want them to ever go through the buddy, especially because if memory offlining fails they must definitely not be treated like free pages that can be allocated immediately.
What do you mean by "skipped during memory offlining"? Are you implying when
virtio-mem is offlining some pages by marking it PageOffline and PG_offline_skippable,
someone else can do memory offlining in parallel?
>
> Next, the page is removed from its memory
>> block. When will PG_offline_skippable be used? The second phase when
>> the page is being removed from its memory block?
>
> PG_offline_skippable is used during memory offlining, while we look for any pages that are not PageBuddy (... or hwpoisoned ...), to migrate them off the memory so they get converted to PageBuddy.
>
> PageOffline + PageOfflineSkippable are checked on that phase, such that they don't require any migration.
Hmm, if you just do not want to get PageOffline migrated, not setting it
__PageMovable would work right? PageOffline + __PageMovable is used by
ballooning, as these inflated pages can be migrated. PageOffline without
__PageMovable should be virtio-mem. Am I missing any other user?
--
Best Regards,
Yan, Zi
>>
>> Note that PageOffline() is a bit confusing because it's "Memory block online but page is logically offline (e.g., has a memmap that can be touched, but the page content should not be touched)".
>
> So PageOffline() is before memory block offline, which is the first phase of
> memory hotunplug.
Yes.
>
>>
>> (memory block offline -> all pages offline and have effectively no state because the memmap is stale)
>
> What do you mean by memmap is stale? When a memory block is offline, memmap is
> still present, so pfn scanner can see these pages. pfn scanner checks memmap
> to know that it should not touch these pages, right?
See pfn_to_online_page() for exactly that use case.
For an offline memory section (either because it was just added or
because it was just offlined), the memmap is assumed to contain garbage
and should not be touched.
See remove_pfn_range_from_zone() -> page_init_poison().
>
>>
>>> removed from page allocator.
>>
>> Usually, all pages are freed back to the buddy (isolated pageblock -> put onto the isolated list). Memory offlining code can then simply grab these "free" pages from the buddy -- no PageOffline involved.
>>
>> If something fails during memory offlining, these isolated pages are simply put back on the appropriate migratetype list and become ordinary free pages that can be allocated immediately.
>
> I am familiar with this part. Then, when PageOffline is used?
>
> From the comment in page-flags.h, I see two examples: inflated pages by balloon driver
> and not onlined pages when onlining the section. These are two different operations:
> 1) inflated pages are going to be offline, 2) not onlined pages are going to be
> online. But you mentioned above that Memory off lining code does not involve
> PageOffline, so inflated pages by balloon driver is not part of memory offlining
> code, but a different way of offlining pages. Am I getting it right?
Yes. PageOffline means logically offline, for whatever reason someone
decides to turn pages logically offline.
Memory ballooning uses and virtio-mem are two users, there are more.
>
> I read a little bit more on memory ballooning and virtio-mem and understand
> that memory ballooning still keeps the inflated page but guest cannot allocate
> and use it, whereas virtio-mem and memory hotunplug remove the page from
> Linux completely (i.e., Linux no longer sees the memory).
In virtio-mem terms, they are considered "fake offline" -- memory
behaves as if it would never have been onlined, but there is a memmap
for it. Like a (current) memory hole.
>
> It seems that I am mixing memory offlining and memory hotunplug. IIUC,
> memory offlining means no one can allocate and use the offlined memory, but
> Linux still sees it; memory hotunplug means Linux no longer sees it (no related
> memmap and other metadata). Am I getting it right?
The doc has this "Phases of Memory Hotplug" description, where it is
roughly divided into that, yes.
>
>>
>> Some PageOffline pages can be migrated using the non-folio migration: this is done for memory ballooning (memory comapction). As they get migrated, they are freed back to the buddy, PageOffline() is cleared -- they become PageBuddy() -- and the above applies.
>
> After a PageOffline page is migrated, the destination page becomes PageOffline, right?
> OK, I see it in balloon_page_insert().
Yes.
>
>>
>> Other PageOffline pages can be skipped during memory offlining (virtio-mem use case, what we are doing her). We don't want them to ever go through the buddy, especially because if memory offlining fails they must definitely not be treated like free pages that can be allocated immediately.
>
> What do you mean by "skipped during memory offlining"? Are you implying when
> virtio-mem is offlining some pages by marking it PageOffline and PG_offline_skippable,
> someone else can do memory offlining in parallel?
It could happen (e.g., manually offline a Linux memory block using
sysfs), but that is not the primary use case.
virtio-mem unplugs memory in the following sequence:
1) alloc_contig_range() small blocks (e.g., 2 MiB)
2) Report the blocks to the hypervisor
3) Mark them fake-offline: PageOffline (+ PageOfflineSkippable now)
Once all small blocks that comprise a Linux memory block (e.g., 128 MiB)
are fake-offline, offline the memory block and remove the memory using
offline_and_remove_memory().
In that operation -- offline_and_remove_memory() -- memory offlining
code must be able to skip these PageOffline pages, otherwise
offline_and_remove_memory() will just fail, saying that there are
unmovable pages in there.
>
>>
>> Next, the page is removed from its memory
>>> block. When will PG_offline_skippable be used? The second phase when
>>> the page is being removed from its memory block?
>>
>> PG_offline_skippable is used during memory offlining, while we look for any pages that are not PageBuddy (... or hwpoisoned ...), to migrate them off the memory so they get converted to PageBuddy.
>>
>> PageOffline + PageOfflineSkippable are checked on that phase, such that they don't require any migration.
>
> Hmm, if you just do not want to get PageOffline migrated, not setting it
> __PageMovable would work right? PageOffline + __PageMovable is used by
> ballooning, as these inflated pages can be migrated. PageOffline without
> __PageMovable should be virtio-mem. Am I missing any other user?
Sure. Just imagine !CONFIG_BALLOON_COMPACTION.
In summary, we have
1) Migratable PageOffline pages (balloon compaction)
2) Unmigratable PageOffline pages (e.g., XEN balloon, hyper-v balloon,
memtrace, in the future likely some memory holes, ... )
3) Skippable PageOffline pages (virtio-mem)
--
Cheers,
David / dhildenb
On 14 May 2025, at 13:28, David Hildenbrand wrote: >>> >>> Note that PageOffline() is a bit confusing because it's "Memory block online but page is logically offline (e.g., has a memmap that can be touched, but the page content should not be touched)". >> >> So PageOffline() is before memory block offline, which is the first phase of >> memory hotunplug. > > Yes. > >> >>> >>> (memory block offline -> all pages offline and have effectively no state because the memmap is stale) >> >> What do you mean by memmap is stale? When a memory block is offline, memmap is >> still present, so pfn scanner can see these pages. pfn scanner checks memmap >> to know that it should not touch these pages, right? > > See pfn_to_online_page() for exactly that use case. > > For an offline memory section (either because it was just added or because it was just offlined), the memmap is assumed to contain garbage and should not be touched. > > See remove_pfn_range_from_zone() -> page_init_poison(). > >> >>> >>>> removed from page allocator. >>> >>> Usually, all pages are freed back to the buddy (isolated pageblock -> put onto the isolated list). Memory offlining code can then simply grab these "free" pages from the buddy -- no PageOffline involved. >>> >>> If something fails during memory offlining, these isolated pages are simply put back on the appropriate migratetype list and become ordinary free pages that can be allocated immediately. >> >> I am familiar with this part. Then, when PageOffline is used? >> >> From the comment in page-flags.h, I see two examples: inflated pages by balloon driver >> and not onlined pages when onlining the section. These are two different operations: >> 1) inflated pages are going to be offline, 2) not onlined pages are going to be >> online. But you mentioned above that Memory off lining code does not involve >> PageOffline, so inflated pages by balloon driver is not part of memory offlining >> code, but a different way of offlining pages. Am I getting it right? > > Yes. PageOffline means logically offline, for whatever reason someone decides to turn pages logically offline. > > Memory ballooning uses and virtio-mem are two users, there are more. > >> >> I read a little bit more on memory ballooning and virtio-mem and understand >> that memory ballooning still keeps the inflated page but guest cannot allocate >> and use it, whereas virtio-mem and memory hotunplug remove the page from >> Linux completely (i.e., Linux no longer sees the memory). > > In virtio-mem terms, they are considered "fake offline" -- memory behaves as if it would never have been onlined, but there is a memmap for it. Like a (current) memory hole. > >> >> It seems that I am mixing memory offlining and memory hotunplug. IIUC, >> memory offlining means no one can allocate and use the offlined memory, but >> Linux still sees it; memory hotunplug means Linux no longer sees it (no related >> memmap and other metadata). Am I getting it right? > > The doc has this "Phases of Memory Hotplug" description, where it is roughly divided into that, yes. > >> >>> >>> Some PageOffline pages can be migrated using the non-folio migration: this is done for memory ballooning (memory comapction). As they get migrated, they are freed back to the buddy, PageOffline() is cleared -- they become PageBuddy() -- and the above applies. >> >> After a PageOffline page is migrated, the destination page becomes PageOffline, right? >> OK, I see it in balloon_page_insert(). > > Yes. > >> >>> >>> Other PageOffline pages can be skipped during memory offlining (virtio-mem use case, what we are doing her). We don't want them to ever go through the buddy, especially because if memory offlining fails they must definitely not be treated like free pages that can be allocated immediately. >> >> What do you mean by "skipped during memory offlining"? Are you implying when >> virtio-mem is offlining some pages by marking it PageOffline and PG_offline_skippable, >> someone else can do memory offlining in parallel? > > It could happen (e.g., manually offline a Linux memory block using sysfs), but that is not the primary use case. > > virtio-mem unplugs memory in the following sequence: > > 1) alloc_contig_range() small blocks (e.g., 2 MiB) > > 2) Report the blocks to the hypervisor > > 3) Mark them fake-offline: PageOffline (+ PageOfflineSkippable now) > > Once all small blocks that comprise a Linux memory block (e.g., 128 MiB) are fake-offline, offline the memory block and remove the memory using offline_and_remove_memory(). > > In that operation -- offline_and_remove_memory() -- memory offlining code must be able to skip these PageOffline pages, otherwise offline_and_remove_memory() will just fail, saying that there are unmovable pages in there. > >> >>> >>> Next, the page is removed from its memory >>>> block. When will PG_offline_skippable be used? The second phase when >>>> the page is being removed from its memory block? >>> >>> PG_offline_skippable is used during memory offlining, while we look for any pages that are not PageBuddy (... or hwpoisoned ...), to migrate them off the memory so they get converted to PageBuddy. >>> >>> PageOffline + PageOfflineSkippable are checked on that phase, such that they don't require any migration. >> >> Hmm, if you just do not want to get PageOffline migrated, not setting it >> __PageMovable would work right? PageOffline + __PageMovable is used by >> ballooning, as these inflated pages can be migrated. PageOffline without >> __PageMovable should be virtio-mem. Am I missing any other user? > > Sure. Just imagine !CONFIG_BALLOON_COMPACTION. > > In summary, we have > > 1) Migratable PageOffline pages (balloon compaction) > > 2) Unmigratable PageOffline pages (e.g., XEN balloon, hyper-v balloon, > memtrace, in the future likely some memory holes, ... ) > > 3) Skippable PageOffline pages (virtio-mem) Thank you for all the explanation. Now I understand how memory offline and memory hotunplug work and shall begin to check the patches. :) -- Best Regards, Yan, Zi
>>>> Next, the page is removed from its memory >>>>> block. When will PG_offline_skippable be used? The second phase when >>>>> the page is being removed from its memory block? >>>> >>>> PG_offline_skippable is used during memory offlining, while we look for any pages that are not PageBuddy (... or hwpoisoned ...), to migrate them off the memory so they get converted to PageBuddy. >>>> >>>> PageOffline + PageOfflineSkippable are checked on that phase, such that they don't require any migration. >>> >>> Hmm, if you just do not want to get PageOffline migrated, not setting it >>> __PageMovable would work right? PageOffline + __PageMovable is used by >>> ballooning, as these inflated pages can be migrated. PageOffline without >>> __PageMovable should be virtio-mem. Am I missing any other user? >> >> Sure. Just imagine !CONFIG_BALLOON_COMPACTION. >> >> In summary, we have >> >> 1) Migratable PageOffline pages (balloon compaction) >> >> 2) Unmigratable PageOffline pages (e.g., XEN balloon, hyper-v balloon, >> memtrace, in the future likely some memory holes, ... ) >> >> 3) Skippable PageOffline pages (virtio-mem) > > Thank you for all the explanation. Now I understand how memory offline > and memory hotunplug work and shall begin to check the patches. :) Sure, if you think the doc or some comments could be updated, I'm happy to review such changes. It's always very helpful to receive feedback from someone that's new to this code. -- Cheers, David / dhildenb
© 2016 - 2026 Red Hat, Inc.